Giter Site home page Giter Site logo

ostrich's Introduction

ostrich's People

Contributors

bbeck avatar bivasdas avatar bvbuild avatar chriskramer avatar dump247 avatar hguo0303 avatar lpearson05 avatar matthewbogner avatar mike-unitskyi avatar milosimpson avatar nbauernfeind avatar ohhatiya avatar olyhuta avatar philipflesher avatar robbytx avatar shawnsmith avatar tianx2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ostrich's Issues

Increase stability of TeamCity build

The TeamCity build randomly fails when no code has been changed (it's currently running hourly). We're currently experiencing 1-2 random failures per day.

I believe this is an issue between Curator's TestingServer and Apache's ZooKeeper library. I have a very simple test case that I've written that exposes this: https://gist.github.com/2891890. Given enough iterations this test will eventually get into an infinite loop. I believe this same problem is what is affecting our TeamCity build.

Support caching of Service instances

Currently in the ServicePool every time we execute a callback we as the ServiceFactory<S> to create a new instance of a service. Depending on the implementation of the ServiceFactory this could be an expensive operation (it may need to establish a new connection to the remote server, etc.). We should offer the ability to cache these instances so that they don't have to be recreated every time.

This functionality should probably be something that individual service providers should control, not service consumers.

Write documentation

We need several pieces of documentation:

  • A guide for service consumers
  • A guide for service providers
  • A guide for operators (describing how the system works, what connectivity is required, etc.)

update documentation

Docs need to be updated for the new project dependencies (i.e., use of curator directly instead of through ZooKeeperConnection).

Add a richer exception hierarchy

Right now ServiceException is thrown when most things go wrong. We should be more specific than that, and have subclasses that represent different things. The following cases are useful to users:

  • Knowing when no hosts were available for the request (e.g. HostDiscovery reporting empty set)
  • Knowing when all retries were exhausted
  • Telling the ServicePool when the exception that happened should result in a retry
  • Telling the ServicePool (likely having it infer) when the exception that happened was a programming error.

Randomize the interval that health checks are polled on

We don't want to bombard a server that comes back up with a ton of health checks. It would be nice to space these out by waiting before the first check by some random amount of time.

Maybe some type of backoff strategy should be employed.

Stability Testing - Havoc!!!! [13]

  • Agent that simulates a rolling restart of ZK
  • Agent that always throws an exception
  • Service that registers and unregisters - system noise
  • Agent that never responds
  • Agent that creates an ip table network partition
  • Create DNS issues

Support partitioned services

It's highly likely that services will be partitioned (e.g. one node will only be able to service queries for a specific range of data). Ostrich needs to support people in authoring and consuming these types of services.

At the same time, a lot of services will only have a single partition. For these services complexity of using Ostrich shouldn't increase in a noticeable way.

At a high level this change will require:

  • Updating the ServicePool to receive a partition key. This should probably be opaque from the perspective of Ostrich.
  • Updating the LoadBalanceAlgorithm to support receiving a partition key, so that the load balancer can choose a suitable service end point to use.
  • Maybe support letting the ServiceFactory know the partition key? I'm not sure this is useful.

Make ServicePool creation simpler...

Currently when a user creates a ServicePool the code looks something like this:

ZooKeeperConnection connection = new ZooKeeperConfiguration()
            .setConnectString(connectString)
            .setRetryNTimes(new com.bazaarvoice.soa.zookeeper.RetryNTimes(3, 100))
            .connect();

ThreadFactory daemonThreadFactory = new ThreadFactoryBuilder()
            .setDaemon(true)
            .build();

ServicePool<CalculatorService> pool = new ServicePoolBuilder<CalculatorService>()
            .withHostDiscovery(new ZooKeeperHostDiscovery(connection, "calculator"))
            .withServiceFactory(new CalculatorServiceFactory())
            .withHealthCheckExecutor(Executors.newScheduledThreadPool(1, daemonThreadFactory))
            .build();

There are a few things that I consider wrong with this picture:

  1. When creating the ZooKeeperHostDiscovery instance, the user needs to know where in ZooKeeper the registration nodes are being stored (e.g. the "calculator" parameter). The CalculatorServiceFactory object actually has that knowledge inside of it, so we shouldn't bleed that information to the user of the service.
  2. The user is required to create a health check executor service without necessarily understanding why. Providing that should be completely optional for them.

I would like to see the above code be rewritten to something like:

ZooKeeperConnection connection = new ZooKeeperConfiguration()
            .setConnectString(connectString)
            .setRetryNTimes(new com.bazaarvoice.soa.zookeeper.RetryNTimes(3, 100))
            .connect();

ServicePool<CalculatorService> pool = new ServicePoolBuilder<CalculatorService>()
            .withZooKeeperHostDiscovery(connection)
            .withServiceFactory(new CalculatorServiceFactory())
            .build();

OnlyBadHostsExceptions should include underlying cause exceptions

Exceptions thrown during a service pool execute method can result in that endpoint being marked unhealthy. When no more endpoints are available an OnlyBadHostsException (OBHE) is thrown. It would be useful for debugging to include the underlying exceptions in the OBHE so that the root cause of the failing services can be determined.

Integrate EmoDB Dropwizard helper classes. [2]

EmoDB has a handful of classes that make it easier to use Ostrich and Dropwizard together. They are useful to other projects that, right now, pull these classes from EmoDB.

See the code here:

  • ConfiguredFixedHostDiscoverySource and ConfiguredPayload make it easier to configure fixed end points in YAML config files. The interface is a bit awkward, though, because you must create a service-specific subclass (example).
  • Payload and PayloadBuilder remove some of the tedious work required to create and parse ServiceEndPoint payloads.
  • ManagedRegistration ties host discovery registration and unregistration to Dropwizard lifecycle events.
  • ResourceRegistry uses the ServiceName annotation and Jersey Path annotation to build ServiceEndPoint objects and register a resource with both Jersey and host discovery.

I don't expect you to take these classes as-is. Pick and choose and refactor as you see fit.

Support more advanced load balancing

We should probably support a load balancing strategy other than random. The most logical one would be something like least loaded. This could be a measure of number of local or global connections to a service, or something like the load on the remote server.

Investigate potential connection loss [3]

The data team had an issue earlier today where an instance seemingly lost connection to ZooKeeper. There wasn't a good way to diagnose this at runtime, so adding some metrics to Ostrich to show connection states and things that are happening with the connection may be useful.

Action:

  • Show how long the server has been connected
  • Show the number of host connect attempts
  • Show the current connection state

Support for non-JVM languages as service providers

We have to make sure that this is needed.

If we wanted to support non-JVM language service providers it could probably be done really easily without having to write separate code for each language. We could write a simple dropwizard service that receives a POST with service endpoint info inside of it. The service would take the endpoint info and create an ephemeral node in ZooKeeper on behalf of the caller. After creating the node it would NOT close the HTTP connection. Instead it would monitor the connection, and if the caller ever closes it, the service would delete the ephemeral node. So in this model having an open connection to a webserver is a proxy for a service being alive. If that connection closes then the server is assumed to not be alive anymore. When HTTP timeouts happen the client will have to reestablish the connection if it still wants to be available.

Given that pretty much all modern languages have the ability to make a HTTP POST request this should enable a service written in any language to be made available through Ostrich. Of course the service provider would have to write a client library in all languages that they have users in.

Remove Service marker interface

The com.bazaarvoice.soa.Service interface presents no real use and requires modification of the service itself (assuming the consumer is following the Dropwizard project-api, project-client, project-service project structure).

ServiceCallback should have a "without result" sibling

com.bazaarvoice.soa.ServiceCallback requires some return type from a service method invocation.

In the case of a "void" return type on a service method (like Databus.subscribe), it would be nice to not have to "return null".

databusServicePool.execute(new RetryNTimes(3, 100, TimeUnit.MILLISECONDS), new ServiceCallback<Databus, Object>() {
@OverRide
public Object call(Databus service) throws ServiceException {
service.subscribe(DATABUS_SUBSCRIPTION_NAME, 86400, 86400);
return null;
}
});

would be nice to have the following instead:

databusServicePool.execute(new RetryNTimes(3, 100, TimeUnit.MILLISECONDS), new ServiceCallbackWithoutResult() {
@OverRide
public void call(Databus service) throws ServiceException {
service.subscribe(DATABUS_SUBSCRIPTION_NAME, 86400, 86400);
}
});

Stability testing [13]

We should have a method to perform long running stability testing (hours if not days) for Ostrich. We need to make sure that we correctly handle all sorts of error conditions such as:

  • ZooKeeper nodes restarting (in a way that doesn't lose quorum)
  • Services being registered and unregistered
  • Services throwing exceptions and errors
  • Services being unhealthy for long periods of time

Refactor ServiceEndPoint

There are a few things wrong with ServiceEndpoint

  1. It shouldn't require specific machine and port information in the name. It should be refactored to have an opaque name for the service that it's associated with, and an opaque id representing the node that it was created for (likely something like hostname:port). This will give operations the ability to look inside of ZooKeeper and make sense of what services are registered (mentally decoding the opaque id into hostname and port). At the same time this also ensures that people who are writing services and clients will include all necessary information for connecting to the service in the payload on the registration. This insulates service authors from changes to the way we represent and name the nodes.
  2. It is currently representing too many concerns. An endpoint is an identifier for a specific instance of a service. Mapping it to JSON and determining when it was registered shouldn't be part of its concerns. These are the concerns of the service registry.
  3. It should be named ServiceEndPoint with a capital P. This is consistent naming with other projects out there.

ServicePool should expose an isHealthy() method.

Client applications that use a ServicePool should integrate into their own health check a verification that their dependencies are also healthy.

The implementation may be as simple as checking that there's at least one end point that's not marked bad. It would be better to actually ping through to at least one end point as part of computing isHealthy().

We should find a way to expose this for proxies that wrap a ServicePool, too. For example, assuming Dropwizard:

MyService service = ServicePool.create(MyService.class)...buildProxy(retryPolicy);
environment.addHealthCheck(new HealthCheck("my-service") {
    @Override
    protected Result check() {
        // TODO: it's nice to include a string w/the name of the live endpoint + timing info
        // like "localhost 493us"
        return ServiceProxies.isHealthy(service) ? Result.healthy() : Result.unhealthy();
    }
});
environment.addManaged(new Managed() {
    @Override
    public void start() {}
    @Override
    public void stop() {
        ServiceProxies.closeQuietly(service);
    }
});

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.