nrel / api-umbrella Goto Github PK

Open source API management platform

License: MIT License

Ruby 52.27% HTML 0.40% JavaScript 7.28% Makefile 0.11% Shell 3.07% Lua 28.78% Dockerfile 0.27% SCSS 0.54% Handlebars 2.72% CUE 0.83% PLpgSQL 3.68% Go 0.03% Rust 0.03%

api-management nginx openresty api-gateway api-manager luajit lua

api-umbrella's Introduction

API Umbrella

What Is API Umbrella?

API Umbrella is an open source API management platform for exposing web service APIs. The basic goal of API Umbrella is to make life easier for both API creators and API consumers. How?

Make life easier for API creators: Allow API creators to focus on building APIs.
- Standardize the boring stuff: APIs can assume the boring stuff (access control, rate limiting, analytics, etc.) is already taken care if the API is being accessed, so common functionality doesn't need to be implemented in the API code.
- Easy to add: API Umbrella acts as a layer above your APIs, so your API code doesn't need to be modified to take advantage of the features provided.
- Scalability: Make it easier to scale your APIs.
Make life easier for API consumers: Let API consumers easily explore and use your APIs.
- Unify disparate APIs: Present separate APIs as a cohesive offering to API consumers. APIs running on different servers or written in different programming languages can be exposed at a single endpoint for the API consumer.
- Standardize access: All your APIs are can be accessed using the same API key credentials.
- Standardize documentation: All your APIs are documented in a single place and in a similar fashion.

Download

Binary packages are available for download. Follow the quick setup instructions on the download page to begin running API Umbrella.

Getting Started

Once you have API Umbrella up and running, there are a variety of things you can do to start using the platform. For a quick tutorial, see getting started.

API Umbrella Development

Are you interested in working on the code behind API Umbrella? See our development setup guide to see how you can get a local development environment setup.

Who's using API Umbrella?

Are you using API Umbrella? Edit this file and let us know.

License

API Umbrella is open sourced under the MIT license.

api-umbrella's People

Contributors

Stargazers

Watchers

Forkers

project-open-data yoyossy chrismetcalf fcc jasonlttl sbusso johan-- towynlin ajayk dwcaraway nvembar danieltaborda gbinal caregivers neo-nie hasantayyar ecommerece kyyberi liqinwu rogeriofalcone brahyt romakaul rulio lisac rrobertorr darylrobbins lsethi123 tiagolira13 aterhzaz mafiu vowel001 thiagoneves alihalabyah lina1 0xnbk riverans josephwinston jaggerliu rezadehganpour leochencipher cmc333333 badou119 energisaone netfirms danielxiaowxx universsky fattydevelop digideskio tigerqiu712 ultimateprogramer ccird mindware folkevil joelvh gaybro8777 jmzfq mechabyte noobfoot thibautgery mackjoner trickjok shaliko rubythonode apinf e-llp lequi peak-dev chinaares brennanneoh shniu gdg maryum375 maniacs-ops jkoivuranta shetaksroc linlee001 jerome-labbe-sb rithison87 chaabni atorres757 echelon-tw minskybelieve dnsnets psoukup rakesh-mohanta miqui kinget007 chuaaron garrettcadams zhiqinghuang bmedici timothystewart6 ferreiramarcelo ineot duanshuaimin scottscarbs justin2061 srikraj8341 billthebest bradbann

api-umbrella's Issues

Local administrator authentication?

Is it possible to authenticate administrative users using the local database, in addition to Google, Github, and Persona?

Berks version error when running 'vagrant up'

When running the 'vagrant up' command, I get the following error:

==> default: The cookbook path '/root/.berkshelf/default/vagrant/berkshelf-20141020-8560-1ic1scs-default' doesn't exist. Ignoring...
Updating Vagrant's berkshelf: '/root/.berkshelf/default/vagrant/berkshelf-20141020-8560-1ic1scs-default'
RuntimeError: Couldn't determine Berks version: #<Buff::ShellOut::Response:0x00000002ec7640 @exitstatus=1, @stdout="", @stderr="/opt/chefdk/embedded/bin/ruby: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.14' not found (required by /opt/chefdk/embedded/lib/libruby.so.2.1)\n">

It may be related to the order of paths in the $HOME variable. How can I resolve this issue?

Add option to make HTTP caching ignore URL query string ordering

Add something like boltsort or querystring so that we can treat these two requests as the same for caching purposes:

/api/whatever?limit=10&start=50
/api/whatever?start=50&limit=10

This sort-indepdant behavior should not be the default, however (since the query string ordering can matter to some backends). It should be an option that has to be explicitly turned on for specific api backends.

Once setup, where do you set Persona or Google Keys so you can login to admin?

Hello, I have everything setup and running, however I do not know how to login as admin. My two options is google or persona. Where do I put the keys in the code to enable myself to login to admin?

Github auth callback URL?

I am trying to configure Github authentication strategy. On the Github OAuth Application Settings page, there is a field for Authorization callback URL. What value should I put here?

<example.com>/what/callback/url

DNS changes/TTL not respected by router

If an API backend is defined using domain names for the backend servers, the IP addresses for these backend servers are resolved only once when nginx first sees them. Subsequently, all requests to that API backend will use whatever IP address was first resolved. So let's say you add a backend server of "api.example.com" which resolves to 10.0.0.1 at the time. After some period of time, the DNS changes for "api.example.com" to 10.0.0.2. The problem is that because the nginx router first saw 10.0.0.1, it will continue to send all requests there.

This won't affect API backends that use static IPs, but this can lead to really unexpected results if an API backend does switch IPs (it'll either stop working or you might end up pointing your traffic at a different website altogether).

The problem is due to upstream usage with nginx, and is well established: https://www.ruby-forum.com/topic/4407628 The dynamic proxy_pass workaround won't entirely work here, since we're reliant on upstream blocks for providing load balancing (in the case multiple backend servers are defined).

One simple, but kludgy, workaround is to to automatically reload nginx on some semi-regular basis, so it can re-resolve domain names to new IPs. However, it would be nicer if there nginx acknowledged the true TTL of the DNS (so we could accommodate backends with super short TTLs without needlessly reloading nginx).

Another option would be to use the proxy_pass workaround when only 1 API backend server is defined, and only the upstream cluster if multiple servers are defined. It might not be perfect, but I suspect load balancing between multiple backend servers is much more likely when you're doing things locally and might not even be using DNS (or it's more in your control). Dynamic DNS entries seem more likely if we're proxying to a remote API that might already be doing it's own load balancing on it's end.

Finally, because I keep wondering about switching this router component back to HAProxy, it has the same issue: http://permalink.gmane.org/gmane.comp.web.haproxy/7393

The DNS issues are semi-related to #5, so it might be worth considering them together.

User and api key does not get logged when gatekeeper returns error

When the gatekeeper blocks a user for exceeding their rate limit, the request gets logged in our analytics system, but it's missing the API key and user information of the requester. This user information should be logged even when errors are returned by the gatekeeper. Here's an example of how the analytics looks when DEMO_KEY exceeded it's rate limits:

http://cl.ly/image/3W0x001c3w2l

In each of those requests DEMO_KEY was being passed in, but as soon as it exceeded its rate limits (503 response code in this case), that information went missing from the logs.

I believe the gatekeeper isn't logging its own analytics in this case because the logging is only being handled if the proxy finishes the request gracefully: https://github.com/NREL/api-umbrella-gatekeeper/blob/6c62a5c7a735356d65b523b9334ad06842f596b0/lib/gatekeeper/worker.js#L156-L157 So in the current case, the analytics system is falling back to the more limited nginx-only log data. I believe this means the problem is is also present for any other type of gatekeeper-level error responses (missing key, invalid key, unauthorized key, etc). We should ensure that all of those error responses get logged with the information available to the gatekeeper.

SSL config

Per @konklone, SSL Labs has a few security suggestions.

missing LSB tags and overrides

As a note, when installing api-umbrella on Debian 7, I get the following warning:

Unpacking api-umbrella (from api-umbrella_0.6.0-1_amd64.deb) ...
Setting up api-umbrella (0.6.0-1) ...
update-rc.d: using dependency based boot sequencing
insserv: warning: script 'api-umbrella' missing LSB tags and overrides

change url for documentation at the top of the page

It might make sense to have the 'Documentation' link at the top of the project's microsite go to

http://nrel.github.io/api-umbrella/docs/architecture/

instead of

http://nrel.github.io/api-umbrella/docs/

Unless I'm mistaken, there's no specific value to going to http://nrel.github.io/api-umbrella/docs first instead.

Split up URL pieces in analytics storage for better querying

Right now, the analytics system stores the full URL in an elasticsearch field:

{
  "request_url": "http://api.data.gov/nrel/alt-fuel-stations/v1.json?fuel_type=ELEC&state=CA&limit=2",
  "request_path": "/nrel/alt-fuel-stations/v1"
  ...
}

It would be beneficial to split this up more. Here's what I'm thinking:

{
  "request_url": "http://api.data.gov/nrel/alt-fuel-stations/v1.json?fuel_type=ELEC&state=CA&limit=2",
  "request_scheme": "http",
  "request_host": "api.data.gov",
  "request_path": "/nrel/alt-fuel-stations/v1.json",
  "request_path_hierarchy": "/nrel/alt-fuel-stations/v1"
  "request_query": {
    "fuel_type": "ELEC",
    "state": "CA",
    "limit": "2",
  },
  ...
}

This would make most querying easier and more efficient, since less wildcards would be needed, and less queries would likely need to start with a wildcard (since those are less efficient). Splitting the query string into a hash would also open up the door to more queries that are difficult or impossible now, since you could AND and OR specific query parameters together.

A couple of extra notes:

We don't necessarily need to store request_url anymore, but it might still be handy to have around for display purposes and as a more definitive view of the original URL.
What we currently store as request_path I'd recommend renaming to request_path_hierarchy. This separate field I think still has some use--it strips any file extensions and then uses elastic search's path hierarchy tokenizer (whereas I think the new request_path should remain a not_analyzed field). The idea is that then this field is easier to build certain types of hierarchal reports from

Oauth support?

Any plans for Oauth?

Not third party, but running your own oauth identity server and using it for authentication.

Add API page: make server 'field' required

I successfully managed to add an api, via admin/#/apis/new, without entering a server. This led to some confusion, and troubleshooting, with regards to 502 errors.

How difficult would it be to validate this page on submission and check that a valid server is entered?

Restrict access by IP

A feature that would sometimes come in handy is the ability to restrict access based on the requester's IP. This could apply to both API backends (restricting access to a certain API so that it's only accessible to certain IPs) as well as individual keys (so the key is only valid for access if it's coming from certain IPs). If this feature is added, it should allow CIDR IP ranges to be specified.

Custom rate limit settings are cached and not refreshed when rate limit changes are made

Assign a custom rate limit to a specific user/api key (in the web admin). Let's say 5 request per hour.
Make an API request as that API key.
Go back to the admin and update that user's custom rate limits. Adjust that same rate limit up to 100 request per hour.
Make more API requests as that API key. Instead of being rate limited at 100 requests, the user's first custom rate limit of 5 requests will still take hold.

The same basic problem also applies for custom rate limits being assigned to API backends. If you add a custom rate limit, publish those changes, access that API, then edit the custom rate limit and re-publish, the first custom rate limit you defined is still being applied.

The only way to currently rectify the problem is to completely restart the gatekeeper processes on the server.

This stems from the RateLimit middleware inside the gatekeeper and the fact that we're caching the TimeWindow objects for each unique rate limit ID in memory indefinitely (see fetchTimeWindow). We have the config reload event we could use to easily clear these cache objects in the event API backend rate limits are changed and published. However,that doesn't help us for per-user custom rate limit changes (since we have no equivalent event being fired for the gatekeeper to listen to). So we either need to not cache per-user rate limit objects at all, or revisit this whole rate limit config caching setup more broadly (I'm not sure how necessary caching these config objects really is).

API backend timeouts can lead to multiple request retries

There are request timeouts setup at the nginx and varnish reverse proxy layers (defaulting to 60 seconds, I think). So if a request doesn't start responding within 60 seconds, the request is aborted to the client. In the event that an API backend is super-slow to respond, I believe nginx is retrying the request, after it's timed out. This leads to duplicate requests to the API backend. This is probably not what we want in the event of timeouts.

I haven't entirely debugged this, so this needs a bit more investigation, but since I'm seeing mysterious duplicate requests for long-running failed requests, my theory is that nginx is triggering these based on the proxy_next_upstream setting. It should probably be set to omit timeout.

In the case where I've seen this, there's only one API backend server, but since there are multiple gatekeeper servers defined for load balancing, I believe that's what's triggering the retries. So it's probably important to check how the the retries and proxy error handling is affected by each proxy layer.

So to reproduce this, I think all that should be necessary is to introduce an API backend that takes longer than 60 seconds to respond. Then check to see that a single user request via API Umbrella leads to multiple API backend requests after it times out.

Having nginx consider a backend down after a timeout might be okay for some backends, but this should probably not be the default (since it can lead to an API backend getting overwhelmed if those slow requests are resource intensive, and you start making duplicate requests before one has even finished). And it definitely should not be enabled for the proxy that load balances against the gatekeeper processes, since we don't want to consider a single gatekeeper unavailable even if it happens to be serving up a slow API backend request.

bundle not found

following the steps I get to:

[vagrant@api api-umbrella-router]$ bundle install --path=vendor/bundle
-bash: bundle: command not found

I ran sudo gem install bundler.
now:
[vagrant@api api-umbrella-router]$ bundle install --path=vendor/bundle
Fetching gem metadata from https://rubygems.org/......
Fetching gem metadata from https://rubygems.org/..
Fetching http://github.com/NREL/api-umbrella-gatekeeper.git
sh: git: command not found
Git error: command git clone 'http://github.com/NREL/api-umbrella-gatekeeper.git' "/vagrant/workspace/api-umbrella-router/vendor/bundle/ruby/1.8/cache/bundler/git/api-umbrella-gatekeeper-b46e83667f716dc0193045bfe83040df6d78c9d6" --bare --no-hardlinks in directory /vagrant/workspace/api-umbrella-router has failed.

so i:
sudo yum install git

then:
[vagrant@api api-umbrella-router]$ cap vagrant deploy
-bash: cap: command not found

then i: sudo gem install capistrano

then:
[vagrant@api api-umbrella-router]$ cap vagrant deploy
/usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in gem_original_require': no such file to load -- capistrano_nrel_ext/recipes/defaults (LoadError) from /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:inrequire'

Easier query building interface for admin analytics

Currently in in the analytics admin, nearly all filtering and querying of analytics is performed via the advanced filters using Lucuene's query syntax. This gives a lot of flexibility in doing complex nested AND, OR, wildcard, range, etc type queries, but the syntax is less than obvious to people getting started. It would also help accommodate simpler queries in a more intuitive way (for example, not having to remember to escape backslashes in URL paths like request_path:\/nrel\/*).

I think this could be a fairly simple query building interface to start with: select a known field, select a matcher (begins with, ends with, contains, etc) and then enter your search value. Click a plus to add another query row, and all your inputs get ANDed together. As long as we keep around the advanced filters input for handling more advanced logic, I think this would be a simple way to making the analytics querying much easier to use 95% of the time.

It might also be worth looking into Kibana again (demo. When I first set about building this interface Kibana was only built for Logstash, but it's become a lot more generic and is now a client-side only JS app that could easily be dropped anywhere elasticsearch is running. I'm still not entirely sure this meets our needs since it's interface seems pretty advanced and I'm not sure it fits some of our other needs (like hiding certain sensitive fields or only making segments of the data available to certain users), but it's probably worth another look.

Move all public facing content to static site?

For api.data.gov, the public site is split between a static site and the web project. At this point, the only thing the web project is actually providing is the signup form and the contact form--everything else is coming from the static site. One of the main downsides of this split is a duplication in the stylesheet and themes between the two projects.

One solution might be to pull in the static site into the rails web app as a git submodule, so we could at least rely on most of the same resources (although the layout files would still be separate).

A better solution might be to move those last two pieces into the static site, and have them perform an ajax-ified signup process. Then at that point, the web project would only be providing public APIs for signing up or sending a contact message, as well as the web admin. Even if we end up introducing a more robust and dynamic user management section on the public site, (login, etc), I think that could still be tackled inside the static site if we build it out as an Ember (or equivalent) app.

More granular admin permissions

Currently the admin web tool allows admins to access and administer anything and everything. If you have an admin account, you have full access. We need a better permissions model for restricting access to the admin. The main use-case for this is api.data.gov where agency accounts should only be able to view and administer their own agencies data (so I'll use agencies as an example, but this concept should be more generic). The obvious places this extends to are:

Analytics: Agencies should only be able to view their own API's analytics.
API Backend: Agencies should only be able to edit their own API backends. Agencies should be able to add new API backends, but preferably in a way that wouldn't step on any other agencies toes (for example, NREL should not be able to overlap with GSA and define a new API backend with a path like api.data.gov/gsa/nrels-cool-service).
API Backend publishing: Right now publishing API backend changes should maybe be a separate feature that only super-admins have access to, since it has the potential to impact multiple APIs. Ideally, though, an agency could only publish their own backend changes, without pushing any other pending edits from other agencies live.
API key management: API keys are fundamentally meant to be shared across agencies, so this area will have to be shared to some degree. However, there should be restrictions around which roles can be assigned (so I can't assign a role that grants access to a private api belonging to another agency). Disabling API keys completely should be discussed, but I think all admins should still have the ability to disable any api key in the event it's causing abuse. How custom api key rate limits should be handled needs more discussion (the answer may be that we need a way to s

In short, I think we need a concept of admin groups (so multiple users from agencies can share access) and then we need to be able to specify which APIs and roles this group has ownership over. API ownership might involve tying the agency to specific API backends that get configured, or it may involve a broader mechanism for assigning domains and URL paths that the agency "owns." The latter needs a bit more thought to ensure it would work, but it may actually end up being simpler and give us more flexibility. It would also solve the issue of permissions around creating new API backends (since if I own "/nrel/*", then I can create all the new API backends I want under /nrel/whatever, but I would be forbidden from creating /gsa/something-else).

502 Bad Gateway - no resolver defined

I was able to get a basic query to work using the Getting Started guide, after some minor tweaking of the URL parameters.

When I add a second API and call its URL, I get a 502 Bad Gateway.

faild to start process: distributed-rate-limits-sync

On a fresh installation of api-umbrella, on Debian 7, I get the following error when running sudo /etc/init.d/api-umbrella start:

sudo /etc/init.d/api-umbrella start
Starting api-umbrella............... [FAIL]


Stopping api-umbrella...Failed to start processes:
  log-processor (STARTING - /opt/api-umbrella/var/log/supervisor/log-processor.log)

  See /opt/api-umbrella/var/log/supervisor/supervisord_forever.log for more details
 [  OK  ]
error: Forever detected script exited with code: 1
error: Script restart attempt #1
Starting api-umbrella............... [FAIL]

Failed to start processes:
  distributed-rate-limits-sync (STARTING - /opt/api-umbrella/var/log/supervisor/distributed-rate-limits-sync.log)

  See /opt/api-umbrella/var/log/supervisor/supervisord_forever.log for more details

Stopping api-umbrella... [  OK  ]
error: Forever detected script exited with code: 1

Download all results from an analytics query

In the admin analytics UI, you can perform queries and get paginated lists of results. However, it would be very useful in some cases to be able to easily download the full set of results as a CSV spreadsheet.

The main use case is in the Users section where it would be nice to be able to download a list of all your users without having to page through the results.

This same functionality would perhaps be useful in the Filter Logs section, but that's maybe not as important right now, and it's also possible that those downloads could become very big.

SOAP and REST integration (i.e. ESB)?

What are your thoughts on integrating/bridging SOAP and REST interfaces? E.g. some older APIs were designed when SOAP was more prevalent, and now may wish for REST interfaces. On what levels would a SOAP / REST bridge exist in the API tooling?

Add benchmark suite

It would be good to have a basic benchmark suite to test the performance of the overall stack. I'd like to start with some pretty basic performance tests like maybe:

Measuring requests per second handled at a few different concurrently levels.
Run the tests for long enough (maybe 1 minute+) to let some of the async tasks kick in (like log parsing/analytics processing). This isn't the best way to to measure the performance of those aspects, but it would at least sort of get factored into the overall performance by having them running in the background.
Gather some system metrics while things are being run to try to also ensure changes don't wildly change the system requirements (for example, requiring a lot more memory or eating a lot more cpu). Maybe use something like the jmeter perfmon plugin.

Ideally, benchmarks would be something that would get run on every commit on dedicated hardware so we could track performance of the stack over time and ensure unexpected speed regressions don't happen (something like PyPy's Speed Center). But that's obviously a lot more involved to get setup (dedicated hardware, etc), so I'd like to just start with a basic suite that we could use for comparing things either during development on your own machine.

Installation without Vagrant

I would like to test api-umbrella without Vagrant but cannot find clear installation procedure.
Can anybody point me in the right direction?
Tnx.

Remove "api.vagrant" host requirement from vagrant setup

When setting up a vagrant box for local development, we currently assume the host machine can resolve the api.vagrant hostname. This requires explicitly adding that hostname to your computer's /etc/hosts file. Right now this is needed for the HTTPS redirects when accessing the admin, but it would be best to remove this hostname requirement, so editing your global hosts file is not necessary for local development.

Remove documentation piece from admin in favor of github model

There's currently a very limited CMS for managing the documentation in the web admin. It basically provides a WYSIWYG HTML editor for managing the documentation pages and some limited ways to organize hierarchy of pages. At this point, the new static site model deployed on api.data.gov (https://github.com/GSA-OCSIT/api.data.gov) seems to be working well and offers a lot more flexibility. So at this point, I think it would be best to remove this abandon feature of the web admin and formalize using this type of static site. If a web admin is really needed in the future, I'd prefer to revisit it.

To make this happen, I think we need a generic static site project for API Umbrella that gets setup during the install process. It would also be nice to consolidate our public-facing site as part of this (see #10), so things are a little more logical.

OPTIONS request with no body results in 503 errors when Varnish is enabled

When api-umbrella-gatekeeper is routing the Varnish for caching (the default), an OPTIONS request without a body will result in a 503 server error being returned from Varnish. If Varnish is bypassed, then most servers seem fine with the request.

The culprit seems to be a combination of node-http-proxy and Varnish. node-http-proxy seems to add a Transfer-Encoding: chunked header onto the request as it passes through. But without a body to the request, Varnish doesn't like this. Varnish 3 dies with a 503 error, while Varnish 4 dies while slightly more telling 411 error ("length required - no chunk, no close, no size"). But if the HTTP body is set to anything (even an empty string), then Varnish has no problem with these chunked requests.

So on the one hand, Varnish seems to be pickier about this situation than other servers (nginx and haproxy don't seem to care if chunked requests are sent without an actual body), but on the other hand, node-http-proxy's behavior of adding the Transfer-Encoding: chunked header doesn't actually seem valid when there is no body.

The issue appears similar to this older issue with node-http-proxy and DELETE requests with no body: http-party/node-http-proxy#373 Similar to that fix, I think the most straight-forward fix could be to delete this Transfer-Encoding header if there is no body for OPTIONS requests: https://github.com/nodejitsu/node-http-proxy/blob/v0.10.4/lib/node-http-proxy/http-proxy.js#L284-L287

Getting started instructions: Errno::ENAMETOOLONG on /var/folders/...

Following the getting started instructions, getting the following: Errno::ENAMETOOLONG: File name too long - /var/folders/zr/knfdc8010c1d71b36z4thctr0000gn/T/d20140616-12458-9av98y/]u>]ă?2?}??

Let me know if you need additional information.

neil@air[~/xxx/api-umbrella]$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'nrel/CentOS-6.5-x86_64' could not be found. Attempting to find and install...
    default: Box Provider: virtualbox
    default: Box Version: >= 1.2.0, < 2.0.0
==> default: Loading metadata for box 'nrel/CentOS-6.5-x86_64'
    default: URL: https://vagrantcloud.com/nrel/CentOS-6.5-x86_64
==> default: Adding box 'nrel/CentOS-6.5-x86_64' (v1.2.0) for provider: virtualbox
    default: Downloading: https://vagrantcloud.com/nrel/CentOS-6.5-x86_64/version/4/provider/virtualbox.box
==> default: Box download is resuming from prior download progress
==> default: Successfully added box 'nrel/CentOS-6.5-x86_64' (v1.2.0) for 'virtualbox'!
==> default: Importing base box 'nrel/CentOS-6.5-x86_64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'nrel/CentOS-6.5-x86_64' is up to date...
==> default: Setting the name of the VM: api-umbrella_default_1402959525056_6631
Updating Vagrant's berkshelf: '/Users/neil/.berkshelf/default/vagrant/berkshelf-20140616-12458-1d4e1w5-default'
Resolving cookbook dependencies...
Fetching 'ack' from git://github.com/NREL-cookbooks/ack.git (at master)
Fetching 'acl' from git://github.com/NREL-cookbooks/acl.git (at master)
Fetching 'api-umbrella' from git://github.com/NREL-cookbooks/api-umbrella.git (at master)
Fetching 'bundler' from git://github.com/NREL-cookbooks/bundler.git (at master)
Fetching 'envbuilder' from git://github.com/theodi/chef-envbuilder.git (at master)
Fetching 'etc' from git://github.com/NREL-cookbooks/etc.git (at master)
Fetching 'geoip' from git://github.com/NREL-cookbooks/geoip.git (at master)
Fetching 'iptables' from git://github.com/NREL-cookbooks/iptables.git (at master)
Fetching 'mongodb' from git://github.com/NREL-cookbooks/mongodb.git (at master)
Fetching 'nginx' from git://github.com/NREL-cookbooks/nginx.git (at api-umbrella)
Fetching 'nodejs' from git://github.com/NREL-cookbooks/nodejs.git (at master)
Fetching 'pygments' from git://github.com/NREL-cookbooks/pygments.git (at master)
Fetching 'rbenv' from git://github.com/NREL-cookbooks/rbenv.git (at master)
Fetching 'redis' from git://github.com/NREL-cookbooks/redis.git (at master)
Fetching 'rubygems' from git://github.com/NREL-cookbooks/rubygems.git (at master)
Fetching 'shasum' from git://github.com/NREL-cookbooks/shasum.git (at master)
Fetching 'sudo' from git://github.com/NREL-cookbooks/sudo.git (at master)
Fetching 'supervisor' from git://github.com/NREL-cookbooks/supervisor.git (at master)
Fetching 'vagrant_extras' from git://github.com/NREL-cookbooks/vagrant_extras.git (at master)
Fetching 'varnish' from git://github.com/NREL-cookbooks/varnish.git (at master)
Fetching 'yum' from git://github.com/NREL-cookbooks/yum.git (at master)
Fetching cookbook index from http://api.berkshelf.com...
Using ack (0.2.1) from git://github.com/NREL-cookbooks/ack.git (at master)
Using api-umbrella (0.3.3) from git://github.com/NREL-cookbooks/api-umbrella.git (at master)
Using acl (0.1.0) from git://github.com/NREL-cookbooks/acl.git (at master)
Installing apt (2.3.10)
Installing ark (0.8.2)
Installing aws (2.2.0)
Installing bluepill (2.3.1)
Installing build-essential (1.4.4)
Using bundler (0.1.5) from git://github.com/NREL-cookbooks/bundler.git (at master)
Installing chef-client (3.2.2)
Installing chef_handler (1.1.6)
Installing cron (1.3.12)
Installing dmg (2.2.0)
Installing elasticsearch (0.3.8)
Using envbuilder (0.2.0) from git://github.com/theodi/chef-envbuilder.git (at master)
Using etc (0.0.2) from git://github.com/NREL-cookbooks/etc.git (at master)
Installing fail2ban (2.0.4)
Using geoip (0.1.0) from git://github.com/NREL-cookbooks/geoip.git (at master)
Installing git (2.8.4)
Using iptables (0.10.2) from git://github.com/NREL-cookbooks/iptables.git (at master)
Installing java (1.17.6)
Installing logrotate (1.4.0)
Installing man (0.7.0)
Using mongodb (0.14.5) from git://github.com/NREL-cookbooks/mongodb.git (at master)
Installing monit (0.7.1)
Installing nano (1.0.0)
Using nginx (2.0.9) from git://github.com/NREL-cookbooks/nginx.git (at api-umbrella)
Using nodejs (1.3.0) from git://github.com/NREL-cookbooks/nodejs.git (at master)
Installing ntp (1.5.4)
Installing ohai (1.1.12)
Installing omnibus_updater (0.2.8)
Installing openssh (1.3.4)
Installing perl (1.2.2)
Using pygments (0.0.2) from git://github.com/NREL-cookbooks/pygments.git (at master)
Installing python (1.4.4)
Using rbenv (0.7.3) from git://github.com/NREL-cookbooks/rbenv.git (at master)
Using redis (0.1.8) from git://github.com/NREL-cookbooks/redis.git (at master)
Installing rsyslog (1.12.2)
Installing ruby_build (0.8.0)
Using rubygems (0.1.3) from git://github.com/NREL-cookbooks/rubygems.git (at master)
Installing runit (1.4.6)
Installing screen (0.8.0)
Installing selinux (0.6.2)
Using shasum (0.0.2) from git://github.com/NREL-cookbooks/shasum.git (at master)
Using sudo (2.5.3) from git://github.com/NREL-cookbooks/sudo.git (at master)
Using supervisor (0.4.11) from git://github.com/NREL-cookbooks/supervisor.git (at master)
Using vagrant_extras (0.2.2) from git://github.com/NREL-cookbooks/vagrant_extras.git (at master)
Using varnish (0.9.11) from git://github.com/NREL-cookbooks/varnish.git (at master)
Installing vim (1.1.2)
Installing windows (1.30.2)
E, [2014-06-16T16:01:36.527664 #12458] ERROR -- : Actor crashed!
Errno::ENAMETOOLONG: File name too long - /var/folders/zr/knfdc8010c1d71b36z4thctr0000gn/T/d20140616-12458-9av98y/]u>]ă?2?}??#?YB??z???1?6!????@Z???1???:??3??{d?p?&C??N?=S????P????Z?|???v???٤h
                                ^<?|
                                    6U5???#?Z?c?ި???-4[h????X6?
0X???                                                         ?0T|ǈ?s???k???
    /Applications/Vagrant/embedded/lib/ruby/2.0.0/fileutils.rb:242:in `mkdir'
    /Applications/Vagrant/embedded/lib/ruby/2.0.0/fileutils.rb:242:in `fu_mkdir'
    /Applications/Vagrant/embedded/lib/ruby/2.0.0/fileutils.rb:219:in `block (2 levels) in mkdir_p'
    /Applications/Vagrant/embedded/lib/ruby/2.0.0/fileutils.rb:217:in `reverse_each'
    /Applications/Vagrant/embedded/lib/ruby/2.0.0/fileutils.rb:217:in `block in mkdir_p'
    /Applications/Vagrant/embedded/lib/ruby/2.0.0/fileutils.rb:203:in `each'
    /Applications/Vagrant/embedded/lib/ruby/2.0.0/fileutils.rb:203:in `mkdir_p'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:737:in `extract_entry'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:979:in `block (2 levels) in unpack'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:685:in `block in each'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:620:in `block in each_entry'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:611:in `loop'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:611:in `each_entry'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:593:in `each'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:685:in `each'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:977:in `block in unpack'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:661:in `open'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:970:in `unpack'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/community_rest.rb:15:in `unpack'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/community_rest.rb:101:in `download'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/downloader.rb:53:in `try_download'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/downloader.rb:33:in `block in download'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/downloader.rb:32:in `each'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/downloader.rb:32:in `download'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/installer.rb:101:in `install'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:26:in `public_send'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:26:in `dispatch'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:63:in `dispatch'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/cell.rb:60:in `block in invoke'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/cell.rb:71:in `block in task'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/actor.rb:362:in `block in task'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/tasks.rb:55:in `block in initialize'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'
    (celluloid):0:in `remote procedure call'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:92:in `value'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/proxies/sync_proxy.rb:33:in `method_missing'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/proxies/cell_proxy.rb:17:in `_send_'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/pool_manager.rb:41:in `_send_'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/pool_manager.rb:123:in `method_missing'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:26:in `public_send'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:26:in `dispatch'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:63:in `dispatch'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/cell.rb:60:in `block in invoke'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/cell.rb:71:in `block in task'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/actor.rb:362:in `block in task'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/tasks.rb:55:in `block in initialize'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'
Using yum (2.4.1) from git://github.com/NREL-cookbooks/yum.git (at master)
Installing xml (1.2.4)
Installing zsh (1.0.0)
E, [2014-06-16T16:02:37.543428 #12458] ERROR -- : Actor crashed!
Errno::EACCES: Permission denied - /var/folders/zr/knfdc8010c1d71b36z4thctr0000gn/T/d20140616-12458-r294l5/./
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:744:in `initialize'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:744:in `open'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:744:in `extract_entry'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:979:in `block (2 levels) in unpack'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:685:in `block in each'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:620:in `block in each_entry'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:611:in `loop'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:611:in `each_entry'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:593:in `each'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:685:in `each'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:977:in `block in unpack'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:661:in `open'
    /Users/neil/.vagrant.d/gems/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:970:in `unpack'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/community_rest.rb:15:in `unpack'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/community_rest.rb:101:in `download'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/downloader.rb:53:in `try_download'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/downloader.rb:33:in `block in download'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/downloader.rb:32:in `each'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/downloader.rb:32:in `download'
    /Users/neil/.vagrant.d/gems/gems/berkshelf-3.1.3/lib/berkshelf/installer.rb:101:in `install'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:26:in `public_send'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:26:in `dispatch'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:63:in `dispatch'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/cell.rb:60:in `block in invoke'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/cell.rb:71:in `block in task'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/actor.rb:362:in `block in task'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/tasks.rb:55:in `block in initialize'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'
    (celluloid):0:in `remote procedure call'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:92:in `value'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/proxies/sync_proxy.rb:33:in `method_missing'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/proxies/cell_proxy.rb:17:in `_send_'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/pool_manager.rb:41:in `_send_'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/pool_manager.rb:123:in `method_missing'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:26:in `public_send'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:26:in `dispatch'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/calls.rb:63:in `dispatch'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/cell.rb:60:in `block in invoke'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/cell.rb:71:in `block in task'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/actor.rb:362:in `block in task'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/tasks.rb:55:in `block in initialize'
    /Users/neil/.vagrant.d/gems/gems/celluloid-0.16.0.pre/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'

Saved analytics reports

In the analytics section, there should be a way to save off queries you have performed for viewing again at a later date. These reports should then be visible to your admin account for running the same queries again. This will help with performing the same kind of queries on a regular basis. An example might be that I want to only see the traffic for one of my specific APIs, excluding internal traffic, and maybe I only care about JSON requests. You can log in and filter down to that every month, but it would be much nicer (and less error prone) if there was an easy way to have a list of your available reports to perform.

We sort of have saved reports already - any query on the analytics page can be bookmarked or shared via the URL and you'll get back to the same view of the analytics. However, one issue with this is that the URL also contains a static date range, so if I was viewing the last 30 days on December 2 and bookmarked that URL, when I came back to that in January, I would still be seeing Nov 3 - Dec 2. This can also bite you if you're looking at analytics at midnight, since any queries you already performed will keep reflecting the previous date range after a refresh. This should be addressed as part of this issue, so the URLs are more intelligently based off the relative datepicker dates. For example, if I'm viewing "Last 30 days", that's what should be captured in the URL and that URL should always lead to the last 30 days of results, regardless of when I'm viewing the URL. The only time static dates should be present in the URL is if a custom range is explicitly picked from the date picker. Since I mainly use the custom range to get further back than a month, the date picker options should perhaps be extended to included other common relative date range (YTD, fiscal year, fiscal quarters, etc).

With the relative date issue fixed in the URLs, then bookmarks could essentially be used for saving reports. However, I think it would still be useful to have a way to save these inside the admin UI to formalize your saved reports and also share them with other admins. But I think this feature then becomes relatively straightforward since you're just assigning a name with a URL and saving it to the admin for display somewhere.

Contemplate switching HTTP caching layer from Varnish to Apache Traffic Server

We have been using Varnish 3 to provide an HTTP caching layer. However, there are a few reasons I'm pondering a switch to Apache Traffic Server to provide the HTTP caching layer. I'm not entirely convinced of making the switch, so I wanted to start an issue to track some of the potential pros/cons.

First a bit of background:

Varnish 3's doesn't seem to handle streaming well. This gzip+chunked responses+streaming issue has bit us. In general, it seems to be acknowledged that Varnish 3's streaming capabilities aren't perfect.
We want streaming, but with Varnish 3 in the stack, it doesn't seem like we can reliably provide streaming responses.
Varnish 4 streams everything by default, so these issues do seem to go away with Varnish 4.

So why think about Apache Traffic Server instead of just upgrading to Varnish 4? After doing more specific testing, there's been a few things that have come up:

The most concrete issue I discovered is that Varnish appears to be hardcoded to retry requests if the backend initially times out. See discussion 1 and discussion 2. I don't like this behavior for GET requests (if a server is slow to respond, I don't want really want to make things worse and retry the same slow query), but this seems particularly dangerous for POST requests. The only way to prevent this from happening in Varnish is to disable keepalive, which I don't want to do. Apache Traffic Server also behaves this way by default, but it's configuration allows it to be disabled.
The other reasons for considering Traffic Server are admittedly more nebulous and less well researched:
- Varnish 4 does fix our previously seen streaming issue, but I've still seen some strange behavior under Varnish 4 where occasionally chunked responses get returned as non-chunked by Varnish. It's extremely sporadic, so maybe it's just a weird timing issue (maybe the whole response gets read more quickly than it can be delivered, so it just skips attempting a chunked response and delivers it non-chunked). The response body is correct under Varnish 4, but I just find it a little concerning that Varnish is sporadically changing the behavior of the backend's response.
- Traffic Server seems to have more of a focus on being as transparent of a proxy as possible, whereas that's less of a priority for Varnish (at least that's my take after reading various docs and mailing list threads). There are logical reasons for Varnish's stance on not trying to be a compliant proxy, but since we have less control over our potential backend servers, it makes Traffic Server's stance a little more appealing.
- I've always had a finicky time with putting Varnish's shared memory log on a tmpfs partition, like suggested. This is probably more of an oversight on my part, but the tmpfs approach has always led to varnish restarts sporadically not working because it doesn't clear the tmpfs partition before trying to start the varnish server again (so it runs out of space). It's probably solvable, I've just been pleasantly surprised how easy Traffic Server is to get running without tuning like this. Removing the need to have a tmpfs partition should also make it easier to install our entire stack.
- Traffic Server benchmarks: I've seen a few different benchmark sets that seem to indicate traffic server performs better and is more efficient than Varnish. I'd definitely like to do our own benchmarking before making this a reason, but it seems promising.

So, reasons to stick with Varnish?

VCL: The flexibility in configuration VCL gives is quite amazing. It's allowed us to implement things like Surrogate-Control support just using the config file.
Surrogate-Control: See above. We'd have to figure out how to implement Surrogate-Control in Traffic Server without something like VCL. I think it might be possible with some creative header manipulation, but it's something we'd need to figure out before seriously considering Traffic Server. Worse case scenario is writing a C extension.
Plugins: Traffic Server also has plugins, but Varnish's plugin ecosystem seems much bigger. For example, I'd like to add something like boltsort (see #32). There are a couple different plugin options to do such things in Varnish, but none available for Traffic Server
Community: Varnish's community does seem bigger. Traffic Server has several large players, and seems to be growing, but Varnish seems to have more mindshare and users at this point.

So there are definitely aspects of Traffic Server that tempt me to switch, but Varnish is also widely used and a fantastic piece of software. While I mull this over, I'll continue to update this thread, but I at least wanted to get some documentation out there on why we might eventually pick one or the other.

Intermittent mongo connection drops

The gatekeeper queries Mongo to verify an API key is valid. This query is sometimes failing to occur. This was leading to a user being denied even if they supplied a valid key. The problem only cropped up rarely, so it wasn't widely seen, which explains why it's only been discovered now. I have an ugly workaround currently implemented that seems to address the problem, but this deserves more investigation, since the workaround is not ideal and not performant.

What's happening: each gatekeeper proxy process holds open a persistent connection to Mongo that gets re-used across all the requests served by that process. The issue crops up when that persistent connection is randomly terminated. There's then a brief period of time when the mongo client isn't aware that it has a terminated connection, so it's subsequent queries fail, until the connection reconnects.

This may be related to the hosting environment and network or firewall settings that lead to the disconnects: https://support.mongolab.com/entries/23009358-handling-dropped-connections-on-windows-azure However, my attempts at fixing it with keepalive settings have been unsuccessful. The network nature of the problem also probably explains why this has never cropped up in unit tests or other local environments where mongo is on the same machine as the gatekeeper.

The workaround I have in place (see NREL/api-umbrella-gatekeeper@ff9da2a and the couple subsequent commits) basically just keeps retrying the mongo query every 50 ms for up to 100 times. Some retry mechanism may be needed, but the amount of these retries that are currently necessary make no sense to me. In one environment where this is a problem, I can see it make up to 60 or 70 retries before finally succeeding. With the wait time in between each retry, this adds a somewhat significant amount of time to the request if a user happens to be the super-unlucky one to hit it when this connection drops.

An API for external API key management

There's a simple API that already exists in the web project to validate API keys and signup for new keys. This should be extended and formalized to allow for complete API key management externally. The administrative functions obviously need to be locked down. Things this API need so to be able to do:

API key/user creation
Updating api key rate limits
Assign API key roles
Disabling an api key

Ensure that lower-level rate limit errors are captured in the analytics system

If you bump into the lower-level IP based rate limits enforced by nginx, these over rate limit errors may not show up in the web-based analytics. We need to ensure that even when the request doesn't make it to the gatekeeper, we still log the requests in our analytics system so it shows up in the admin analytics.

See 18F/api.data.gov#77

Make installation process easier and quicker

The process for installing API Umbrella could use some improvement, to make it easier and quicker. The general plan is to offer self-contained RPMs and debs, so we have simple one-click installers for installing API Umbrella and all of it's dependencies.

Work has already started on this in the omnibus-api-umbrella. This uses the omnibus packaging system the opscode folks came up with to easily create self-contained package installers across different OSes. The package installer will install all the dependencies, and then there's some initial work on the config branch to introduce a single command line utility, api-umbrella-ctl that will run all the necessary dependencies on demand. The general idea is that once this work is done, running API Umbrella will be as simple as installing a package for your OS and then running something like api-umbrella-ctl start.

For a bit of background on why this has become an issue: the basic issue right now is that the installation process has become quite dependent on our custom Chef scripts. The main problems with this are that it requires users to use Chef (which obviously not everyone uses), they must use some of our custom Chef cookbooks (and how we need something like nginx configured may not jive with how someone else wants to install nginx on other systems Chef manages), and our cookbooks only target RHEL/CentOS 6. So while we can automate this for development purposes using our Vagrant setup, it's not ideal for people wanting to deploy this to different types of production environments. Plus, all of these changes should also make the Vagrant setup process go much faster, since everything will come bundled in the binary package installers.

Docker could also be a good candidate for simplifying this kind of thing, but that also places a dependency on Docker that I don't quite want to mandate (because again, not everyone uses it). However, we could certainly offer Docker images of API Umbrella in addition to the packages pretty easily once once we get all this sorted out.

vagrant provisioning fails on nginx build

I'm trying to provision a clean vagrant box. The provisioning seems to first fail when it's trying to build nginx from source.

adding module in /opt/rbenv/versions/1.9.3-p484/gems/gems/passenger-4.0.23/ext/nginx
*** The Phusion Passenger support files are not yet compiled. Compiling them for you... ***
*** Running 'rake nginx CACHING=false' in /opt/rbenv/versions/1.9.3-p484/gems/gems/passenger-4.0.23/ext/nginx... ***
STDERR: /usr/lib/ruby/site_ruby/1.8/rubygems.rb:779:in 'report_activate_error': Could not find RubyGem rake (>= 0) (Gem::LoadError)
        from /usr/lib/ruby/site_ruby/1.8/rubygems.rb:214:in 'activate'
        from /usr/lib/ruby/site_ruby/1.8/rubygems.rb:1082:in 'gem'
        from /opt/chef/embedded/bin/rake:22
---- End output of "bash"  "/tmp/chef-script20140112-3380-byguq0" ----
Ran "bash"  "/tmp/chef-script20140112-3380-byguq0" returned 1

Apparently nginx fails to be built, and the provisioning finally stops at:

[2014-01-12T18:14:24+00:00] INFO: template[nginx.conf] sending reload action to service[nginx] (delayed)                                                                                                
[2014-01-12T18:14:24+00:00] ERROR: Running exception handlers                                       
...
Chef::Exceptions::Service
-------------------------
service[nginx]: unable to locate the init.d script!

Admin analytics should hide test monitoring traffic by default

We generate a fair amount of API traffic from automated remote checks (nagios check_http requests and the like). There are ways to filter this out using the advanced query feature, but it would be helpful if this were excluded by default. Although, it would still be desirable to be able to view the stats with this traffic included too, so it shouldn't be filtered out entirely.

We need some way to define what we consider test traffic. This should be relatively easily configurable, since it may change over time (as new monitoring services get added, change, etc). One approach would be to simply assign a list of API keys that are considered test traffic and make sure all tests use those keys. A more complex approach might involve defining compound rules (for example, user agent contains "nagios" and the IP is X). In either case, but particularly the latter, it may be desirable to store a boolean field directly on the analytics records as to whether or not it was considered test traffic. This would optimize query speeds if this is going to be a common flag.

Installed mongo-10gen is newer than candidate package

On vagrant up I encounter a Chef error:

================================================================================
Error executing action `install` on resource 'package[mongo-10gen]'
================================================================================


Chef::Exceptions::Package
-------------------------
Installed package mongo-10gen-2.4.6-mongodb_1 is newer than candidate package mongo-10gen-2.4.5-mongodb_1


Resource Declaration:
---------------------
# In /tmp/vagrant-chef-1/chef-solo-1/cookbooks/mongodb/recipes/default.rb

 32:   package "mongo-10gen" do
 33:     action :install
 34:     version node[:mongodb][:package_version]
 35:   end
 36: end



Compiled Resource:
------------------
# Declared in /tmp/vagrant-chef-1/chef-solo-1/cookbooks/mongodb/recipes/default.rb:32:in `from_file'

package("mongo-10gen") do
  action [:install]
  retries 0
  retry_delay 2
  package_name "mongo-10gen"
  version "2.4.5-mongodb_1"
  cookbook_name :mongodb
  recipe_name "default"
end

Let me know if there's some other info you need or how I can help.

Vagrantfile error

$vagrant
There is a syntax error in the following Vagrantfile. The syntax error
message is reproduced below for convenience:

/home/httpd/api-umbrella/Vagrantfile:28: syntax error, unexpected ':', expecting kEND
config.vm.network :forwarded_port, guest: 80, host: 8080
^
/home/httpd/api-umbrella/Vagrantfile:32: syntax error, unexpected ':', expecting kEND
config.vm.network :private_network, ip: "10.10.10.2"
^

RAML Support?

What are your thoughts on API Umbrella as it relates to the RAML specification?

"My Profile" page

Story

As a Developer
I would like a profile page to list details such as my API key, etc.
So that I can easily find my user information

Example usage

I forget my API Key, so I

Log in to the API Umbrella portal
Click a link to access my profile
Copy the listed API key
Proceed with development

Dependencies

Authentication
User details database

Installation: Berkshelf dependency error

Hi... after fighting my way through vagrant/berkshelf installation issues I got as far as running vagrant up which resulted in the following.... suggestions welcome... I'd really like to get umbrella up and running to evaluate.

Running on Ubuntu 13.10 x64 / Vagrant 1.5.3 / Berkshelf plugin 2.0.1... the latter two were the only combination I could get to install cleanly.

Resolving cookbook dependencies...
Fetching 'ack' from git://github.com/NREL-cookbooks/ack.git (at master)
Fetching 'acl' from git://github.com/NREL-cookbooks/acl.git (at master)
Fetching 'api-umbrella' from git://github.com/NREL-cookbooks/api-umbrella.git (at master)
Fetching 'bundler' from git://github.com/NREL-cookbooks/bundler.git (at master)
Fetching 'envbuilder' from git://github.com/theodi/chef-envbuilder.git (at master)
Fetching 'etc' from git://github.com/NREL-cookbooks/etc.git (at master)
Fetching 'geoip' from git://github.com/NREL-cookbooks/geoip.git (at master)
Fetching 'iptables' from git://github.com/NREL-cookbooks/iptables.git (at master)
Fetching 'mongodb' from git://github.com/NREL-cookbooks/mongodb.git (at master)
Fetching 'nginx' from git://github.com/NREL-cookbooks/nginx.git (at api-umbrella)
Fetching 'nodejs' from git://github.com/NREL-cookbooks/nodejs.git (at master)
Fetching 'pygments' from git://github.com/NREL-cookbooks/pygments.git (at master)
Fetching 'rbenv' from git://github.com/NREL-cookbooks/rbenv.git (at master)
Fetching 'redis' from git://github.com/NREL-cookbooks/redis.git (at master)
Fetching 'rubygems' from git://github.com/NREL-cookbooks/rubygems.git (at master)
Fetching 'shasum' from git://github.com/NREL-cookbooks/shasum.git (at master)
Fetching 'sudo' from git://github.com/NREL-cookbooks/sudo.git (at master)
Fetching 'supervisor' from git://github.com/NREL-cookbooks/supervisor.git (at master)
Fetching 'vagrant_extras' from git://github.com/NREL-cookbooks/vagrant_extras.git (at master)
Fetching 'varnish' from git://github.com/NREL-cookbooks/varnish.git (at master)
Fetching 'yum' from git://github.com/NREL-cookbooks/yum.git (at master)
Fetching cookbook index from http://api.berkshelf.com...
Berkshelf::NoSolutionError: Unable to satisfy constraints on package nginx due to solution constraint (api-umbrella = 0.2.1). Solution constraints that may result in a constraint on nginx: [(api-umbrella = 0.2.1) -> (nginx ~> 2.0.1)], [(nginx = 2.0.8)]
Demand that cannot be met: (api-umbrella = 0.2.1)
Artifacts for which there are conflicting dependencies: nginx = 2.0.8 -> [(apt ~> 2.2), (bluepill ~> 2.3), (build-essential ~> 1.4), (ohai ~> 1.1), (runit ~> 1.2), (yum < 3.0.0), (logrotate >= 0.0.0), (rbenv >= 0.0.0)],sudo = 2.5.3 -> []Unable to find a solution for demands: ack (0.2.1), acl (0.1.0), api-umbrella (0.2.1), bundler (0.1.5), chef-client (~> 3.2.0), envbuilder (0.2.0), etc (0.0.2), fail2ban (~> 2.1.2), geoip (0.1.0), iptables (0.10.2), mongodb (0.14.5), nano (~> 1.0.0), nginx (2.0.8), nodejs (1.3.0), omnibus_updater (~> 0.2.8), openssh (~> 1.3.2), pygments (0.0.2), rbenv (0.7.3), redis (0.1.8), rubygems (0.1.3), screen (~> 0.8.0), shasum (0.0.2), sudo (2.5.3), supervisor (0.4.11), vagrant_extras (0.2.2), varnish (0.9.11), vim (~> 1.1.2), yum (2.4.1), zsh (~> 1.0.0)

GoogleOauth2: "Invalid credentials".

When configuring the Google settings for API Umbrella, I get the following error:

Could not authenticate you from GoogleOauth2 because "Invalid credentials".

The user email address is in the initial_superusers list, and the Google settings for client_id and client_secret seem correct.

On the Google configuration end of things, what should I put for:

Redirect URIs
Javascript Origins

I currently have:

Redirect URIs : <domain.com>/admins/auth/google_oauth2/callback
Javascript Origins: <domain.com>

Public API key usage page

It would be great if API key holders could easily monitor their usage.

Original request: FDA/openfda#30

Allow sorting of API backend configurations

In the admin tool, when setting up API backends, the ordering of the backends can matter for URL prefix matching. If one backend claims /something/ and another backend claims /something/else/, the latter backend must come first in the API backend ordering so it can be matched.

We do have sorting within an individual API backend's configuration, but we need something similar for the individual API backends. The database already supports a sort order field, and is ordering by this, so we really just need to expose a drag-and-drop interface for sorting these in the admin.

Add API page: inline documentation for all fields

As an administrator
I would appreciate inline documentation on all fields on the Add API page
So that I can make sense of it all (to a certain degree)

Invalid API backend domain names prevent router from being restarted

If an API backend is defined using an invalid domain name, nginx cannot reload or startup. There are a couple scenarios where this can crop up:

Scenario 1: If a new API backend gets defined using an invalid domain name altogether and then the configuration gets published. In this case, the new nginx configuration file gets written to disk with the invalid domain, but the nginx reload command won't take affect because the configuration file test first failed (with a "host not found in upstream" error).

There are a few problems here:

The admin was able to save the backend with an invalid domain entry. We should probably validate that these domains resolve inside the web admin when saving.
There's no indication that the nginx reload failed due to the configtest failing. Nginx will keep running since the reload was never actually performed, but the admin user doesn't know this unless they were looking at the server log files. This might be harder to fix, since we don't really have a 2-way communication mechanism between the web app and the gatekeeper when publishing changes. This would be nice long term (maybe using Zookeeper), but in the short term this might be more difficult, so the easier option is to try and ensure the reloads will never fail (which validating the domains up front would probably help with 99% of the time).
The configuration file with the invalid domain name was written to disk and still remains the active config file, despite the fact that the configtest action failed. This causes problems if the server gets rebooted or nginx restarts completely. In those cases, nginx will fail to start altogether, because it can't resolve the domain.

Scenario 2: An API backend is defined using a domain name that resolved at one point, but then ceases to resolve at a later date. The bigger issue here is related to the last item above. While nginx is still running, things will be okay (that specific API backend will be the only thing that fails, as expected). However, further reloads can't be performed, and if the server or nginx needs to restart completely, nginx will fail to start since it has a config file with a domain that can't be resolved.

This one's a bit harder to fix, but it can happen (during the government shutdown, for example, one API backend domain disappeared from DNS completely). One thought is to always resolve domains and use those IP addresses, but that leads to other issues when DNS items change. Another thought is to possibly use some type of local DNS cache or resolver on the server, that would at least ensure domains keep resolving to something, even if it's not valid (not sure how possible this is but I've used dnsmasq for local DNS caching in the past).

The DNS issues are semi-related to #4, so it might be worth considering them together.

offer api status dashboard for participating APIs

Perhaps using something like http://www.stashboard.org/ or https://www.statuspage.io/.

Note more discussion around this on the US Government API listserve.

Installation without Chef/Capistrano?

How do we install api-umbrella without the use of Chef/Capistrano?