Giter Site home page Giter Site logo

getsentry / freight Goto Github PK

View Code? Open in Web Editor NEW
610.0 64.0 45.0 2.63 MB

Freight is a service which aims to make application deployments better.

Home Page: https://freight.readthedocs.io

License: Apache License 2.0

Makefile 0.21% Shell 0.24% Python 75.20% Mako 0.11% JavaScript 17.53% HTML 0.07% Dockerfile 1.06% Less 4.23% Ruby 1.35%
tag-archived

freight's Introduction

Freight

This project is a work in progress and is not yet intended to be production ready.

This service is intended to augment your existing deployment processes. It should improve on what you may already have, or help you fill in what you're missing.

The overarching goal of the system is to provide easy manual and automated deploys, with a consistent central view of the world. It's heavily inspired by GitHub's processes (and its Heaven project) as well as personal experiences of internal tools from members of the Sentry team.

It's not designed to replace something like Heroku, or other PaaS services, but rather to work with your existing processes, no matter what they are.

Current Features

  • Works behind-firewall (no inbound traffic)
  • Multiple applications. All configuration is unique per application
  • Per-environment deployments (i.e. different versions on staging and production)
  • Workspace management (i.e. whatever your deploy command is may be generating local artifacts, those should be cleaned up)
  • Support for at least Fabric-based (simple shell commands) deploys
  • API-accessible deploy logs
  • Hubot integration (starting deploys)
  • Slack integration (notifying when deploys start/finish/fail)
  • Sentry integration (release tracking, error reporting)
  • Integration with GitHub status checks (i.e. did Circle CI pass on sha XXX)
  • A GUI to get an overview of deploy status and history

Roadmap

What's coming up:

V0

  • Release state management (know what versions are active where, and provide a historical view)
  • Environment locking (i.e. prevent people from deploying to an environment)
  • Automatic deploys (i.e. by looking for VCS changes)
  • Actions within the GUI (deploy, cancel)

V1

  • Deploy queue (i.e. cramer queued sha XXX, armin queued sha YYY)

V2 and Beyond

Machine-consistency service

We could run a service on each machine that would check-in with the master. This would record the current version of the application. The service would be configured with a set of apps (their environment info, how to get app version). The service could also be aware of "how do I deploy a version" which could assist in pull-based deploys.

Resources

freight's People

Contributors

asottile-sentry avatar beezz avatar billyvg avatar cameronmcefee avatar ckj avatar dcramer avatar dependabot[bot] avatar egsy avatar evanpurkhiser avatar evralston avatar jamesftw avatar jasonious avatar jkimbo avatar joshuarli avatar jtcunning avatar marksteve avatar mattgauntseo-sentry avatar mattrobenolt avatar maxbittker avatar mitsuhiko avatar nampnq avatar oioki avatar rahul-kumar-saini avatar robindaugherty avatar rshk avatar scttcper avatar thoas avatar tkuijer avatar tonyo avatar zylphrex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

freight's Issues

Heroku template does not launch

It fails every time with this output:

Detected 512 MB available memory, 512 MB limit per process (WEB_MEMORY)
Recommending WEB_CONCURRENCY=1
+ alembic upgrade head

500 error on heroku

When resolving auth the output of the python version breaks the app.

Invalid header value 'ds/0.0.0 (python 2.7.10 (default, May 27 2015, 20:38:41) \n[GCC 4.8.2])'

Deploys stalling

It's unclear what's causing this, but my theories are:

  1. Something in the underlying task worker is failing
  2. Something is causing a deadlock and completely hanging the execution

Can't introspect because Heroku. The one thing I do know is canceling the task correctly shows the message which means the execute_task handler is still functional and responsive, which means the Celery worker should be working fine.

The only real way to recover atm is to ps:scale worker=0 and then bring them back up.

Error: socket hang up

We're running freight + hubot, and a few times a day, Hubot reports the following error trying to speak to the freight web process:

Error: socket hang up

We are using a supervisor to keep freight going, and the run environment looks something like this:

export PATH=~/.virtualenvs/freight/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
export PYTHONPATH=~/.virtualenvs/freight/lib/python2.7/site-packages/
export FREIGHT_CONF=~/freight/freight.conf.py
~/.virtualenvs/freight/bin/python ~/freight/bin/web --addr :5000 --debug

Unfortunately, there is nothing in the logs that's different from when freight is working normally. So far, the only fix we've found is to just restart the freight webserver.

Do you have any thoughts on why this might be happening?

Null read_timeout

worker_1 | TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'
worker_1 | Traceback (most recent call last):
worker_1 |   File "/usr/local/lib/python2.7/site-packages/rq/worker.py", line 568, in perform_job
worker_1 |     rv = job.perform()
worker_1 |   File "/usr/local/lib/python2.7/site-packages/rq/job.py", line 495, in perform
worker_1 |     self._result = self.func(*self.args, **self.kwargs)
worker_1 |   File "/usr/src/app/freight/queue.py", line 58, in inner
worker_1 |     rv = func(*args, **kwargs)
worker_1 |   File "/usr/src/app/freight/jobs/execute_task.py", line 48, in execute_task
worker_1 |     taskrunner.wait()
worker_1 |   File "/usr/src/app/freight/jobs/execute_task.py", line 251, in wait
worker_1 |     elif self._logreporter.last_recv and self._logreporter.last_recv < time() - self.read_timeout:
worker_1 | TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'

Scheduled Deploys

Refs GH-3

We want to solve the following:

  • Deploy SHA when the checks pass
  • Automatically schedule deploys for changes to REF

There's a few considerations that we need to resolve:

  • When I schedule a deploy, can I still deploy other refs? The easiest solution is to not allow it, and to make a scheduled deploy act the same as a normal deploy, meaning it takes out a lock until it either deploys or the checks fail.
  • When I schedule a deploy of a ref (i.e. auto deploy master), do I always try to deploy latest, and wait for a green on latest? Or do I simply schedule a deploy of SHA, thus deploying every commit (and removing batching). Removing batching isn't ideal here, so this is a big question.

Mark deployment as bad/failed after the fact

While the deployment script may have succeeded, a deployment may have catastrophic consequences. Being able to mark a deployment as bad help communicate and prevent operators from rolling back into a deployment that is known to cause issues.

When marking the deployment as bad, it would be thus useful to also include some info like a link to a Sentry issue or a text comment explaining why the deployment is bad.

Break out status checks into first class state

Right now status checks happen simply as part of the build (thus reporting in the build log). Let's break these out into a state (i.e. "waiting on tests") which will let us improve the UX around this behavior.

Improve Notifications

With the latest change around queueing theres some awful UX.

hubot deploy getsentry
.... wait until something becomes available. ...
[ Deploy starting message ]

The best solution I can come up with is wait 5s and send a notification, whether its queued or started.

Record logs

These should be buffered. We need to determine if Postgres is efficient at string concatenation, and if so we can just do UPDATE log SET text = text + '...'; If not we'll store chunks similar to how things are implement in https://github.com/dropbox/changes

Basic React frontend

Drop in Sentry's webpack-based configuration so we can have the beginnings of a web frontend to the API.

Bad log formatting for failed deployment

I tried to make a deployment but it failed.
The red text is not easy to read since all lines are joined.

I suggest to either just place a link to the log or do better formatting. The former being less work.

Restart a cancelled/failed task

Currently, there is no way to restart a cancelled or a failed task.

We can implement this feature by using the same pattern as the task cancellation.

Opinions?

Support queued deploys

It's not clear on the best strategy for this.

Right now the implementation for v0 is designed to say "you cannot execute a task on an environment that already has a task executing". For v1 we want to allow you to enqueue tasks, specifically targeting the auto-deploy use case.

The case is a bit complicated, as generally it will be "I want to deploy the [latest sha in ref that is passing checks]". This generally means we'd need to enqueue a "I have a pending deploy once you're done", but that task would not yet have a sha associated with it, and it couldn't be as simple as "master" as that branch may have newer commits that have passed checks, but the latest commit may not.

Requesting more deployments does not start a new GCP build

I deployed a change and I tried to deploy it few times since the output showed to me an intermittent issue.
The right solution was to re-run the GCP build and then request a new Freight deployment.

IMHO I believe we should either encourage the user to re-run their build first OR make new Freight deployments trigger new GCP builds.

I checked sentry and getsentry commits to see anything in the CI failing, however, it was successful on both repos.

moha

soon as possible

Postgres integrity errors

Encountering the same issue as: #52 while running the current master.

>> Running ['git', 'fetch', '--all', '-p']
Fetching origin
Raven is not configured (logging is disabled). Please see the documentation for more information.
[2015-12-08 22:41:23,950: WARNING/Worker-2] Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/srv/freight/releases/609f306b8d413505c0ab5a68168d55f8390b524d/freight/freight/tasks/execute_task.py", line 118, in run
    self._run()
  File "/srv/freight/releases/609f306b8d413505c0ab5a68168d55f8390b524d/freight/freight/tasks/execute_task.py", line 143, in _run
    self.save_chunk(result[:newline_pos])
  File "/srv/freight/releases/609f306b8d413505c0ab5a68168d55f8390b524d/freight/freight/tasks/execute_task.py", line 110, in save_chunk
    db.session.commit()
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 150, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 801, in commit
    self.transaction.commit()
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 392, in commit
    self._prepare_impl()
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 372, in _prepare_impl
    self.session.flush()
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2015, in flush
    self._flush(objects)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2133, in _flush
    transaction.rollback(_capture_exception=True)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2097, in _flush
    flush_context.execute()
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
    rec.execute(self)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
    uow
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 174, in save_obj
    mapper, table, insert)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 785, in _emit_insert_statements
    execute(statement, params)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "unq_logchunk_source_offset"
DETAIL:  Key (task_id, "offset")=(8, 0) already exists.
 [SQL: 'INSERT INTO logchunk (task_id, "offset", size, text, date_created) VALUES (%(task_id)s, %(offset)s, %(size)s, %(text)s, %(date_created)s) RETURNING logchunk.id'] [parameters: {'date_created': datetime.datetime(2015, 12, 8, 22, 41, 23, 949021), 'text': u"Raven is not configured (logging is disabled). Please see the documentation for more information.\n>> Running ['git', 'fetch', '--all', '-p']\nFetching origin\nRaven is not configured (logging is disabled). Please see the documentation for more information.\n", 'offset': 0, 'task_id': 8, 'size': 255}]
[2015-12-08 22:43:33,367: WARNING/Worker-2] Raven is not configured (logging is disabled). Please see the documentation for more information.

Postgres integrity errors in LogReporter

I'm getting lots of errors from the LogReporter when it's trying to save a LogChunk due to unique constraints. Example:

IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "unq_logchunk_source_offset"
DETAIL:  Key (task_id, "offset")=(8, 0) already exists.
 [SQL: 'INSERT INTO logchunk (task_id, "offset", size, text, date_created) VALUES (%(task_id)s, %(offset)s, %(size)s, %(text)s, %(date_created)s) RETURNING logchunk.id'] [parameters: {'text': u"Raven is not configured (logging is disabled). Please see the documentation for more information.\n>> Running ['git', 'fetch', '--all', '-p']\nFetching origin\n", 'date_created': datetime.datetime(2015, 10, 27, 11, 1, 58, 679882), 'offset': 0, 'task_id': 8, 'size': 157}]

The offending code block seems to be

def save_chunk(self, text):
# we also want to pipe this to stdout
sys.stdout.write(text)
text = text.decode('utf-8', 'replace')
text_len = len(text)
db.session.add(LogChunk(
task_id=self.task_id,
text=text,
offset=self.cur_offset,
size=text_len,
))
# we commit immediately to ensure the API can stream logs
db.session.commit()
self.cur_offset += text_len

I'm not very familiar with threads but seems that self.cur_offset is not being incremented correctly which is leading to the key error. Can anyone with more context help?

Deploy Queue and Auto Deploy

The end goal is to have a way to track branches and automatically deploy whenever commits therein happen. However the same problem happens if two commits are scheduled by any mean. So to isolate the problem, we want the following:

  • queue a commit by enqueueing a commit sha
    • this can happen based on a hook
    • alternatively manually
  • when multiple commits are queued up, we skip irrelevant commits to the same target

The proposal is to have a deploy stack for each target (production, staging etc.).

  • When a new commit should be deployed it's added to the queue.
  • If it's a named target (like a branch) then the target is resolved immediately into a commit hash
  • it's pushed to the top of the stack for the target
  • Items can be removed from anywhere within the stack to revoke a pending deploy (stack with benefits)

Independently of this there is the deploy logic:

  • There is a system that monitors the stack for each target and always deploys top.
  • Once the deploy is done it looks at the stack and if it's not empty, it deploys the top again.
  • At any point any stack items older than the last deploy are ignored.
  • In an ideal situation this ends up in an empty stack.

Example:

DEPLOY A (stack was empty, deploy starts)
DEPLOY B
DEPLOY C
DEPLOY D
DEPLOY E
STACK state: B C D [E] (E is top, A was already removed immediately)

Once A finished deploying E is the top of the stack as it's newer than the deploy of A. When E is done, D C and B are removed from the stack as they are too far in the past timestamp wise. If E would have been revoked before deployed, D would have been deployed as expected.

Docker container fails to startup

+ alembic upgrade head
Traceback (most recent call last):
  File "/usr/local/bin/alembic", line 9, in <module>
    load_entry_point('alembic==0.7.7', 'console_scripts', 'alembic')()
  File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 439, in main
    CommandLine(prog=prog).main(argv=argv)
  File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 433, in main
    self.run_cmd(cfg, options)
  File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 416, in run_cmd
    **dict((k, getattr(options, k)) for k in kwarg)
  File "/usr/local/lib/python2.7/site-packages/alembic/command.py", line 165, in upgrade
    script.run_env()
  File "/usr/local/lib/python2.7/site-packages/alembic/script.py", line 390, in run_env
    util.load_python_file(self.dir, 'env.py')
  File "/usr/local/lib/python2.7/site-packages/alembic/util.py", line 244, in load_python_file
    module = load_module_py(module_id, path)
  File "/usr/local/lib/python2.7/site-packages/alembic/compat.py", line 79, in load_module_py
    mod = imp.load_source(module_id, path, fp)
  File "migrations/env.py", line 23, in <module>
    app = create_app()
  File "/usr/src/app/freight/config.py", line 149, in create_app
    configure_api(app)
  File "/usr/src/app/freight/config.py", line 160, in configure_api
    from freight.api.app_details import AppDetailsApiView
  File "/usr/src/app/freight/api/app_details.py", line 7, in <module>
    from freight.api.base import ApiView
  File "/usr/src/app/freight/api/base.py", line 11, in <module>
    from freight.utils.auth import get_current_user
  File "/usr/src/app/freight/utils/auth.py", line 7, in <module>
    from freight.testutils.fixtures import Fixtures
  File "/usr/src/app/freight/testutils/__init__.py", line 6, in <module>
    import_submodules(locals(), __name__, __path__)
  File "/usr/src/app/freight/utils/imports.py", line 13, in import_submodules
    module = loader.find_module(name).load_module(name)
  File "/usr/local/lib/python2.7/pkgutil.py", line 246, in load_module
    mod = imp.load_module(fullname, self.file, self.filename, self.etc)
  File "/usr/src/app/freight/testutils/cases.py", line 6, in <module>
    import pytest
ImportError: No module named pytest

Multiple ssh keys?

We want to use freight to deploy multiple repos and we'd also like to use github deploy keys to give freight access to the repositories since it seems like a perfect fit! However deploy keys can only be used to access a single repo so any thoughts about allowing freight to use multiple ssh keys?

Ability to preview commits to be deployed, and email authors that their stuff is going out

Given an incident in sentry I think it would be nice if hitting deploy would send an email to all commit's authors before executing, to give them a little bit of time to sprint to the computer and watch dashboards.

We have something like this in Sentry releases, but the way we set it up currently only emails after the deploy has gone out.

Parts of this (the emailing part) might make sense to actually implement in Sentry releases, not Freight.

Implement an artifact build/push task.

Freight currently has just a notion of "deploy".

Being able to build an artifact and push it to a repository will severely decrease the time on a lot of bespoke deployments.

Clean up old tasks/logs

Our storage is only additive with nothing ever being removed. There's little value in retaining logs and deploy statuses from more than a month ago. We should add a script like, bin/cleanup --days=30 to purge old data.

Or possibly by number of jobs, rather than number of days. To say, maintain a trailing record of the past 100.

Standardize task arguments

Task IDs:
app-name/env-name#task_id
i.e. getsentry/production#1

App versions:
app-name:version
i.e. getsentry:master or getsentry:a30331

This would make a lot of things more usable, and it's still very human readable. For example, in freight-cli:

freight tail getsentry/production#1

We could also default the environment like we do elsewhere:

freight tail getsentry#1

This should be clearly documented so its obvious that all features should be built with this in mind, especially the API (and it's responses).

Automatically select application on build pages

It'd be nice if, when you're on a build page, if you click deploy again, we default to selecting the current app. I've wanted this a couple times when I deploy something, then find I made a typo and need to deploy a second time.

Deploy spawning duplicate concurrent jobs

During my investigation of #54 I found out that for each deploy triggered, multiple deploys jobs were being spawned and processed by celery.

This issue seems to have started for me with cc2bea0, the previous commit of 39f290f is working fine.

Duplicate jobs cc2bea0:
spawn_multiple

Working as desired 39f290f:
spawn_single

I am not too familiar with celery but happy to provide any more troubleshooting information if required.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.