getsentry / freight Goto Github PK

View Code? Open in Web Editor NEW

610.0 64.0 45.0 2.63 MB

Freight is a service which aims to make application deployments better.

Home Page: https://freight.readthedocs.io

License: Apache License 2.0

Makefile 0.21% Shell 0.24% Python 75.20% Mako 0.11% JavaScript 17.53% HTML 0.07% Dockerfile 1.06% Less 4.23% Ruby 1.35%

tag-archived

freight's Introduction

Freight

This project is a work in progress and is not yet intended to be production ready.

This service is intended to augment your existing deployment processes. It should improve on what you may already have, or help you fill in what you're missing.

The overarching goal of the system is to provide easy manual and automated deploys, with a consistent central view of the world. It's heavily inspired by GitHub's processes (and its Heaven project) as well as personal experiences of internal tools from members of the Sentry team.

It's not designed to replace something like Heroku, or other PaaS services, but rather to work with your existing processes, no matter what they are.

Current Features

Works behind-firewall (no inbound traffic)
Multiple applications. All configuration is unique per application
Per-environment deployments (i.e. different versions on staging and production)
Workspace management (i.e. whatever your deploy command is may be generating local artifacts, those should be cleaned up)
Support for at least Fabric-based (simple shell commands) deploys
API-accessible deploy logs
Hubot integration (starting deploys)
Slack integration (notifying when deploys start/finish/fail)
Sentry integration (release tracking, error reporting)
Integration with GitHub status checks (i.e. did Circle CI pass on sha XXX)
A GUI to get an overview of deploy status and history

Roadmap

What's coming up:

V0

Release state management (know what versions are active where, and provide a historical view)
Environment locking (i.e. prevent people from deploying to an environment)
Automatic deploys (i.e. by looking for VCS changes)
Actions within the GUI (deploy, cancel)

V1

Deploy queue (i.e. cramer queued sha XXX, armin queued sha YYY)

V2 and Beyond

Machine-consistency service

We could run a service on each machine that would check-in with the master. This would record the current version of the application. The service would be configured with a set of apps (their environment info, how to get app version). The service could also be aware of "how do I deploy a version" which could assist in pull-based deploys.

Resources

freight's People

Contributors

Stargazers

Watchers

freight's Issues

Heroku template does not launch

It fails every time with this output:

Detected 512 MB available memory, 512 MB limit per process (WEB_MEMORY)
Recommending WEB_CONCURRENCY=1
+ alembic upgrade head

Add a mechanism to lock deploys and require confirmation to deploy during the event.

Give users the ability to apply a lock on an application so that CI or other users cannot initiate a deploy.

500 error on heroku

When resolving auth the output of the python version breaks the app.

Invalid header value 'ds/0.0.0 (python 2.7.10 (default, May 27 2015, 20:38:41) \n[GCC 4.8.2])'

Deploys stalling

It's unclear what's causing this, but my theories are:

Something in the underlying task worker is failing
Something is causing a deadlock and completely hanging the execution

Can't introspect because Heroku. The one thing I do know is canceling the task correctly shows the message which means the execute_task handler is still functional and responsive, which means the Celery worker should be working fine.

The only real way to recover atm is to ps:scale worker=0 and then bring them back up.

Error: socket hang up

We're running freight + hubot, and a few times a day, Hubot reports the following error trying to speak to the freight web process:

Error: socket hang up

We are using a supervisor to keep freight going, and the run environment looks something like this:

export PATH=~/.virtualenvs/freight/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
export PYTHONPATH=~/.virtualenvs/freight/lib/python2.7/site-packages/
export FREIGHT_CONF=~/freight/freight.conf.py
~/.virtualenvs/freight/bin/python ~/freight/bin/web --addr :5000 --debug

Unfortunately, there is nothing in the logs that's different from when freight is working normally. So far, the only fix we've found is to just restart the freight webserver.

Do you have any thoughts on why this might be happening?

Null read_timeout

worker_1 | TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'
worker_1 | Traceback (most recent call last):
worker_1 |   File "/usr/local/lib/python2.7/site-packages/rq/worker.py", line 568, in perform_job
worker_1 |     rv = job.perform()
worker_1 |   File "/usr/local/lib/python2.7/site-packages/rq/job.py", line 495, in perform
worker_1 |     self._result = self.func(*self.args, **self.kwargs)
worker_1 |   File "/usr/src/app/freight/queue.py", line 58, in inner
worker_1 |     rv = func(*args, **kwargs)
worker_1 |   File "/usr/src/app/freight/jobs/execute_task.py", line 48, in execute_task
worker_1 |     taskrunner.wait()
worker_1 |   File "/usr/src/app/freight/jobs/execute_task.py", line 251, in wait
worker_1 |     elif self._logreporter.last_recv and self._logreporter.last_recv < time() - self.read_timeout:
worker_1 | TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'

Deploying a non-existent git ref 500s

JS console:

223 POST http://localhost:5000/api/0/deploys/ 500 (INTERNAL SERVER ERROR)

https://sentry.io/sentry/test/issues/251875413/

Scheduled Deploys

Refs GH-3

We want to solve the following:

Deploy SHA when the checks pass
Automatically schedule deploys for changes to REF

There's a few considerations that we need to resolve:

When I schedule a deploy, can I still deploy other refs? The easiest solution is to not allow it, and to make a scheduled deploy act the same as a normal deploy, meaning it takes out a lock until it either deploys or the checks fail.
When I schedule a deploy of a ref (i.e. auto deploy master), do I always try to deploy latest, and wait for a green on latest? Or do I simply schedule a deploy of SHA, thus deploying every commit (and removing batching). Removing batching isn't ideal here, so this is a big question.

Mark deployment as bad/failed after the fact

While the deployment script may have succeeded, a deployment may have catastrophic consequences. Being able to mark a deployment as bad help communicate and prevent operators from rolling back into a deployment that is known to cause issues.

When marking the deployment as bad, it would be thus useful to also include some info like a link to a Sentry issue or a text comment explaining why the deployment is bad.

broadcasts snap scroll location to bottom

Break out status checks into first class state

Right now status checks happen simply as part of the build (thus reporting in the build log). Let's break these out into a state (i.e. "waiting on tests") which will let us improve the UX around this behavior.

Improve Notifications

With the latest change around queueing theres some awful UX.

hubot deploy getsentry
.... wait until something becomes available. ...
[ Deploy starting message ]

The best solution I can come up with is wait 5s and send a notification, whether its queued or started.

timestamps have a hover color but aren't clickable

this is a little confusing, should probably just not have them change color on hover

styles.css needs a hash like vendor.js and app.js

i looked at it for a few minutes but how do you even webpack

Record logs

These should be buffered. We need to determine if Postgres is efficient at string concatenation, and if so we can just do UPDATE log SET text = text + '...'; If not we'll store chunks similar to how things are implement in https://github.com/dropbox/changes

Basic React frontend

Drop in Sentry's webpack-based configuration so we can have the beginnings of a web frontend to the API.

Bad log formatting for failed deployment

I tried to make a deployment but it failed.
The red text is not easy to read since all lines are joined.

I suggest to either just place a link to the log or do better formatting. The former being less work.

Restart a cancelled/failed task

Currently, there is no way to restart a cancelled or a failed task.

We can implement this feature by using the same pattern as the task cancellation.

Opinions?

Support queued deploys

It's not clear on the best strategy for this.

Right now the implementation for v0 is designed to say "you cannot execute a task on an environment that already has a task executing". For v1 we want to allow you to enqueue tasks, specifically targeting the auto-deploy use case.

The case is a bit complicated, as generally it will be "I want to deploy the [latest sha in ref that is passing checks]". This generally means we'd need to enqueue a "I have a pending deploy once you're done", but that task would not yet have a sha associated with it, and it couldn't be as simple as "master" as that branch may have newer commits that have passed checks, but the latest commit may not.

Requesting more deployments does not start a new GCP build

I deployed a change and I tried to deploy it few times since the output showed to me an intermittent issue.
The right solution was to re-run the GCP build and then request a new Freight deployment.

IMHO I believe we should either encourage the user to re-run their build first OR make new Freight deployments trigger new GCP builds.

I checked sentry and getsentry commits to see anything in the CI failing, however, it was successful on both repos.

Implement a --dev flag for bin/{web,worker}.

Need a way to surpass oauth and some other random bits will probably be helpful.

Add `tini` to docker container to prevent zombie processes

moha

soon as possible

Cancel debounced notifications on final send

The TASK_FINISHED notification should cancel any other pending events.

Postgres integrity errors

Encountering the same issue as: #52 while running the current master.

>> Running ['git', 'fetch', '--all', '-p']
Fetching origin
Raven is not configured (logging is disabled). Please see the documentation for more information.
[2015-12-08 22:41:23,950: WARNING/Worker-2] Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/srv/freight/releases/609f306b8d413505c0ab5a68168d55f8390b524d/freight/freight/tasks/execute_task.py", line 118, in run
    self._run()
  File "/srv/freight/releases/609f306b8d413505c0ab5a68168d55f8390b524d/freight/freight/tasks/execute_task.py", line 143, in _run
    self.save_chunk(result[:newline_pos])
  File "/srv/freight/releases/609f306b8d413505c0ab5a68168d55f8390b524d/freight/freight/tasks/execute_task.py", line 110, in save_chunk
    db.session.commit()
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 150, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 801, in commit
    self.transaction.commit()
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 392, in commit
    self._prepare_impl()
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 372, in _prepare_impl
    self.session.flush()
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2015, in flush
    self._flush(objects)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2133, in _flush
    transaction.rollback(_capture_exception=True)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2097, in _flush
    flush_context.execute()
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
    rec.execute(self)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
    uow
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 174, in save_obj
    mapper, table, insert)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 785, in _emit_insert_statements
    execute(statement, params)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/srv/freight/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "unq_logchunk_source_offset"
DETAIL:  Key (task_id, "offset")=(8, 0) already exists.
 [SQL: 'INSERT INTO logchunk (task_id, "offset", size, text, date_created) VALUES (%(task_id)s, %(offset)s, %(size)s, %(text)s, %(date_created)s) RETURNING logchunk.id'] [parameters: {'date_created': datetime.datetime(2015, 12, 8, 22, 41, 23, 949021), 'text': u"Raven is not configured (logging is disabled). Please see the documentation for more information.\n>> Running ['git', 'fetch', '--all', '-p']\nFetching origin\nRaven is not configured (logging is disabled). Please see the documentation for more information.\n", 'offset': 0, 'task_id': 8, 'size': 255}]
[2015-12-08 22:43:33,367: WARNING/Worker-2] Raven is not configured (logging is disabled). Please see the documentation for more information.

Postgres integrity errors in LogReporter

I'm getting lots of errors from the LogReporter when it's trying to save a LogChunk due to unique constraints. Example:

IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "unq_logchunk_source_offset"
DETAIL:  Key (task_id, "offset")=(8, 0) already exists.
 [SQL: 'INSERT INTO logchunk (task_id, "offset", size, text, date_created) VALUES (%(task_id)s, %(offset)s, %(size)s, %(text)s, %(date_created)s) RETURNING logchunk.id'] [parameters: {'text': u"Raven is not configured (logging is disabled). Please see the documentation for more information.\n>> Running ['git', 'fetch', '--all', '-p']\nFetching origin\n", 'date_created': datetime.datetime(2015, 10, 27, 11, 1, 58, 679882), 'offset': 0, 'task_id': 8, 'size': 157}]

The offending code block seems to be

freight/freight/tasks/execute_task.py

Lines 93 to 108 in ec7edc5

    
           def save_chunk(self, text): 
        
               # we also want to pipe this to stdout 
        
               sys.stdout.write(text) 
        
               text = text.decode('utf-8', 'replace') 
        
               text_len = len(text) 
        
               db.session.add(LogChunk( 
        
                   task_id=self.task_id, 
        
                   text=text, 
        
                   offset=self.cur_offset, 
        
                   size=text_len, 
        
               )) 
        
               # we commit immediately to ensure the API can stream logs 
        
               db.session.commit() 
        
               self.cur_offset += text_len

I'm not very familiar with threads but seems that self.cur_offset is not being incremented correctly which is leading to the key error. Can anyone with more context help?

Deploy Queue and Auto Deploy

The end goal is to have a way to track branches and automatically deploy whenever commits therein happen. However the same problem happens if two commits are scheduled by any mean. So to isolate the problem, we want the following:

queue a commit by enqueueing a commit sha
- this can happen based on a hook
- alternatively manually
when multiple commits are queued up, we skip irrelevant commits to the same target

The proposal is to have a deploy stack for each target (production, staging etc.).

When a new commit should be deployed it's added to the queue.
If it's a named target (like a branch) then the target is resolved immediately into a commit hash
it's pushed to the top of the stack for the target
Items can be removed from anywhere within the stack to revoke a pending deploy (stack with benefits)

Independently of this there is the deploy logic:

There is a system that monitors the stack for each target and always deploys top.
Once the deploy is done it looks at the stack and if it's not empty, it deploys the top again.
At any point any stack items older than the last deploy are ignored.
In an ideal situation this ends up in an empty stack.

Example:

DEPLOY A (stack was empty, deploy starts)
DEPLOY B
DEPLOY C
DEPLOY D
DEPLOY E
STACK state: B C D [E] (E is top, A was already removed immediately)

Once A finished deploying E is the top of the stack as it's newer than the deploy of A. When E is done, D C and B are removed from the stack as they are too far in the past timestamp wise. If E would have been revoked before deployed, D would have been deployed as expected.

favicon.png not included in dist/

Tried to do a quick search but I just don't get webpack :P

Docker container fails to startup

+ alembic upgrade head
Traceback (most recent call last):
  File "/usr/local/bin/alembic", line 9, in <module>
    load_entry_point('alembic==0.7.7', 'console_scripts', 'alembic')()
  File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 439, in main
    CommandLine(prog=prog).main(argv=argv)
  File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 433, in main
    self.run_cmd(cfg, options)
  File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 416, in run_cmd
    **dict((k, getattr(options, k)) for k in kwarg)
  File "/usr/local/lib/python2.7/site-packages/alembic/command.py", line 165, in upgrade
    script.run_env()
  File "/usr/local/lib/python2.7/site-packages/alembic/script.py", line 390, in run_env
    util.load_python_file(self.dir, 'env.py')
  File "/usr/local/lib/python2.7/site-packages/alembic/util.py", line 244, in load_python_file
    module = load_module_py(module_id, path)
  File "/usr/local/lib/python2.7/site-packages/alembic/compat.py", line 79, in load_module_py
    mod = imp.load_source(module_id, path, fp)
  File "migrations/env.py", line 23, in <module>
    app = create_app()
  File "/usr/src/app/freight/config.py", line 149, in create_app
    configure_api(app)
  File "/usr/src/app/freight/config.py", line 160, in configure_api
    from freight.api.app_details import AppDetailsApiView
  File "/usr/src/app/freight/api/app_details.py", line 7, in <module>
    from freight.api.base import ApiView
  File "/usr/src/app/freight/api/base.py", line 11, in <module>
    from freight.utils.auth import get_current_user
  File "/usr/src/app/freight/utils/auth.py", line 7, in <module>
    from freight.testutils.fixtures import Fixtures
  File "/usr/src/app/freight/testutils/__init__.py", line 6, in <module>
    import_submodules(locals(), __name__, __path__)
  File "/usr/src/app/freight/utils/imports.py", line 13, in import_submodules
    module = loader.find_module(name).load_module(name)
  File "/usr/local/lib/python2.7/pkgutil.py", line 246, in load_module
    mod = imp.load_module(fullname, self.file, self.filename, self.etc)
  File "/usr/src/app/freight/testutils/cases.py", line 6, in <module>
    import pytest
ImportError: No module named pytest

Use the browser Notifications API to notify of build status changes

https://developer.mozilla.org/en-US/docs/Web/API/notification

feature: tags for builds on docker hub

Currently every build gets the latest tag. We should probably have something to allow users to pin to a specific build.

Queued jobs should show below active jobs

https://www.dropbox.com/s/r1avitxc1cbd6sq/Screenshot%202015-08-05%2019.16.19.png?dl=0

Support for other database than PostgreSQL

Freight seems to use SQLAlchemy which supports other DB (MySQL, SQLite..).
So it’s a bit unclear how dependent to PostgreSQL Freight is..

Cancelling a non-running task does not notify

If a task isn't in the execute_task phase it won't log/notify of cancellation

Make freight run as underprivileged user

Add gosu and drop down to a freight user.

Multiple ssh keys?

We want to use freight to deploy multiple repos and we'd also like to use github deploy keys to give freight access to the repositories since it seems like a perfect fit! However deploy keys can only be used to access a single repo so any thoughts about allowing freight to use multiple ssh keys?

Ability to preview commits to be deployed, and email authors that their stuff is going out

Given an incident in sentry I think it would be nice if hitting deploy would send an email to all commit's authors before executing, to give them a little bit of time to sprint to the computer and watch dashboards.

We have something like this in Sentry releases, but the way we set it up currently only emails after the deploy has gone out.

Parts of this (the emailing part) might make sense to actually implement in Sentry releases, not Freight.

Add authentication to API

Simple token auth

i.e. Bearer: <config['API_KEY']>

Failure notification is sent before "started" one when task fails early

Not a huge issue, but it might get confusing:

I guess it's caused by https://github.com/getsentry/freight/blob/master/freight/notifiers/utils.py#L55-L62 (why not something like: add to queue anyways -> if event == FINISHED then send all from queue, btw?)

Link in Quickstart to Celeryproject doesn't work

Link to the CeleryProject site in the quickstart requirements section doesn't work due to a double http:// part.
http://freight.readthedocs.org/en/latest/quickstart.html#requirements

Implement an artifact build/push task.

Freight currently has just a notion of "deploy".

Being able to build an artifact and push it to a repository will severely decrease the time on a lot of bespoke deployments.

Change favicon to indicate busy, finished build states

Can be done dynamically via JavaScript.

Cancelled tasks remain in "Created" state if not started before cancellation

Might be related to #35, but when we cancel tasks that have not started yet they stay in "Created" state forever instead of moving to the "cancelled" state that runs move to if they are canceled once the build process starts.

Deploy to Heroku button fails

It appears that the Heroku button in the docs doesn't work because of an invalid link. It also appears that with the correct template link, Heroku fails to launch the app because the Heroku build defaults to Python3 and freight is Python2. So this import of urlparse fails.

Clean up old tasks/logs

Our storage is only additive with nothing ever being removed. There's little value in retaining logs and deploy statuses from more than a month ago. We should add a script like, bin/cleanup --days=30 to purge old data.

Or possibly by number of jobs, rather than number of days. To say, maintain a trailing record of the past 100.

timestamps selected when copying text

it's nicer to be able to paste output from freight without the timestamps, which are very noisy in plaintext.

Standardize task arguments

Task IDs:
app-name/env-name#task_id
i.e. getsentry/production#1

App versions:
app-name:version
i.e. getsentry:master or getsentry:a30331

This would make a lot of things more usable, and it's still very human readable. For example, in freight-cli:

freight tail getsentry/production#1

We could also default the environment like we do elsewhere:

freight tail getsentry#1

This should be clearly documented so its obvious that all features should be built with this in mind, especially the API (and it's responses).

feature: put environment in maintenance mode to temporarily reject deploys

user story:

as an ops engineer, sometimes (an engineer) is deploying a potentially sensitive change in master to staging, and I need to make sure that no errant deploys go out unless they're his

Automatically select application on build pages

It'd be nice if, when you're on a build page, if you click deploy again, we default to selecting the current app. I've wanted this a couple times when I deploy something, then find I made a typo and need to deploy a second time.

Deploy spawning duplicate concurrent jobs

During my investigation of #54 I found out that for each deploy triggered, multiple deploys jobs were being spawned and processed by celery.

This issue seems to have started for me with cc2bea0, the previous commit of 39f290f is working fine.

Duplicate jobs cc2bea0:

Working as desired 39f290f:

I am not too familiar with celery but happy to provide any more troubleshooting information if required.

	def save_chunk(self, text):
	# we also want to pipe this to stdout
	sys.stdout.write(text)

	text = text.decode('utf-8', 'replace')
	text_len = len(text)
	db.session.add(LogChunk(
	task_id=self.task_id,
	text=text,
	offset=self.cur_offset,
	size=text_len,
	))

	# we commit immediately to ensure the API can stream logs
	db.session.commit()
	self.cur_offset += text_len