pinterest / pinball Goto Github PK
View Code? Open in Web Editor NEWPinball is a scalable workflow manager
License: Apache License 2.0
Pinball is a scalable workflow manager
License: Apache License 2.0
Hi
I am very new to Pinball and want to use it in my project. I have looked at getting-started guide to have pinball up and running , and imported example workflow as mentioned in the guide. But I dont know where to go after that, i cant find enough tutorial on the net also.
For my project, i might need to create my own workflow such as fetching some webpage, send them to parsers and indexers, archive the data, store some images, scan through log files, restart app server etc. Can somebody please guide me where i should start learning/using pinball.
Thanks in advance
Max
while running
python -m pinball.run_pinball -c tutorial/example_repo/tutorial.yaml -m master
I get an error:
Creating tables ...
Creating table active_tokens_NV2XG3DFNAWUQUBNINXW24DBOEWUK3DJORSS2OBTGAYC2Q2NKQ______
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in run_code
exec code in run_globals
File "/home/musleh/Documents/pinball/tmp/pinball/pinball/run_pinball.py", line 230, in
main()
File "/home/musleh/Documents/pinball/tmp/pinball/pinball/run_pinball.py", line 208, in main
factory.create_master(DbStore())
File "pinball/persistence/store.py", line 46, in init
self.initialize()
File "pinball/persistence/store.py", line 130, in initialize
management.call_command('syncdb', interactive=False)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/init.py", line 161, in call_command
return klass.execute(args, *defaults)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 255, in execute
output = self.handle(args, _options)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 385, in handle
return self.handle_noargs(options)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/commands/syncdb.py", line 102, in handle_noargs
cursor.execute(statement)
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/util.py", line 41, in execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/mysql/base.py", line 130, in execute
six.reraise(utils.DatabaseError, utils.DatabaseError(tuple(e.args)), sys.exc_info()[2])
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/mysql/base.py", line 120, in execute
return self.cursor.execute(query, args)
File "/usr/local/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/usr/local/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
django.db.utils.DatabaseError: (1103, "Incorrect table name 'active_tokens_NV2XG3DFNAWUQUBNINXW24DBOEWUK3DJORSS2OBTGAYC2Q2NKQ'")
and when i created a table without the trailing underscores it worked fine.
Failed MySQL command:
mysql> create table active_tokens_NV2XG3DFNAWUQUBNINXW24DBOEWUK3DJORSS2OBTGAYC2Q2NKQ______(id int);
ERROR 1103 (42000): Incorrect table name 'active_tokens_NV2XG3DFNAWUQUBNINXW24DBOEWUK3DJORSS2OBTGAYC2Q2NKQ______'
Succesful SQL command:
mysql> create table active_tokens_NV2XG3DFNAWUQUBNINXW24DBOEWUK3DJORSS2OBTGAYC2Q2NKQ(id int);
Query OK, 0 rows affected (0.08 sec)
Hi there,
I'm a researcher studying software evolution. As part of my current research, I'm studying the implications of open-sourcing a proprietary software, for instance, if the project succeed in attracting newcomers. However, I observed that some projects, like pinball, deleted their software history.
Knowing that software history is indispensable for developers (e.g., developers need to refer to history several times a day), I would like to ask pinball developers the following four brief questions:
Thanks in advance for your collaboration,
Gustavo Pinto, PhD
http://www.gustavopinto.org
After pulling & building the latest pinball with the fixed workflows_config commits, i'm now able to start the ui from the console without error. However, when I visit http://localhost:8080/schedules, a datatables error alert pops up:
Datatables warning (table id = "workflows"). Requested unknown parameter 'workflows_config' from the data source at row 0.
So, I then click ok and click on my workflow (tutorial_workflow). I then get the following error:
AttributeError at /schedule/
'WorkflowSchedule' object has no attribute 'workflows_config'
Request Method: GET
Request URL: http://localhost:8080/schedule/?workflow=tutorial_workflow
Django Version: 1.5.4
Exception Type: AttributeError
Exception Value:
'WorkflowSchedule' object has no attribute 'workflows_config'
Exception Location: /usr/local/lib/python2.7/dist-packages/pinball-0.1.1-py2.7.egg/pinball/ui/data_builder.py in get_schedule, line 891
Python Executable: /usr/bin/python
Python Version: 2.7.6
Python Path:
['',
'/usr/local/lib/python2.7/dist-packages/luigi-1.1.3-py2.7.egg',
'/usr/local/lib/python2.7/dist-packages/python_daemon-UNKNOWN-py2.7.egg',
'/usr/local/lib/python2.7/dist-packages/tornado-4.1-py2.7-linux-x86_64.egg',
'/usr/local/lib/python2.7/dist-packages/lockfile-0.10.2-py2.7.egg',
'/usr/local/lib/python2.7/dist-packages/backports.ssl_match_hostname-3.4.0.2-py2.7.egg',
'/usr/local/lib/python2.7/dist-packages/certifi-14.5.14-py2.7.egg',
'/usr/local/lib/python2.7/dist-packages/pinball-0.1.1-py2.7.egg',
'/mnt',
'/usr/lib/python2.7',
'/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk',
'/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages']
Server time: Sat, 21 Mar 2015 00:44:33 -0400
It seems there are some more references to workflows_config that need to be fixed in the ui now?
Thanks!
Collecting pydot==1.0.28 (from pinball)
Could not find a version that satisfies the requirement pydot==1.0.28 (from pinball) (from versions: 1.0.2)
Some externally hosted files were ignored as access to them may be unreliable (use --allow-external to allow).
No distributions matching the version for pydot==1.0.28 (from pinball)
I'm assuming it should be pydot==1.0.2 instead of 1.0.28
sudo pip install pydot
The directory '/Users/smallem/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want the -H flag.
You are using pip version 6.0.7, however version 6.0.8 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The directory '/Users/smallem/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want the -H flag.
Collecting pydot
Downloading pydot-1.0.2.tar.gz
Couldn't import dot_parser, loading of dot files will not be possible.
Requirement already satisfied (use --upgrade to upgrade): pyparsing in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from pydot)
Requirement already satisfied (use --upgrade to upgrade): setuptools in /Library/Python/2.7/site-packages (from pydot)
Installing collected packages: pydot
Running setup.py install for pydot
Couldn't import dot_parser, loading of dot files will not be possible.
Successfully installed pydot-1.0.2
In README.rst, first line says
Pinball is a scalable workflow managemer.
meant to say
management tool
?
When building from source using 'python setup.py install --user', I get this error:
Installed /home/golharr/.local/lib/python2.7/site-packages/pinball-0.2.9-py2.7.egg
Processing dependencies for pinball==0.2.9
Traceback (most recent call last):
File "setup.py", line 81, in
test_suite='tests',
File "/apps/sys/python/2.7.8/lib/python2.7/distutils/core.py", line 151, in setup
dist.run_commands()
File "/apps/sys/python/2.7.8/lib/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/apps/sys/python/2.7.8/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "build/bdist.linux-x86_64/egg/setuptools/command/install.py", line 67, in run
File "build/bdist.linux-x86_64/egg/setuptools/command/install.py", line 117, in do_egg_install
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 370, in run
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 594, in easy_install
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 645, in install_item
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 704, in process_distribution
TypeError: not enough arguments for format string
Hi,
I'm just getting started with the pinball. (installed from master branch..) and running the master command.
But get some db table creation error.
relation "active_tokens_mfxgc3tefvew443qnfzg63rngu2tema_" does not exist
Looks like django's db creation creates tables in some order that doesn't work with postgres . I'm using django.db.backends.postgresql_psycopg2 for default database.
We're using python scripts mostly, and has set up our scripts to correctly use the stderr and stdout channels:
'stdout': {
'level': 'INFO',
'class': 'logging.StreamHandler',
'formatter': 'simple',
'stream': 'ext://sys.stdout',
},
'stderr': {
'level': 'ERROR',
'class': 'logging.StreamHandler',
'formatter': 'simple',
'stream': 'ext://sys.stderr',
},
We have one master: running the master, ui, and scheduler service. We have two worker machines: each running the worker service. When we try to see the logs of a job: we keep getting the error:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/pinball-0.1.1-py2.7.egg/pinball/ui/views.py", line 185, in file_content execution, log_type)
File "/usr/local/lib/python2.7/dist-packages/pinball-0.1.1-py2.7.egg/pinball/ui/data_builder.py", line 843, in get_file_content f.close()
File "/usr/local/lib/python2.7/dist-packages/pinball-0.1.1-py2.7.egg/pinball/workflow/log_saver.py", line 101, in close self._file_descriptor.close()
AttributeError: 'NoneType' object has no attribute 'close'
It seems master is trying to read the logs on it's own disk, instead of reading it from the worker. What is wrong with our set up to enable us to read the logs. It works fine when we tested it and the worker service was running on the master self.
I just installed pinball and am trying to get it up and running on CentOS 6. When I run
python -m pinball.run_pinball -c ./default.yaml -m master
I get:
Traceback (most recent call last):
File "/apps/sys/python/2.7.8/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/apps/sys/python/2.7.8/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/golharr/workspace/pinball/pinball/run_pinball.py", line 26, in
from pinball.config.pinball_config import PinballConfig
File "pinball/config/pinball_config.py", line 28, in
class PinballConfig(object):
File "pinball/config/pinball_config.py", line 97, in PinballConfig
'django.core.context_processors.request',
TypeError: can only concatenate list (not "tuple") to list
Hi,
I am not sure if its the right place to ask such questions, but I have not found other suitable place.
My main question is what he major difference between your project and Luigi?
Why one should use pinball and not luigi ?
Thanks,
The initial Pinball User Guide was added by @pgarbacki as a placeholder and has never been updated.
config is supposed to be independent of code: you are free to change the config as long as it makes to you even you are always using the same code. But generation number should represent the property of the code, it doesn't make sense to change the number if you didn't change the code.
So my feeling is that we should put generation info into code, remove it from the config yaml file, what's your thought on this? @pgarbacki @MaoYe
After reading about pinball, I can't see how I can prevent full failure overnight if the server with the master or the scheduler dies.
Is there any way to have multi master? and multi schedulers?
We had a job which never ended and just kept running. Looking at the log it showed that "Job Succeeded", but the process did not end. This was quite a weird bug.
I noticed that a dependency was duplicated, which I believe was the cause. This is just an FYI in case it comes up again; feel free to close. Maybe add a check to make sure the dependencies is a set/unique?
Trying to install pinball on a brand new ubuntu 14.04 ec2 instance and I am receiving the following error after running pip install pinball
:
Cleaning up... Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_root/guppy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-_Rny0e-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip_build_root/guppy Storing debug log for failure in /root/.pip/pip.log
See gist here: https://gist.github.com/arukaen/d0847fbd3a6b07230984#file-pinball-issue-github
I'm taking a look at pinball and I'm trying to get it Dockerized so my colleagues can have a plug-n-play version of this. So far its been a pretty nice experience putting this into Docker, however I'm afraid i've hit a wall.
Please check out my repository here.
docker-compose is a bit finicky so you need to be careful in how you start this;
Now you should be serving all of these components on a docker host! huzzah!
NOT SO FAST!!!
There's nothing there?! Your guide prepared me for such an occasion (typo in the docs btw ctrl+f reschedue)
root@d80b93983b5e:/code# python -m pinball.tools.workflow_util -c /code/pinball_config.yml -f reschedule
Creating tables ...
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/site-packages/pinball/tools/workflow_util.py", line 1285, in <module>
main()
File "/usr/local/lib/python2.7/site-packages/pinball/tools/workflow_util.py", line 1281, in main
print run_command(options)
File "/usr/local/lib/python2.7/site-packages/pinball/tools/workflow_util.py", line 1222, in run_command
return command.execute(client, store)
File "/usr/local/lib/python2.7/site-packages/pinball/tools/workflow_util.py", line 771, in execute
ParserCaller.CMD_RESCHEDULE)
File "/usr/local/lib/python2.7/site-packages/pinball/parser/utils.py", line 92, in load_parser_with_caller
return load_path(parser_name)(annotate_parser_caller(parser_params, parser_caller))
File "/usr/local/lib/python2.7/site-packages/pinball/workflow/utils.py", line 54, in load_path
module = importlib.import_module(module_name)
File "/usr/local/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No module named workflows.parser
Odd, b/c examining /usr/local/lib/python/site-packages/pinball_ext/
there is a module named workflows.parser
. This has me thinking maybe I need to do something in order to configure a workflow? Or maybe that my python-path is not set.
ONE MORE CLUE!
Navigating to $DOCKER_HOST:8080/ you should now see this message...
No idea if this is a symptom or maybe a cause? That is the id of the "master" container.
Anyways, let me know if you guys have any feedback or find these efforts interesting at all!
Cheers!
Pinball is rad.
Really bad documentation. I can't figure out anything.
Is this thing distributed? can I run workers on different machine?
What is the datastore that holds job and schedule information , is it a database or filesystem?
Some links don't seem to be available from outside.
In my company cases, we need to run a specific instance(=workflow) on specific worker pool due to the input source location. I implemented that feature using modify in QueryAndOwnTransaction[1] protocol. I can contribute this feature if you guys want. Let me know if you guys want.
[1] https://github.com/pinterest/pinball/blob/master/pinball/master/transaction.py#L266
Most tables has a limit on table names to about 64 characters. Some tables are base64 encoded hostname postfixed and can end up in longer table names than 64 characters.
In the project I am working on, I need to differentiate who is calling the parser, for example, it's UI, scheduler or some other tool/lib.
the parameter passed down to parser is a map, that's great, so we can add another key 'caller' to it and this won't affect previous code.
how do you think about it? @pgarbacki @MaoYe
I'm wondering why does not pinball save current jobs metadata to mysql? For now current jobs metadata is loaded in the parser layer when the workflow is scheduled. This concept is hard to pass definition of workflow with extra params through UI. Is there a reason?
#TODO(mao): to make it flexible that allow users specify through UI
https://github.com/pinterest/pinball/blob/master/pinball/parser/repository_config_parser.py#L121
FYI. I always thank you for making good workflow solution. Recently I implemented backfilling form for specific requrement of my company.
Hey guys,
First of all, great work! And way to open source it!
I have built a framework in the past quite similar for such workflow management, and it really seems as if you guys have hit the nail firmly on the head.
I would love to contribute. Obviously, with a codebase this large and being fresh out of the doors, core code contributions are a fleeting dream. However, I would love to help you further documentation.
I have checked
https://github.com/pinterest/pinball/blob/master/CONTRIBUTING.rst
and it says more docstrings and further documentation are welcome, but I would love an example of where to start. I would gladly add to whatever you think is more important.
Is there a slack room, IRC channel, or gitter.im room where I could talk to other contributors?
Or is there a kanban board of sorts to grab/make cards for documentation specific tasks?
If not, I'd gladly assist in whatever way I could to set it up. Look forward to hearing back from you guys.
Cheers,
Bobby
PS sorry for opening it as an issue, I just figured it was the easiest means of communication, feel free to close it and email me at bobbygrayson[at]gmail[dot]com
Getting the following error when trying to start the UI:
[2015-03-19 Thu 16:01:23] - pinball.ui.cache_thread - INFO "Workflow data computation starting."
[2015-03-19 Thu 16:01:23] - pinball.ui.cache_thread - ERROR "'WorkflowSchedule' object has no attribute 'workflows_config'"
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/pinball/ui/cache_thread.py", line 67, in _compute_workflow
schedules_data = data_builder.get_schedules()
File "/usr/local/lib/python2.7/dist-packages/pinball/ui/data_builder.py", line 862, in get_schedules
workflows_config=schedule.workflows_config,
AttributeError: 'WorkflowSchedule' object has no attribute 'workflows_config'
[2015-03-19 Thu 16:01:23] - pinball.ui.cache_thread - INFO "Workflow data computation starting."
I also get this error on schedules:
An error occurred on the server: INTERNAL SERVER ERROR
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/pinball/ui/views.py", line 201, in schedules
schedules_data = data_builder.get_schedules()
File "/usr/local/lib/python2.7/dist-packages/pinball/ui/data_builder.py", line 862, in get_schedules
workflows_config=schedule.workflows_config,
AttributeError: 'WorkflowSchedule' object has no attribute 'workflows_config'
Setup:
MySQL: mysql Ver 14.14 Distrib 5.6.17, for osx10.9 (x86_64) using EditLine wrapper
Pinball given root credentials to MySQL db.
Error:
When I run:
python -m pinball.run_pinball -c path/to/pinball/yaml/configuration/file -m master
I get the following error:
Creating tables ...
Creating table active_tokens_NFYC2MJZGIWTCNRYFUYTEOBNGI2DOLTFMMZC42LOORSXE3TBNQ______
django.db.utils.DatabaseError: (1059, "Identifier name 'active_tokens_NFYC2MJZGIWTCNRYFUYTEOBNGI2DOLTFMMZC42LOORSXE3TBNQ______' is too long")
Hi guys,
I have experienced a performance issue since few days. In my case, the pinball logs are over 50,000 records. Of course, I've fixed this issue after moving some records to another table.
As you know, basically the token persistent layer of pinball has a limitation of performance due to like
operation even if cache is enabled.
https://github.com/pinterest/pinball/blob/master/pinball/persistence/store.py#L267
I'm wondering how many days to save the log of workflow at Pinterest? Is 4 weeks, right?
https://github.com/pinterest/pinball/blob/master/pinball/tools/workflow_util.py#L1260
Receiving the following error via pinlog:
executor failed to renew job ownership on time
I'm having trouble resolving this and was wondering if you know perhaps anything about it?
The job is a CommandLineJob that does an ssh to another box. The job runs about 30 minutes, then the script finished, but pinball then fails with above.
I don't think it's the ServerAliveInterval, I've tried that.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.