Cavalcade-Runner Daemon for Cavalcade, a scalable WordPress jobs system. |
|
A Human Made project. Maintained by @rmccue. |
What?
This is the runner for Cavalcade. Head over to the Cavalcade repo to learn more about running this.
Daemon for Cavalcade, a scalable WordPress jobs system.
Home Page: https://engineering.hmn.md/projects/cavalcade/
Cavalcade-Runner Daemon for Cavalcade, a scalable WordPress jobs system. |
|
A Human Made project. Maintained by @rmccue. |
This is the runner for Cavalcade. Head over to the Cavalcade repo to learn more about running this.
Job::reschedule()
uses the value nextrun + interval
when calculating the new time to run the job.
https://github.com/humanmade/Cavalcade-Runner/blob/master/inc/class-job.php#L84
This causes the jobs to run constantly if the task is originally scheduled before the current timestamp until nextrun + interval
is after the current time stamp.
To reproduce schedule an event with wp_schedule_event( 0, 'hourly', 'pwcc_every_hour' )
as a shorthand way of setting the task to start immediately and then once every hour. The outcome is:
pwcc_every_hour
fires 426,388 consecutivelypwcc_every_hour
continues firing once each hourA better approach would be to record the start time when the lock is obtained and add the interval to that when rescheduling.
If your DB endpoint becomes unreachable, cavalcade can effectively crash, and fail to resume running jobs once the DB becomes available again.
Related to #49, it would be nice to be able to install the Cavalcade Runner on a Mac using Homebrew. Ideally that would enable brew install cavalcade
, and put the executable from bin/cavalcade
into the Homebrew directory. By default, this is /usr/local/bin/
.
Homebrew includes documentation to create and maintain a tap. It's likely this would be the necessary approach, as the use of Cavalcade, especially on Mac OS, is probably too obscure to qualify as a formula.
Currently logging looks like this:
worker.1 | [11] Running wp cavalcade run 11 (wp_scheduled_auto_draft_delete a:0:{})
worker.1 | [11] Started worker
worker.1 | [11] Worker status: Array
worker.1 | (
worker.1 | [command] => wp cavalcade run 11
worker.1 | [pid] => 52
worker.1 | [running] =>
worker.1 | [signaled] =>
worker.1 | [stopped] =>
worker.1 | [exitcode] => 0
worker.1 | [termsig] => 0
worker.1 | [stopsig] => 0
worker.1 | )
worker.1 |
worker.1 | [11] Worker shutting down...
worker.1 | [11] Worker out:
worker.1 | [11] Worker err:
worker.1 | [11] Worker ret: 0
Could we just log one line like nginx? I had something like this in mind:
[11] [28/Jul/2017:14:43:41 +0000] START wp_scheduled_auto_draft_delete a:0:{}
[11] [28/Jul/2017:14:43:45 +0000] END wp_scheduled_auto_draft_delete 0 ["output"] ["error text"]
We had an interesting failure mode where our galera cluster became inaccessible for about 8-9 seconds. That was long enough for every cavalcade worker to crash, then fail starting enough times that systemd gave up. We're now using RestartSec=300
which is probably too conservative, but I figure five minutes is long enough for transitory failures without having a negative impact on site performance.
This isn't so much an issue as something that I wanted to share for the benefit of others.
I set up Cavalcade for a local site (which is powered by Valet+), and I wanted to have Cavalcade automatically launch using the native Mac OS launchctl
system.
Here's the config file I ended up using, with specifics replaced:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<!-- This label should be unique to the site -->
<key>Label</key>
<string>some-site.cavalcade</string>
<key>ProgramArguments</key>
<array>
<string>bash</string>
<string>-lc</string>
<!-- Update the cavalcade location as needed -->
<!-- Replace /path/to/wp/site with the correct path -->
<string>~/projects/Cavalcade-Runner/bin/cavalcade /path/to/wp/site</string>
</array>
<key>RunAtLoad</key>
<true/>
<!-- These log file paths are used with Homebrew, but I've mapped Cavalcade here too -->
<key>StandardOutPath</key>
<string>/usr/local/var/log/cavalcade.log</string>
<key>StandardErrorPath</key>
<string>/usr/local/var/log/cavalcade_error.log</string>
</dict>
</plist>
This file is placed into my ~/Library/LaunchAgents/
directory with a .plist
extension. For consistency, I've matched the file name with the Label some-site.cavalcade
above. To load get this running, I then do this:
launchctl load -w ~/Library/LaunchAgents/some-site.cavalcade.plist
I'm concerned about php's ability to be a long running script, so I'm wondering what you would think about the following:
What if, instead of using a while loop and sleeping, we run the script to completion and allow upstart/systemd to manage restarting the process? I'm testing it currently over the last ~3 hours and memory consumption is consistent; it's not growing.
I added the following to the systemd definition:
RestartSec=1
StartLimitBurst=0
I realize means we'd have to spawn a connection to the database every second, which I don't think is terrible, but not ideal either. This could be changed to 5 second or 10, perhaps.
We've been experiencing some issues with memory leaks lately, so I've been doing some testing. I used Xdebug's function tracing ability to see which functions are consuming memory, and here are the top 10 consumers:
PDOStatement->execute 896 1.2630 33392944 1.2630 33392944
sleep 896 895.2948 16176472 895.2948 16176472
PDO->prepare 896 1.1511 7825664 1.1511 7825664
HM\Platform\boostrap_cavalcade_runner 1 0.0044 531408 0.0013 220472
HM\Cavalcade\Runner\autoload 4 0.0011 76504 0.0009 76168
include 7 0.0148 575480 0.0017 34376
PDO->__construct 1 0.0389 11392 0.0389 11392
HM\Cavalcade\Runner\Runner->bootstrap 1 0.0551 591656 0.0002 4784
HM\Cavalcade\Runner\Hooks->register 3 0.0003 2704 0.0001 2704
define 51 0.0005 2000 0.0005 2000
Seeing as PDOStatement->execute
was the top consumer, I went through the code and unset($statement)
after each statement and ran Cavalcade for some time. Even with this, I can see memory usage growing.
A flaw in my thought process is that the function trace shows the memory consumed over the course of the script, but not necessarily what's being consumed at that moment. I made use of https://github.com/BitOne/php-meminfo
and https://github.com/arnaud-lb/php-memory-profiler
, and they unfortunately didn't reveal anything. Well, they did reveal that the application isn't leaking anything because the memory allocated was always well below what was allocated to the process.
One thing that is consistent is that sleep
seems to consume a lot of memory. Reading around, I've seen that php isn't really designed to be a long running process, so I'm concerned about it in this instance.
In the event of the database server not being reachable, or queries failing for some other reason, it would be useful if Cavalcade-Runner added log items for failures.
So looking at a way to direct Cc logs to their own files in 16.04, since Systemd seems to have its own ideas.
I would have liked to handle this in salt but not sure how. The most direct approach to change https://github.com/humanmade/Cavalcade-Runner/blob/master/systemd.service#L11
To something like ExecStart=/bin/bash -c 'exec /etc/cavalcade/bin/cavalcade &>> /var/log/cavalcade.log '
.
cc @joehoyle
In the run
method of the Runner
class, $this->terminate()
is called when the program breaks out of the while
loop.
However, the terminate
method requires a $signal
argument, which is missing here. Should probably be SIGTERM
.
The line in question: https://github.com/humanmade/Cavalcade-Runner/blob/master/lib/Runner.php#L95
When the cavalcade service is restarted, we do two things:
wp cavalcade run
processesThis works well to let the jobs complete, but the status of the process is changed, and cavalcade-runner interprets it as a fail.
is_done
will return true
here, but shutdown()
will return -1
. This is because (it seems) once a process has been sent SIGTERM
, proc_get_status
will return:
(
[command] => wp cavalcade run 440 --url='example.com/'
[pid] => 13589
[running] =>
[signaled] => 1
[stopped] =>
[exitcode] => -1
[termsig] => 15
[stopsig] => 0
)
(see exitcode
)
According to the PHP docs: "The exit code returned by the process (which is only meaningful if running is FALSE). Only first call of this function return real value, next calls return -1." I think this might be an undocumented side-effect of a process ending with SIGTERM
.
I think we need to have some logic to handle the case when signaled => 1
or stopsig => 15
, and maybe return 0
instead of -1
in those cases?
At the moment, the Job
class relies on the existence of the wp_blogs
table to determine the current site URL. However, this table doesn't exist on single installs.
The runner is quite noisy so we're sending all STDOUT
to /dev/null
. However it seems like since only printf()
is used, all output is STDOUT
and no STDERR
is ever sent. Could we switch from printf()
to something like fwrite()
?
Where hosts do/can not support Cavalcade-Runner of in the case where you want to use dedicated resources for processing cron - then the runner may need to subscribe to an external DB e.g. Amazon RDS.
Best practice for external DB connections is to leverage SSL to protect data in transit.
It is possible to add support for DB requests via SSL to the runner.
This is not in WP core today - but there are plugins to provide this e.g. (from stackoverflow)
For those looking for a way to do this w/o hacking core or rolling your own plugin:
https://wordpress.org/plugins/secure-db-connection/
Created by the dev who initially reported the issue in WordPress: https://core.trac.wordpress.org/ticket/28625
When we configure wordpress to connec to the database with mysql socket the runner fails to start. We can reproduce it by adding this line to wp-config.php:
define( 'DB_HOST', 'localhost:/tmp/mysql.sock' );
Right now we are seeing jobs being run more than once simultaneously because a job's "lock" is not quite atomic. We should probably use a SELECT... FOR UPDATE
in https://github.com/humanmade/Cavalcade-Runner/blob/master/inc/class-runner.php#L221
This would serve two purposes:
wp-settings.php
during bootstrapclass_exists( ... )
I'm running this in local docker environment and I'm hitting an error which says that database refuses connection:
web_1 | worker.1 | Error: SQLSTATE[HY000] [2002] Connection refused
web_1 | worker.1 | #0 /var/www/project/vendor/humanmade/cavalcade-runner/lib/Runner.php(200): PDO->__construct('mysql:host=mysq...', 'web', 'password', Array)
web_1 | worker.1 | #1 /var/www/project/vendor/humanmade/cavalcade-runner/lib/Runner.php(89): HM\Cavalcade\Runner\Runner->connect_to_db()
web_1 | worker.1 | #2 /var/www/project/vendor/humanmade/cavalcade-runner/bin/cavalcade(26): HM\Cavalcade\Runner\Runner->bootstrap('/var/www/projec...')
web_1 | worker.1 | #3 {main}
This happens because my mysql is not ready yet. Could we add a default timeout which could be adjusted with an flag like --timeout=10
?
PHP Deprecated: Creation of dynamic property HM\Cavalcade\Runner\Logger::$db is deprecated in /etc/cavalcade/inc/class-logger.php on line 8
PHP Deprecated: Creation of dynamic property HM\Cavalcade\Runner\Logger::$table_prefix is deprecated in /etc/cavalcade/inc/class-logger.php on line 9
PHP Deprecated: Creation of dynamic property HM\Cavalcade\Runner\Job::$schedule is deprecated in /etc/cavalcade/inc/class-runner.php on line 251
Hey
Where do you recommend I install the runner? I'm using Debian
Right now, the runner loads wp-config in. With the new hook system (#23), this is going to overload wp-config a little. In addition, wp-config loading already doesn't work with some complex systems (like wordpress.org).
Proposal: add a CAVALCADE_CONFIG
environment variable which specifies a PHP file to include instead of wp-config. The WP path would still be required on the command line, so the config acts as system-level configuration which can work with multiple installs. (For single install systems, it can replace wp-config instead.)
If run_job throws an exception due to not being able to proc_open
it will get stuck in a running
state forever. As the lock is acquired at the start of the function, it's never updated.
I'm not sure if also intentional, but a failure to proc_open
of a worker also shuts down cavalcade in https://github.com/humanmade/Cavalcade-Runner/blob/master/inc/class-runner.php#L131. I think that's probably a reasonable thing to do, but there's no graceful setting status on the job that tried to run, as it's not part of $this->workers
.
A common use case is to have logging be sent to a specialized logging server, where monitoring can happen across all servers and platforms/services.
To make this feasible with Cavalcade, it needs a mechanism to replace the current Logger with a different one.
@rmccue proposed a very simple filter that does not change the current code, but allows for arbitrary loggers to be used, as long as they provide the correct methods.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.