humanmade / cavalcade-runner Goto Github PK

Daemon for Cavalcade, a scalable WordPress jobs system.

Home Page: https://engineering.hmn.md/projects/cavalcade/

PHP 100.00%

cavalcade-runner's Introduction

Cavalcade-Runner Daemon for Cavalcade, a scalable WordPress jobs system.
A Human Made project. Maintained by @rmccue.

What?

This is the runner for Cavalcade. Head over to the Cavalcade repo to learn more about running this.

cavalcade-runner's People

Contributors

Stargazers

Watchers

cavalcade-runner's Issues

Tag a release so people don't have to use dev-master

Improve logic for setting next run time in `Job::reschedule()`.

Job::reschedule() uses the value nextrun + interval when calculating the new time to run the job.

https://github.com/humanmade/Cavalcade-Runner/blob/master/inc/class-job.php#L84

This causes the jobs to run constantly if the task is originally scheduled before the current timestamp until nextrun + interval is after the current time stamp.

To reproduce schedule an event with wp_schedule_event( 0, 'hourly', 'pwcc_every_hour' ) as a shorthand way of setting the task to start immediately and then once every hour. The outcome is:

the action pwcc_every_hour fires 426,388 consecutively
the action pwcc_every_hour continues firing once each hour

A better approach would be to record the start time when the lock is obtained and add the interval to that when rescheduling.

Cavalcade Runner crashes if DB becomes unreachable.

If your DB endpoint becomes unreachable, cavalcade can effectively crash, and fail to resume running jobs once the DB becomes available again.

Add Cavalcade Runner to Homebrew

Related to #49, it would be nice to be able to install the Cavalcade Runner on a Mac using Homebrew. Ideally that would enable brew install cavalcade, and put the executable from bin/cavalcade into the Homebrew directory. By default, this is /usr/local/bin/.

Homebrew includes documentation to create and maintain a tap. It's likely this would be the necessary approach, as the use of Cavalcade, especially on Mac OS, is probably too obscure to qualify as a formula.

Change logging to less verbose

Currently logging looks like this:

worker.1  | [11] Running wp cavalcade run 11 (wp_scheduled_auto_draft_delete a:0:{})
worker.1  | [11] Started worker
worker.1  | [11] Worker status: Array
worker.1  | (
worker.1  |     [command] => wp cavalcade run 11
worker.1  |     [pid] => 52
worker.1  |     [running] =>
worker.1  |     [signaled] =>
worker.1  |     [stopped] =>
worker.1  |     [exitcode] => 0
worker.1  |     [termsig] => 0
worker.1  |     [stopsig] => 0
worker.1  | )
worker.1  |
worker.1  | [11] Worker shutting down...
worker.1  | [11] Worker out:
worker.1  | [11] Worker err:
worker.1  | [11] Worker ret: 0

Could we just log one line like nginx? I had something like this in mind:

 [11] [28/Jul/2017:14:43:41 +0000] START wp_scheduled_auto_draft_delete a:0:{}
 [11] [28/Jul/2017:14:43:45 +0000] END wp_scheduled_auto_draft_delete 0 ["output"] ["error text"]

Default wait time on restarting in systemd

We had an interesting failure mode where our galera cluster became inaccessible for about 8-9 seconds. That was long enough for every cavalcade worker to crash, then fail starting enough times that systemd gave up. We're now using RestartSec=300 which is probably too conservative, but I figure five minutes is long enough for transitory failures without having a negative impact on site performance.

Mac OS launchctl config

This isn't so much an issue as something that I wanted to share for the benefit of others.

I set up Cavalcade for a local site (which is powered by Valet+), and I wanted to have Cavalcade automatically launch using the native Mac OS launchctl system.

Here's the config file I ended up using, with specifics replaced:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
        <!-- This label should be unique to the site -->
        <key>Label</key>
        <string>some-site.cavalcade</string>

        <key>ProgramArguments</key>
        <array>
                <string>bash</string>
                <string>-lc</string>
                <!-- Update the cavalcade location as needed -->
                <!-- Replace /path/to/wp/site with the correct path -->
                <string>~/projects/Cavalcade-Runner/bin/cavalcade /path/to/wp/site</string>
        </array>

        <key>RunAtLoad</key>
        <true/>

        <!-- These log file paths are used with Homebrew, but I've mapped Cavalcade here too -->
        <key>StandardOutPath</key>
        <string>/usr/local/var/log/cavalcade.log</string>

        <key>StandardErrorPath</key>
        <string>/usr/local/var/log/cavalcade_error.log</string>
</dict>
</plist>

This file is placed into my ~/Library/LaunchAgents/ directory with a .plist extension. For consistency, I've matched the file name with the Label some-site.cavalcade above. To load get this running, I then do this:

launchctl load -w ~/Library/LaunchAgents/some-site.cavalcade.plist

Proposal: exit runner completely, defer to Systemd for restart

TL;DR:

I'm concerned about php's ability to be a long running script, so I'm wondering what you would think about the following:

What if, instead of using a while loop and sleeping, we run the script to completion and allow upstart/systemd to manage restarting the process? I'm testing it currently over the last ~3 hours and memory consumption is consistent; it's not growing.

I added the following to the systemd definition:

RestartSec=1
StartLimitBurst=0

I realize means we'd have to spawn a connection to the database every second, which I don't think is terrible, but not ideal either. This could be changed to 5 second or 10, perhaps.

We've been experiencing some issues with memory leaks lately, so I've been doing some testing. I used Xdebug's function tracing ability to see which functions are consuming memory, and here are the top 10 consumers:

PDOStatement->execute                              896  1.2630 33392944  1.2630 33392944
sleep                                              896  895.2948 16176472  895.2948 16176472
PDO->prepare                                       896  1.1511  7825664  1.1511  7825664
HM\Platform\boostrap_cavalcade_runner                1  0.0044   531408  0.0013   220472
HM\Cavalcade\Runner\autoload                         4  0.0011    76504  0.0009    76168
include                                              7  0.0148   575480  0.0017    34376
PDO->__construct                                     1  0.0389    11392  0.0389    11392
HM\Cavalcade\Runner\Runner->bootstrap                1  0.0551   591656  0.0002     4784
HM\Cavalcade\Runner\Hooks->register                  3  0.0003     2704  0.0001     2704
define                                              51  0.0005     2000  0.0005     2000

Seeing as PDOStatement->execute was the top consumer, I went through the code and unset($statement) after each statement and ran Cavalcade for some time. Even with this, I can see memory usage growing.

A flaw in my thought process is that the function trace shows the memory consumed over the course of the script, but not necessarily what's being consumed at that moment. I made use of https://github.com/BitOne/php-meminfo and https://github.com/arnaud-lb/php-memory-profiler, and they unfortunately didn't reveal anything. Well, they did reveal that the application isn't leaking anything because the memory allocated was always well below what was allocated to the process.

One thing that is consistent is that sleep seems to consume a lot of memory. Reading around, I've seen that php isn't really designed to be a long running process, so I'm concerned about it in this instance.

DB errors are not logged

In the event of the database server not being reachable, or queries failing for some other reason, it would be useful if Cavalcade-Runner added log items for failures.

log location when using xenial xerus

So looking at a way to direct Cc logs to their own files in 16.04, since Systemd seems to have its own ideas.

I would have liked to handle this in salt but not sure how. The most direct approach to change https://github.com/humanmade/Cavalcade-Runner/blob/master/systemd.service#L11

To something like ExecStart=/bin/bash -c 'exec /etc/cavalcade/bin/cavalcade &>> /var/log/cavalcade.log '.

cc @joehoyle

Missing argument for `Runner->terminate()` method

In the run method of the Runner class, $this->terminate() is called when the program breaks out of the while loop.

However, the terminate method requires a $signal argument, which is missing here. Should probably be SIGTERM.

The line in question: https://github.com/humanmade/Cavalcade-Runner/blob/master/lib/Runner.php#L95

Jobs show "failed" when service is restarted.

When the cavalcade service is restarted, we do two things:

Ignore the signal in any wp cavalcade run processes
Wait for all running workers to complete in the cavalcade-runner

This works well to let the jobs complete, but the status of the process is changed, and cavalcade-runner interprets it as a fail.

is_done will return true here, but shutdown() will return -1. This is because (it seems) once a process has been sent SIGTERM, proc_get_status will return:

(
    [command] => wp cavalcade run 440 --url='example.com/'
    [pid] => 13589
    [running] =>
    [signaled] => 1
    [stopped] =>
    [exitcode] => -1
    [termsig] => 15
    [stopsig] => 0
)

(see exitcode)

According to the PHP docs: "The exit code returned by the process (which is only meaningful if running is FALSE). Only first call of this function return real value, next calls return -1." I think this might be an undocumented side-effect of a process ending with SIGTERM.

I think we need to have some logic to handle the case when signaled => 1 or stopsig => 15, and maybe return 0 instead of -1 in those cases?

`get_site_url` on non-Multisite

At the moment, the Job class relies on the existence of the wp_blogs table to determine the current site URL. However, this table doesn't exist on single installs.

STDOUT and STDERR

The runner is quite noisy so we're sending all STDOUT to /dev/null. However it seems like since only printf() is used, all output is STDOUT and no STDERR is ever sent. Could we switch from printf() to something like fwrite()?

SSL support for secure database connections

Where hosts do/can not support Cavalcade-Runner of in the case where you want to use dedicated resources for processing cron - then the runner may need to subscribe to an external DB e.g. Amazon RDS.

Best practice for external DB connections is to leverage SSL to protect data in transit.

It is possible to add support for DB requests via SSL to the runner.

This is not in WP core today - but there are plugins to provide this e.g. (from stackoverflow)

For those looking for a way to do this w/o hacking core or rolling your own plugin:

https://wordpress.org/plugins/secure-db-connection/

Created by the dev who initially reported the issue in WordPress: https://core.trac.wordpress.org/ticket/28625

Runner is not able to use mysql sockets

When we configure wordpress to connec to the database with mysql socket the runner fails to start. We can reproduce it by adding this line to wp-config.php:

define( 'DB_HOST', 'localhost:/tmp/mysql.sock' );

Lock a job when starting it

Right now we are seeing jobs being run more than once simultaneously because a job's "lock" is not quite atomic. We should probably use a SELECT... FOR UPDATE in https://github.com/humanmade/Cavalcade-Runner/blob/master/inc/class-runner.php#L221

Define a constant with the real ABSPATH

This would serve two purposes:

handle edge cases when anyone wants to include something other than wp-settings.php during bootstrap
provide an easy way to check if it's a cavalcade runner process rather than class_exists( ... )

Allow to wait some time before timeouting initial connection to the database

I'm running this in local docker environment and I'm hitting an error which says that database refuses connection:

web_1       | worker.1  | Error: SQLSTATE[HY000] [2002] Connection refused
web_1       | worker.1  | #0 /var/www/project/vendor/humanmade/cavalcade-runner/lib/Runner.php(200): PDO->__construct('mysql:host=mysq...', 'web', 'password', Array)
web_1       | worker.1  | #1 /var/www/project/vendor/humanmade/cavalcade-runner/lib/Runner.php(89): HM\Cavalcade\Runner\Runner->connect_to_db()
web_1       | worker.1  | #2 /var/www/project/vendor/humanmade/cavalcade-runner/bin/cavalcade(26): HM\Cavalcade\Runner\Runner->bootstrap('/var/www/projec...')
web_1       | worker.1  | #3 {main}

This happens because my mysql is not ready yet. Could we add a default timeout which could be adjusted with an flag like --timeout=10?

PHP Warnings running under PHP 8.2

PHP Deprecated:  Creation of dynamic property HM\Cavalcade\Runner\Logger::$db is deprecated in /etc/cavalcade/inc/class-logger.php on line 8
PHP Deprecated:  Creation of dynamic property HM\Cavalcade\Runner\Logger::$table_prefix is deprecated in /etc/cavalcade/inc/class-logger.php on line 9
PHP Deprecated:  Creation of dynamic property HM\Cavalcade\Runner\Job::$schedule is deprecated in /etc/cavalcade/inc/class-runner.php on line 251

Install directory

Hey

Where do you recommend I install the runner? I'm using Debian

Add alternative configuration loading

Right now, the runner loads wp-config in. With the new hook system (#23), this is going to overload wp-config a little. In addition, wp-config loading already doesn't work with some complex systems (like wordpress.org).

Proposal: add a CAVALCADE_CONFIG environment variable which specifies a PHP file to include instead of wp-config. The WP path would still be required on the command line, so the config acts as system-level configuration which can work with multiple installs. (For single install systems, it can replace wp-config instead.)

Exception in run_job doesn't set job to failed

If run_job throws an exception due to not being able to proc_open it will get stuck in a running state forever. As the lock is acquired at the start of the function, it's never updated.

I'm not sure if also intentional, but a failure to proc_open of a worker also shuts down cavalcade in https://github.com/humanmade/Cavalcade-Runner/blob/master/inc/class-runner.php#L131. I think that's probably a reasonable thing to do, but there's no graceful setting status on the job that tried to run, as it's not part of $this->workers.

Allow for the `Logger` to be replaced

A common use case is to have logging be sent to a specialized logging server, where monitoring can happen across all servers and platforms/services.

To make this feasible with Cavalcade, it needs a mechanism to replace the current Logger with a different one.

@rmccue proposed a very simple filter that does not change the current code, but allows for arbitrary loggers to be used, as long as they provide the correct methods.

humanmade / cavalcade-runner Goto Github PK

cavalcade-runner's Introduction

What?

cavalcade-runner's People

Contributors

Stargazers

Watchers

Forkers

cavalcade-runner's Issues

TL;DR:

Recommend Projects

Recommend Topics

Recommend Org