shopify / ci-queue Goto Github PK
View Code? Open in Web Editor NEWDistribute tests over many workers using a queue
License: MIT License
Distribute tests over many workers using a queue
License: MIT License
In some extreme cases, a single low level issue can make a huge portion of the test suite fail.
It can make sense in this context to stop running tests once X errors have been recorded, both to save CPU time and to keep a readable output.
cc @Shopify/pipeline
When Dependabot raises new PR's for ci-queue
updates there is no information contained in the PR about what changed which forces a developer to dive into this project to understand what has happened.
Once inside this project, there is no CHANGELOG or Github releases which forces developers to manually inspect PR's or commits to understand what has happened.
This is a poor use of developer's time.
Adding a CHANGELOG or adding information to Github releases can streamline this process and give more confidence to consumers when updates occur.
It looks like that skipped tests are showing up as errors in the list if there is another error happening.
[REDACTED]
I'm running ci-queue in a bit of an alternative setting. I've got a Ruby on Rails app which has a long test suite, and I'm running 8 concurrent instances of RSpec on Heroku CI, but on one and the same dyno, which also runs Redis in-dyno.
I start the whole chain with the following script
export CI_QUEUE_URL=$([ -z "$REDIS_URL" ] && echo "redis://127.0.0.1:6379" || echo "$REDIS_URL")
for num in $(seq 1 $PARALLEL_COUNT); do
RUBYOPT="-W0" \
CI_NODE_INDEX=$(expr $num - 1) \
DATABASE_URL=$([ "$num" -ne "1" ] && echo $DATABASE_URL$num || echo $DATABASE_URL) \
rspec-queue \
--timeout 180 \
--max-consecutive-failures 10 \
--max-requeues 5 \
--requeue-tolerance 5 $@ &
done
wait
rspec-queue --report
Every now and then when this runs on Heroku CI, 7 out of the 8 workers end at the same time, but it seems the 8th one keeps on running for hours until Heroku kills my build after 2 hours that usually passes in about 10-15 minutes. It's as if it doesn't understand that the queue is done.
Randomized with seed 57482
.
Finished in 9 minutes 51 seconds (files took 10.22 seconds to load)
85 examples, 0 failures, 1 pending
Randomized with seed 15517
< THIS IS WHERE I EXPECT THE REPORT >
.......................................................................................................................-----> test command `bin/test` failed with signal: terminated
While there are 8 instances running, the word Finished
only shows up 7 times in the log, so I assume what I see here is the finishing of workers 6 and 7, and the following dots being part of worker 8.
I have no idea how to debug this further, but I'd love to add more details if someone can help me in the right direction.
I've compared the output of a successful build and a failed build.
Since we updated redis to 5.0.5, we are facing errors whenever worker-id is not present:
$ bundle exec minitest-queue run -Itest $(find test -name \*_test.rb | sort)
bundler: failed to load command: minitest-queue (/tmp/bundle/ruby/3.1.0/bin/minitest-queue)
W, [2022-10-04T21:09:42.446048 #241] WARN -- : calling Tracer#shutdown multiple times.
/tmp/bundle/ruby/3.1.0/gems/redis-client-0.9.0/lib/redis_client/command_builder.rb:37:in `block in generate': Unsupported command argument type: NilClass (TypeError)
from /tmp/bundle/ruby/3.1.0/gems/redis-client-0.9.0/lib/redis_client/command_builder.rb:28:in `map!'
from /tmp/bundle/ruby/3.1.0/gems/redis-client-0.9.0/lib/redis_client/command_builder.rb:28:in `generate'
from /tmp/bundle/ruby/3.1.0/gems/redis-client-0.9.0/lib/redis_client.rb:211:in `call_v'
from /tmp/bundle/ruby/3.1.0/gems/redis-5.0.5/lib/redis/client.rb:73:in `call_v'
from /tmp/bundle/ruby/3.1.0/gems/redis-5.0.5/lib/redis.rb:167:in `block in send_command'
from /tmp/bundle/ruby/3.1.0/gems/redis-5.0.5/lib/redis.rb:166:in `synchronize'
from /tmp/bundle/ruby/3.1.0/gems/redis-5.0.5/lib/redis.rb:166:in `send_command'
from /tmp/bundle/ruby/3.1.0/gems/redis-5.0.5/lib/redis/commands/sets.rb:21:in `sadd'
from /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.24.2/lib/ci/queue/redis/worker.rb:218:in `register'
from /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.24.2/lib/ci/queue/redis/worker.rb:211:in `push'
from /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.24.2/lib/ci/queue/redis/worker.rb:27:in `populate'
from /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.24.2/lib/minitest/queue/runner.rb:282:in `populate_queue'
from /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.24.2/lib/minitest/queue/runner.rb:87:in `run_command'
from /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.24.2/lib/minitest/queue/runner.rb:38:in `public_send'
from /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.24.2/lib/minitest/queue/runner.rb:38:in `run!'
from /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.24.2/lib/minitest/queue/runner.rb:19:in `invoke'
from /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.24.2/exe/minitest-queue:5:in `<top (required)>'
from /tmp/bundle/ruby/3.1.0/bin/minitest-queue:25:in `load'
from /tmp/bundle/ruby/3.1.0/bin/minitest-queue:25:in `<top (required)>'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/lib/bundler/cli/exec.rb:58:in `load'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/lib/bundler/cli/exec.rb:58:in `kernel_load'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/lib/bundler/cli/exec.rb:23:in `run'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/lib/bundler/cli.rb:483:in `exec'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/lib/bundler/vendor/thor/lib/thor.rb:392:in `dispatch'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/lib/bundler/cli.rb:31:in `dispatch'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/lib/bundler/vendor/thor/lib/thor/base.rb:485:in `start'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/lib/bundler/cli.rb:25:in `start'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/exe/bundle:48:in `block in <top (required)>'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/lib/bundler/friendly_errors.rb:103:in `with_friendly_errors'
from /usr/local/ruby/lib/ruby/gems/3.1.0/gems/bundler-2.3.9/exe/bundle:36:in `<top (required)>'
from /usr/local/ruby/bin/bundle:25:in `load'
from /usr/local/ruby/bin/bundle:25:in `<main>'
This problem only appears when worker_id
is nil, since we send it down to redis which is now doing more strict type checking.
I think in the ideal scenario, worker_id should never be nil, but in that case, the command should fail in argument validation whenever no worker ID was provided or inferred, instead of failing when trying to access redis
1st party software must have a business impact classification. See our list of classification to help you determine the classification of your application.
It's critical that we understand the business impact of the software we write. The classification of your service determines the level of support you must provide as owners and helps us set standards.
This is mandatory. If this is not done you won't be able to have any runtimes.
At the latest, this should be done before 2021-07-22.
First, leave a comment explaining why it doesn't apply. Then, leave another comment as /not_applicable
, and close the issue.
If you change your mind, make sure to comment any reason and reopen the issue.
Please contact the Production Excellence team using Slack at #production-excellence-team.
Your service: ci-queue/rubygems
Owners:
Currently, all worker nodes
We can shave off some test time if a worker first
When an order-related failure happens is very difficult to reproduce it locally for large suites (which are likely among users of this library).
Having a way to reproduce the exact sequence locally would be a great help and would considerably reduce --bisect
times.
If there's consensus I can try to propose an implementation for this 🙋♂️
This PR introduced the usage of a method sadd
that always returns an integer value.
See failed CI run https://buildkite.com/shopify/shop-server/builds/167914#01886d13-625e-4fbe-a514-5c289ee654dd
👋 Running into a ENOENT
error when the gem tries to open up the log/test_order.log
file and the log
directory does not exist.
Running this command on CI:
bundle exec minitest-queue run --max-requeues 3 --requeue-tolerance 0.01 --timeout 45 -Itest:lib $(find test -name \*_test.rb | sort)
Causes this error:
/usr/local/bundle/gems/bundler-1.17.3/lib/bundler/rubygems_integration.rb:200: warning: constant Gem::ConfigMap is deprecated
--
| Traceback (most recent call last):
| 8: from /tmp/bundle/ruby/2.7.0/gems/minitest-5.14.1/lib/minitest.rb:68:in `block in autorun'
| 7: from /tmp/bundle/ruby/2.7.0/gems/minitest-5.14.1/lib/minitest.rb:139:in `run'
| 6: from /tmp/bundle/ruby/2.7.0/gems/minitest-5.14.1/lib/minitest.rb:849:in `start'
| 5: from /tmp/bundle/ruby/2.7.0/gems/minitest-5.14.1/lib/minitest.rb:849:in `each'
| 4: from /tmp/bundle/ruby/2.7.0/gems/minitest-reporters-1.4.2/lib/minitest/minitest_reporter_plugin.rb:16:in `start'
| 3: from /tmp/bundle/ruby/2.7.0/gems/minitest-reporters-1.4.2/lib/minitest/minitest_reporter_plugin.rb:16:in `each'
| 2: from /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/minitest/queue/order_reporter.rb:11:in `start'
| 1: from /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/minitest/queue/order_reporter.rb:11:in `open'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/minitest/queue/order_reporter.rb:11:in `initialize': No such file or directory @ rb_sysopen - log/test_order.log (Errno::ENOENT)
Committing an empty log
folder at the root of my repo fixes it. The folder should just be created on its own if it doesn't exist and is required at runtime.
eg: #181
then I get on buildkite
https://buildkite.com/shopify/bloodhound/builds/10574#69a9c3dc-3685-4b17-b0b2-c6a3b960703e
Pipelining commands on a Redis instance is deprecated and will be removed in Redis 5.0.0. 61% Time: 00:02:00
--
|
| redis.multi do
| redis.get("key")
| end
|
| should be replaced by
|
| redis.multi do \|pipeline\|
| pipeline.get("key")
| end
|
| (called from /tmp/bundle/ruby/3.0.0/gems/ci-queue-0.21.1/lib/ci/queue/redis/base.rb:24:in `size'}
| Pipelining commands on a Redis instance is deprecated and will be removed in Redis 5.0.0. 61% Time: 00:02:00
|
| redis.multi do
| redis.get("key")
| end
|
| should be replaced by
|
| redis.multi do \|pipeline\|
| pipeline.get("key")
| end
|
| (called from /tmp/bundle/ruby/3.0.0/gems/ci-queue-0.21.1/lib/ci/queue/redis/base.rb:24:in `size'}
| Pipelining commands on a Redis instance is deprecated and will be removed in Redis 5.0.0.
|
| redis.multi do
| redis.get("key")
| end
|
| should be replaced by
|
| redis.multi do \|pipeline\|
| pipeline.get("key")
| end
This would make it easier to test for flakiness. Idea: add [ci no-requeue]
to commit message but I am open to other ideas.
cc @gmalette
Currently some of our projects use the following wrapper script:
command = %w(bundle exec minitest-queue)
command << (ENV['BUILDKITE_RETRY_COUNT'].to_i > 0 ? 'retry' : 'run')
This is pretty much always the behavior you want, so minitest-queue
and rspec-queue
should behave like this by default if they detect the job was retried.
I need to check CircleCI & Travis-ci docs to know if they have a similar env var.
Or remove the TODO from the README if it plug-and-play.
On otherwise good builds, retried tests show up as failures in Starscream.
Despite having been retried successfully.
Node 2:
tests/starscream/loaders/test_sqlalchemy_loaders.py::TestRedshiftLoaderIntegration::test_load_from_scratch_incremental[parquet] FAILED WILL_RETRY
tests/starscream/loaders/test_sqlalchemy_loaders.py::TestRedshiftLoaderIntegration::test_load_from_scratch_incremental[parquet] SKIPPED
Node 3:
tests/starscream/loaders/test_sqlalchemy_loaders.py::TestRedshiftLoaderIntegration::test_load_from_scratch_incremental[json] PASSED
cc/ @Shopify/dev-acceleration
After #34, you need to initialize the File and Static queue passing the file name or the test respectively and you also need to populate the queue with the list of tests. This seems contra-intuitive.
Maybe the initialize should only take the config and the populate the list of tests and add a .populate
class method (or any other name) that does what the initialize used to do.
Also those queues require a configuration object that is not needed in most of its use cases. Making it optional, or adding a high level method as .populate
that creates the configuration object would be great.
cc @casperisfine @wvanbergen @tjoyal
Nitpick:
The @index
instance variable is only set after you call #populate
so if you don't you get a "non-method error fetch for nil" in the to_a
method and a ruby warning because the instance variable is not initialized. Maybe we should try to write warning free code so we could avoid this kind of error?
By default no timeouts are set for the Redis client. It's poorly documented, but as far as I can tell, the Redis client will wait hang infinitely by default. This is not a sensible default for ci-queue clients, forcing every client to specify this value. socket_connect_timeout
defaults to whatever is specified for socket_timeout
, but since that is handled deep down in the Redis client and certainly not documented by ci-queue a responsible client might not rely on that.
I suggest explicitly setting sensible defaults for these two values to reduce the burden on clients.
ci-queue/python/ciqueue/_pytest/test_queue.py
Lines 59 to 81 in 3b7ac59
The way we insert our custom error reports make it so that Minitest won't print anything anymore.
We should stay as close as possible to minitest default behavior if no other reporters are configured.
CI::Queue::Redis::ReservationError: "./spec/models/redacted_spec.rb[1:20]" is already reserved. You have to acknowledge it before you can reserve another one
--
| /app/vendor/bundle/ruby/2.4.0/gems/ci-queue-0.11.1/lib/ci/queue/redis/worker.rb:117:in `reserve'
| /app/vendor/bundle/ruby/2.4.0/gems/ci-queue-0.11.1/lib/ci/queue/redis/worker.rb:48:in `poll'
| /app/vendor/bundle/ruby/2.4.0/gems/ci-queue-0.11.1/lib/rspec/queue.rb:347:in `block (2 levels) in run_specs'
| /app/vendor/bundle/ruby/2.4.0/gems/rspec-core-3.6.0/lib/rspec/core/configuration.rb:1894:in `with_suite_hooks'
| /app/vendor/bundle/ruby/2.4.0/gems/ci-queue-0.11.1/lib/rspec/queue.rb:346:in `block in run_specs'
| /app/vendor/bundle/ruby/2.4.0/gems/rspec-core-3.6.0/lib/rspec/core/reporter.rb:79:in `report'
| /app/vendor/bundle/ruby/2.4.0/gems/ci-queue-0.11.1/lib/rspec/queue.rb:343:in `run_specs'
| /app/vendor/bundle/ruby/2.4.0/gems/rspec-core-3.6.0/lib/rspec/core/runner.rb:87:in `run'
| /app/vendor/bundle/ruby/2.4.0/gems/rspec-core-3.6.0/lib/rspec/core/runner.rb:71:in `run'
| /app/vendor/bundle/ruby/2.4.0/gems/rspec-core-3.6.0/lib/rspec/core/runner.rb:45:in `invoke'
| /app/vendor/bundle/ruby/2.4.0/gems/ci-queue-0.11.1/exe/rspec-queue:4:in `<top (required)>'
| /app/vendor/bundle/ruby/2.4.0/bin/rspec-queue:23:in `load'
| /app/vendor/bundle/ruby/2.4.0/bin/rspec-queue:23:in `<top (required)>'
the python implementation of ciqueue should be made to work with python 3.6, as well as the current python 2.7. We should extend the test environment to cover both versions of python (tox may come in handy for this).
eg.
https://buildkite.com/shopify/starscream/builds/80219#9bbff676-e0e4-4ad2-b121-45c8584f82de
In this instance, we know that there are errors during collection, but we push them anyway. The tests can't run on the workers, but by pushing them to the queue, an reporter is fooled into thinking that they'll eventually complete.
Sometimes our Redis server that handle the ci-queue workload experience a failover or some other availability issues.
When this happens it break builds even though it recovers pretty fast.
Error connecting to Redis on redacted.svc.cluster.local.:6379 (SocketError) (Redis::CannotConnectError)
./tmp/bundle/ruby/3.1.0/gems/redis-4.8.0/lib/redis/client.rb:162:in `call': MASTERDOWN Link with MASTER is down and replica-serve-stale-data is set to 'no'. (Redis::CommandError)
(that later one need to be better categorized by the redis gem though)
Ideally we'd be resilient to these small transient errors, this means retrying all or most commands and possibly waiting a bit before retrying. The redis
gem has the necessary elements for that it's mostly just configuration.
However we need to go over all the commands we emit and make sure they are idempotent, otherwise retrying could result in a corrupted state, lost tests etc.
cc @ChrisBr
I think I'd like to split up populating the queue, and working off the queue, so we can put a filter/order step in between. This allows us to remove tests we don't want to run, or order the list so the slowest ones get run first.
Then, the existing worker code can work off this queue.
CI::Queue::Entry
. This class would hold all annotation.@DazWorrall @casperisfine what do you think?
In the following scenario:
Expected behavior:
Worker A should see that the test was already acknowledged by another worker and simply drop the result.
Current behavior:
The test is requeued by worker A
Discovered by @lugray
Because both minitest and rspec integration are provided by the same gem, we can't define explicit dependencies in the gemspec, even though that would make testing much much easier.
I think we should split in 3 gems:
Errors in the python-tests
job in the Tests
workflow were preventing new versions of the Ruby gem from being released. To unblock the release, we stopped running the tests against Python 3.8 and removed the lint check with #170. Support for Python 3.8 and the lint check should be re-added.
I believe this gist should help reproduce this error: https://gist.github.com/mrsimo/9cfc750719a7230c2a96803a4c861042
I added a puts
inside the redis
gem just before executes a command, hence the redis output we see:
$ be minitest-queue retry --build wadus-1 --queue $queue test.rb
["*2", "$4", "auth", "$32", <zip>"]
["*4", "$6", "lrange", "$27", "build:wadus-1:worker::queue", "$1", "0", "$2", "-1"]
["*4", "$4", "hset", "$27", "build:wadus-1:error-reports", "$57", "BugTest#test_doesn’t_allow_requests_from_inactive_users", "$333", "\x04\b{\b:\x19test_and_module_nameI\">BugTest#test_doesn\xE2\x80\x99t_allow_requests_from_inactive_users\x06:\x06ET:\x0Etest_nameI\"6test_doesn\xE2\x80\x99t_allow_requests_from_inactive_users\x06;\x06T:\voutputI\"\x01\x9F\e[31mFAIL\e[0m BugTest#test_doesn\xE2\x80\x99t_allow_requests_from_inactive_users\n\e[33mExpected true to not be truthy.\e[0m\n test.rb:10:in `block in <class:BugTest>'\n\n\x06;\x06T"]
Traceback (most recent call last):
35: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/minitest-5.11.3/lib/minitest.rb:63:in `block in autorun'
34: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/minitest-5.11.3/lib/minitest.rb:136:in `run'
33: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/ci-queue-0.13.4/lib/minitest/queue.rb:145:in `__run'
32: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/ci-queue-0.13.4/lib/minitest/queue.rb:156:in `run_from_queue'
31: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/ci-queue-0.13.4/lib/ci/queue/static.rb:52:in `poll'
30: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/ci-queue-0.13.4/lib/minitest/queue.rb:177:in `block in run_from_queue'
29: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/minitest-5.11.3/lib/minitest.rb:802:in `record'
28: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/minitest-5.11.3/lib/minitest.rb:802:in `each'
27: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/minitest-5.11.3/lib/minitest.rb:803:in `block in record'
26: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/minitest-reporters-1.3.0/lib/minitest/minitest_reporter_plugin.rb:20:in `record'
25: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/minitest-reporters-1.3.0/lib/minitest/minitest_reporter_plugin.rb:20:in `each'
24: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/minitest-reporters-1.3.0/lib/minitest/minitest_reporter_plugin.rb:21:in `block in record'
23: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/ci-queue-0.13.4/lib/minitest/queue/build_status_recorder.rb:52:in `record'
22: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/ci-queue-0.13.4/lib/ci/queue/redis/build_record.rb:20:in `record_error'
21: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis.rb:2303:in `pipelined'
20: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis.rb:45:in `synchronize'
19: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/2.5.0/monitor.rb:226:in `mon_synchronize'
18: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis.rb:45:in `block in synchronize'
17: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis.rb:2307:in `block in pipelined'
16: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:153:in `call_pipeline'
15: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:291:in `with_reconnect'
14: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:155:in `block in call_pipeline'
13: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:185:in `call_pipelined'
12: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:218:in `process'
11: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:304:in `logging'
10: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:219:in `block in process'
9: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:366:in `ensure_connected'
8: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:220:in `block (2 levels) in process'
7: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:220:in `each'
6: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:226:in `block (3 levels) in process'
5: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:267:in `write'
4: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:248:in `io'
3: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/client.rb:269:in `block in write'
2: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/connection/ruby.rb:356:in `write'
1: from /home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/connection/command_helper.rb:30:in `build_command'
/home/albert/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/redis-4.0.1/lib/redis/connection/command_helper.rb:30:in `join': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
If I change the ’
from doesn’t
things go well.
I can hopefully just change all these in my test suite, but it seems we have very enthusiastic developers when comes to proper apostrophes in our team 😆 and we'll just be one ’
away from test catastrophe.
We are getting LoadError: cannot load such file
errors from the ci-queue
ruby gem when trying to use it. It looks to me that the gem specifies several development_dependencies
but uses them outside of development at runtime.
The fix on our end is to require this gems dependencies in our gemfile
. The gem should properly specify its dependencies so installing it via bundler
will grab everything it needs.
Here are the errors I've encountered:
bundler: failed to load command: minitest-queue (/tmp/bundle/ruby/2.7.0/bin/minitest-queue)
--
| LoadError: cannot load such file -- redis
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/ci/queue/redis.rb:3:in `require'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/ci/queue/redis.rb:3:in `<top (required)>'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/ci/queue.rb:47:in `require'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/ci/queue.rb:47:in `from_uri'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/minitest/queue/runner.rb:31:in `run!'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/minitest/queue/runner.rb:19:in `invoke'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/exe/minitest-queue:5:in `<top (required)>'
| /tmp/bundle/ruby/2.7.0/bin/minitest-queue:23:in `load'
| /tmp/bundle/ruby/2.7.0/bin/minitest-queue:23:in `<top (required)>'
bundler: failed to load command: minitest-queue (/tmp/bundle/ruby/2.7.0/bin/minitest-queue)
--
| LoadError: cannot load such file -- minitest/reporters
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/minitest/queue.rb:4:in `require'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/minitest/queue.rb:4:in `<top (required)>'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/minitest/queue/runner.rb:4:in `require'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/lib/minitest/queue/runner.rb:4:in `<top (required)>'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/exe/minitest-queue:4:in `require'
| /tmp/bundle/ruby/2.7.0/gems/ci-queue-0.20.5/exe/minitest-queue:4:in `<top (required)>'
| /tmp/bundle/ruby/2.7.0/bin/minitest-queue:23:in `load'
| /tmp/bundle/ruby/2.7.0/bin/minitest-queue:23:in `<top (required)>'
I think this is causing the issue where skipped tests are reported in the buildkite annotation, instead of any actual failures:
<testsuite name="ShopSearchIndexingJobTest" filepath="components/platform/test/unit/jobs/shop_search_indexing_job_test.rb" skipped="1" failures="0" errors="0" tests="1" assertions="0" time="0.06969359999993685">
<testcase name="test_update_shop_documents_when_shop's_active_field_changes" lineno="30" classname="ShopSearchIndexingJobTest" assertions="0" time="0.06969359999993685" flaky_test="true">
<failure type="test_update_shop_documents_when_shop's_active_field_changes" message="This test is very slow: [url]">
Skipped: test_update_shop_documents_when_shop's_active_field_changes(ShopSearchIndexingJobTest) [/app/components/platform/test/unit/jobs/shop_search_indexing_job_test.rb:31]: This test is very slow: [url]
</failure>
</testcase>
</testsuite>
This was a minitest skipped test (as in: skip
at the top of the test definition, not testability magic). 2 things:
cc @Shopify/pipeline
Hello!
We're in the process of evaluating ci-queue for our RSpec suite and I'd like to know: is the timing of a previous example run taken into account when scheduling the example in future builds? E.g. "schedule slower examples first".
Thanks in advance!
As discussed on Slack: https://shopify.slack.com/archives/C9BPSMUDC/p1521483816000555
When a failing (flaky) test turns green after a retry, the test results always keep failing on retry. The failing tests returned are the one before the retry, so the only way to get a passing build is to retry the entire suite 🔥💰
I've just encountered the following behaviour with the rspec runner of ci-queue:
NameError
, e.g. by referring to a non-existent class:describe MyNonExistentClass do
...
end
I guessed wrongly that rspec-queue
would fail with a non-zero exit code, but it exited with 0
.
Bare rspec
exits with 1
in the above situation.
I'm wondering if this is intended behaviour - perhaps its needed for retries to work?
The error message output is:
NameError: uninitialized constant MyNonExistentClass
Debugging shows that line 400 is where the 0
exit code is returned, because the NameError
causes @world.non_example_failure
to be true
. Syntax errors appear to have the same result:
ci-queue/ruby/lib/rspec/queue.rb
Lines 387 to 401 in cba2cdd
With this test suite:
# test_example.py
from unittest import TestCase
class TestExample(TestCase):
def test_fail(self):
assert False
Running works as expected:
$ pytest -p ciqueue.pytest --queue 'redis://localhost:6379/0?worker=0&build=repro&retry=0&timeout=10' test_example.py
========================================================= test session starts =========================================================
platform darwin -- Python 2.7.13, pytest-4.0.1, py-1.7.0, pluggy-0.6.0
...
collected 1 item
test_example.py F
============================================================== FAILURES ===============================================================
________________________________________________________ TestExample.test_fail ________________________________________________________
self = <test_example.TestExample testMethod=test_fail>
def test_fail(self):
> assert False
E AssertionError: assert False
test_example.py:5: AssertionError
====================================================== 1 failed in 0.06 seconds =======================================================
However, reporting shows that the test passed:
$ pytest -p ciqueue.pytest_report --queue 'redis://localhost:6379/0?worker=0&build=repro&retry=0&timeout=10' test_example.py
========================================================= test session starts =========================================================
platform darwin -- Python 2.7.13, pytest-4.0.1, py-1.7.0, pluggy-0.6.0
...
collected 1 item
test_example.py . [100%]
====================================================== 1 passed in 0.02 seconds =======================================================
This appears to be due to ciqueue's makereport triggering before the one in pytest which is responsible for setting call.excinfo for TestCase-style tests.
From these docs it looks like:
tryfirst=True
hooks is undefinedhookwrapper=True
can execute code before a tryfirst
hookSo, one fix could be to get pytest to switch to using hookwrapper
. I figured I'd raise it with you folks first, though, in case it's something that's easier to fix on this end (by making ciqueue execute later, somehow).
See #113 (comment)
ciqueue for pytest got somewhat outdated:
👋 This would be a nice to have, as it would more closely mirror how the ruby cli operates. In the cli I can do something like this:
ruby -Ilib -Itest ./test/some_test
which is equivalent to:
ruby -Ilib:test ./test/some_test
For some command like this:
bundle exec minitest-queue run --queue redis://maestro-service-prototype.railgun --max-requeues 3 --requeue-tolerance 0.01 --timeout 45 -Ilib -Itest ./test/maestro_service/conductor_test.rb
The gem will only respect the last -I
flag. It would be nice to have it allow multiple -I
flags, or maybe output some warning if it sees multiple of the same flag.
Apologies if this is not the place to ask questions, but I couldn't find a better one (Discussions would be a great fit here).
I have a few questions regarding how ci-queue splits tests in an RSpec test suite.
it
block)? In other words, are individual items in the queue just specific examples? Or maybe is it whole example groups (i.e. describe
with multiple it
blocks inside)?describe "foo" do
# a very expensive factory
let!(:car) { Factory.create(:car) }
it "works" do
# some tests...
end
it "moves" do
# some tests...
end
end
it
blocks in different workers might be less optimal than scheduling the whole describe
block (or whole file) in a single worker. That's because in the latter case, the expensive factory would be created once, while in the former case it'll be created two times (one in each worker).In general, I'm interested in any insights you might have regarding the optimal way to split a large RSpec suite, with regards to the "smallest unit of work".
Thanks in advance!
Looking at our graphs of redis memory usage, it looks like this gem only removes keys when it hits the memory limit of redis.
This is causing us some problems, as the redis SaaS we are using has a total memory limit for all our instances, and our db for the ci-test-queue using all the memory it can use means other databases have less memory available than they need to operate properly.
I thought I would investigate whether there was a way to clear out old test runs from the ci-queue.
If this isn't currently possible, I'd be happy to put in the work to make it possible. My thought would be to add an optional ttl to the redis commands storing data, but I'd be interested in hearing alternative approaches.
We're using the rubygem version 0.16.0.
Support for Ruby 2.7 is ended, so it can be removed from CI and the gem
Since ci-queue cannot by necessity respect before(:all)
hooks, I have a workaround to run the required setup in my CI script. However, I'd like to avoid running it if the queue is empty (i.e. other workers have already exhausted it), so it would be cool to have something like:
unless `rspec-queue --query queue-size`.chomp! == '0'
# do expensive setup
end
If there's already a straightforward programmatic way to do this, I'm happy to use that instead of adding a feature!
Looks like this gsub
is removing all the project roots to make paths relative, but causes issues when the project root path/name is also within a nested subdirectory:
Take this example path in a backtrace: components/fooservices/foo/bar.rb
. It should be components/foo/app/services/foo/bar.rb
. The app/
project root is being removed from within paths when it should just be at the start.
We probably have to iterate over each line in backtrace
and only remove it from the start of the line? Or use the relative path helpers.
cc @wvanbergen
It would be great if ci-queue supported testing in other languages/frameworks, such as python's pytest.
The current way we get python tests to run tests off a ci queue is to load the queue with tests using the following command:
py.test -q --collect-only
and use pytest_redis to pull the tests off the queue and run them
py.test -vv --junitxml=pytest.xml -p pytest_redis \
--redis-host=#{redis_host} \
--redis-port=#{redis_port} \
--redis-list-key=#{redis_list_key} \
--redis-backup-list-key=#{redis_backup_list_key}"
cc/ @Shopify/byroot @casperisfine
👋 This gem looks really interesting! Forgive me for asking what might be a dumb question...
I have an application whose test suite I've parallelized in two levels. First, the test files are split up across a dozen or two containers (using circleci tests glob "test/system/**/*_test.rb" | circleci tests split --split-by=timings
). Second, each container runs its test using a few parallel processes (via the parallelize
method from Rails 6). The second level helps to further reduce the overall runtime and can parallelize individual tests within the largest test files.
This has worked pretty well, but sometimes one container is just really slow to start up so we end up with the job waiting on that one container when all the rest have already finished. Here's a particularly egregious example:
This has led me to look into queueing solutions that would minimize the effects of one slow container, and I happened to find this one, which looks promising. My question is: if I were to adopt this, would I still be able to leverage the second level of Rails' multiple-process parallelism within each container, or are there some incompatibilities that prevent that?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.