Comments (66)
Any update on this? We can confirm this started happening as well after upgrading to Sidekiq 7 / Redis 7. We are using Redis Cloud by Redis Labs.
from sidekiq.
FYI, many people using a cloud service are using Redis Sentinel under the covers and could be hitting this newly discovered bug: redis-rb/redis-client#197 (review)
from sidekiq.
We're having the same problem. A lightly used Heroku instance running Sidekiq with a Redis instance hosted on Redis Cloud. Every few minutes we get new errors in the log:
Jan 21 13:23:15 staging app/worker.1 pid=2 tid=20be INFO: Redis is online, 3.303780886111781 sec downtime
Jan 21 13:23:15 staging app/worker.1 pid=2 tid=20by INFO: Redis is online, 3.302901866962202 sec downtime
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=1x3u ERROR: Error fetching job: Waited 3.0 seconds
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=1x3u WARN: RedisClient::ReadTimeoutError: Waited 3.0 seconds
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=1x3u WARN:
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=206e ERROR: Error fetching job: Waited 3.0 seconds
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=206e WARN: RedisClient::ReadTimeoutError: Waited 3.0 seconds
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=206e WARN:
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=2096 ERROR: Error fetching job: Waited 3.0 seconds
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=2096 WARN: RedisClient::ReadTimeoutError: Waited 3.0 seconds
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=2096 WARN:
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=20by ERROR: Error fetching job: Waited 3.0 seconds
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=20by WARN: RedisClient::ReadTimeoutError: Waited 3.0 seconds
Jan 21 13:27:44 staging app/worker.1 pid=2 tid=20by WARN:
This is our redis-cli info
output for our Redis Cloud instance:
# Server
redis_version:7.2.0
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:0000000000000000000000000000000000000000
redis_mode:standalone
os:Linux 5.4.0-1049-aws x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:7.5.0
process_id:12105563
run_id:cd3398a50315a259bb36537848cffd8bace34995
tcp_port:12516
server_time_usec:1705939219000000
uptime_in_seconds:86046
uptime_in_days:0
hz:10
lru_clock:0
config_file:
# Clients
connected_clients:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
maxclients:30
cluster_connections:0
# Memory
used_memory:3066544
used_memory_human:2.92M
used_memory_rss:3066544
used_memory_peak:3597648
used_memory_peak_human:3.43M
used_memory_lua:35840
maxmemory_policy:noeviction
mem_fragmentation_ratio:1
mem_allocator:jemalloc-5.3.0
# Persistence
loading:0
rdb_changes_since_last_save:58378
rdb_bgsave_in_progress:0
rdb_last_save_time:1705853176
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
# Stats
total_connections_received:141
total_commands_processed:325380
instantaneous_ops_per_sec:0
total_net_input_bytes:29456100
total_net_output_bytes:231400860
instantaneous_input_kbps:0.07
instantaneous_output_kbps:2.54
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:69
evicted_keys:0
keyspace_hits:50243
keyspace_misses:434611
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
migrate_cached_sockets:0
total_forks:0
total_error_replies:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
total_active_defrag_time:0
current_active_defrag_time:0
# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:0.00
used_cpu_user:0.00
used_cpu_sys_children:0.00
used_cpu_user_children:0.00
used_cpu_sys_main_thread:0.00
used_cpu_user_main_thread:0.00
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=39,expires=36,avg_ttl=17470728473
From our Gemfile
:
redis (5.0.8)
redis-client (>= 0.17.0)
redis-client (0.19.1)
redis-client (>= 0.19.0)
sidekiq (7.2.1)
Our initializer:
Sidekiq.configure_server do |config|
config.redis = { ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE } }
end
Sidekiq.configure_client do |config|
config.redis = { ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE } }
end
Our config:
:max_retries: 7
:queues:
- default
- low
I've contacted Redis Cloud support but they haven't provided any helpful guidance yet.
from sidekiq.
I would recommend trying OpenSSL 3. Mixing modern versions (Ruby 3.3, redis 7.2) with legacy versions like OpenSSL 1.1 may cause subtle issues. That version is almost 4 years old now.
from sidekiq.
The only difference is Render has a TLS URL whereas Redis Cloud provides a non-TLS URL. Could this have anything to do with SSL verify mode being set to "none" when a non-TLS URL is being used?
No. I had the same issue when I tried with Heroku Data for Redis which provides a TLS URL.
from sidekiq.
I would recommend trying OpenSSL 3. Mixing modern versions (Ruby 3.3, redis 7.2) with legacy versions like OpenSSL 1.1 may cause subtle issues. That version is almost 4 years old now.
@mperham, my OpenSSL version is 3.2.0
but I have the same issue.
from sidekiq.
@mperham I don't remember if we discussed this before.
But it just hit me that for many of these reports, it might simply due to redis-client
default timeout being much stricter than the old redis 4.x
gem. It used to be 5 seconds (which is a bit bonkers), for redis-client
it's just 1 second.
This probably participated in handling lots of transient issues with various cloud providers, especially slow ones etc.
Maybe it would make sense to set an higher default timeout in Sidekiq, as it's much less latency sensitive as a web server.
from sidekiq.
Hi Mike. Do you have a timeframe for when you'll be pushing 7.3 to RubyGems?
from sidekiq.
Just to add more info:
I am using redis-client 0.22.1
and I still see RedisClient::ReadTimeoutError 3seconds
at least 2 times per day.
from sidekiq.
What version is the Redis server?
from sidekiq.
Can you include the output from redis-cli info
?
from sidekiq.
Sure.
# Server
redis_version:7.2.0
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:0000000000000000000000000000000000000000
redis_mode:standalone
os:Linux 5.4.0-1071-aws x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:7.5.0
process_id:12089385
run_id:f28c63be14220b56c1c14028bab584487d4d0fb1
tcp_port:13144
server_time_usec:1705005037000000
uptime_in_seconds:121430
uptime_in_days:1
hz:10
lru_clock:0
config_file:
# Clients
connected_clients:7
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:5
maxclients:256
cluster_connections:0
# Memory
used_memory:4135064
used_memory_human:3.94M
used_memory_rss:4135064
used_memory_peak:4319784
used_memory_peak_human:4.11M
used_memory_lua:40960
maxmemory_policy:noeviction
mem_fragmentation_ratio:1
mem_allocator:jemalloc-5.3.0
# Persistence
loading:0
rdb_changes_since_last_save:16636
rdb_bgsave_in_progress:0
rdb_last_save_time:1704883611
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
# Stats
total_connections_received:50
total_commands_processed:97020
instantaneous_ops_per_sec:4
total_net_input_bytes:18699852
total_net_output_bytes:223692477
instantaneous_input_kbps:0.45
instantaneous_output_kbps:2.58
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:5
evicted_keys:0
keyspace_hits:16247
keyspace_misses:235563
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
migrate_cached_sockets:0
total_forks:0
total_error_replies:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
total_active_defrag_time:0
current_active_defrag_time:0
# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:0.00
used_cpu_user:0.00
used_cpu_sys_children:0.00
used_cpu_user_children:0.00
used_cpu_sys_main_thread:0.00
used_cpu_user_main_thread:0.00
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=15,expires=11,avg_ttl=28671441902
from sidekiq.
I'm assuming that is your local Redis and not a production Redis. "db0:keys=15" is basically empty. I need to see production.
from sidekiq.
This is production. I am using Redis Cloud.
from sidekiq.
from sidekiq.
Well that certainly looks lightly used. I can't explain why you'd see any errors. Make sure the redis-client
gem is up to date.
from sidekiq.
Confirmed we are using 0.19.1
.
from sidekiq.
Ruby and OpenSSL versions in use?
from sidekiq.
ruby 3.3.0 (2023-12-25 revision 5124f9ac75)
OpenSSL 1.1.1f 31 Mar 2020
I also ran redis-cli --latency
and got:
min: 24, max: 566, avg: 38.93 (1081 samples)
I'm starting to wonder if it was a zombie process. I created a Redis instance on Render and pointed my config there instead and have not been seeing the errors. I'll point back at Redis Cloud and see if the errors come back...
from sidekiq.
As soon as I switched back to Redis Cloud I started getting the errors again:
2024-01-22T16:39:27.504660+00:00 app[worker.1]: pid=2 tid=206y class=ActiveStorage::PurgeJob jid=2526685e9b6fdb2d4caf744a elapsed=0.059 INFO: done
2024-01-22T16:41:01.945720+00:00 app[worker.1]: pid=2 tid=20sm ERROR: Error fetching job: Waited 3.0 seconds
2024-01-22T16:41:01.946268+00:00 app[worker.1]: pid=2 tid=20sm WARN: RedisClient::ReadTimeoutError: Waited 3.0 seconds
2024-01-22T16:41:01.946843+00:00 app[worker.1]: pid=2 tid=20sm WARN:
2024-01-22T16:41:01.947817+00:00 app[worker.1]: pid=2 tid=20t6 ERROR: Error fetching job: Waited 3.0 seconds
2024-01-22T16:41:01.947818+00:00 app[worker.1]: pid=2 tid=20t6 WARN: RedisClient::ReadTimeoutError: Waited 3.0 seconds
2024-01-22T16:41:01.948097+00:00 app[worker.1]: pid=2 tid=20t6 WARN:
2024-01-22T16:41:01.948898+00:00 app[worker.1]: pid=2 tid=20tq ERROR: Error fetching job: Waited 3.0 seconds
2024-01-22T16:41:01.948964+00:00 app[worker.1]: pid=2 tid=20tq WARN: RedisClient::ReadTimeoutError: Waited 3.0 seconds
2024-01-22T16:41:01.949192+00:00 app[worker.1]: pid=2 tid=20tq WARN:
The only difference is Render has a TLS URL whereas Redis Cloud provides a non-TLS URL. Could this have anything to do with SSL verify mode being set to "none" when a non-TLS URL is being used?
from sidekiq.
After upgrading to the heroku-22 stack with OpenSSL 3.0.2, the problem appears to have gone away.
Perhaps Redis Cloud is running something on their end that does not play nicely with older versions of OpenSSL, whereas Render and Heroku Redis aren't.
@mperham thanks for your help.
from sidekiq.
I spoke too soon. The error is back again. It took an hour or so to start happening. I restarted the dyno and it went away. I'll wait another hour and see if it comes back.
from sidekiq.
It came back after 30 mins this time. Not sure what's going on. Same logs as before:
2024-01-22T23:24:20.474062+00:00 app[worker.1]: pid=2 tid=206y ERROR: Error fetching job: Waited 3.0 seconds
2024-01-22T23:24:20.474209+00:00 app[worker.1]: pid=2 tid=206y WARN: RedisClient::ReadTimeoutError: Waited 3.0 seconds
2024-01-22T23:24:20.474391+00:00 app[worker.1]: pid=2 tid=206y WARN:
2024-01-22T23:24:20.475225+00:00 app[worker.1]: pid=2 tid=20tq ERROR: Error fetching job: Waited 3.0 seconds
2024-01-22T23:24:20.475369+00:00 app[worker.1]: pid=2 tid=20tq WARN: RedisClient::ReadTimeoutError: Waited 3.0 seconds
2024-01-22T23:24:20.475636+00:00 app[worker.1]: pid=2 tid=20tq WARN:
2024-01-22T23:24:20.476392+00:00 app[worker.1]: pid=2 tid=20ve ERROR: Error fetching job: Waited 3.0 seconds
2024-01-22T23:24:20.476465+00:00 app[worker.1]: pid=2 tid=20ve WARN: RedisClient::ReadTimeoutError: Waited 3.0 seconds
2024-01-22T23:24:20.476694+00:00 app[worker.1]: pid=2 tid=20ve WARN:
2024-01-22T23:24:20.478731+00:00 app[worker.1]: pid=2 tid=20vy ERROR: Error fetching job: Waited 3.0 seconds
2024-01-22T23:24:20.478911+00:00 app[worker.1]: pid=2 tid=20vy WARN: RedisClient::ReadTimeoutError: Waited 3.0 seconds
2024-01-22T23:24:20.479083+00:00 app[worker.1]: pid=2 tid=20vy WARN:
from sidekiq.
Experiencing the same or a similar issue here. We're using Heroku redis. Also using heroku-22 stack, and exact same versions as originally reported. Our error message is
RedisClient::ReadTimeoutError Waited 2.0 seconds
which is different to the 3.0 seconds everyone else is reporting
Ruby version: 3.2.2
Rails version: 7.1.2
Sidekiq / Pro / Enterprise version(s): Sidekiq 7.2.0
..however our redis-client version is 0.19.1.
I've raised it with heroku support, they've sent me here. It is happening very rarely, so it's really just noise. (Bearing in mind this is happening on an application with very low utilisation)
Here is the output from redis-cli info:
# Server
redis_version:7.0.14
redis_git_sha1:c1d92a69
redis_git_dirty:1
redis_build_id:b1de837cc15df969
redis_mode:standalone
os:Linux 5.15.0-1049-aws x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:9.4.0
process_id:4
process_supervised:no
run_id:a64abd3b6e3e88ecfd416eb1d2954d26c1238338
tcp_port:29459
server_time_usec:1706154004108726
uptime_in_seconds:4840594
uptime_in_days:56
hz:10
configured_hz:10
lru_clock:11655188
executable:/usr/lib/redis/7.0/bin/redis-server
config_file:/etc/redis/heroku-redis.conf
io_threads_active:0
# Clients
connected_clients:5
cluster_connections:0
maxclients:40
client_recent_max_input_buffer:20617
client_recent_max_output_buffer:20504
blocked_clients:2
tracking_clients:0
clients_in_timeout_table:2
# Memory
used_memory:3384600
used_memory_human:3.23M
used_memory_rss:13467648
used_memory_rss_human:12.84M
used_memory_peak:3685064
used_memory_peak_human:3.51M
used_memory_peak_perc:91.85%
used_memory_overhead:1804346
used_memory_startup:534480
used_memory_dataset:1580254
used_memory_dataset_perc:55.45%
allocator_allocated:3545968
allocator_active:4493312
allocator_resident:7188480
total_system_memory:16456396800
total_system_memory_human:15.33G
used_memory_lua:44032
used_memory_vm_eval:44032
used_memory_lua_human:43.00K
used_memory_scripts_eval:4928
number_of_cached_scripts:4
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:76800
used_memory_vm_total_human:75.00K
used_memory_functions:184
used_memory_scripts:5112
used_memory_scripts_human:4.99K
maxmemory:52428800
maxmemory_human:50.00M
maxmemory_policy:noeviction
allocator_frag_ratio:1.27
allocator_frag_bytes:947344
allocator_rss_ratio:1.60
allocator_rss_bytes:2695168
rss_overhead_ratio:1.87
rss_overhead_bytes:6279168
mem_fragmentation_ratio:4.01
mem_fragmentation_bytes:10105344
mem_not_counted_for_evict:36024
mem_replication_backlog:1048592
mem_total_replication_buffers:1086712
mem_clients_slaves:38136
mem_clients_normal:70746
mem_cluster_links:0
mem_aof_buffer:2048
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:5624547
rdb_bgsave_in_progress:0
rdb_last_save_time:1701313410
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:802816
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:0
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:7
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:1073152
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:28595763
aof_base_size:124430
aof_pending_rewrite:0
aof_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0
# Stats
total_connections_received:163387
total_commands_processed:44273118
instantaneous_ops_per_sec:16
total_net_input_bytes:4346006451
total_net_output_bytes:1504233864
total_net_repl_input_bytes:0
total_net_repl_output_bytes:504714686
instantaneous_input_kbps:1.34
instantaneous_output_kbps:0.93
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.83
rejected_connections:0
sync_full:1
sync_partial_ok:0
sync_partial_err:0
expired_keys:69298
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:61829
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:2016167
keyspace_misses:2208268
pubsub_channels:1
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:516
total_forks:8
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:3
dump_payload_sanitizations:0
total_reads_processed:36964298
total_writes_processed:33642871
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:9707
reply_buffer_expands:3740
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.191.150,port=11789,state=online,offset=504714441,lag=0
master_failover_state:no-failover
master_replid:6056d1f9af710dbc8b88c1eb948d192cf3ac8cf9
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:504714441
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:503649833
repl_backlog_histlen:1064609
# CPU
used_cpu_sys:2214.261827
used_cpu_user:5403.988730
used_cpu_sys_children:0.015823
used_cpu_user_children:0.011875
used_cpu_sys_main_thread:2159.962719
used_cpu_user_main_thread:5395.535469
# Modules
# Errorstats
errorstat_NOAUTH:count=3
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=1136,expires=1126,avg_ttl=14160610755
Very similar sidekiq config except we're using sidekiq pro
# frozen_string_literal: true
Sidekiq.configure_server do |config|
config.super_fetch!
config.reliable_scheduler!
# Need to configure SSL to VERIFY_NONE to work with redis 6 on Heroku
# https://stackoverflow.com/questions/65834575/how-to-enable-tls-for-redis-6-on-sidekiq
config.redis = { ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE } }
end
Sidekiq.configure_client do |config|
# Need to configure SSL to VERIFY_NONE to work with redis 6 on Heroku
# https://stackoverflow.com/questions/65834575/how-to-enable-tls-for-redis-6-on-sidekiq
config.redis = { ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE } }
end
from sidekiq.
Experiencing the same issue on a self-managed server running Ubuntu 22.04. Like others, we're seeing this intermittently. We get a burst of 5-15 exception reports from our app, then the problem disappears for another week or so. Logs show CPU and memory are NOT under load when this problem occurs.
Here is the redis-cli info
:
# Server
redis_version:7.2.4
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:44edf16bebe6fed2
redis_mode:standalone
os:Linux 5.15.0-84-generic x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:11.4.0
process_id:2140443
process_supervised:systemd
run_id:2cd2d22e1e896a2755a48e51b75b96e8b571b655
tcp_port:6379
server_time_usec:1707529719360288
uptime_in_seconds:1587085
uptime_in_days:18
hz:10
configured_hz:10
lru_clock:13030903
executable:/usr/bin/redis-server
config_file:/etc/redis/redis.conf
io_threads_active:0
listener0:name=tcp,bind=127.0.0.1,bind=-::1,port=6379
# Clients
connected_clients:9
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:20523
client_recent_max_output_buffer:0
blocked_clients:5
tracking_clients:0
clients_in_timeout_table:5
total_blocking_keys:1
total_blocking_keys_on_nokey:0
# Memory
used_memory:2223584
used_memory_human:2.12M
used_memory_rss:9883648
used_memory_rss_human:9.43M
used_memory_peak:2449608
used_memory_peak_human:2.34M
used_memory_peak_perc:90.77%
used_memory_overhead:1200687
used_memory_startup:866056
used_memory_dataset:1022897
used_memory_dataset_perc:75.35%
allocator_allocated:2363464
allocator_active:2703360
allocator_resident:5783552
total_system_memory:8323088384
total_system_memory_human:7.75G
used_memory_lua:39936
used_memory_vm_eval:39936
used_memory_lua_human:39.00K
used_memory_scripts_eval:312
number_of_cached_scripts:1
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:72704
used_memory_vm_total_human:71.00K
used_memory_functions:184
used_memory_scripts:496
used_memory_scripts_human:496B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.14
allocator_frag_bytes:339896
allocator_rss_ratio:2.14
allocator_rss_bytes:3080192
rss_overhead_ratio:1.71
rss_overhead_bytes:4100096
mem_fragmentation_ratio:4.45
mem_fragmentation_bytes:7661888
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_clients_slaves:0
mem_clients_normal:140399
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:246
rdb_bgsave_in_progress:0
rdb_last_save_time:1707529507
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_saves:5272
rdb_last_cow_size:1007616
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:1581
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
# Stats
total_connections_received:525
total_commands_processed:7283763
instantaneous_ops_per_sec:2
total_net_input_bytes:420025646
total_net_output_bytes:61687497
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.10
instantaneous_output_kbps:0.01
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:18570
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:229262
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:794864
keyspace_misses:317229
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:732
total_forks:5272
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:7
dump_payload_sanitizations:0
total_reads_processed:5411013
total_writes_processed:5410491
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:8665
reply_buffer_expands:8442
eventloop_cycles:20543014
eventloop_duration_sum:4680801201
eventloop_duration_cmd_sum:78258250
instantaneous_eventloop_cycles_per_sec:11
instantaneous_eventloop_duration_usec:175
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:c0659ec6f47274cc82baf25f4411dbe40e2b8263
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:2012.975291
used_cpu_user:2544.071009
used_cpu_sys_children:21.285451
used_cpu_user_children:32.497599
used_cpu_sys_main_thread:2012.578427
used_cpu_user_main_thread:2543.784237
# Modules
# Errorstats
errorstat_NOAUTH:count=6
errorstat_NOSCRIPT:count=1
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=2005,expires=2000,avg_ttl=86398957318
from sidekiq.
I've noticed that no one posted a backtrace. Can anyone supply a backtrace or is it swallowed in this case?
from sidekiq.
@mperham this is what our app reported to Sentry:
RedisClient::ReadTimeoutError
Waited 3.0 seconds
redis-client (0.19.1) lib/redis_client/ruby_connection/buffered_io.rb in block in fill_buffer at line 140
redis-client (0.19.1) lib/redis_client/ruby_connection/buffered_io.rb in loop at line 122
redis-client (0.19.1) lib/redis_client/ruby_connection/buffered_io.rb in fill_buffer at line 122
redis-client (0.19.1) lib/redis_client/ruby_connection/buffered_io.rb in ensure_remaining at line 114
redis-client (0.19.1) lib/redis_client/ruby_connection/buffered_io.rb in getbyte at line 85
redis-client (0.19.1) lib/redis_client/ruby_connection/resp3.rb in parse at line 113
redis-client (0.19.1) lib/redis_client/ruby_connection/resp3.rb in load at line 50
redis-client (0.19.1) lib/redis_client/ruby_connection.rb in block in read at line 96
redis-client (0.19.1) lib/redis_client/ruby_connection/buffered_io.rb in with_timeout at line 48
redis-client (0.19.1) lib/redis_client/ruby_connection.rb in read at line 96
redis-client (0.19.1) lib/redis_client/connection_mixin.rb in call at line 31
redis-client (0.19.1) lib/redis_client.rb in block (2 levels) in blocking_call at line 318
redis-client (0.19.1) lib/redis_client/middlewares.rb in call at line 16
redis-client (0.19.1) lib/redis_client.rb in block in blocking_call at line 317
redis-client (0.19.1) lib/redis_client.rb in ensure_connected at line 677
redis-client (0.19.1) lib/redis_client.rb in blocking_call at line 316
redis-client (0.19.1) lib/redis_client/decorator.rb in blocking_call at line 26
sidekiq (7.2.0) lib/sidekiq/fetch.rb in block in retrieve_work at line 47
sidekiq (7.2.0) lib/sidekiq/capsule.rb in block in redis at line 102
connection_pool (2.4.1) lib/connection_pool.rb in block (2 levels) in with at line 110
connection_pool (2.4.1) lib/connection_pool.rb in handle_interrupt at line 109
connection_pool (2.4.1) lib/connection_pool.rb in block in with at line 109
connection_pool (2.4.1) lib/connection_pool.rb in handle_interrupt at line 106
connection_pool (2.4.1) lib/connection_pool.rb in with at line 106
sidekiq (7.2.0) lib/sidekiq/capsule.rb in redis at line 99
sidekiq (7.2.0) lib/sidekiq/component.rb in redis at line 28
sidekiq (7.2.0) lib/sidekiq/fetch.rb in retrieve_work at line 47
sidekiq (7.2.0) lib/sidekiq/processor.rb in get_one at line 87
sidekiq (7.2.0) lib/sidekiq/processor.rb in fetch at line 99
sidekiq (7.2.0) lib/sidekiq/processor.rb in process_one at line 81
sidekiq (7.2.0) lib/sidekiq/processor.rb in run at line 72
sidekiq (7.2.0) lib/sidekiq/component.rb in watchdog at line 10
sidekiq (7.2.0) lib/sidekiq/component.rb in block in safe_thread at line 19
from sidekiq.
Ugh, that looks like a bog standard read timeout from an overloaded Redis. I can't see any other hints about what's wrong. I'm still flummoxed.
from sidekiq.
RedisClient::ReadTimeoutError: Unknown Error (hiredis_read) (RedisClient::ReadTimeoutError)
from hiredis-client (0.20.0) lib/redis_client/hiredis_connection.rb:95:in `_read'
from hiredis-client (0.20.0) lib/redis_client/hiredis_connection.rb:95:in `read'
from redis-client (0.20.0) lib/redis_client/connection_mixin.rb:31:in `call'
from redis-client (0.20.0) lib/redis_client.rb:318:in `block (2 levels) in blocking_call'
from redis-client (0.20.0) lib/redis_client/middlewares.rb:16:in `call'
from redis-client (0.20.0) lib/redis_client.rb:317:in `block in blocking_call'
from redis-client (0.20.0) lib/redis_client.rb:677:in `ensure_connected'
from redis-client (0.20.0) lib/redis_client.rb:316:in `blocking_call'
from redis-client (0.20.0) lib/redis_client/decorator.rb:26:in `blocking_call'
from sidekiq (7.2.1) lib/sidekiq/fetch.rb:47:in `block in retrieve_work'
from sidekiq (7.2.1) lib/sidekiq/capsule.rb:102:in `block in redis'
from connection_pool (2.4.1) lib/connection_pool.rb:110:in `block (2 levels) in with'
from connection_pool (2.4.1) lib/connection_pool.rb:109:in `handle_interrupt'
from connection_pool (2.4.1) lib/connection_pool.rb:109:in `block in with'
from connection_pool (2.4.1) lib/connection_pool.rb:106:in `handle_interrupt'
from connection_pool (2.4.1) lib/connection_pool.rb:106:in `with'
from sidekiq (7.2.1) lib/sidekiq/capsule.rb:99:in `redis'
from sidekiq (7.2.1) lib/sidekiq/component.rb:28:in `redis'
from sidekiq (7.2.1) lib/sidekiq/fetch.rb:47:in `retrieve_work'
from sidekiq (7.2.1) lib/sidekiq/processor.rb:87:in `get_one'
from sidekiq (7.2.1) lib/sidekiq/processor.rb:99:in `fetch'
from sidekiq (7.2.1) lib/sidekiq/processor.rb:81:in `process_one'
from sidekiq (7.2.1) lib/sidekiq/processor.rb:72:in `run'
from sidekiq (7.2.1) lib/sidekiq/component.rb:10:in `watchdog'
from sidekiq (7.2.1) lib/sidekiq/component.rb:19:in `block in safe_thread'
we are using sidekiq on same env for long time and this issue started only after update to sidekiq 7. Maybe it is bad new redis-client
(because no such issues with redis.rb
), maybe something related to new feature capsules
, maybe usage of new redis protocol inside redis-client
show that protocol is bad, than old one... Not clear where is the issue. Maybe if we will make redis client "pluggable" for sidekiq, it will help to understand where it came from (like select to use redis-client
or redis.rb
for sidekiq)
Looks like right now to solve this issue is best approach is not to update to sidekiq 7.
- Redis not overload, same with sidekiq (cpu load for both less the 10%)
- Redis 7.0.14 version
- No TLS connection
- latest stable openssl, ruby, rails, redis-client gem
from sidekiq.
@casperisfine do you have any advice for tracking down random read timeouts in redis-client?
from sidekiq.
Not particularly no. You have to experience it yourself and know TCP / SSL etc to poke at it to figure out what's wrong. Not really possible to do proxy debugging with these things.
What's really not clear to me is why it would have started with Sidekiq 7. There was of course a bunch of change in the Redis client, but fundamentally, they don't work very differently. Timeout are unfortunately a common things, especially on public cloud, so I wonder if it's not just a reporting difference? Were previous versions of sidekiq reporting these errors or just swallowing them and retrying?
Also redis-rb
retries commands by default, but IIRC Sidekiq configure redis-client
with 1 retry too, right?
Again, super hard to tell from just reports like this, I also don't see any common point in the various reports. I see some TLS, some no-TLS, some hiredis-client
, some Ruby driver, so that's pretty much all the codepaths in redis-client
. I often write bugs, but I have a hard time believing all these configuration are similarly bugged.
from sidekiq.
Good questions.
I've also seen compatibility issues with OpenSSL 3 and Ruby 3 along with actual Redis server bugs with OpenSSL 3. Make sure you're upgrading those stack bits to get network fixes. These errors are "below" Sidekiq in the stack and sadly I don't have really any insight into what's causing the issue.
from sidekiq.
I also have same problem.
my app is on heroku.com.
versions is like this
ruby 3.3.1
rails 6.1.7.7
sidekiq 7.2.4
redis (5.2.0)
redis-client (>= 0.22.0)
redis-client (0.22.1)
connection_pool
Heroku Data for Redis VERSION 7.0.15
openssl version on heroku-22 stack : openssl: '3.0.10+quic'
This is the backtrace
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/ruby_connection/buffered_io.rb:214:in `block in fill_buffer'
<internal:kernel>:187:in `loop'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/ruby_connection/buffered_io.rb:197:in `fill_buffer'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/ruby_connection/buffered_io.rb:187:in `ensure_remaining'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/ruby_connection/buffered_io.rb:152:in `getbyte'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/ruby_connection/resp3.rb:113:in `parse'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/ruby_connection/resp3.rb:50:in `load'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/ruby_connection.rb:98:in `block in read'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/ruby_connection/buffered_io.rb:114:in `with_timeout'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/ruby_connection.rb:98:in `read'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/connection_mixin.rb:31:in `call'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client.rb:340:in `block (2 levels) in blocking_call'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/middlewares.rb:16:in `call'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client.rb:339:in `block in blocking_call'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client.rb:699:in `ensure_connected'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client.rb:338:in `blocking_call'
/app/vendor/bundle/ruby/3.3.0/gems/redis-client-0.22.1/lib/redis_client/decorator.rb:26:in `blocking_call'
/app/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.2.4/lib/sidekiq/fetch.rb:47:in `block in retrieve_work'
/app/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.2.4/lib/sidekiq/capsule.rb:102:in `block in redis'
/app/vendor/bundle/ruby/3.3.0/gems/connection_pool-2.4.1/lib/connection_pool.rb:110:in `block (2 levels) in with'
/app/vendor/bundle/ruby/3.3.0/gems/connection_pool-2.4.1/lib/connection_pool.rb:109:in `handle_interrupt'
/app/vendor/bundle/ruby/3.3.0/gems/connection_pool-2.4.1/lib/connection_pool.rb:109:in `block in with'
/app/vendor/bundle/ruby/3.3.0/gems/connection_pool-2.4.1/lib/connection_pool.rb:106:in `handle_interrupt'
/app/vendor/bundle/ruby/3.3.0/gems/connection_pool-2.4.1/lib/connection_pool.rb:106:in `with'
/app/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.2.4/lib/sidekiq/capsule.rb:99:in `redis'
/app/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.2.4/lib/sidekiq/component.rb:28:in `redis'
/app/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.2.4/lib/sidekiq/fetch.rb:47:in `retrieve_work'
/app/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.2.4/lib/sidekiq/processor.rb:87:in `get_one'
/app/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.2.4/lib/sidekiq/processor.rb:99:in `fetch'
/app/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.2.4/lib/sidekiq/processor.rb:81:in `process_one'
/app/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.2.4/lib/sidekiq/processor.rb:72:in `run'
/app/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.2.4/lib/sidekiq/component.rb:10:in `watchdog'
/app/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.2.4/lib/sidekiq/component.rb:19:in `block in safe_thread'
but connection error messages was repeated so I downgraded to Sidekiq 6.5.5.
it looks like it's been resolved.
I am sorry for not providing any clues for the solution.
I appreciate your hard work and look forward to a solution.
from sidekiq.
Not sure how much this will help, but figured I would add some details that might give someone some clues?
This week I upgraded 3 small Heroku apps to Sidekiq 7.2.4 using Redis Cloud 30MB and 100MB with Redis 7.2.3. All are Ruby 3.3.1, Rails 7.1.3.2. All use 1 worker dyno each with 5 concurrency.
App 1:
- High volume of Sidekiq Jobs triggered by ecommerce webhooks. 100MB Redis Cloud. 8 sidekiq queues
- Zero RedisTimeoutError encountered
App 2:
- Staging for App 1. Single user, mostly. 30MB Redis Cloud. Scheduler kicks off Sidekiq jobs that run for ~10 seconds every 10 minutes. 8 sidekiq queues
- Infrequent RedisTimeoutError: a few every couple hours. Timeouts do not seem correlated to scheduler frequency.
- Recent Timeouts: 8:30a, 11:22p, 6:28p, 6:28p, 3:11p, 3:11p, 3:11p, 3:11p, 1:58p, 12:39p, 12:39p, 8:39a
App 3:
- Different app, different Gemfile, but 80% the same. 30MB Redis Cloud. High usage 9-5. Jobs mainly triggered by user actions: ~ a job every minute or so. 5 Sidekiq queues.
- Frequent RedisTimeoutError, a few per hour. More frequent during low throughput.
- Recent Timeouts (all am): 10:05, 10:03, 8:18, 8:17, 7:27, 7:26, 7:26, 7:26, 7:12, 6:36, 6:36, 5:43, 5:43, 5:43, 5:40, 5:40, 5:40, 5:40, 5:40, 5:22, 5:22, 5:22, 5:15, 5:09
from sidekiq.
Update: I upgraded from Redis Cloud 30MB ($0) to Redis Cloud 100MB ($10) and the timeouts went away. I do have an ticket open with Redis Cloud regarding this and will report back when it gets to a resolution.
from sidekiq.
Hi @axlekb We are on 1GB plan and this still occurs daily.
from sidekiq.
@netwire88 Those who work with me to reproduce the problem and debug the problem are going to be the ones who fix it. Do you want it fixed? Then grant me access to your production system so I can debug it.
from sidekiq.
Hi, i can confirm this issue after upgrading from Sidekiq 6 to 7.
We are on Rails 7.0.8, Ruby (3.2.3, it's the same for Ruby 3.3.1), Sidekiq 7.2.4, redis-server 7.2.4, openssl 3.0.2 and we have medium to low load.
from sidekiq.
Upgrading redis-client to v0.22.2 didn't fix the issues.
from sidekiq.
@lu-x Could you post the exception backtrace and also confirm that you have a read_timeout
set? For example, from my Rails console, I see a default 1 second:
irb(main):014:0> Sidekiq.redis { |redis| puts redis.config.read_timeout }
1.0
from sidekiq.
v0.22.2 would add another second by default to the Sidekiq BRPOP. Instead of:
RedisClient::ReadTimeoutError: Waited 3.0 seconds
You would probably see:
RedisClient::ReadTimeoutError: Waited 4.0 seconds
Because the error happens even with the additional second, this suggests that this the problem isn't a race condition as described in redis/redis-rb#1279, but something else.
The error suggests that Redis::Client
was waiting for another byte, but for some reason never got it. I'd be curious what the state of the RESP3 buffer was at the time, and how many bytes were expected. I wonder if we should add some sort of logging when this happens.
I hope this isn't Nagle's algorithm again. 😄 https://brooker.co.za/blog/2024/05/09/nagle.html It shouldn't be since redis-client
sets TCP_NODELAY
: https://github.com/redis-rb/redis-client/blob/3759fc908b3fed59392157990895ad199714ffba/lib/redis_client/ruby_connection.rb#L124
from sidekiq.
@stanhu read_timeout
is 2 (as set by me)
This is the Stacktrace:
Click me
RedisClient::ReadTimeoutError: Waited 2 seconds (RedisClient::ReadTimeoutError)
from redis-client (0.22.2) lib/redis_client/ruby_connection/buffered_io.rb:214:in `block in fill_buffer'
from <internal:kernel>:187:in `loop'
from redis-client (0.22.2) lib/redis_client/ruby_connection/buffered_io.rb:197:in `fill_buffer'
from redis-client (0.22.2) lib/redis_client/ruby_connection/buffered_io.rb:187:in `ensure_remaining'
from redis-client (0.22.2) lib/redis_client/ruby_connection/buffered_io.rb:152:in `getbyte'
from redis-client (0.22.2) lib/redis_client/ruby_connection/resp3.rb:113:in `parse'
from redis-client (0.22.2) lib/redis_client/ruby_connection/resp3.rb:50:in `load'
from redis-client (0.22.2) lib/redis_client/ruby_connection.rb:96:in `read'
from redis-client (0.22.2) lib/redis_client/connection_mixin.rb:52:in `block in call_pipelined'
from <internal:numeric>:237:in `times'
from redis-client (0.22.2) lib/redis_client/connection_mixin.rb:50:in `call_pipelined'
from redis-client (0.22.2) lib/redis_client.rb:765:in `block in connect'
from redis-client (0.22.2) lib/redis_client/middlewares.rb:16:in `call'
from redis-client (0.22.2) lib/redis_client.rb:764:in `connect'
from redis-client (0.22.2) lib/redis_client.rb:732:in `raw_connection'
from redis-client (0.22.2) lib/redis_client.rb:697:in `ensure_connected'
from redis-client (0.22.2) lib/redis_client.rb:431:in `pipelined'
from redis-client (0.22.2) lib/redis_client/decorator.rb:51:in `pipelined'
from sidekiq (7.2.4) lib/sidekiq/client.rb:228:in `block in raw_push'
from connection_pool (2.4.1) lib/connection_pool.rb:110:in `block (2 levels) in with'
from connection_pool (2.4.1) lib/connection_pool.rb:109:in `handle_interrupt'
from connection_pool (2.4.1) lib/connection_pool.rb:109:in `block in with'
from connection_pool (2.4.1) lib/connection_pool.rb:106:in `handle_interrupt'
from connection_pool (2.4.1) lib/connection_pool.rb:106:in `with'
from sidekiq (7.2.4) lib/sidekiq/client.rb:225:in `raw_push'
from sidekiq (7.2.4) lib/sidekiq/client.rb:93:in `push'
from sidekiq (7.2.4) lib/sidekiq/client.rb:174:in `push'
from activejob (7.0.8.1) lib/active_job/queue_adapters/sidekiq_adapter.rb:22:in `enqueue'
from activejob (7.0.8.1) lib/active_job/enqueuing.rb:67:in `block in enqueue'
from activesupport (7.0.8.1) lib/active_support/callbacks.rb:118:in `block in run_callbacks'
from activejob (7.0.8.1) lib/active_job/instrumentation.rb:25:in `block in instrument'
from activesupport (7.0.8.1) lib/active_support/notifications.rb:206:in `block in instrument'
from activesupport (7.0.8.1) lib/active_support/notifications/instrumenter.rb:24:in `instrument'
from sentry-rails (5.7.0) lib/sentry/rails/tracing.rb:54:in `instrument'
from activesupport (7.0.8.1) lib/active_support/notifications.rb:206:in `instrument'
from activejob (7.0.8.1) lib/active_job/instrumentation.rb:35:in `instrument'
from activejob (7.0.8.1) lib/active_job/instrumentation.rb:9:in `block (2 levels) in <module:Instrumentation>'
from activesupport (7.0.8.1) lib/active_support/callbacks.rb:127:in `instance_exec'
from activesupport (7.0.8.1) lib/active_support/callbacks.rb:127:in `block in run_callbacks'
from activejob (7.0.8.1) lib/active_job/logging.rb:27:in `tag_logger'
from activejob (7.0.8.1) lib/active_job/logging.rb:14:in `block (2 levels) in <module:Logging>'
from activesupport (7.0.8.1) lib/active_support/callbacks.rb:127:in `instance_exec'
from activesupport (7.0.8.1) lib/active_support/callbacks.rb:127:in `block in run_callbacks'
from activesupport (7.0.8.1) lib/active_support/callbacks.rb:138:in `run_callbacks'
from activejob (7.0.8.1) lib/active_job/enqueuing.rb:63:in `enqueue'
from activejob (7.0.8.1) lib/active_job/enqueuing.rb:30:in `perform_later'
from sidekiq.
@lu-x Oh, interesting. The stack trace above suggests that:
RedisClient::ReadTimeoutError
is happening at the Redis connection setup, specifically the connection prelude (HELLO 3
,ROLE
, etc.)- Redis Sentinels are being used.
- There was no additional time added to the timeout because this was not a BRPOP/BLPOP command.
The error suggests that the Redis host didn't respond within 2 seconds to one of the connection setup commands for some reason. Can you think of a reason why the Redis host might not have responded? Some ideas:
- Too many connections to handle.
- Redis server blocked by some other command.
- Redis server blocked by a RDB snapshot.
from sidekiq.
Redis Sentinels are being used.
👍 cluster of three.
Too many connections to handle.
I don't think so, we have < 1000 connections
redis-cli --latency shows fast response times (~20ms)
Redis server blocked by some other command.
Redis server blocked by a RDB snapshot.
I'm not aware of anything, but could be possible of course.
from sidekiq.
@casperisfine yeah, esp if the CPU is saturated and only switching every 100ms. I will get this into Sidekiq 7.3.
from sidekiq.
@pmichaeljones You can pick up the change in 7.x with this:
Sidekiq.configure_server do |cfg|
cfg.redis = { timeout: 3, ... }
end
The 7.3 milestone is in the issues.
from sidekiq.
Forgot to update: Upgraded to redis-client 0.22.2
and the timeouts on the free 30MB Redis Cloud plan went away.
Update: I upgraded from Redis Cloud 30MB ($0) to Redis Cloud 100MB ($10) and the timeouts went away. I do have an ticket open with Redis Cloud regarding this and will report back when it gets to a resolution.
from sidekiq.
Upgraded to redis-client 0.22.2 and the timeouts on the free 30MB Redis Cloud plan went away.
That's great, thanks. Based on redis-rb/redis-client#197 (comment) this suggests that increasing the BRPOP timeout from 3 to 4 seconds might be making a difference. With redis-client
0.22.1, the timeout is 3 because of conn.read_timeout + TIMEOUT
(1 + 2), but with redis-client
0.22.2 conn.read_timeout
is automatically added, so the timeout is now 4 seconds.
This might be giving credence to the timeout theory in #6162 (comment) and 1c2a37d.
Will Sidekiq 7.3, I believe the timeout will actually now be 3+3+2 = 8, so I think we'd want to drop the extra conn.read_timeout
.
from sidekiq.
Upgraded to redis-client 0.22.2
from redis-client 0.21.0
and I haven't seen any RedisClient::ReadTimeoutError: Waited 3.0 seconds errors during the last 24 hours. Previously there were at least 3-5 per day. 🤞
from sidekiq.
upgrade to redis-client (0.22.2) 5 days ago and it looks solved.
Thanks!!
from sidekiq.
It's been 3 days since I upgraded to redis-client 0.22.2
and I haven't had another error since. Looks promising!
from sidekiq.
Unfortunately even after upgrading to redis-client 0.22.2
+ sidekiq 7.2.4
on Redis Cloud 1GB running Redis Stack 7.2.3, it's not working. We are now getting RedisClient::ReadTimeoutError: Waited 4.0 seconds
from sidekiq.
Same here!
redis-client 0.22.2 + sidekiq 7.2.4 running redis on heroku, we are getting RedisClient::ReadTimeoutError: Waited 4.0 seconds
from sidekiq.
It is happening to us as well RedisClient::ReadTimeoutError: Waited 3.0 seconds
We are using Heroku Data for Redis.
I also contacted Heroku support and was sent here. We're using sidekiq 7.2.4
, redis-client 0.22.1
. Our heroku app is on the heroku-22
stack.
from sidekiq.
@mperham here is our raw stack trace:
RedisClient::ReadTimeoutError: Waited 3.0 seconds (RedisClient::ReadTimeoutError)
from redis_client/ruby_connection/buffered_io.rb:214:in `block in fill_buffer'
from redis_client/ruby_connection/buffered_io.rb:197:in `loop'
from redis_client/ruby_connection/buffered_io.rb:197:in `fill_buffer'
from redis_client/ruby_connection/buffered_io.rb:187:in `ensure_remaining'
from redis_client/ruby_connection/buffered_io.rb:152:in `getbyte'
from redis_client/ruby_connection/resp3.rb:113:in `parse'
from redis_client/ruby_connection/resp3.rb:50:in `load'
from redis_client/ruby_connection.rb:98:in `block in read'
from redis_client/ruby_connection/buffered_io.rb:114:in `with_timeout'
from redis_client/ruby_connection.rb:98:in `read'
from redis_client/connection_mixin.rb:31:in `call'
from redis_client.rb:340:in `block (2 levels) in blocking_call'
from redis_client/middlewares.rb:16:in `call'
from redis_client.rb:339:in `block in blocking_call'
from redis_client.rb:699:in `ensure_connected'
from redis_client.rb:338:in `blocking_call'
from redis_client/decorator.rb:26:in `blocking_call'
from sidekiq/fetch.rb:47:in `block in retrieve_work'
from sidekiq/capsule.rb:102:in `block in redis'
from connection_pool.rb:110:in `block (2 levels) in with'
from connection_pool.rb:109:in `handle_interrupt'
from connection_pool.rb:109:in `block in with'
from connection_pool.rb:106:in `handle_interrupt'
from connection_pool.rb:106:in `with'
from sidekiq/capsule.rb:99:in `redis'
from sidekiq/component.rb:28:in `redis'
from sidekiq/fetch.rb:47:in `retrieve_work'
from sidekiq/processor.rb:87:in `get_one'
from sidekiq/processor.rb:99:in `fetch'
from sidekiq/processor.rb:81:in `process_one'
from sidekiq/processor.rb:72:in `run'
from sidekiq/component.rb:10:in `watchdog'
from sidekiq/component.rb:19:in `block in safe_thread'
from sidekiq.
@jeremyhaile whats your process concurrency? The only other cause of timeouts could be if your CPUs are way overloaded. Reduce your concurrency.
from sidekiq.
< 100 connections (from several servers/pools). Low activity. Yea, I'm not sure what's causing it. Perhaps we need to move off of Heroku. Or, perhaps there's a bug somewhere...
from sidekiq.
@jeremyhaile Your environment is totally same except your redis-client version is 0.22.1
how about upgrade your redis-client?
redis-client (0.22.2)
sidekiq (7.2.4)
heroku-22 stack
heroku data for redis
and my sidekiq.rb redis config is like this.
{url: "#{ENV["REDIS_TLS_URL"]}/0", ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE }, timeout: 3}
I upgraded redis-client on Jun 5th and the timeout error hasn't appeared since.
from sidekiq.
@kikikiblue thanks for the idea. We updated to 0.22.2 yesterday which seems like it also increases the default read timeout to 4s. Unfortunately we've seen the error 3 times today already, with a 4s timeout error now. Our load is almost nothing today too, as most of our load is during weekdays.
from sidekiq.
In my cases any version updates for sidekiq and redis-client gems did not help. Help to change config:
SIDEKIQ_REDIS_OPTIONS = {
network_timeout: 3,
pool_timeout: 2,
reconnect_attempts: [0.1, 0.25, 0.75], # I think this is major change, which helped to remove RedisTimeoutError errors
size: Integer(ENV.fetch('RAILS_MAX_THREADS', 10))
}.freeze
Sidekiq.configure_client do |config|
config.redis = SIDEKIQ_REDIS_OPTIONS
end
Sidekiq.configure_server do |config|
config.redis = SIDEKIQ_REDIS_OPTIONS
end
from sidekiq.
Oh, right, as mentioned above: For those of you having trouble, can you see if increasing the connection pool size helps? Perhaps you're running into #5929 (comment)
Also, do your jobs call Sidekiq.redis
?
from sidekiq.
For reference: Running redis-client 0.22.2 + sidekiq 7.2.4
gives me the RedisClient::ReadTimeoutError 3/4x more than before with 0.22.1
(But could be a coincidence)
@le0pard can you confirm if your suggestion does improve the situation?
from sidekiq.
@stanhu nope, my code did not use Sidekiq.redis
. I have separate connection for such use
@rafaelbiriba I can repeat, if this was not clear from first message - errors disappear only after I reached this settings for redis in sidekiq (both server and client), any versions update did not help with errors.
Now (90 days range in error system filter by redis errors):
4 months ago situation:
from sidekiq.
errors disappear only after I reached this settings for redis in sidekiq (both server and client), any versions update did not help with errors.
To be clear, the default concurrency
value is 5, and you've doubled the Redis pool size from 5 to 10, correct? This suggests to me that for some reason there aren't enough connections in the pool for those threads. It would be nice to understand why that's the case because that would explain why #6158 might be needed.
As I mentioned in #5929 (comment):
- In Sidekiq 6, there was one connection pool of size
concurrency + 5
. - In Sidekiq 7, a capsule has its own connection pool of size
concurrency
, and there is an internal Sidekiq connection pool with a fixed size of 10.
By increasing the pool size to 10, I think you're effectively doing what Sidekiq 6 did before.
from sidekiq.
only increasing pool did not help, help also reconnect_attempts
with array. Pool just used as first suggestion at beginning, but did not help remove errors
from sidekiq.
I'm glad these fixes are working for people, but I'm a bit confused at the suggestions. Increasing pool sizes, adding reconnect attempts, etc. seem like connection fixes. But all of our stack traces and the ones I've seen other people post are read timeouts.
I'm not a redis expert, but I thought that read timeouts occur when you already have a connection and are running a command. So, shouldn't increasing concurrency or reconnect attempts have nothing to do with a read timeout?
I would think that a read timeout is either caused by a bug somewhere....or contention for the CPU that is causing redis commands to run slowly on the server...or a problem with your cloud provider (for example if Heroku is overloading instances without enough processing power). Someone who understands redis and the redis client, please correct me if I'm wrong!
RedisClient::ReadTimeoutError: Waited 3.0 seconds (RedisClient::ReadTimeoutError)
from redis_client/ruby_connection/buffered_io.rb:214:in `block in fill_buffer'
from redis_client/ruby_connection/buffered_io.rb:197:in `loop'
from redis_client/ruby_connection/buffered_io.rb:197:in `fill_buffer'
from redis_client/ruby_connection/buffered_io.rb:187:in `ensure_remaining'
from redis_client/ruby_connection/buffered_io.rb:152:in `getbyte'
from redis_client/ruby_connection/resp3.rb:113:in `parse'
from redis_client/ruby_connection/resp3.rb:50:in `load'
from redis_client/ruby_connection.rb:98:in `block in read'
from redis_client/ruby_connection/buffered_io.rb:114:in `with_timeout'
from redis_client/ruby_connection.rb:98:in `read'
from redis_client/connection_mixin.rb:31:in `call'
from redis_client.rb:340:in `block (2 levels) in blocking_call'
from redis_client/middlewares.rb:16:in `call'
from redis_client.rb:339:in `block in blocking_call'
from redis_client.rb:699:in `ensure_connected'
from redis_client.rb:338:in `blocking_call'
from redis_client/decorator.rb:26:in `blocking_call'
from sidekiq/fetch.rb:47:in `block in retrieve_work'
from sidekiq/capsule.rb:102:in `block in redis'
from sidekiq.
Related Issues (20)
- Add splash logo on startup when using a custom log formatter HOT 2
- Job payload inconsistently persisted through retries HOT 3
- delete_by_class returns zero and doesn't remove any jobs HOT 1
- sidekiq-ent not found when running bin/rails on Ruby 3.3.2 HOT 5
- Dependency on the `logger` gem which will be removed in Ruby 3.5 HOT 2
- QUESTION: Is it possible to customize UI? HOT 3
- sidekiq-ent: unknown keyword: :cluster_safe (ArgumentError) HOT 2
- Error with rack 3.1.0 HOT 3
- Language selector in footer hides web body content HOT 1
- Sidekiq Pro 7.2.0 does not support dogstatsd-ruby versions less than v5 HOT 2
- Ghost callbacks spam the retry set? HOT 10
- Nested batches fire in a wrong order in testing mode HOT 5
- Sidekiq API uninitialized when route drawing is deferred HOT 3
- Killing busy job by kill thread issue HOT 1
- Unmet dependency: Redis 6.2.0 doesn't exist HOT 2
- Consolidate metrics under the sidekiq.* namespace
- Is job.delete on a ScheduledSet job a valid way to do it now? HOT 2
- Removing attributes from `Current` breaks job processing HOT 1
- Storing error data on Sidekiq job for next retry HOT 1
- Limiting job by TTL instead of retries count HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sidekiq.