Giter Site home page Giter Site logo

Comments (9)

amarts avatar amarts commented on June 2, 2024 4

@mohammaddawoodshaik thanks for detailed debugging, and logs for the issue.

just opened gluster/glusterfs#4224, which hopefully should fix the issue.

Lets wait for comments from the experts, once it is found to be fine, will merge it in our base branch, and make a kadalu release.

@aravindavk @vatsa287

from kadalu.

amarts avatar amarts commented on June 2, 2024 1

@amarts a quick question, now that simple-quota is in gluster release branches, can we directly take gluster builds and containerize them for Kadalu?

We are actually doing almost the same thing. Only main reason we have decided to have our own branch was glusterfs's release timelines didn't match our requirements, and considering Red Hat (now IBM) has reduced frequency of releases too, we had to keep them managed ourself to provide the best possible experience to kadalu users.

If you notice, all the PRs added into our branch are submitted and merged in glusterfs devel branch already.

from kadalu.

mohammaddawoodshaik avatar mohammaddawoodshaik commented on June 2, 2024

Any update on this issue? This is blocking our promotions. Any help will be appreciated.

from kadalu.

mohammaddawoodshaik avatar mohammaddawoodshaik commented on June 2, 2024

Some more info collected from the FUSE client logs:

[2023-08-18 05:45:52.101240 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:52.101751 +0000] D [write-behind.c:1742:wb_process_queue] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7fe9)[0x7fda0dfeffe9] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xabb0)[0x7fda0dff2bb0] (--> /opt/lib/glusterfs/2023.04.17/xlator/debug/io-stats.so(+0x5efb)[0x7fda0dfbbefb] (--> /opt/lib/libglusterfs.so.0(default_stat+0xac)[0x7fda138325ec] ))))) 0-common-storage-pool-write-behind: processing queues
[2023-08-18 05:45:52.101780 +0000] D [MSGID: 0] [write-behind.c:1689:__wb_pick_winds] 0-common-storage-pool-write-behind: (unique=50, fop=STAT, gfid=df1f4e82-89e1-4a20-9985-b6ddc28651f4, gen=0): picking the request for winding
[2023-08-18 05:45:52.101801 +0000] D [MSGID: 0] [afr-read-txn.c:448:afr_read_txn] 0-common-storage-pool-replicate-0: df1f4e82-89e1-4a20-9985-b6ddc28651f4: generation now vs cached: 2, 2
[2023-08-18 05:45:52.101860 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:52.102037 +0000] D [write-behind.c:411:__wb_request_unref] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x76b1)[0x7fda0dfef6b1] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7ead)[0x7fda0dfefead] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x8ca8)[0x7fda0dff0ca8] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xabb0)[0x7fda0dff2bb0] ))))) 0-common-storage-pool-write-behind: (unique = 50, fop=STAT, gfid=df1f4e82-89e1-4a20-9985-b6ddc28651f4, gen=0): destroying request, removing from all queues
[2023-08-18 05:45:52.102368 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:52.102473 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:52.103284 +0000] W [MSGID: 108027] [afr-common.c:2888:afr_attempt_readsubvol_set] 0-common-storage-pool-replicate-0: no read subvols for /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee
[2023-08-18 05:45:52.103324 +0000] D [MSGID: 0] [afr-common.c:3082:afr_lookup_done] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-replicate-0 returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.103372 +0000] D [MSGID: 0] [utime.c:218:gf_utime_set_mdata_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-utime returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.103394 +0000] D [MSGID: 0] [write-behind.c:2371:wb_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-write-behind returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.103406 +0000] D [MSGID: 0] [io-stats.c:2284:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.103421 +0000] W [fuse-bridge.c:1052:fuse_entry_cbk] 0-glusterfs-fuse: 51: LOOKUP() /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee => -1 (Transport endpoint is not connected)
[2023-08-18 05:45:52.103611 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:52.103705 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:52.104244 +0000] W [MSGID: 108027] [afr-common.c:2888:afr_attempt_readsubvol_set] 0-common-storage-pool-replicate-0: no read subvols for /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee
[2023-08-18 05:45:52.104278 +0000] D [MSGID: 0] [afr-common.c:3082:afr_lookup_done] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-replicate-0 returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.104295 +0000] D [MSGID: 0] [utime.c:218:gf_utime_set_mdata_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-utime returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.104307 +0000] D [MSGID: 0] [write-behind.c:2371:wb_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-write-behind returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.104324 +0000] D [MSGID: 0] [io-stats.c:2284:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.104332 +0000] W [fuse-bridge.c:1052:fuse_entry_cbk] 0-glusterfs-fuse: 52: LOOKUP() /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee => -1 (Transport endpoint is not connected)
[2023-08-18 05:45:53.922949 +0000] D [name.c:171:client_fill_address_family] 0-common-storage-pool-client-2: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: server-common-storage-pool-2-0.common-storage-pool)
[2023-08-18 05:45:53.924929 +0000] E [MSGID: 101073] [name.c:254:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=0}, {ret=Name or service not known}]
[2023-08-18 05:45:53.924969 +0000] E [name.c:383:af_inet_client_get_remote_sockaddr] 0-common-storage-pool-client-2: DNS resolution failed on host server-common-storage-pool-2-0.common-storage-pool
[2023-08-18 05:45:53.925123 +0000] D [MSGID: 0] [client.c:2235:client_rpc_notify] 0-common-storage-pool-client-2: got RPC_CLNT_DISCONNECT
[2023-08-18 05:45:53.925634 +0000] D [rpc-clnt-ping.c:90:rpc_clnt_remove_ping_timer_locked] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/libgfrpc.so.0(+0x6aae)[0x7fda1375caae] (--> /opt/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xd8)[0x7fda13763978] (--> /opt/lib/libgfrpc.so.0(+0xe8b8)[0x7fda137648b8] (--> /opt/lib/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fda1375fbf6] ))))) 0-: : ping timer event already removed






[2023-08-18 05:45:56.925800 +0000] D [name.c:171:client_fill_address_family] 0-common-storage-pool-client-2: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: server-common-storage-pool-2-0.common-storage-pool)
[2023-08-18 05:45:56.927391 +0000] E [MSGID: 101073] [name.c:254:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=0}, {ret=Name or service not known}]
[2023-08-18 05:45:56.927424 +0000] E [name.c:383:af_inet_client_get_remote_sockaddr] 0-common-storage-pool-client-2: DNS resolution failed on host server-common-storage-pool-2-0.common-storage-pool
[2023-08-18 05:45:56.927581 +0000] D [MSGID: 0] [client.c:2235:client_rpc_notify] 0-common-storage-pool-client-2: got RPC_CLNT_DISCONNECT
[2023-08-18 05:45:56.927895 +0000] D [rpc-clnt-ping.c:90:rpc_clnt_remove_ping_timer_locked] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/libgfrpc.so.0(+0x6aae)[0x7fda1375caae] (--> /opt/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xd8)[0x7fda13763978] (--> /opt/lib/libgfrpc.so.0(+0xe8b8)[0x7fda137648b8] (--> /opt/lib/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fda1375fbf6] ))))) 0-: : ping timer event already removed
[2023-08-18 05:45:57.346944 +0000] D [write-behind.c:1742:wb_process_queue] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7fe9)[0x7fda0dfeffe9] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] (--> /opt/lib/glusterfs/2023.04.17/xlator/debug/io-stats.so(+0x5c95)[0x7fda0dfbbc95] (--> /opt/lib/libglusterfs.so.0(default_lookup+0xb4)[0x7fda138326e4] ))))) 0-common-storage-pool-write-behind: processing queues
[2023-08-18 05:45:57.347005 +0000] D [MSGID: 0] [write-behind.c:1689:__wb_pick_winds] 0-common-storage-pool-write-behind: (unique=53, fop=LOOKUP, gfid=00000000-0000-0000-0000-000000000001, gen=0): picking the request for winding
[2023-08-18 05:45:57.347159 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:57.347312 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:57.347533 +0000] D [write-behind.c:411:__wb_request_unref] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x76b1)[0x7fda0dfef6b1] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7ead)[0x7fda0dfefead] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x8ca8)[0x7fda0dff0ca8] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] ))))) 0-common-storage-pool-write-behind: (unique = 53, fop=LOOKUP, gfid=00000000-0000-0000-0000-000000000001, gen=0): destroying request, removing from all queues
[2023-08-18 05:45:57.348530 +0000] D [write-behind.c:1742:wb_process_queue] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7fe9)[0x7fda0dfeffe9] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] (--> /opt/lib/glusterfs/2023.04.17/xlator/debug/io-stats.so(+0x5c95)[0x7fda0dfbbc95] (--> /opt/lib/libglusterfs.so.0(default_lookup+0xb4)[0x7fda138326e4] ))))) 0-common-storage-pool-write-behind: processing queues
[2023-08-18 05:45:57.348552 +0000] D [MSGID: 0] [write-behind.c:1689:__wb_pick_winds] 0-common-storage-pool-write-behind: (unique=54, fop=LOOKUP, gfid=99c3c687-2955-4ddc-9468-bf8ce5dbc2f3, gen=0): picking the request for winding
[2023-08-18 05:45:57.348629 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:57.348728 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:57.348900 +0000] D [write-behind.c:411:__wb_request_unref] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x76b1)[0x7fda0dfef6b1] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7ead)[0x7fda0dfefead] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x8ca8)[0x7fda0dff0ca8] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] ))))) 0-common-storage-pool-write-behind: (unique = 54, fop=LOOKUP, gfid=99c3c687-2955-4ddc-9468-bf8ce5dbc2f3, gen=0): destroying request, removing from all queues
[2023-08-18 05:45:57.349811 +0000] D [write-behind.c:1742:wb_process_queue] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7fe9)[0x7fda0dfeffe9] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] (--> /opt/lib/glusterfs/2023.04.17/xlator/debug/io-stats.so(+0x5c95)[0x7fda0dfbbc95] (--> /opt/lib/libglusterfs.so.0(default_lookup+0xb4)[0x7fda138326e4] ))))) 0-common-storage-pool-write-behind: processing queues
[2023-08-18 05:45:57.349847 +0000] D [MSGID: 0] [write-behind.c:1689:__wb_pick_winds] 0-common-storage-pool-write-behind: (unique=55, fop=LOOKUP, gfid=a0d03120-a76d-4ec4-afda-64ef29befdda, gen=0): picking the request for winding
[2023-08-18 05:45:57.349958 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:57.350059 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:57.350240 +0000] D [write-behind.c:411:__wb_request_unref] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x76b1)[0x7fda0dfef6b1] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7ead)[0x7fda0dfefead] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x8ca8)[0x7fda0dff0ca8] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] ))))) 0-common-storage-pool-write-behind: (unique = 55, fop=LOOKUP, gfid=a0d03120-a76d-4ec4-afda-64ef29befdda, gen=0): destroying request, removing from all queues
[2023-08-18 05:45:57.350854 +0000] D [write-behind.c:1742:wb_process_queue] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7fe9)[0x7fda0dfeffe9] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] (--> /opt/lib/glusterfs/2023.04.17/xlator/debug/io-stats.so(+0x5c95)[0x7fda0dfbbc95] (--> /opt/lib/libglusterfs.so.0(default_lookup+0xb4)[0x7fda138326e4] ))))) 0-common-storage-pool-write-behind: processing queues
[2023-08-18 05:45:57.350878 +0000] D [MSGID: 0] [write-behind.c:1689:__wb_pick_winds] 0-common-storage-pool-write-behind: (unique=56, fop=LOOKUP, gfid=df1f4e82-89e1-4a20-9985-b6ddc28651f4, gen=0): picking the request for winding
[2023-08-18 05:45:57.350964 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:57.351058 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:57.351511 +0000] D [write-behind.c:411:__wb_request_unref] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x76b1)[0x7fda0dfef6b1] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7ead)[0x7fda0dfefead] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x8ca8)[0x7fda0dff0ca8] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] ))))) 0-common-storage-pool-write-behind: (unique = 56, fop=LOOKUP, gfid=df1f4e82-89e1-4a20-9985-b6ddc28651f4, gen=0): destroying request, removing from all queues
[2023-08-18 05:45:57.351760 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:57.351873 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:57.352404 +0000] W [MSGID: 108027] [afr-common.c:2888:afr_attempt_readsubvol_set] 0-common-storage-pool-replicate-0: no read subvols for /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee
[2023-08-18 05:45:57.352446 +0000] D [MSGID: 0] [afr-common.c:3082:afr_lookup_done] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-replicate-0 returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:57.352511 +0000] D [MSGID: 0] [utime.c:218:gf_utime_set_mdata_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-utime returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:57.352537 +0000] D [MSGID: 0] [write-behind.c:2371:wb_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-write-behind returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:57.352574 +0000] D [MSGID: 0] [io-stats.c:2284:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:57.352600 +0000] W [fuse-bridge.c:1052:fuse_entry_cbk] 0-glusterfs-fuse: 57: LOOKUP() /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee => -1 (Transport endpoint is not connected)
[2023-08-18 05:45:59.928025 +0000] D [name.c:171:client_fill_address_family] 0-common-storage-pool-client-2: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: server-common-storage-pool-2-0.common-storage-pool)
[2023-08-18 05:45:59.929565 +0000] E [MSGID: 101073] [name.c:254:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=0}, {ret=Name or service not known}]
[2023-08-18 05:45:59.929592 +0000] E [name.c:383:af_inet_client_get_remote_sockaddr] 0-common-storage-pool-client-2: DNS resolution failed on host server-common-storage-pool-2-0.common-storage-pool
[2023-08-18 05:45:59.929748 +0000] D [MSGID: 0] [client.c:2235:client_rpc_notify] 0-common-storage-pool-client-2: got RPC_CLNT_DISCONNECT
[2023-08-18 05:45:59.930056 +0000] D [rpc-clnt-ping.c:90:rpc_clnt_remove_ping_timer_locked] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/libgfrpc.so.0(+0x6aae)[0x7fda1375caae] (--> /opt/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xd8)[0x7fda13763978] (--> /opt/lib/libgfrpc.so.0(+0xe8b8)[0x7fda137648b8] (--> /opt/lib/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fda1375fbf6] ))))) 0-: : ping timer event already removed


^C
root@kadalu-csi-nodeplugin-lcc25:/var/log/gluster# ls /mnt/common-storage-pool_dawood/subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee
ls: cannot access '/mnt/common-storage-pool_dawood/subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee': Transport endpoint is not connected
root@kadalu-csi-nodeplugin-lcc25:/var/log/gluster# ls /mnt/common-storage-pool_dawood/subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee
ls: cannot access '/mnt/common-storage-pool_dawood/subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee': Transport endpoint is not connected
root@kadalu-csi-nodeplugin-lcc25:/var/log/gluster#

from kadalu.

mohammaddawoodshaik avatar mohammaddawoodshaik commented on June 2, 2024

From further debugging on the issue found some more info:

  • When subVol directories are stuck in healing state observed the following.
root@maglev-master-10-104-241-73:/data/srv/data/brick/subvol/a1/cb# getfattr -m . -d -e hex pvc-36830367-6b27-4d50-baf1-9c00e0c5d805/
# file: pvc-36830367-6b27-4d50-baf1-9c00e0c5d805/
trusted.afr.common-storage-pool-client-0=0x0000000000010acb00000000
trusted.afr.common-storage-pool-client-1=0x000000000000ea0600000000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0xce671f3b681640f2948a709dc9694875
trusted.gfs.squota.limit=0x31303733373431383234
trusted.glusterfs.mdata=0x0100000000000000000000000064ea4c0c000000001a5f9a940000000064ea4bfd000000000c43bd360000000064ea4bfd000000000c43bd36
trusted.glusterfs.namespace=0x74727565 
  • Here heal is yet to be done on client-0 and client-1
  • Now client-2 is picking-up this entry for heal pending and try to heal in other two bricks.
  • In the bricks heal is failing with an error Remove-Xattr : operation is not supported Client-2 logs
The message "I [MSGID: 108026] [afr-self-heal-common.c:1758:afr_log_selfheal] 0-common-storage-pool-replicate-0: Completed metadata selfheal on ce671f3b-6816-40f2-948a-709dc9694875. sources=[2]  sinks=" repeated 117 times between [2023-08-28 09:12:58.468222 +0000] and [2023-08-28 09:14:57.207600 +0000]
[2023-08-28 09:14:58.215228 +0000] I [MSGID: 108026] [afr-self-heal-metadata.c:50:__afr_selfheal_metadata_do] 0-common-storage-pool-replicate-0: performing metadata selfheal on ce671f3b-6816-40f2-948a-709dc9694875 
[2023-08-28 09:14:58.218331 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1057:client4_0_removexattr_cbk] 0-common-storage-pool-client-0: remote operation failed. [{errno=95}, {error=Operation not supported}] 
[2023-08-28 09:14:58.220900 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1057:client4_0_removexattr_cbk] 0-common-storage-pool-client-1: remote operation failed. [{errno=95}, {error=Operation not supported}] 
[2023-08-28 09:14:58.224025 +0000] I [MSGID: 108026] [afr-self-heal-common.c:1758:afr_log_selfheal] 0-common-storage-pool-replicate-0: Completed metadata selfheal on ce671f3b-6816-40f2-948a-709dc9694875. sources=[2]  sinks= 
The message "I [MSGID: 108026] [afr-self-heal-metadata.c:50:__afr_selfheal_metadata_do] 0-common-storage-pool-replicate-0: performing metadata selfheal on ce671f3b-6816-40f2-948a-709dc9694875" repeated 117 times between [2023-08-28 09:14:58.215228 +0000] and [2023-08-28 09:16:56.952424 +0000] 

In other clients:

[2023-08-28 09:22:13.429601 +0000] I [MSGID: 115058] [server-rpc-fops_v2.c:777:server4_removexattr_cbk] 0-common-storage-pool-server: REMOVEXATTR info [{frame=613931}, {path=/subvol/a1/cb/pvc-36830367-6b27-4d50-baf1-9c00e0c5d805}, {uuid_utoa=ce671f3b-6816-40f2-948a-709dc9694875}, {name=}, {client=CTX_ID:322bad7c-cb23-4ae1-b717-8533410f2326-GRAPH_ID:0-PID:18-HOST:server-common-storage-pool-0-0-PC_NAME:common-storage-pool-client-1-RECON_NO:-0}, {error-xlator=-}, {errno=95}, {error=Operation not supported}] 
[2023-08-28 09:22:13.430435 +0000] I [MSGID: 0] [simple-quota.c:260:sq_update_hard_limit] 0-common-storage-pool-simple-quota: hardlimit update: ce671f3b-6816-40f2-948a-709dc9694875 1073741824 0 
[2023-08-28 09:22:14.444170 +0000] E [MSGID: 0] [server-rpc-fops_v2.c:2846:server4_removexattr_resume] 0-/bricks/common-storage-pool/data/brick: /subvol/a1/cb/pvc-36830367-6b27-4d50-baf1-9c00e0c5d805: removal of namespace is not allowed [Operation not supported]
[2023-08-28 09:22:14.444220 +0000] I [MSGID: 115058] [server-rpc-fops_v2.c:777:server4_removexattr_cbk] 0-common-storage-pool-server: REMOVEXATTR info [{frame=613940}, {path=/subvol/a1/cb/pvc-36830367-6b27-4d50-baf1-9c00e0c5d805}, {uuid_utoa=ce671f3b-6816-40f2-948a-709dc9694875}, {name=}, {client=CTX_ID:322bad7c-cb23-4ae1-b717-8533410f2326-GRAPH_ID:0-PID:18-HOST:server-common-storage-pool-0-0-PC_NAME:common-storage-pool-client-1-RECON_NO:-0}, {error-xlator=-}, {errno=95}, {error=Operation not supported}] 
[2023-08-28 09:22:14.444679 +0000] I [MSGID: 0] [simple-quota.c:260:sq_update_hard_limit] 0-common-storage-pool-simple-quota: hardlimit update: ce671f3b-6816-40f2-948a-709dc9694875 1073741824 0  

@amarts @aravindavk @leelavg

from kadalu.

leelavg avatar leelavg commented on June 2, 2024

As two other server pods are running fine, FUSE should not be showing error while accessing the data

  • you mentioned bringing down node interface on that node, are you sure components other than Kadalu came up fine?
  • along with server pod nodeplugin also would've went down, so after they are up did you restarted your application pod?

  • pls run heal info or trigger heals from provisioned pod but not directly on server pods
  • server pods are stateful sets, when the network was brought down maybe k8s interfered in some unknown way 🤔

  • having said above I couldn't possibly decode glusterfs logs and provide a way forward
  • I'm inclined not to try this scenario as this isn't at integration layer

from kadalu.

mohammaddawoodshaik avatar mohammaddawoodshaik commented on June 2, 2024

@leelavg
Forgot to mention some more info here.
Above info I have shared is from a system where all Server and node-plugin pods are running fine. Despite the fact we have tried full-heal multiple times we are seeing the above SubVol stuck in HealPeding state.

Coming back to Transport endpoint not connected issue:
We are seeing this issue as a consequence of the above one. As subVol is stuck in heal-pending state after couple nertwork flaps in different nodes(one at a time) we observe that Client is left with no brick with correct data for it to access, thus going to TransportNotConnected issue.

from kadalu.

leelavg avatar leelavg commented on June 2, 2024

@amarts a quick question, now that simple-quota is in gluster release branches, can we directly take gluster builds and containerize them for Kadalu?

from kadalu.

mohammaddawoodshaik avatar mohammaddawoodshaik commented on June 2, 2024

@mohammaddawoodshaik thanks for detailed debugging, and logs for the issue.

just opened gluster/glusterfs#4224, which hopefully should fix the issue.

Lets wait for comments from the experts, once it is found to be fine, will merge it in our base branch, and make a kadalu release.

@aravindavk @vatsa287

Hello @amarts -
I have patched my clusters with the latest fix you have given. Despite our efforts to rectify the issue, it still persists.
Following the error log I see in Brick:
[2024-02-26 10:10:40.901386 +0000] I [MSGID: 115036] [server.c:494:server_rpc_notify] 0-common-storage-pool-server: disconnecting connection [{client-uid=CTX_ID:5365c705-8706-4c1e-b4d9-706543c870c0-GRAPH_ID:0-PID:217-HOST:server-common-storage-pool-2-0-PC_NAME:common-storage-pool-client-0-RECON_NO:-0}]
[2024-02-26 10:10:40.901536 +0000] I [MSGID: 101054] [client_t.c:374:gf_client_unref] 0-common-storage-pool-server: Shutting down connection CTX_ID:5365c705-8706-4c1e-b4d9-706543c870c0-GRAPH_ID:0-PID:217-HOST:server-common-storage-pool-2-0-PC_NAME:common-storage-pool-client-0-RECON_NO:-0
The message "I [MSGID: 108026] [afr-self-heal-metadata.c:50:__afr_selfheal_metadata_do] 0-common-storage-pool-replicate-0: performing metadata selfheal on e51affac-4d26-4649-94dc-2eb7d765dc7b" repeated 173 times between [2024-02-26 10:09:35.004196 +0000] and [2024-02-26 10:11:02.194570 +0000]
The message "W [MSGID: 114031] [client-rpc-fops_v2.c:1057:client4_0_removexattr_cbk] 0-common-storage-pool-client-2: remote operation failed. [{errno=95}, {error=Operation not supported}]" repeated 173 times between [2024-02-26 10:09:35.006733 +0000] and [2024-02-26 10:11:02.195794 +0000]
The message "I [MSGID: 108026] [afr-self-heal-common.c:1758:afr_log_selfheal] 0-common-storage-pool-replicate-0: Completed metadata selfheal on e51affac-4d26-4649-94dc-2eb7d765dc7b. sources=[0] 1 sinks=" repeated 173 times between [2024-02-26 10:09:35.009663 +0000] and [2024-02-26 10:11:02.197650 +0000]
[2024-02-26 10:11:03.200718 +0000] I [MSGID: 108026] [afr-self-heal-metadata.c:50:__afr_selfheal_metadata_do] 0-common-storage-pool-replicate-0: performing metadata selfheal on e51affac-4d26-4649-94dc-2eb7d765dc7b
[2024-02-26 10:11:03.201815 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1057:client4_0_removexattr_cbk] 0-common-storage-pool-client-2: remote operation failed. [{errno=95}, {error=Operation not supported}]
[2024-02-26 10:11:03.203594 +0000] I [MSGID: 108026] [afr-self-heal-common.c:1758:afr_log_selfheal] 0-common-storage-pool-replicate-0: Completed metadata selfheal on e51affac-4d26-4649-94dc-2eb7d765dc7b. sources=[0] 1 sinks=

And also I have two SubVols(PVCs) created, and I see that these dirs are stuck in HealPending state, but data inside them is healing properly.
List of files needing a heal on common-storage-pool:
Brick server-common-storage-pool-0-0.common-storage-pool:/bricks/common-storage-pool/data/brick
/subvol/3a/74/pvc-bac13d60-444a-400d-a7e3-e343fd8b45c6
/subvol/aa/84/pvc-01833d52-a37d-4ded-b455-9a454d11343a

cc: @leelavg @aravindavk @vatsa287

from kadalu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.