Comments (9)
@mohammaddawoodshaik thanks for detailed debugging, and logs for the issue.
just opened gluster/glusterfs#4224, which hopefully should fix the issue.
Lets wait for comments from the experts, once it is found to be fine, will merge it in our base branch, and make a kadalu release.
from kadalu.
@amarts a quick question, now that simple-quota is in gluster release branches, can we directly take gluster builds and containerize them for Kadalu?
We are actually doing almost the same thing. Only main reason we have decided to have our own branch was glusterfs's release timelines didn't match our requirements, and considering Red Hat (now IBM) has reduced frequency of releases too, we had to keep them managed ourself to provide the best possible experience to kadalu users.
If you notice, all the PRs added into our branch are submitted and merged in glusterfs devel
branch already.
from kadalu.
Any update on this issue? This is blocking our promotions. Any help will be appreciated.
from kadalu.
Some more info collected from the FUSE client logs:
[2023-08-18 05:45:52.101240 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:52.101751 +0000] D [write-behind.c:1742:wb_process_queue] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7fe9)[0x7fda0dfeffe9] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xabb0)[0x7fda0dff2bb0] (--> /opt/lib/glusterfs/2023.04.17/xlator/debug/io-stats.so(+0x5efb)[0x7fda0dfbbefb] (--> /opt/lib/libglusterfs.so.0(default_stat+0xac)[0x7fda138325ec] ))))) 0-common-storage-pool-write-behind: processing queues
[2023-08-18 05:45:52.101780 +0000] D [MSGID: 0] [write-behind.c:1689:__wb_pick_winds] 0-common-storage-pool-write-behind: (unique=50, fop=STAT, gfid=df1f4e82-89e1-4a20-9985-b6ddc28651f4, gen=0): picking the request for winding
[2023-08-18 05:45:52.101801 +0000] D [MSGID: 0] [afr-read-txn.c:448:afr_read_txn] 0-common-storage-pool-replicate-0: df1f4e82-89e1-4a20-9985-b6ddc28651f4: generation now vs cached: 2, 2
[2023-08-18 05:45:52.101860 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:52.102037 +0000] D [write-behind.c:411:__wb_request_unref] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x76b1)[0x7fda0dfef6b1] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7ead)[0x7fda0dfefead] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x8ca8)[0x7fda0dff0ca8] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xabb0)[0x7fda0dff2bb0] ))))) 0-common-storage-pool-write-behind: (unique = 50, fop=STAT, gfid=df1f4e82-89e1-4a20-9985-b6ddc28651f4, gen=0): destroying request, removing from all queues
[2023-08-18 05:45:52.102368 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:52.102473 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:52.103284 +0000] W [MSGID: 108027] [afr-common.c:2888:afr_attempt_readsubvol_set] 0-common-storage-pool-replicate-0: no read subvols for /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee
[2023-08-18 05:45:52.103324 +0000] D [MSGID: 0] [afr-common.c:3082:afr_lookup_done] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-replicate-0 returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.103372 +0000] D [MSGID: 0] [utime.c:218:gf_utime_set_mdata_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-utime returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.103394 +0000] D [MSGID: 0] [write-behind.c:2371:wb_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-write-behind returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.103406 +0000] D [MSGID: 0] [io-stats.c:2284:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.103421 +0000] W [fuse-bridge.c:1052:fuse_entry_cbk] 0-glusterfs-fuse: 51: LOOKUP() /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee => -1 (Transport endpoint is not connected)
[2023-08-18 05:45:52.103611 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:52.103705 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:52.104244 +0000] W [MSGID: 108027] [afr-common.c:2888:afr_attempt_readsubvol_set] 0-common-storage-pool-replicate-0: no read subvols for /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee
[2023-08-18 05:45:52.104278 +0000] D [MSGID: 0] [afr-common.c:3082:afr_lookup_done] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-replicate-0 returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.104295 +0000] D [MSGID: 0] [utime.c:218:gf_utime_set_mdata_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-utime returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.104307 +0000] D [MSGID: 0] [write-behind.c:2371:wb_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-write-behind returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.104324 +0000] D [MSGID: 0] [io-stats.c:2284:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:52.104332 +0000] W [fuse-bridge.c:1052:fuse_entry_cbk] 0-glusterfs-fuse: 52: LOOKUP() /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee => -1 (Transport endpoint is not connected)
[2023-08-18 05:45:53.922949 +0000] D [name.c:171:client_fill_address_family] 0-common-storage-pool-client-2: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: server-common-storage-pool-2-0.common-storage-pool)
[2023-08-18 05:45:53.924929 +0000] E [MSGID: 101073] [name.c:254:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=0}, {ret=Name or service not known}]
[2023-08-18 05:45:53.924969 +0000] E [name.c:383:af_inet_client_get_remote_sockaddr] 0-common-storage-pool-client-2: DNS resolution failed on host server-common-storage-pool-2-0.common-storage-pool
[2023-08-18 05:45:53.925123 +0000] D [MSGID: 0] [client.c:2235:client_rpc_notify] 0-common-storage-pool-client-2: got RPC_CLNT_DISCONNECT
[2023-08-18 05:45:53.925634 +0000] D [rpc-clnt-ping.c:90:rpc_clnt_remove_ping_timer_locked] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/libgfrpc.so.0(+0x6aae)[0x7fda1375caae] (--> /opt/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xd8)[0x7fda13763978] (--> /opt/lib/libgfrpc.so.0(+0xe8b8)[0x7fda137648b8] (--> /opt/lib/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fda1375fbf6] ))))) 0-: : ping timer event already removed
[2023-08-18 05:45:56.925800 +0000] D [name.c:171:client_fill_address_family] 0-common-storage-pool-client-2: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: server-common-storage-pool-2-0.common-storage-pool)
[2023-08-18 05:45:56.927391 +0000] E [MSGID: 101073] [name.c:254:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=0}, {ret=Name or service not known}]
[2023-08-18 05:45:56.927424 +0000] E [name.c:383:af_inet_client_get_remote_sockaddr] 0-common-storage-pool-client-2: DNS resolution failed on host server-common-storage-pool-2-0.common-storage-pool
[2023-08-18 05:45:56.927581 +0000] D [MSGID: 0] [client.c:2235:client_rpc_notify] 0-common-storage-pool-client-2: got RPC_CLNT_DISCONNECT
[2023-08-18 05:45:56.927895 +0000] D [rpc-clnt-ping.c:90:rpc_clnt_remove_ping_timer_locked] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/libgfrpc.so.0(+0x6aae)[0x7fda1375caae] (--> /opt/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xd8)[0x7fda13763978] (--> /opt/lib/libgfrpc.so.0(+0xe8b8)[0x7fda137648b8] (--> /opt/lib/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fda1375fbf6] ))))) 0-: : ping timer event already removed
[2023-08-18 05:45:57.346944 +0000] D [write-behind.c:1742:wb_process_queue] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7fe9)[0x7fda0dfeffe9] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] (--> /opt/lib/glusterfs/2023.04.17/xlator/debug/io-stats.so(+0x5c95)[0x7fda0dfbbc95] (--> /opt/lib/libglusterfs.so.0(default_lookup+0xb4)[0x7fda138326e4] ))))) 0-common-storage-pool-write-behind: processing queues
[2023-08-18 05:45:57.347005 +0000] D [MSGID: 0] [write-behind.c:1689:__wb_pick_winds] 0-common-storage-pool-write-behind: (unique=53, fop=LOOKUP, gfid=00000000-0000-0000-0000-000000000001, gen=0): picking the request for winding
[2023-08-18 05:45:57.347159 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:57.347312 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:57.347533 +0000] D [write-behind.c:411:__wb_request_unref] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x76b1)[0x7fda0dfef6b1] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7ead)[0x7fda0dfefead] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x8ca8)[0x7fda0dff0ca8] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] ))))) 0-common-storage-pool-write-behind: (unique = 53, fop=LOOKUP, gfid=00000000-0000-0000-0000-000000000001, gen=0): destroying request, removing from all queues
[2023-08-18 05:45:57.348530 +0000] D [write-behind.c:1742:wb_process_queue] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7fe9)[0x7fda0dfeffe9] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] (--> /opt/lib/glusterfs/2023.04.17/xlator/debug/io-stats.so(+0x5c95)[0x7fda0dfbbc95] (--> /opt/lib/libglusterfs.so.0(default_lookup+0xb4)[0x7fda138326e4] ))))) 0-common-storage-pool-write-behind: processing queues
[2023-08-18 05:45:57.348552 +0000] D [MSGID: 0] [write-behind.c:1689:__wb_pick_winds] 0-common-storage-pool-write-behind: (unique=54, fop=LOOKUP, gfid=99c3c687-2955-4ddc-9468-bf8ce5dbc2f3, gen=0): picking the request for winding
[2023-08-18 05:45:57.348629 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:57.348728 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:57.348900 +0000] D [write-behind.c:411:__wb_request_unref] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x76b1)[0x7fda0dfef6b1] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7ead)[0x7fda0dfefead] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x8ca8)[0x7fda0dff0ca8] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] ))))) 0-common-storage-pool-write-behind: (unique = 54, fop=LOOKUP, gfid=99c3c687-2955-4ddc-9468-bf8ce5dbc2f3, gen=0): destroying request, removing from all queues
[2023-08-18 05:45:57.349811 +0000] D [write-behind.c:1742:wb_process_queue] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7fe9)[0x7fda0dfeffe9] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] (--> /opt/lib/glusterfs/2023.04.17/xlator/debug/io-stats.so(+0x5c95)[0x7fda0dfbbc95] (--> /opt/lib/libglusterfs.so.0(default_lookup+0xb4)[0x7fda138326e4] ))))) 0-common-storage-pool-write-behind: processing queues
[2023-08-18 05:45:57.349847 +0000] D [MSGID: 0] [write-behind.c:1689:__wb_pick_winds] 0-common-storage-pool-write-behind: (unique=55, fop=LOOKUP, gfid=a0d03120-a76d-4ec4-afda-64ef29befdda, gen=0): picking the request for winding
[2023-08-18 05:45:57.349958 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:57.350059 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:57.350240 +0000] D [write-behind.c:411:__wb_request_unref] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x76b1)[0x7fda0dfef6b1] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7ead)[0x7fda0dfefead] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x8ca8)[0x7fda0dff0ca8] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] ))))) 0-common-storage-pool-write-behind: (unique = 55, fop=LOOKUP, gfid=a0d03120-a76d-4ec4-afda-64ef29befdda, gen=0): destroying request, removing from all queues
[2023-08-18 05:45:57.350854 +0000] D [write-behind.c:1742:wb_process_queue] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7fe9)[0x7fda0dfeffe9] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] (--> /opt/lib/glusterfs/2023.04.17/xlator/debug/io-stats.so(+0x5c95)[0x7fda0dfbbc95] (--> /opt/lib/libglusterfs.so.0(default_lookup+0xb4)[0x7fda138326e4] ))))) 0-common-storage-pool-write-behind: processing queues
[2023-08-18 05:45:57.350878 +0000] D [MSGID: 0] [write-behind.c:1689:__wb_pick_winds] 0-common-storage-pool-write-behind: (unique=56, fop=LOOKUP, gfid=df1f4e82-89e1-4a20-9985-b6ddc28651f4, gen=0): picking the request for winding
[2023-08-18 05:45:57.350964 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:57.351058 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:57.351511 +0000] D [write-behind.c:411:__wb_request_unref] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x76b1)[0x7fda0dfef6b1] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x7ead)[0x7fda0dfefead] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0x8ca8)[0x7fda0dff0ca8] (--> /opt/lib/glusterfs/2023.04.17/xlator/performance/write-behind.so(+0xc1d0)[0x7fda0dff41d0] ))))) 0-common-storage-pool-write-behind: (unique = 56, fop=LOOKUP, gfid=df1f4e82-89e1-4a20-9985-b6ddc28651f4, gen=0): destroying request, removing from all queues
[2023-08-18 05:45:57.351760 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-0: ping timeout is 0, returning
[2023-08-18 05:45:57.351873 +0000] D [rpc-clnt-ping.c:290:rpc_clnt_start_ping] 0-common-storage-pool-client-1: ping timeout is 0, returning
[2023-08-18 05:45:57.352404 +0000] W [MSGID: 108027] [afr-common.c:2888:afr_attempt_readsubvol_set] 0-common-storage-pool-replicate-0: no read subvols for /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee
[2023-08-18 05:45:57.352446 +0000] D [MSGID: 0] [afr-common.c:3082:afr_lookup_done] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-replicate-0 returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:57.352511 +0000] D [MSGID: 0] [utime.c:218:gf_utime_set_mdata_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-utime returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:57.352537 +0000] D [MSGID: 0] [write-behind.c:2371:wb_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool-write-behind returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:57.352574 +0000] D [MSGID: 0] [io-stats.c:2284:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7fd9fc000d08, common-storage-pool returned -1 [Transport endpoint is not connected]
[2023-08-18 05:45:57.352600 +0000] W [fuse-bridge.c:1052:fuse_entry_cbk] 0-glusterfs-fuse: 57: LOOKUP() /subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee => -1 (Transport endpoint is not connected)
[2023-08-18 05:45:59.928025 +0000] D [name.c:171:client_fill_address_family] 0-common-storage-pool-client-2: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: server-common-storage-pool-2-0.common-storage-pool)
[2023-08-18 05:45:59.929565 +0000] E [MSGID: 101073] [name.c:254:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=0}, {ret=Name or service not known}]
[2023-08-18 05:45:59.929592 +0000] E [name.c:383:af_inet_client_get_remote_sockaddr] 0-common-storage-pool-client-2: DNS resolution failed on host server-common-storage-pool-2-0.common-storage-pool
[2023-08-18 05:45:59.929748 +0000] D [MSGID: 0] [client.c:2235:client_rpc_notify] 0-common-storage-pool-client-2: got RPC_CLNT_DISCONNECT
[2023-08-18 05:45:59.930056 +0000] D [rpc-clnt-ping.c:90:rpc_clnt_remove_ping_timer_locked] (--> /opt/lib/libglusterfs.so.0(_gf_log_callingfn+0x182)[0x7fda137c12f2] (--> /opt/lib/libgfrpc.so.0(+0x6aae)[0x7fda1375caae] (--> /opt/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xd8)[0x7fda13763978] (--> /opt/lib/libgfrpc.so.0(+0xe8b8)[0x7fda137648b8] (--> /opt/lib/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fda1375fbf6] ))))) 0-: : ping timer event already removed
^C
root@kadalu-csi-nodeplugin-lcc25:/var/log/gluster# ls /mnt/common-storage-pool_dawood/subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee
ls: cannot access '/mnt/common-storage-pool_dawood/subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee': Transport endpoint is not connected
root@kadalu-csi-nodeplugin-lcc25:/var/log/gluster# ls /mnt/common-storage-pool_dawood/subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee
ls: cannot access '/mnt/common-storage-pool_dawood/subvol/73/6d/pvc-b966a7fd-c00b-446c-9818-99fae5794aee': Transport endpoint is not connected
root@kadalu-csi-nodeplugin-lcc25:/var/log/gluster#
from kadalu.
From further debugging on the issue found some more info:
- When subVol directories are stuck in healing state observed the following.
root@maglev-master-10-104-241-73:/data/srv/data/brick/subvol/a1/cb# getfattr -m . -d -e hex pvc-36830367-6b27-4d50-baf1-9c00e0c5d805/
# file: pvc-36830367-6b27-4d50-baf1-9c00e0c5d805/
trusted.afr.common-storage-pool-client-0=0x0000000000010acb00000000
trusted.afr.common-storage-pool-client-1=0x000000000000ea0600000000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0xce671f3b681640f2948a709dc9694875
trusted.gfs.squota.limit=0x31303733373431383234
trusted.glusterfs.mdata=0x0100000000000000000000000064ea4c0c000000001a5f9a940000000064ea4bfd000000000c43bd360000000064ea4bfd000000000c43bd36
trusted.glusterfs.namespace=0x74727565
- Here heal is yet to be done on client-0 and client-1
- Now client-2 is picking-up this entry for heal pending and try to heal in other two bricks.
- In the bricks heal is failing with an error
Remove-Xattr : operation is not supported
Client-2 logs
The message "I [MSGID: 108026] [afr-self-heal-common.c:1758:afr_log_selfheal] 0-common-storage-pool-replicate-0: Completed metadata selfheal on ce671f3b-6816-40f2-948a-709dc9694875. sources=[2] sinks=" repeated 117 times between [2023-08-28 09:12:58.468222 +0000] and [2023-08-28 09:14:57.207600 +0000]
[2023-08-28 09:14:58.215228 +0000] I [MSGID: 108026] [afr-self-heal-metadata.c:50:__afr_selfheal_metadata_do] 0-common-storage-pool-replicate-0: performing metadata selfheal on ce671f3b-6816-40f2-948a-709dc9694875
[2023-08-28 09:14:58.218331 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1057:client4_0_removexattr_cbk] 0-common-storage-pool-client-0: remote operation failed. [{errno=95}, {error=Operation not supported}]
[2023-08-28 09:14:58.220900 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1057:client4_0_removexattr_cbk] 0-common-storage-pool-client-1: remote operation failed. [{errno=95}, {error=Operation not supported}]
[2023-08-28 09:14:58.224025 +0000] I [MSGID: 108026] [afr-self-heal-common.c:1758:afr_log_selfheal] 0-common-storage-pool-replicate-0: Completed metadata selfheal on ce671f3b-6816-40f2-948a-709dc9694875. sources=[2] sinks=
The message "I [MSGID: 108026] [afr-self-heal-metadata.c:50:__afr_selfheal_metadata_do] 0-common-storage-pool-replicate-0: performing metadata selfheal on ce671f3b-6816-40f2-948a-709dc9694875" repeated 117 times between [2023-08-28 09:14:58.215228 +0000] and [2023-08-28 09:16:56.952424 +0000]
In other clients:
[2023-08-28 09:22:13.429601 +0000] I [MSGID: 115058] [server-rpc-fops_v2.c:777:server4_removexattr_cbk] 0-common-storage-pool-server: REMOVEXATTR info [{frame=613931}, {path=/subvol/a1/cb/pvc-36830367-6b27-4d50-baf1-9c00e0c5d805}, {uuid_utoa=ce671f3b-6816-40f2-948a-709dc9694875}, {name=}, {client=CTX_ID:322bad7c-cb23-4ae1-b717-8533410f2326-GRAPH_ID:0-PID:18-HOST:server-common-storage-pool-0-0-PC_NAME:common-storage-pool-client-1-RECON_NO:-0}, {error-xlator=-}, {errno=95}, {error=Operation not supported}]
[2023-08-28 09:22:13.430435 +0000] I [MSGID: 0] [simple-quota.c:260:sq_update_hard_limit] 0-common-storage-pool-simple-quota: hardlimit update: ce671f3b-6816-40f2-948a-709dc9694875 1073741824 0
[2023-08-28 09:22:14.444170 +0000] E [MSGID: 0] [server-rpc-fops_v2.c:2846:server4_removexattr_resume] 0-/bricks/common-storage-pool/data/brick: /subvol/a1/cb/pvc-36830367-6b27-4d50-baf1-9c00e0c5d805: removal of namespace is not allowed [Operation not supported]
[2023-08-28 09:22:14.444220 +0000] I [MSGID: 115058] [server-rpc-fops_v2.c:777:server4_removexattr_cbk] 0-common-storage-pool-server: REMOVEXATTR info [{frame=613940}, {path=/subvol/a1/cb/pvc-36830367-6b27-4d50-baf1-9c00e0c5d805}, {uuid_utoa=ce671f3b-6816-40f2-948a-709dc9694875}, {name=}, {client=CTX_ID:322bad7c-cb23-4ae1-b717-8533410f2326-GRAPH_ID:0-PID:18-HOST:server-common-storage-pool-0-0-PC_NAME:common-storage-pool-client-1-RECON_NO:-0}, {error-xlator=-}, {errno=95}, {error=Operation not supported}]
[2023-08-28 09:22:14.444679 +0000] I [MSGID: 0] [simple-quota.c:260:sq_update_hard_limit] 0-common-storage-pool-simple-quota: hardlimit update: ce671f3b-6816-40f2-948a-709dc9694875 1073741824 0
from kadalu.
As two other server pods are running fine, FUSE should not be showing error while accessing the data
- you mentioned bringing down node interface on that node, are you sure components other than Kadalu came up fine?
- along with server pod nodeplugin also would've went down, so after they are up did you restarted your application pod?
- pls run heal info or trigger heals from provisioned pod but not directly on server pods
- server pods are stateful sets, when the network was brought down maybe k8s interfered in some unknown way 🤔
- having said above I couldn't possibly decode glusterfs logs and provide a way forward
- I'm inclined not to try this scenario as this isn't at integration layer
from kadalu.
@leelavg
Forgot to mention some more info here.
Above info I have shared is from a system where all Server and node-plugin pods are running fine. Despite the fact we have tried full-heal multiple times we are seeing the above SubVol stuck in HealPeding state.
Coming back to Transport endpoint not connected issue:
We are seeing this issue as a consequence of the above one. As subVol is stuck in heal-pending state after couple nertwork flaps in different nodes(one at a time) we observe that Client is left with no brick with correct data for it to access, thus going to TransportNotConnected issue.
from kadalu.
@amarts a quick question, now that simple-quota is in gluster release branches, can we directly take gluster builds and containerize them for Kadalu?
from kadalu.
@mohammaddawoodshaik thanks for detailed debugging, and logs for the issue.
just opened gluster/glusterfs#4224, which hopefully should fix the issue.
Lets wait for comments from the experts, once it is found to be fine, will merge it in our base branch, and make a kadalu release.
Hello @amarts -
I have patched my clusters with the latest fix you have given. Despite our efforts to rectify the issue, it still persists.
Following the error log I see in Brick:
[2024-02-26 10:10:40.901386 +0000] I [MSGID: 115036] [server.c:494:server_rpc_notify] 0-common-storage-pool-server: disconnecting connection [{client-uid=CTX_ID:5365c705-8706-4c1e-b4d9-706543c870c0-GRAPH_ID:0-PID:217-HOST:server-common-storage-pool-2-0-PC_NAME:common-storage-pool-client-0-RECON_NO:-0}]
[2024-02-26 10:10:40.901536 +0000] I [MSGID: 101054] [client_t.c:374:gf_client_unref] 0-common-storage-pool-server: Shutting down connection CTX_ID:5365c705-8706-4c1e-b4d9-706543c870c0-GRAPH_ID:0-PID:217-HOST:server-common-storage-pool-2-0-PC_NAME:common-storage-pool-client-0-RECON_NO:-0
The message "I [MSGID: 108026] [afr-self-heal-metadata.c:50:__afr_selfheal_metadata_do] 0-common-storage-pool-replicate-0: performing metadata selfheal on e51affac-4d26-4649-94dc-2eb7d765dc7b" repeated 173 times between [2024-02-26 10:09:35.004196 +0000] and [2024-02-26 10:11:02.194570 +0000]
The message "W [MSGID: 114031] [client-rpc-fops_v2.c:1057:client4_0_removexattr_cbk] 0-common-storage-pool-client-2: remote operation failed. [{errno=95}, {error=Operation not supported}]" repeated 173 times between [2024-02-26 10:09:35.006733 +0000] and [2024-02-26 10:11:02.195794 +0000]
The message "I [MSGID: 108026] [afr-self-heal-common.c:1758:afr_log_selfheal] 0-common-storage-pool-replicate-0: Completed metadata selfheal on e51affac-4d26-4649-94dc-2eb7d765dc7b. sources=[0] 1 sinks=" repeated 173 times between [2024-02-26 10:09:35.009663 +0000] and [2024-02-26 10:11:02.197650 +0000]
[2024-02-26 10:11:03.200718 +0000] I [MSGID: 108026] [afr-self-heal-metadata.c:50:__afr_selfheal_metadata_do] 0-common-storage-pool-replicate-0: performing metadata selfheal on e51affac-4d26-4649-94dc-2eb7d765dc7b
[2024-02-26 10:11:03.201815 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1057:client4_0_removexattr_cbk] 0-common-storage-pool-client-2: remote operation failed. [{errno=95}, {error=Operation not supported}]
[2024-02-26 10:11:03.203594 +0000] I [MSGID: 108026] [afr-self-heal-common.c:1758:afr_log_selfheal] 0-common-storage-pool-replicate-0: Completed metadata selfheal on e51affac-4d26-4649-94dc-2eb7d765dc7b. sources=[0] 1 sinks=
And also I have two SubVols(PVCs) created, and I see that these dirs are stuck in HealPending state, but data inside them is healing properly.
List of files needing a heal on common-storage-pool:
Brick server-common-storage-pool-0-0.common-storage-pool:/bricks/common-storage-pool/data/brick
/subvol/3a/74/pvc-bac13d60-444a-400d-a7e3-e343fd8b45c6
/subvol/aa/84/pvc-01833d52-a37d-4ded-b455-9a454d11343a
cc: @leelavg @aravindavk @vatsa287
from kadalu.
Related Issues (20)
- [Bug]: nomad controller exception HOT 2
- [Bug]: Over-provisioning stops working when one of the PVC is resized HOT 6
- [Bug]: Failed to pull image "docker.io/library/busybox" HOT 8
- [Bug]: Failed to mount device - not a block device HOT 6
- [Bug]: kadalu-operator can't response prometheus metrics from `/metrics`, except `/metrics.json` HOT 2
- [Bug]: Storage not created HOT 2
- [Bug]: Kadalu upgrading not working properly and PV data not correct HOT 2
- [Bug]: kadalu-operator.storage missing, how to add performance options HOT 8
- 3 node cluster - Arbiter volume still becomes unavailable temporarly when one node is rebooted HOT 3
- [RFE]: Can't create a volume via the kadalu cli not using kubernetes HOT 3
- [Bug]: HOT 1
- kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name kadalu not found in the list of registered CSI drivers HOT 3
- [Bug]: Error "Failed to open socket connection" while binding PVC HOT 4
- [Bug]: CSI: Sending SIGHUP to main.py only works once
- [Bug]: restart kadalu-csi-nodeplugin daemonset would make current gfs mount unavailable. HOT 2
- [Bug]: Frequent exceptions in Operator: "Expired: too old resource version: 1058643076 (1578740144)"
- [Bug]: Exception in Operator when name of added KadaluStorage starts with name of existing KadaluStorage HOT 2
- [RFE]: Mounting multiple GlusterFS volumes fails if referencing one or more missing volumes on the server. HOT 2
- [Bug]: ResourceExhausted desc = No Hosting Volumes available, add more storage HOT 1
- [Bug]: Double mounts after kublet restart
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kadalu.