Deion of problem: glusterd service fails to start after a

glusterd service fails to start after normal reboot about glusterfs HOT 1 OPEN

git2212 commented on May 9, 2024

glusterd service fails to start after normal reboot

from glusterfs.

Comments (1)

mohammaddawoodshaik commented on May 9, 2024

In our case also we are seeing this issue. We are having one volume with 3-brick:3-replica setup with 3-nodes. in one node we have restarted all gluster services. And after that glusterd was continously crashing, stating 0-management: Initialization of volume 'management' failed, review your volfile again.

Logs from GlusterD:

[2023-10-23 04:35:15.477485] W [glusterfsd.c:1570:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f50547c96db] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xfd) [0x55de29541b4d] -->/usr/sbin/glusterd(cleanup_and_exit+0x54) [0x55de29541994] ) 0-: received signum (15), shutting down
[2023-10-23 04:38:56.022520] I [MSGID: 100030] [glusterfsd.c:2847:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 6.10 (args: /usr/sbin/glusterd -N -p /var/run/glusterd.pid)
[2023-10-23 04:38:56.022658] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid of current running process is 88
[2023-10-23 04:38:56.071590] I [MSGID: 106478] [glusterd.c:1422:init] 0-management: Maximum allowed open file descriptors set to 65536
[2023-10-23 04:38:56.071663] I [MSGID: 106479] [glusterd.c:1478:init] 0-management: Using /var/lib/glusterd/ as working directory
[2023-10-23 04:38:56.071704] I [MSGID: 106479] [glusterd.c:1484:init] 0-management: Using /var/run/gluster as pid file working directory
[2023-10-23 04:38:56.073867] I [socket.c:1022:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2023-10-23 04:38:56.097616] I [socket.c:965:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 11
[2023-10-23 04:38:56.097971] I [MSGID: 106059] [glusterd.c:1860:init] 0-management: base-port override: 49152
[2023-10-23 04:38:56.097982] I [MSGID: 106059] [glusterd.c:1865:init] 0-management: max-port override: 49152
[2023-10-23 04:39:04.956934] I [MSGID: 106513] [glusterd-store.c:2394:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 60000
[2023-10-23 04:39:34.308779] I [MSGID: 106544] [glusterd.c:152:glusterd_uuid_init] 0-management: retrieved UUID: ddc1d9f0-a6e5-4751-8e35-0f91d326fa41
[2023-10-23 04:39:34.912425] I [MSGID: 106498] [glusterd-handler.c:3687:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2023-10-23 04:39:34.912618] I [MSGID: 106498] [glusterd-handler.c:3687:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2023-10-23 04:39:34.912669] W [MSGID: 106061] [glusterd-handler.c:3490:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2023-10-23 04:39:34.912720] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2023-10-23 04:39:34.916260] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
Final graph:
+------------------------------------------------------------------------------+
1: volume management
2: type mgmt/glusterd
3: option rpc-auth.auth-glusterfs on
4: option rpc-auth.auth-unix on
5: option rpc-auth.auth-null on
6: option rpc-auth-allow-insecure on
7: option transport.listen-backlog 1024
8: option max-port 49152
9: option base-port 49152
10: option transport.address-family inet
11: option transport.socket.listen-port 24007
12: option event-threads 1
13: option ping-timeout 0
14: option transport.socket.read-fail-log off
15: option transport.socket.keepalive-interval 2
16: option transport.socket.keepalive-time 10
17: option glusterd-sockfile /etc/glusterd_socket/gluster.sock
18: option transport-type socket
19: option working-directory /var/lib/glusterd/
20: end-volume
21:
+------------------------------------------------------------------------------+
[2023-10-23 04:39:34.916252] W [MSGID: 106061] [glusterd-handler.c:3490:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2023-10-23 04:39:35.015736] I [MSGID: 101190] [event-epoll.c:688:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2023-10-23 04:39:35.017093] I [MSGID: 106487] [glusterd-handler.c:1516:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2023-10-23 04:39:35.017342] I [MSGID: 106487] [glusterd-handler.c:1516:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2023-10-23 04:39:35.021484] I [MSGID: 106163] [glusterd-handshake.c:1389:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 60000
[2023-10-23 04:39:35.022776] I [MSGID: 106493] [glusterd-rpc-ops.c:468:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 0c333528-b176-4229-8df5-230844b7ee6f, host: <>, port: 0
[2023-10-23 04:39:35.562534] I [glusterd-utils.c:6314:glusterd_brick_start] 0-management: starting a fresh brick process for brick /mnt/bricks/ndp_brick
[2023-10-23 04:39:36.599410] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2023-10-23 04:40:04.394492] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2023-10-23 04:40:04.394624] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: nfs already stopped
[2023-10-23 04:40:04.394652] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: nfs service is stopped
[2023-10-23 04:40:04.394972] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2023-10-23 04:40:04.395758] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: glustershd already stopped
[2023-10-23 04:40:04.395783] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: glustershd service is stopped
[2023-10-23 04:40:04.395806] I [MSGID: 106567] [glusterd-svc-mgmt.c:220:glusterd_svc_start] 0-management: Starting glustershd service
[2023-10-23 04:40:05.400050] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2023-10-23 04:40:05.400307] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: quotad already stopped
[2023-10-23 04:40:05.400340] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: quotad service is stopped
[2023-10-23 04:40:05.400376] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2023-10-23 04:40:05.400515] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: bitd already stopped
[2023-10-23 04:40:05.400542] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: bitd service is stopped
[2023-10-23 04:40:05.400571] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2023-10-23 04:40:05.400693] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: scrub already stopped
[2023-10-23 04:40:05.400711] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: scrub service is stopped
[2023-10-23 04:40:05.400754] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2023-10-23 04:40:05.400888] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-gfproxyd: setting frame-timeout to 600
[2023-10-23 04:40:05.401121] I [MSGID: 106492] [glusterd-handler.c:2796:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 0c333528-b176-4229-8df5-230844b7ee6f
[2023-10-23 04:40:23.695495] I [MSGID: 106502] [glusterd-handler.c:2837:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2023-10-23 04:40:23.695661] I [MSGID: 106493] [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 0c333528-b176-4229-8df5-230844b7ee6f
[2023-10-23 04:40:23.714829] I [MSGID: 106493] [glusterd-rpc-ops.c:468:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: c03c5b65-fe44-4342-bacd-76947c016646, host: <>, port: 0
[2023-10-23 04:40:25.356022] I [MSGID: 106490] [glusterd-handler.c:2611:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 0c333528-b176-4229-8df5-230844b7ee6f
[2023-10-23 04:40:25.356665] I [MSGID: 106009] [glusterd-utils.c:3466:glusterd_compare_friend_volume] 0-management: Version of volume ndp_vol differ. local version = 7370, remote version = 7382 on peer <>
[2023-10-23 04:40:25.357955] I [MSGID: 106493] [glusterd-handler.c:3883:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to <> (0), ret: 0, op_ret: 0
[2023-10-23 04:40:26.595064] W [glusterfsd.c:1570:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f1e9eb816db] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xfd) [0x55d33e735b4d] -->/usr/sbin/glusterd(cleanup_and_exit+0x54) [0x55d33e735994] ) 0-: received signum (15), shutting down
[2023-10-23 04:40:30.011947] I [MSGID: 100030] [glusterfsd.c:2847:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 6.10 (args: /usr/sbin/glusterd -N -p /var/run/glusterd.pid)
[2023-10-23 04:40:30.012251] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid of current running process is 77
[2023-10-23 04:40:30.096321] I [MSGID: 106478] [glusterd.c:1422:init] 0-management: Maximum allowed open file descriptors set to 65536
[2023-10-23 04:40:30.096375] I [MSGID: 106479] [glusterd.c:1478:init] 0-management: Using /var/lib/glusterd/ as working directory
[2023-10-23 04:40:30.096392] I [MSGID: 106479] [glusterd.c:1484:init] 0-management: Using /var/run/gluster as pid file working directory
[2023-10-23 04:40:30.099942] I [socket.c:1022:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2023-10-23 04:40:30.101375] I [socket.c:965:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 11
[2023-10-23 04:40:30.101751] I [MSGID: 106059] [glusterd.c:1860:init] 0-management: base-port override: 49152
[2023-10-23 04:40:30.101766] I [MSGID: 106059] [glusterd.c:1865:init] 0-management: max-port override: 49152
[2023-10-23 04:40:33.389696] I [MSGID: 106513] [glusterd-store.c:2394:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 60000
[2023-10-23 04:40:33.390046] E [MSGID: 106206] [glusterd-store.c:3139:glusterd_store_update_volinfo] 0-management: Failed to get next store iter
[2023-10-23 04:40:33.390062] E [MSGID: 106207] [glusterd-store.c:3404:glusterd_store_retrieve_volume] 0-management: Failed to update volinfo for ndp_vol volume
[2023-10-23 04:40:33.390092] E [MSGID: 106201] [glusterd-store.c:3641:glusterd_store_retrieve_volumes] 0-management: Unable to restore volume: ndp_vol
[2023-10-23 04:40:33.390119] E [MSGID: 101019] [xlator.c:629:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2023-10-23 04:40:33.390128] E [MSGID: 101066] [graph.c:362:glusterfs_graph_init] 0-management: initializing translator failed
[2023-10-23 04:40:33.390134] E [MSGID: 101176] [graph.c:725:glusterfs_graph_activate] 0-graph: init failed
[2023-10-23 04:40:33.390230] W [glusterfsd.c:1570:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xf7) [0x561488e5cc27] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x22f) [0x561488e5cacf] -->/usr/sbin/glusterd(cleanup_and_exit+0x54) [0x561488e58994] ) 0-: received signum (-1), shutting down

FYI:
We are running GlusterFS-6.10

Some more info:
When we look at the config directly of the GlusterFS, we saw that some files are missing.
/var/lib/glusterfs/vols/ndp_vol/ndp_vol.<<brick-hostnames*>>.mnt-bricks-ndp_brick.vol - This file went missing on the GlusterFS crashing node.

@git2212 - Just a suggestion, could you also check, if these config files are missing for you?

from glusterfs.

glusterd service fails to start after normal reboot about glusterfs HOT 1 OPEN

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent