Skip to content

Split-Brain on Arbiter Volume #4613

@rafikc30

Description

@rafikc30

On our 2*3 cluster with arbiter setup, we have observed split-brain. We had some issues at the brick which caused write operation failures. Here is the log snippet from the client log

[2024-10-27 08:09:31.006068] W [MSGID: 114031] [client-rpc-fops_v2.c:674:client4_0_writev_cbk] 0-vol-client-3: remote operation failed. [{errno=22},{error=Invalid argument}]
[2024-10-27 08:09:31.006188] W [MSGID: 114031] [client-rpc-fops_v2.c:674:client4_0_writev_cbk] 0-vol-client-3: remote operation failed. [{errno=22},{error=Invalid argument}]
[2024-10-27 08:09:31.006436] W [MSGID: 114031] [client-rpc-fops_v2.c:674:client4_0_writev_cbk] 0-vol-client-4: remote operation failed. [{errno=22},{error=Invalid argument}]
[2024-10-27 08:09:31.006519] W [MSGID: 114031] [client-rpc-fops_v2.c:674:client4_0_writev_cbk] 0-vol-client-4: remote operation failed. [{errno=22},{error=Invalid argument}]
[2024-10-27 08:09:31.008177] W [MSGID: 108001] [afr-transaction.c:1016:afr_handle_quorum] 0-vol-replicate-1: 6d857c1a-6cc6-464b-9dc1-0ea4ba45a36e: Failing WRITE as quorum is not met [Invalid argument]
[2024-10-27 08:09:32.225315] E [MSGID: 108008] [afr-transaction.c:2802:afr_write_txn_refresh_done] 0-vol-replicate-1: Failing SETATTR on gfid 6d857c1a-6cc6-464b-9dc1-0ea4ba45a36e: split-brain observed. [Input/output error]
[2024-10-27 08:09:32.242549] W [MSGID: 108027] [afr-common.c:2897:afr_attempt_readsubvol_set] 0-vol-replicate-1: no read subvols for /vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
The message "W [MSGID: 108027] [afr-common.c:2897:afr_attempt_readsubvol_set] 0-vol-replicate-1: no read subvols for /vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar" repeated 35 times between [2024-10-27 08:09:32.242549 and [2024-10-27 08:09:32.311238]
[2024-10-27 08:09:32.721827] W [MSGID: 114031] [client-rpc-fops_v2.c:674:client4_0_writev_cbk] 0-vol-client-4: remote operation failed. [{errno=22},{error=Invalid argument}]

From our audit logs

024-10-27T08:09:30.852669+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|open|ok|w|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:30.855084+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|fchmod|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar|744
2024-10-27T08:09:30.855195+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|create_file|ok|0x40000080|file|overwrite_if|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:30.859502+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|create_file|ok|0x100|file|open|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:30.861911+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|ntimes|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar|||Sun Oct 27 08:09:31 AM 2024 UTC|Sun Oct 27 08:09:31 AM 2024 UTC
2024-10-27T08:09:30.862380+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|ntimes|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar|||Sun Oct 27 08:09:31 AM 2024 UTC|
2024-10-27T08:09:31.002971+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.006857+00:00 NODE01 smbd_audit: message repeated 7 times: [ stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar]
2024-10-27T08:09:31.006984+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.007402+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.008381+00:00 NODE01 smbd_audit: message repeated 2 times: [ stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar]
2024-10-27T08:09:31.008467+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|fail (Success)|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.008940+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.012798+00:00 NODE01 smbd_audit: message repeated 7 times: [ stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar]
2024-10-27T08:09:31.013150+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.013307+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.013544+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.013743+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/1
2024-10-27T08:09:31.007402+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.008381+00:00 NODE01 smbd_audit: message repeated 2 times: [ stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar]
2024-10-27T08:09:31.008467+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|fail (Success)|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.008940+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.012798+00:00 NODE01 smbd_audit: message repeated 7 times: [ stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar]
2024-10-27T08:09:31.013150+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.013307+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.013544+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.013743+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.013935+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.014023+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.014195+00:00 NODE01 smbd_audit: message repeated 2 times: [ stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar]
2024-10-27T08:09:31.014434+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.014512+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.015057+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.015206+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.015572+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.017377+00:00 NODE01 smbd_audit: message repeated 3 times: [ stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok
2024-10-27T08:09:31.013935+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.014023+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.014195+00:00 NODE01 smbd_audit: message repeated 2 times: [ stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar]
2024-10-27T08:09:31.014434+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.014512+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.015057+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.015206+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.015572+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.017377+00:00 NODE01 smbd_audit: message repeated 3 times: [ stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar]
2024-10-27T08:09:31.017666+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|fail (Success)|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.017928+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.018006+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|fail (Success)|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.018400+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.020056+00:00 NODE01 smbd_audit: message repeated 3 times: [ stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar]
2024-10-27T08:09:31.020165+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|fail (Success)|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.020779+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_send|ok|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
2024-10-27T08:09:31.020955+00:00 NODE01 smbd_audit: stprage_audit|vol|192.168.1.22|vol|vol|pwrite_recv|fail (Success)|/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar

Extended attributes on the bricks

NODE01

NODE01:/home/ # stat /gluster/brick4/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar; getfattr -d -m . -e hex /gluster/brick4/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
File: /gluster/brick4/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
Size: 655360 Blocks: 1032 IO Block: 4096 regular file
Device: 850h/2128d Inode: 14119185214 Links: 2
Access: (0744/-rwxr--r--) Uid: ( 2000/fsnobody) Gid: ( 2000/fsnobody)
Access: 2024-10-27 08:09:30.000000000 +0000
Modify: 2024-10-27 08:09:31.007783346 +0000
Change: 2025-08-22 15:26:01.420490610 +0000
Birth: -
getfattr: Removing leading '/' from absolute path names

file: gluster/brick4/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar

trusted.afr.vol-client-4=0x000000010000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0400000000000000670d11f00009dbaf
trusted.gfid=0x6d857c1a6cc6464b9dc10ea4ba45a36e
trusted.gfid2path.14ee8ca49d3d2deb=0x38336236656234362d393965662d343432332d623734302d3932356161396165376339382f37354236324443342d3537303830312e746172
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.mdata=0x01000000000000000000000000671df53b000000000074ab1000000000671df53b000000000074ab1000000000671df53a0000000000000000
trusted.pgfid.83b6eb46-99ef-4423-b740-925aa9ae7c98=0x00000001
trusted.worm.attr=0x312f302f3000
trusted.worm_file=0x3100

Node02

NODE02:/home/# stat /gluster/brick4/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar; getfattr -d -m . -e hex /gluster/brick4/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
File: /gluster/brick4/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
Size: 655360 Blocks: 1032 IO Block: 4096 regular file
Device: 850h/2128d Inode: 3919463680 Links: 2
Access: (0744/-rwxr--r--) Uid: ( 2000/fsnobody) Gid: ( 2000/fsnobody)
Access: 2024-10-27 08:09:30.000000000 +0000
Modify: 2024-10-27 08:09:31.004406543 +0000
Change: 2025-08-22 15:26:02.017620385 +0000
Birth: -
getfattr: Removing leading '/' from absolute path names

file: gluster/brick4/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar

trusted.afr.vol-client-3=0x000000010000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0400000000000000670d11f3000649c0
trusted.gfid=0x6d857c1a6cc6464b9dc10ea4ba45a36e
trusted.gfid2path.14ee8ca49d3d2deb=0x38336236656234362d393965662d343432332d623734302d3932356161396165376339382f37354236324443342d3537303830312e746172
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.mdata=0x01000000000000000000000000671df53b000000000074ab1000000000671df53b000000000074ab1000000000671df53a0000000000000000
trusted.pgfid.83b6eb46-99ef-4423-b740-925aa9ae7c98=0x00000001
trusted.worm.attr=0x312f302f3000
trusted.worm_file=0x3100

NODEARBITER

NODEARBITER:/home/ # stat /gluster/arbiter*/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar; getfattr -d -m . -e hex /gluster/arbiter*/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
File: /gluster/arbiter3/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar
Size: 0 Blocks: 8 IO Block: 4096 regular empty file
Device: 830h/2096d Inode: 1426077 Links: 2
Access: (0744/-rwxr--r--) Uid: ( 2000/fsnobody) Gid: ( 2000/fsnobody)
Access: 2024-10-27 08:09:30.000000000 +0000
Modify: 2024-10-27 08:09:31.000000000 +0000
Change: 2024-10-27 08:09:32.216380246 +0000
Birth: -
getfattr: Removing leading '/' from absolute path names

file: gluster/arbiter3/glusterbrick/vol/data/2022/1/5/15/4EB16FFC/75B62DC4-570801.tar

trusted.afr.vol-client-3=0x000000010000000000000000
trusted.afr.vol-client-4=0x000000010000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000670d11f3000ad130
trusted.gfid=0x6d857c1a6cc6464b9dc10ea4ba45a36e
trusted.gfid2path.14ee8ca49d3d2deb=0x38336236656234362d393965662d343432332d623734302d3932356161396165376339382f37354236324443342d3537303830312e746172
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.mdata=0x01000000000000000000000000671df53b000000000029520000000000671df53b000000000029520000000000671df53a0000000000000000
trusted.pgfid.83b6eb46-99ef-4423-b740-925aa9ae7c98=0x00000001
trusted.worm.attr=0x312f302f3000
trusted.worm_file=0x3100

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions