elastio / elastio-snap Goto Github PK
View Code? Open in Web Editor NEWThis project forked from datto/dattobd
kernel module for taking block-level snapshots and incremental backups of Linux block devices
License: GNU General Public License v2.0
This project forked from datto/dattobd
kernel module for taking block-level snapshots and incremental backups of Linux block devices
License: GNU General Public License v2.0
Hi,
I saw the function elastio_snap_get_super to get the get_super address and the GET_SUPER_ADDR is the address of the get_super got from sysmap, but I have a question, why here need to plus the address of kfree and minus the KFREE_ADDR of the address kfree? are they different between address of kfree and KFREE_ADDR? please see the following code.
struct super_block* (elastio_snap_get_super)(struct block_device ) = (GET_SUPER_ADDR != 0) ?
(struct super_block ()(struct block_device*)) (GET_SUPER_ADDR + (long long)(((void *)kfree) - (void *)KFREE_ADDR)) : NULL;
Thanks!
It hangs on the setup snapshot operation.
dmesg:
[Jun17 16:12] elastio_snap: loading out-of-tree module taints kernel.
[ +0.000050] elastio_snap: module verification failed: signature and/or required key missing - tainting kernel
[ +0.000755] elastio-snap: module init
[ +0.000001] elastio-snap: get major number
[ +0.000001] elastio-snap: allocate global device array
[ +0.000000] elastio-snap: registering proc file
[ +0.000004] elastio-snap: registering control device
[ +0.001784] elastio-snap: locating system call table
[ +0.000000] elastio-snap: failed to locate system call table, persistence disabled
[ +0.227487] loop: module loaded
[ +0.344622] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
[ +0.002557] elastio-snap: ioctl command received: 1076379905
[ +0.000004] elastio-snap: received setup snap ioctl - 9 : /dev/loop0 : /tmp/elastio-snap_911/cow.snap
[ +0.000005] elastio-snap: allocating device struct
[ +0.000001] elastio-snap: initializing tracer
[ +0.000000] elastio-snap: finding block device
[ +0.000001] elastio-snap: checking block device is not already being traced
[ +0.000001] elastio-snap: fetching the absolute pathname for the base device
[ +0.000003] elastio-snap: calculating block device size and offset
[ +0.000000] elastio-snap: bdev size = 524288, offset = 0
[ +0.000001] elastio-snap: creating cow manager
[ +0.000001] elastio-snap: allocating cow manager, seqid = 1
[ +0.000000] elastio-snap: creating cow file
[ +0.000294] elastio-snap: allocating cow manager array (16 sections)
[ +0.000001] elastio-snap: allocating cow file (26843545 bytes)
[ +0.000071] elastio-snap: finding cow file inode
[ +0.000001] elastio-snap: getting relative pathname of cow file
[ +0.000137] elastio-snap: allocating queue
[ +0.000023] elastio-snap: setting up make request function
[ +0.000000] elastio-snap: setting queue limits
[ +0.000003] elastio-snap: allocating gendisk
[ +0.000002] elastio-snap: initializing gendisk
[ +0.000001] elastio-snap: naming gendisk
[ +0.000001] elastio-snap: block device size: 524288
[ +0.000001] elastio-snap: adding disk
[ +0.000364] elastio-snap: starting mrf kernel thread
[ +0.000043] elastio-snap: creating kernel cow thread
[ +0.000050] elastio-snap: getting the base block device's make_request_fn
[ +0.000000] elastio-snap: freezing 'loop0'
[ +0.032588] elastio-snap: starting tracing
[ +0.000001] elastio-snap: thawing 'loop0'
[ +0.000007] elastio-snap: error finding original_mrf: -14
[Jun17 16:16] INFO: task python3:38157 blocked for more than 120 seconds.
[ +0.000022] Tainted: G OE --------- - - 4.18.0-305.3.1.el8.x86_64 #1
[ +0.000018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000018] python3 D 0 38157 38140 0x80004080
[ +0.000002] Call Trace:
[ +0.000008] __schedule+0x2c4/0x700
[ +0.000003] ? bit_wait_timeout+0x90/0x90
[ +0.000001] schedule+0x38/0xa0
[ +0.000002] io_schedule+0x12/0x40
[ +0.000001] bit_wait_io+0xd/0x50
[ +0.000001] __wait_on_bit+0x6c/0x80
[ +0.000002] out_of_line_wait_on_bit+0x91/0xb0
[ +0.000003] ? init_wait_var_entry+0x50/0x50
[ +0.000003] __sync_dirty_buffer+0xcf/0xe0
[ +0.000020] ext4_commit_super+0x209/0x2b0 [ext4]
[ +0.000006] ? ioctl_transition_inc+0x320/0x320 [elastio_snap]
[ +0.000013] ext4_unfreeze+0x4d/0x60 [ext4]
[ +0.000003] thaw_super_locked+0x2f/0xb0
[ +0.000003] __tracer_transition_tracing+0xb0/0x110 [elastio_snap]
[ +0.000003] __tracer_setup_tracing+0x74/0x120 [elastio_snap]
[ +0.000002] __ioctl_setup+0x356/0x3c0 [elastio_snap]
[ +0.000002] ctrl_ioctl+0x740/0x8d0 [elastio_snap]
[ +0.000004] ? do_vfs_ioctl+0xa4/0x680
[ +0.000001] do_vfs_ioctl+0xa4/0x680
[ +0.000003] ksys_ioctl+0x60/0x90
[ +0.000002] __x64_sys_ioctl+0x16/0x20
[ +0.000002] do_syscall_64+0x5b/0x1a0
[ +0.000003] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ +0.000002] RIP: 0033:0x7f13b3f6d62b
[ +0.000005] Code: Unable to access opcode bytes at RIP 0x7f13b3f6d601.
[ +0.000001] RSP: 002b:00007fffe1d1f758 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[ +0.000002] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f13b3f6d62b
[ +0.000000] RDX: 00007fffe1d1f790 RSI: 0000000040284101 RDI: 0000000000000003
[ +0.000001] RBP: 00007fffe1d1f7c0 R08: 0000000000000000 R09: 00007f13b4cf953d
[ +0.000001] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f13b1a060b0
[ +0.000000] R13: 0000000000000001 R14: 0000000000000028 R15: 0000000000000001
Some storages are quite slow. For instance, there is a performance problem with the AWS ebs volumes. elastio-snap
driver uses asynchronous logic for submitting bio requests to the original device and to the copy-on-write storage file. It seems that ebs
suffers from such behavior. As a result, the split bios are accumulated in memory. See #96.
This problem can be partially resolved by moving COW file to another device.
Another hypothetical new feature of the new functionality with the ability to host COW on another device is to track changes to the device without a file system or with some foreign FS.
In 8539d91, you accidentally replaced Datto
with Assurio
in the note about it being a fork of dattobd
. I suspect this was an unintentional mistake caused by a rather greedy Find+Replace. 😉
It'd be great if you fixed this as soon as reasonably possible.
dkms-assurio-snap package doesn't detect installed kernel-headers.
Installing : dkms-assurio-snap-0.10.13-1.fc31.noarch
Running scriptlet: dkms-assurio-snap-0.10.13-1.fc31.noarch
Loading new assurio-snap-0.10.13 DKMS files...
Building for 5.5.8-200.fc31.x86_64
Module build for kernel 5.5.8-200.fc31.x86_64 was skipped since the
kernel headers for this kernel does not seem to be installed.
rpm -qa kernel-headers
kernel-headers-5.5.8-200.fc31.x86_64
Tests have hardcoded 'loop0' loop device. And it's impossible to run them when a loop device with id 0 already exists. That's the common case for Ubuntu 20.04 with the snapd service.
Fedora34 and Fedora35
This Linux kernel version is the default one on the Ubuntu 20.04 and, presumably, will be default on the Debian 12 (bullseye) which will be released soon, this year.
elastio-snap build is failing on Fedora 32 with the kernel 5.8.9-200.
make -C /lib/modules/5.8.9-200.fc32.x86_64/build M=/home/elastio/elastio-snap/src modules
make[2]: Entering directory '/usr/src/kernels/5.8.9-200.fc32.x86_64'
CC [M] /home/elastio/elastio-snap/src/elastio-snap.o
In file included from ./include/linux/umh.h:4,
from ./include/linux/kmod.h:9,
from ./include/linux/module.h:16,
from /home/elastio/elastio-snap/src/includes.h:11,
from /home/elastio/elastio-snap/src/elastio-snap.c:8:
/home/elastio/elastio-snap/src/elastio-snap.c: In function ‘__tracer_setup_snap’:
./include/linux/gfp.h:297:20: warning: passing argument 1 of ‘blk_alloc_queue’ makes pointer from integer without a cast [-Wint-conversion]
297 | #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| unsigned int
/home/elastio/elastio-snap/src/elastio-snap.c:3584:34: note: in expansion of macro ‘GFP_KERNEL’
3584 | dev->sd_queue = blk_alloc_queue(GFP_KERNEL);
| ^~~~~~~~~~
In file included from /home/elastio/elastio-snap/src/includes.h:13,
from /home/elastio/elastio-snap/src/elastio-snap.c:8:
./include/linux/blkdev.h:1172:55: note: expected ‘blk_qc_t (*)(struct request_queue *, struct bio *)’ {aka ‘unsigned int (*)(struct request_queue *, struct bio *)’} but argument is of type ‘unsigned int’
1172 | struct request_queue *blk_alloc_queue(make_request_fn make_request, int node_id);
| ~~~~~~~~~~~~~~~~^~~~~~~~~~~~
/home/elastio/elastio-snap/src/elastio-snap.c:3584:18: error: too few arguments to function ‘blk_alloc_queue’
3584 | dev->sd_queue = blk_alloc_queue(GFP_KERNEL);
| ^~~~~~~~~~~~~~~
In file included from /home/elastio/elastio-snap/src/includes.h:13,
from /home/elastio/elastio-snap/src/elastio-snap.c:8:
./include/linux/blkdev.h:1172:23: note: declared here
1172 | struct request_queue *blk_alloc_queue(make_request_fn make_request, int node_id);
| ^~~~~~~~~~~~~~~
/home/elastio/elastio-snap/src/elastio-snap.c:3593:2: error: implicit declaration of function ‘blk_queue_make_request’; did you mean ‘blk_queue_max_segments’? [-Werror=implicit-function-declaration]
3593 | blk_queue_make_request(dev->sd_queue, snap_mrf);
| ^~~~~~~~~~~~~~~~~~~~~~
| blk_queue_max_segments
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:281: /home/elastio/elastio-snap/src/elastio-snap.o] Error 1
make[2]: *** [Makefile:1756: /home/elastio/elastio-snap/src] Error 2
make[2]: Leaving directory '/usr/src/kernels/5.8.9-200.fc32.x86_64'
make[1]: *** [Makefile:14: default] Error 2
make[1]: Leaving directory '/home/elastio/elastio-snap/src'
make: *** [Makefile:24: driver] Error 2
After creating the snapshot, if the original volume data changes very frequently, will the system memory be exhausted?
Because when one bio comes, need to clone one bio and add the bio to the queue via bio_queue_add.
If the cow file writing is slower than the original volume writing speed, the system memory must be used exhaust and cause system crash, right?
Fedora 31 (with set -x in the dkms script)
Building initial module for 5.5.8-200.fc31.x86_64
+ set +e
+ dkms build -m assurio-snap -v 0.10.13 -k 5.5.8-200.fc31.x86_64
find: ‘/var/lib/dkms/assurio-snap/0.10.13/build/configure-tests/feature-tests/build/’: No such file or directory
+ case $? in
+ set -e
+ echo Done.
Done.
CentOS 6.10
Installing : dkms-assurio-snap-0.10.13-1.el6.noarch 1/1
Loading new assurio-snap-0.10.13 DKMS files...
Building for 2.6.32-754.3.5.el6.x86_64
Building initial module for 2.6.32-754.3.5.el6.x86_64
find: `/var/lib/dkms/assurio-snap/0.10.13/build/configure-tests/feature-tests/build/': No such file or directory
Done.
Steps to reproduce:
elastio@debian11-amd64-build:~$ sudo losetup --find --show ~/disk1.img
/dev/loop0
elastio@debian11-amd64-build:~$ sudo losetup --find --show ~/disk2.img
/dev/loop1
elastio@debian11-amd64-build:~$ sudo mkfs.ext4 /dev/loop0 -F
mke2fs 1.46.2 (28-Feb-2021)
/dev/loop0 contains a ext4 file system
last mounted on /home/elastio/mount1 on Thu Jan 6 23:41:03 2022
Discarding device blocks: done
Creating filesystem with 262144 1k blocks and 65536 inodes
Filesystem UUID: ae199fc6-4f8e-4634-8594-b42bfa78b4af
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729, 204801, 221185
Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
elastio@debian11-amd64-build:~$ sudo mkfs.ext4 /dev/loop1 -F
mke2fs 1.46.2 (28-Feb-2021)
/dev/loop1 contains a ext4 file system
last mounted on /home/elastio/mount2 on Thu Jan 6 23:41:07 2022
Discarding device blocks: done
Creating filesystem with 262144 1k blocks and 65536 inodes
Filesystem UUID: 9e441755-b1cc-47f4-ba0b-b17b600d5233
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729, 204801, 221185
Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
elastio@debian11-amd64-build:~$ sudo mount /dev/loop0 ~/mount1
elastio@debian11-amd64-build:~$ sudo mount /dev/loop1 ~/mount2
elastio@debian11-amd64-build:~$ cd elastio-snap/
elastio@debian11-amd64-build:~/elastio-snap$ sudo insmod src/elastio-snap.ko debug=1
elastio@debian11-amd64-build:~/elastio-snap$ sudo elioctl setup-snapshot /dev/loop0 ~/mount1/cow 0
elastio@debian11-amd64-build:~/elastio-snap$ sudo elioctl setup-snapshot /dev/loop1 ~/mount1/cow 1
driver returned an error performing specified action. check dmesg for more info: Invalid argument
elastio@debian11-amd64-build:~/elastio-snap$ uname -r
5.10.0-8-amd64
dmesg:
[Jan 6 23:43] elastio_snap: loading out-of-tree module taints kernel.
[ +0.000196] elastio_snap: module verification failed: signature and/or required key missing - tainting kernel
[ +0.001122] elastio-snap: module init
[ +0.000002] elastio-snap: get major number
[ +0.000005] elastio-snap: allocate global device array
[ +0.000001] elastio-snap: registering proc file
[ +0.000011] elastio-snap: registering control device
[ +0.000190] elastio-snap: locating system call table
[ +0.000002] elastio-snap: failed to locate system call table, persistence disabled
[ +24.468829] elastio-snap: ioctl command received: 1076379905
[ +0.000010] elastio-snap: received setup snap ioctl - 0 : /dev/loop0 : /home/elastio/mount1/cow
[ +0.000046] elastio-snap: allocating device struct
[ +0.000002] elastio-snap: initializing tracer
[ +0.000001] elastio-snap: finding block device
[ +0.000004] elastio-snap: checking block device is not already being traced
[ +0.000001] elastio-snap: fetching the absolute pathname for the base device
[ +0.000007] elastio-snap: calculating block device size and offset
[ +0.000002] elastio-snap: bdev size = 524288, offset = 0
[ +0.000002] elastio-snap: creating cow manager
[ +0.000002] elastio-snap: allocating cow manager, seqid = 1
[ +0.000001] elastio-snap: creating cow file
[ +0.000510] elastio-snap: allocating cow manager array (16 sections)
[ +0.000003] elastio-snap: allocating cow file (26843545 bytes)
[ +0.000364] elastio-snap: finding cow file inode
[ +0.000002] elastio-snap: getting relative pathname of cow file
[ +0.002046] elastio-snap: setting up make request function
[ +0.000004] elastio-snap: setting queue limits
[ +0.000005] elastio-snap: allocating gendisk
[ +0.000009] elastio-snap: initializing gendisk
[ +0.000002] elastio-snap: naming gendisk
[ +0.000015] elastio-snap: block device size: 524288
[ +0.000007] elastio-snap: adding disk
[ +0.000445] elastio-snap: starting mrf kernel thread
[ +0.000300] elastio-snap: creating kernel cow thread
[ +0.000344] elastio-snap: getting the base block device's make_request_fn
[ +0.000005] elastio-snap: original mrf is empty, set to elastio_snap_null_mrf
[ +0.000017] elastio-snap: freezing 'loop0'
[ +0.092565] elastio-snap: starting tracing
[ +0.000121] elastio-snap: thawing 'loop0'
[ +0.003585] elastio-snap: minor range = 0 - 0
[ +8.322135] elastio-snap: ioctl command received: 1076379905
[ +0.000003] elastio-snap: received setup snap ioctl - 1 : /dev/loop1 : /home/elastio/mount1/cow
[ +0.000020] elastio-snap: allocating device struct
[ +0.000001] elastio-snap: initializing tracer
[ +0.000000] elastio-snap: finding block device
[ +0.000002] elastio-snap: checking block device is not already being traced
[ +0.000002] elastio-snap: fetching the absolute pathname for the base device
[ +0.000004] elastio-snap: calculating block device size and offset
[ +0.000002] elastio-snap: bdev size = 524288, offset = 0
[ +0.000002] elastio-snap: creating cow manager
[ +0.000000] elastio-snap: allocating cow manager, seqid = 1
[ +0.000002] elastio-snap: creating cow file
[ +0.000067] elastio-snap: allocating cow manager array (16 sections)
[ +0.000001] elastio-snap: allocating cow file (26843545 bytes)
[ +0.000060] elastio-snap: '/home/elastio/mount1/cow' is not on 'loop1': -22
[ +0.000080] elastio-snap: error setting up cow manager: -22
[ +0.000045] elastio-snap: destroying cow manager. close method: 0
[ +0.000008] elastio-snap: error setting up tracer as active snapshot: -22
[ +0.000042] elastio-snap: freeing base block device path
[ +0.000000] elastio-snap: freeing base block device
[ +0.000001] elastio-snap: error during setup ioctl handler: -22
[ +0.000041] elastio-snap: minor range = 0 - 0
When extensive write operations are happen and driver is about to overflow the memory (which is a subject to fix of #96 ), driver should set an error state and do not perform tracing to avoid system crash by consuming all of the available memory.
The Linux kernel 5.9 is just released. Fedora 33 isn't released yet. And even 33 beta has kernel 5.8 by default.
But it's possible to install vanilla kernel 5.9 even onto Fedora 31 using this instruction: https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories
What is the known problem?
The make_request_fn
has been moved to struct block_device_operations
and renamed as submit_bio
torvalds/linux@c62b37d
So even after the fix of #48 compilation of the module will fail like this:
[elastio@fedora31-amd64-build elastio-snap]$ make
make -C src
make[1]: входимо до каталогу «/home/elastio/elastio-snap/src»
if [ ! -f kernel-config.h ] || tail -1 kernel-config.h | grep -qv '#endif'; then mkdir configure-tests/feature-tests/build; ./genconfig.sh "5.9.1-36.vanilla.1.fc31.x86_64" "-w"; fi;
generating configurations for kernel-5.9.1-36.vanilla.1.fc31.x86_64
make[2]: входимо до каталогу «/home/elastio/elastio-snap/src/configure-tests/feature-tests»
make[3]: входимо до каталогу «/usr/src/kernels/5.9.1-36.vanilla.1.fc31.x86_64»
make[3]: Залишаю каталог "/usr/src/kernels/5.9.1-36.vanilla.1.fc31.x86_64"
make[2]: Залишаю каталог "/home/elastio/elastio-snap/src/configure-tests/feature-tests"
performing configure test: HAVE_BDOPS_OPEN_INT - not present
performing configure test: HAVE_BDOPS_OPEN_INODE - not present
performing configure test: HAVE_BDEV_STACK_LIMITS - not present
performing configure test: HAVE_BD_SUPER - present
performing configure test: HAVE_BIO_BI_REMAINING - not present
performing configure test: HAVE_BIO_BI_BDEV - not present
performing configure test: HAVE_BIO_BI_POOL - present
performing configure test: HAVE_BIO_ENDIO_1 - present
performing configure test: HAVE_BIO_ENDIO_INT - not present
performing configure test: HAVE_BIOSET_CREATE_3 - not present
performing configure test: HAVE_BIO_LIST - present
performing configure test: HAVE_BIOSET_INIT - present
performing configure test: HAVE_BIOSET_NEED_BVECS_FLAG - present
performing configure test: HAVE_BLK_ALLOC_QUEUE_MK_REQ_FN_NODE_ID - not present
performing configure test: HAVE_BLK_ALLOC_QUEUE_GFP_T - present
performing configure test: HAVE_BLKDEV_GET_BY_PATH - present
performing configure test: HAVE_BLKDEV_PUT_1 - not present
performing configure test: HAVE_BLK_SET_DEFAULT_LIMITS - present
performing configure test: HAVE_BLK_SET_STACKING_LIMITS - present
performing configure test: HAVE_BLK_STATUS_T - present
performing configure test: HAVE_BVEC_MERGE_DATA - not present
performing configure test: HAVE_BVEC_ITER - present
performing configure test: HAVE_COMPOUND_HEAD - present
performing configure test: HAVE___DENTRY_PATH - not present
performing configure test: HAVE_DENTRY_PATH_RAW - present
performing configure test: HAVE_D_UNLINKED - present
performing configure test: HAVE_ENUM_REQ_OP - not present
performing configure test: HAVE_ENUM_REQ_OPF - present
performing configure test: HAVE_FILE_INODE - present
performing configure test: HAVE_FMODE_T - present
performing configure test: HAVE_FOPS_FALLOCATE - present
performing configure test: HAVE_GENHD_FL_NO_PART_SCAN - present
performing configure test: HAVE_IOPS_FALLOCATE - not present
performing configure test: HAVE_INODE_LOCK - present
performing configure test: HAVE_KERNEL_READ_PPOS - present
performing configure test: HAVE_KERNEL_WRITE_PPOS - present
performing configure test: HAVE_MAKE_REQUEST_FN_INT - not present
performing configure test: HAVE_KERN_PATH - present
performing configure test: HAVE_MAKE_REQUEST_FN_VOID - not present
performing configure test: HAVE_MERGE_BVEC_FN - not present
performing configure test: HAVE_NOTIFY_CHANGE_2 - not present
performing configure test: HAVE_MNT_WANT_WRITE - present
performing configure test: HAVE_NOOP_LLSEEK - present
performing configure test: HAVE_PART_NR_SECTS_READ - not present
performing configure test: HAVE_PROC_CREATE_FN_FILE_OPERATIONS - not present
performing configure test: HAVE_PATH_PUT - present
performing configure test: HAVE_PROC_CREATE_FN_PROC_OPS - present
performing configure test: HAVE_SB_START_WRITE - present
performing configure test: HAVE_SUBMIT_BIO_WAIT - not present
performing configure test: HAVE_STRUCT_PATH - present
performing configure test: HAVE_SUBMIT_BIO_1 - present
performing configure test: HAVE_SYS_OLDUMOUNT - not present
performing configure test: HAVE_TASK_STRUCT_TASK_WORKS_HLIST - not present
performing configure test: HAVE_THAW_BDEV_INT - not present
performing configure test: HAVE_TASK_STRUCT_TASK_WORKS_CB_HEAD - present
performing configure test: HAVE_UAPI_MOUNT_H - present
performing configure test: HAVE_USER_PATH_AT - present
performing configure test: HAVE_UUID_H - present
performing configure test: HAVE_VFS_FALLOCATE - present
performing configure test: HAVE_VFS_UNLINK_2 - not present
performing configure test: HAVE_VZALLOC - present
make[2]: входимо до каталогу «/home/elastio/elastio-snap/src/configure-tests/feature-tests»
make[3]: входимо до каталогу «/usr/src/kernels/5.9.1-36.vanilla.1.fc31.x86_64»
make[3]: Залишаю каталог "/usr/src/kernels/5.9.1-36.vanilla.1.fc31.x86_64"
make[2]: Залишаю каталог "/home/elastio/elastio-snap/src/configure-tests/feature-tests"
performing sys_mount lookup
grep: /lib/modules/5.9.1-36.vanilla.1.fc31.x86_64/System.map: Permission denied
performing sys_umount lookup
grep: /lib/modules/5.9.1-36.vanilla.1.fc31.x86_64/System.map: Permission denied
performing sys_oldumount lookup
grep: /lib/modules/5.9.1-36.vanilla.1.fc31.x86_64/System.map: Permission denied
performing sys_call_table lookup
grep: /lib/modules/5.9.1-36.vanilla.1.fc31.x86_64/System.map: Permission denied
performing printk lookup
grep: /lib/modules/5.9.1-36.vanilla.1.fc31.x86_64/System.map: Permission denied
make -C /lib/modules/5.9.1-36.vanilla.1.fc31.x86_64/build M=/home/elastio/elastio-snap/src modules
make[2]: входимо до каталогу «/usr/src/kernels/5.9.1-36.vanilla.1.fc31.x86_64»
CC [M] /home/elastio/elastio-snap/src/elastio-snap.o
/home/elastio/elastio-snap/src/elastio-snap.c:546:41: помилка: unknown type name ‘make_request_fn’
546 | static inline int elastio_snap_call_mrf(make_request_fn *fn, struct request_queue *q, struct bio *bio){
| ^~~~~~~~~~~~~~~
/home/elastio/elastio-snap/src/elastio-snap.c:848:2: помилка: unknown type name ‘make_request_fn’
848 | make_request_fn *sd_orig_mrf; //block device's original make request function
| ^~~~~~~~~~~~~~~
/home/elastio/elastio-snap/src/elastio-snap.c: У функції ‘snap_mrf_thread’:
/home/elastio/elastio-snap/src/elastio-snap.c:2749:9: помилка: неявне оголошення функції ‘elastio_snap_call_mrf’; ви мали на увазі ‘elastio_snap_get_mnt’? [-Werror=implicit-function-declaration]
2749 | ret = elastio_snap_call_mrf(dev->sd_orig_mrf, elastio_snap_bio_get_queue(bio), bio);
| ^~~~~~~~~~~~~~~~~~~~~
| elastio_snap_get_mnt
/home/elastio/elastio-snap/src/elastio-snap.c: У функції ‘tracing_mrf’:
/home/elastio/elastio-snap/src/elastio-snap.c:3091:2: помилка: unknown type name ‘make_request_fn’
3091 | make_request_fn *orig_mrf = NULL;
| ^~~~~~~~~~~~~~~
/home/elastio/elastio-snap/src/elastio-snap.c: На верхньому рівні:
/home/elastio/elastio-snap/src/elastio-snap.c:3177:53: помилка: unknown type name ‘make_request_fn’
3177 | static int find_orig_mrf(struct block_device *bdev, make_request_fn **mrf){
| ^~~~~~~~~~~~~~~
/home/elastio/elastio-snap/src/elastio-snap.c: У функції ‘__tracer_should_reset_mrf’:
/home/elastio/elastio-snap/src/elastio-snap.c:3210:6: помилка: ‘struct request_queue’ has no member named ‘make_request_fn’
3210 | if(q->make_request_fn != tracing_mrf) return 0;
| ^~
/home/elastio/elastio-snap/src/elastio-snap.c: На верхньому рівні:
/home/elastio/elastio-snap/src/elastio-snap.c:3224:92: помилка: unknown type name ‘make_request_fn’
3224 | static int __tracer_transition_tracing(struct snap_device *dev, struct block_device *bdev, make_request_fn *new_mrf, struct snap_device **dev_ptr){
| ^~~~~~~~~~~~~~~
/home/elastio/elastio-snap/src/elastio-snap.c: У функції ‘__tracer_setup_snap’:
/home/elastio/elastio-snap/src/elastio-snap.c:3607:2: помилка: неявне оголошення функції ‘blk_queue_make_request’; ви мали на увазі ‘blk_queue_max_segments’? [-Werror=implicit-function-declaration]
3607 | blk_queue_make_request(dev->sd_queue, snap_mrf);
| ^~~~~~~~~~~~~~~~~~~~~~
| blk_queue_max_segments
/home/elastio/elastio-snap/src/elastio-snap.c: У функції ‘__tracer_destroy_tracing’:
/home/elastio/elastio-snap/src/elastio-snap.c:3734:38: помилка: неявне оголошення функції ‘__tracer_transition_tracing’; ви мали на увазі ‘__tracer_destroy_tracing’? [-Werror=implicit-function-declaration]
3734 | if(__tracer_should_reset_mrf(dev)) __tracer_transition_tracing(NULL, dev->sd_base_dev, dev->sd_orig_mrf, &snap_devices[dev->sd_minor]);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
| __tracer_destroy_tracing
/home/elastio/elastio-snap/src/elastio-snap.c: У функції ‘__tracer_setup_tracing’:
/home/elastio/elastio-snap/src/elastio-snap.c:3766:8: помилка: неявне оголошення функції ‘find_orig_mrf’ [-Werror=implicit-function-declaration]
3766 | ret = find_orig_mrf(dev->sd_base_dev, &dev->sd_orig_mrf);
| ^~~~~~~~~~~~~
cc1: деякі попередження вважаються помилками
make[3]: *** [scripts/Makefile.build:283: /home/elastio/elastio-snap/src/elastio-snap.o] Помилка 1
make[2]: *** [Makefile:1784: /home/elastio/elastio-snap/src] Помилка 2
make[2]: Залишаю каталог "/usr/src/kernels/5.9.1-36.vanilla.1.fc31.x86_64"
make[1]: *** [Makefile:14: default] Помилка 2
make[1]: Залишаю каталог "/home/elastio/elastio-snap/src"
make: *** [Makefile:24: driver] Помилка 2
It's used in Fedora 34.
Now centos_packaging pipeline has 3 almost same 3 build steps for CentOS 6, 7 and 8.
The difference is in some packages installation for 6 and 7/8 and in the used docker images, what is the reason of the build steps duplication.
This copy-paste can be avoided by implementation of the Starlark script instead of yaml config file.
The snapshot taking is failed if there is no free space for the .cow
file. We have a requirement to have .cow
file on the device for which we are taking a snapshot. Is it possible to place and use the .cow
file on another device? It would be useful in case where there is no free space left on the device.
This functionality was requested here:
https://elastio.slack.com/archives/C01KP7YSG13/p1633876963237800
cc @e-kov
Fedora 33 is released.
Time to build packages for that version.
On a fresh Amazon Linux 2 EC2 instance, I follow the instructions in INSTALL.md
, installing the package repo and then the dkms-elastio-snap
package. I expect that this means I can now take snapshots of local disks, but this isn't the case:
$ sudo target/release/elastio block backup --scalez-stor-url http://s0test.elastio.dev:61234 /dev/nvme1n1p1 -c 8 -h 8 -s 32 -t cbt --catalog-service local
Oct 19 07:12:40.062 INFO console: Backing up 1 block device(s) to http://s0test.elastio.dev:61234/
Oct 19 07:12:40.144 ERROR console: The Elastio change block tracking driver is not available or is not supported on this system
[ec2-user@ip-172-31-13-79 elastio]$ sudo insmod elastio-snap
insmod: ERROR: could not load module elastio-snap: No such file or directory
Looking more closely at the installer output from when we installed the dkms-elastio-snap
package:
Running transaction
Installing : yum-plugin-dkms-build-requires-1.0-2.amzn2.noarch 1/5
Installing : dkms-2.6.1-1.amzn2.0.1.noarch 2/5
Installing : dkms-elastio-snap-0.10.13-1.amzn2.noarch 3/5
Loading new elastio-snap-0.10.13 DKMS files...
Building for 4.14.193-149.317.amzn2.x86_64
Module build for kernel 4.14.193-149.317.amzn2.x86_64 was skipped since the
kernel headers for this kernel does not seem to be installed.
Installing : libelastio-snap-0.10.13-1.amzn2.x86_64 4/5
Installing : elastio-snap-utils-0.10.13-1.amzn2.x86_64 5/5
Configuring dracut, please wait...
Verifying : dkms-2.6.1-1.amzn2.0.1.noarch 1/5
Verifying : elastio-snap-utils-0.10.13-1.amzn2.x86_64 2/5
Verifying : libelastio-snap-0.10.13-1.amzn2.x86_64 3/5
Verifying : dkms-elastio-snap-0.10.13-1.amzn2.noarch 4/5
Verifying : yum-plugin-dkms-build-requires-1.0-2.amzn2.noarch 5/5
Installed:
dkms-elastio-snap.noarch 0:0.10.13-1.amzn2 elastio-snap-utils.x86_64 0:0.10.13-1.amzn2
Dependency Installed:
dkms.noarch 0:2.6.1-1.amzn2.0.1 libelastio-snap.x86_64 0:0.10.13-1.amzn2 yum-plugin-dkms-build-requires.noarch 0:1.0-2.amzn2
Complete!
Note this line in particular:
Module build for kernel 4.14.193-149.317.amzn2.x86_64 was skipped since the
kernel headers for this kernel does not seem to be installed.
That's annoying. Why wasn't this one of the package's dependencies? Fine I'll do it myself.
$ sudo yum install kernel-headers
Loaded plugins: dkms-build-requires, extras_suggestions, langpacks, priorities, update-motd
195 packages excluded due to repository priority protections
Package kernel-headers-4.14.198-152.320.amzn2.x86_64 already installed and latest version
Nothing to do
What fresh hell is this??
$ sudo yum install kernel-devel-$(uname -r)
Aha that fixed the problem.
This package should have been a dependency then.
I don't know if this impacts other RHEL-derived distros but it definitely impacts Amazon Linux 2.
In Debian11 LVM environment, the system becomes unstable when the snapshot is executed.
A call trace is then logged at the time the snapshot is executed.
This issue does not occur in the LVM environment of Debian10 or the basic environment of Debian11.
The command that was executed:
♯ elioctl setup-snapshot /dev/mapper/debian11lvm--vg-root /.elastio 0
LVM Environment:
♯ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 512M 0 part /boot/efi
├─sda2 8:2 0 488M 0 part /boot
└─sda3 8:3 0 39G 0 part
├─debian11lvm--vg-root 254:0 0 7.6G 0 lvm /
├─debian11lvm--vg-var 254:1 0 2.8G 0 lvm /var
├─debian11lvm--vg-swap_1 254:2 0 976M 0 lvm [SWAP]
├─debian11lvm--vg-tmp 254:3 0 568M 0 lvm /tmp
└─debian11lvm--vg-home 254:4 0 27.1G 0 lvm /home
sr0 11:0 1 1024M 0 rom
elastio-snap0 253:0 0 7.6G 1 disk
Sep 16 13:05:26 debian11lvm kernel: [ 235.727134] elastio-snap: error finding original_mrf for the traced bio
Sep 16 13:05:26 debian11lvm kernel: [ 235.727158] BUG: kernel NULL pointer dereference, address: 00000000000000a8
Sep 16 13:05:26 debian11lvm kernel: [ 235.727163] #PF: supervisor read access in kernel mode
Sep 16 13:05:26 debian11lvm kernel: [ 235.727165] #PF: error_code(0x0000) - not-present page
Sep 16 13:05:26 debian11lvm kernel: [ 235.727167] PGD 8000000007393067 P4D 8000000007393067 PUD 7394067 PMD 0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727180] Oops: 0000 [#1] SMP PTI
Sep 16 13:05:26 debian11lvm kernel: [ 235.727184] CPU: 0 PID: 256 Comm: kworker/u4:30 Tainted: G OE 5.10.0-8-amd64 #1 Debian 5.10.46-4
Sep 16 13:05:26 debian11lvm kernel: [ 235.727191] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.13989454.B64.1906190538 06/19/2019
Sep 16 13:05:26 debian11lvm kernel: [ 235.727204] Workqueue: writeback wb_workfn (flush-254:1)
Sep 16 13:05:26 debian11lvm kernel: [ 235.727212] RIP: 0010:__blk_mq_sched_bio_merge+0xd3/0x100
Sep 16 13:05:26 debian11lvm kernel: [ 235.727216] Code: 74 05 48 83 45 78 01 48 89 ef c6 07 00 0f 1f 40 00 5d 44 89 c0 41 5c 41 5d 41 5e 41 5f c3 31 c0 84 d2 0f 94 c0 48 8b 44 c5 50 80 a8 00 00 00 01 75 93 eb 06 4c 3b 78 10 75 a7 45 31 c0 5d 41
Sep 16 13:05:26 debian11lvm kernel: [ 235.727218] RSP: 0018:ffffa78240a3f738 EFLAGS: 00010202
Sep 16 13:05:26 debian11lvm kernel: [ 235.727221] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa78240a3f778
Sep 16 13:05:26 debian11lvm kernel: [ 235.727223] RDX: 0000000000000001 RSI: ffff8dc4c97ae540 RDI: ffff8dc4f2932788
Sep 16 13:05:26 debian11lvm kernel: [ 235.727225] RBP: ffff8dc53dc00000 R08: 0000000000000001 R09: 0000000000001000
Sep 16 13:05:26 debian11lvm kernel: [ 235.727227] R10: ffff8dc4f2932788 R11: ffffffff912cb3e8 R12: ffff8dc4f2932788
Sep 16 13:05:26 debian11lvm kernel: [ 235.727228] R13: ffff8dc4c97ae540 R14: 0000000000000001 R15: 0000000000000000
Sep 16 13:05:26 debian11lvm kernel: [ 235.727231] FS: 0000000000000000(0000) GS:ffff8dc53dc00000(0000) knlGS:0000000000000000
Sep 16 13:05:26 debian11lvm kernel: [ 235.727233] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 16 13:05:26 debian11lvm kernel: [ 235.727235] CR2: 00000000000000a8 CR3: 0000000002750003 CR4: 00000000001706f0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727306] Call Trace:
Sep 16 13:05:26 debian11lvm kernel: [ 235.727315] blk_mq_submit_bio+0xd9/0x520
Sep 16 13:05:26 debian11lvm kernel: [ 235.727331] tracing_mrf.cold+0x95/0x1a4 [elastio_snap]
Sep 16 13:05:26 debian11lvm kernel: [ 235.727337] ? submit_bio_checks+0x1be/0x5a0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727341] ? __mod_memcg_lruvec_state+0x21/0xe0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727345] submit_bio_noacct+0xf8/0x420
Sep 16 13:05:26 debian11lvm kernel: [ 235.727381] ext4_bio_write_page+0x30c/0x580 [ext4]
Sep 16 13:05:26 debian11lvm kernel: [ 235.727403] mpage_submit_page+0x4b/0x80 [ext4]
Sep 16 13:05:26 debian11lvm kernel: [ 235.727424] mpage_process_page_bufs+0x112/0x120 [ext4]
Sep 16 13:05:26 debian11lvm kernel: [ 235.727444] mpage_prepare_extent_to_map+0x1c4/0x290 [ext4]
Sep 16 13:05:26 debian11lvm kernel: [ 235.727466] ext4_writepages+0x210/0xfc0 [ext4]
Sep 16 13:05:26 debian11lvm kernel: [ 235.727488] ? ext4_writepages+0x57/0xfc0 [ext4]
Sep 16 13:05:26 debian11lvm kernel: [ 235.727492] ? __find_get_block+0xb6/0x2c0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727497] ? update_sd_lb_stats.constprop.0+0x814/0x8a0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727502] do_writepages+0x34/0xc0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727507] ? fprop_reflect_period_percpu.isra.0+0x7b/0xc0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727511] __writeback_single_inode+0x39/0x2a0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727521] writeback_sb_inodes+0x200/0x470
Sep 16 13:05:26 debian11lvm kernel: [ 235.727527] __writeback_inodes_wb+0x4c/0xe0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727530] wb_writeback+0x1d8/0x290
Sep 16 13:05:26 debian11lvm kernel: [ 235.727534] wb_workfn+0x292/0x4d0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727538] ? check_preempt_curr+0x4f/0x60
Sep 16 13:05:26 debian11lvm kernel: [ 235.727541] ? ttwu_do_wakeup+0x17/0x130
Sep 16 13:05:26 debian11lvm kernel: [ 235.727546] process_one_work+0x1b6/0x350
Sep 16 13:05:26 debian11lvm kernel: [ 235.727551] worker_thread+0x53/0x3e0
Sep 16 13:05:26 debian11lvm kernel: [ 235.727554] ? process_one_work+0x350/0x350
Sep 16 13:05:26 debian11lvm kernel: [ 235.727557] kthread+0x11b/0x140
Sep 16 13:05:26 debian11lvm kernel: [ 235.727560] ? __kthread_bind_mask+0x60/0x60
Sep 16 13:05:26 debian11lvm kernel: [ 235.727565] ret_from_fork+0x22/0x30
Sep 16 13:05:26 debian11lvm kernel: [ 235.727573] Modules linked in: rfkill nft_counter xt_tcpudp nft_compat nf_tables libcrc32c nfnetlink nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common ghash_clmulni_intel aesni_intel libaes crypto_simd cryptd glue_helper rapl vmw_balloon joydev serio_raw efi_pstore pcspkr sg vmw_vmci ac evdev msr elastio_snap(OE) fuse configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod hid_generic usbhid hid sr_mod cdrom ata_generic vmwgfx sd_mod t10_pi crc_t10dif crct10dif_generic ttm crct10dif_pclmul crct10dif_common crc32_pclmul drm_kms_helper psmouse crc32c_intel cec ehci_pci ahci libahci ata_piix vmxnet3 uhci_hcd drm ehci_hcd usbcore usb_common vmw_pvscsi libata scsi_mod i2c_piix4 button
Sep 16 13:05:26 debian11lvm kernel: [ 235.727686] CR2: 00000000000000a8
Sep 16 13:05:26 debian11lvm kernel: [ 235.727690] ---[ end trace 1a58c68817fcdd96 ]---
Sep 16 13:05:26 debian11lvm kernel: [ 235.729059] elastio-snap: error finding original_mrf for the traced bio
Sep 16 13:05:26 debian11lvm kernel: [ 235.729064] BUG: kernel NULL pointer dereference, address: 00000000000000a8
Sep 16 13:05:26 debian11lvm kernel: [ 235.729065] #PF: supervisor read access in kernel mode
Sep 16 13:05:26 debian11lvm kernel: [ 235.729066] #PF: error_code(0x0000) - not-present page
Sep 16 13:05:26 debian11lvm kernel: [ 235.729066] PGD 8000000007393067 P4D 8000000007393067 PUD 7394067 PMD 0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729070] Oops: 0000 [#2] SMP PTI
Sep 16 13:05:26 debian11lvm kernel: [ 235.729071] CPU: 1 PID: 257 Comm: kworker/u4:31 Tainted: G D OE 5.10.0-8-amd64 #1 Debian 5.10.46-4
Sep 16 13:05:26 debian11lvm kernel: [ 235.729072] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.13989454.B64.1906190538 06/19/2019
Sep 16 13:05:26 debian11lvm kernel: [ 235.729074] Workqueue: writeback wb_workfn (flush-254:1)
Sep 16 13:05:26 debian11lvm kernel: [ 235.729077] RIP: 0010:__blk_mq_sched_bio_merge+0xd3/0x100
Sep 16 13:05:26 debian11lvm kernel: [ 235.729079] Code: 74 05 48 83 45 78 01 48 89 ef c6 07 00 0f 1f 40 00 5d 44 89 c0 41 5c 41 5d 41 5e 41 5f c3 31 c0 84 d2 0f 94 c0 48 8b 44 c5 50 80 a8 00 00 00 01 75 93 eb 06 4c 3b 78 10 75 a7 45 31 c0 5d 41
Sep 16 13:05:26 debian11lvm kernel: [ 235.729080] RSP: 0018:ffffa78240a47888 EFLAGS: 00010202
Sep 16 13:05:26 debian11lvm kernel: [ 235.729081] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa78240a478c8
Sep 16 13:05:26 debian11lvm kernel: [ 235.729082] RDX: 0000000000000001 RSI: ffff8dc4ceb2d840 RDI: ffff8dc4f2932788
Sep 16 13:05:26 debian11lvm kernel: [ 235.729082] RBP: ffff8dc53dd00000 R08: 0000000000000001 R09: 0000000000001000
Sep 16 13:05:26 debian11lvm kernel: [ 235.729083] R10: ffff8dc4f2932788 R11: ffffffff912cb3e8 R12: ffff8dc4f2932788
Sep 16 13:05:26 debian11lvm kernel: [ 235.729084] R13: ffff8dc4ceb2d840 R14: 0000000000000001 R15: 0000000000000000
Sep 16 13:05:26 debian11lvm kernel: [ 235.729085] FS: 0000000000000000(0000) GS:ffff8dc53dd00000(0000) knlGS:0000000000000000
Sep 16 13:05:26 debian11lvm kernel: [ 235.729086] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 16 13:05:26 debian11lvm kernel: [ 235.729087] CR2: 00000000000000a8 CR3: 0000000002750002 CR4: 00000000001706e0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729115] Call Trace:
Sep 16 13:05:26 debian11lvm kernel: [ 235.729118] blk_mq_submit_bio+0xd9/0x520
Sep 16 13:05:26 debian11lvm kernel: [ 235.729121] tracing_mrf.cold+0x95/0x1a4 [elastio_snap]
Sep 16 13:05:26 debian11lvm kernel: [ 235.729122] ? submit_bio_checks+0x1be/0x5a0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729124] submit_bio_noacct+0xf8/0x420
Sep 16 13:05:26 debian11lvm kernel: [ 235.729135] ext4_io_submit+0x49/0x60 [ext4]
Sep 16 13:05:26 debian11lvm kernel: [ 235.729145] ext4_writepages+0x22e/0xfc0 [ext4]
Sep 16 13:05:26 debian11lvm kernel: [ 235.729148] ? __switch_to+0x114/0x460
Sep 16 13:05:26 debian11lvm kernel: [ 235.729151] ? out_of_line_wait_on_bit_lock+0xb0/0xb0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729152] ? update_group_capacity+0x25/0x1d0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729153] ? update_sd_lb_stats.constprop.0+0x816/0x8a0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729156] do_writepages+0x34/0xc0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729157] ? fprop_reflect_period_percpu.isra.0+0x7b/0xc0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729159] __writeback_single_inode+0x39/0x2a0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729160] writeback_sb_inodes+0x200/0x470
Sep 16 13:05:26 debian11lvm kernel: [ 235.729162] __writeback_inodes_wb+0x4c/0xe0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729164] wb_writeback+0x1d8/0x290
Sep 16 13:05:26 debian11lvm kernel: [ 235.729165] wb_workfn+0x292/0x4d0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729167] ? __switch_to_asm+0x42/0x70
Sep 16 13:05:26 debian11lvm kernel: [ 235.729169] process_one_work+0x1b6/0x350
Sep 16 13:05:26 debian11lvm kernel: [ 235.729170] worker_thread+0x53/0x3e0
Sep 16 13:05:26 debian11lvm kernel: [ 235.729172] ? process_one_work+0x350/0x350
Sep 16 13:05:26 debian11lvm kernel: [ 235.729173] kthread+0x11b/0x140
Sep 16 13:05:26 debian11lvm kernel: [ 235.729174] ? __kthread_bind_mask+0x60/0x60
Sep 16 13:05:26 debian11lvm kernel: [ 235.729176] ret_from_fork+0x22/0x30
Sep 16 13:05:26 debian11lvm kernel: [ 235.729177] Modules linked in: rfkill nft_counter xt_tcpudp nft_compat nf_tables libcrc32c nfnetlink nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common ghash_clmulni_intel aesni_intel libaes crypto_simd cryptd glue_helper rapl vmw_balloon joydev serio_raw efi_pstore pcspkr sg vmw_vmci ac evdev msr elastio_snap(OE) fuse configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod hid_generic usbhid hid sr_mod cdrom ata_generic vmwgfx sd_mod t10_pi crc_t10dif crct10dif_generic ttm crct10dif_pclmul crct10dif_common crc32_pclmul drm_kms_helper psmouse crc32c_intel cec ehci_pci ahci libahci ata_piix vmxnet3 uhci_hcd drm ehci_hcd usbcore usb_common vmw_pvscsi libata scsi_mod i2c_piix4 button
Sep 16 13:05:26 debian11lvm kernel: [ 235.729207] CR2: 00000000000000a8
Sep 16 13:05:26 debian11lvm kernel: [ 235.729208] ---[ end trace 1a58c68817fcdd97 ]---
Sep 16 13:05:26 debian11lvm kernel: [ 235.773383] RIP: 0010:__blk_mq_sched_bio_merge+0xd3/0x100
Sep 16 13:05:26 debian11lvm kernel: [ 235.773386] Code: 74 05 48 83 45 78 01 48 89 ef c6 07 00 0f 1f 40 00 5d 44 89 c0 41 5c 41 5d 41 5e 41 5f c3 31 c0 84 d2 0f 94 c0 48 8b 44 c5 50 80 a8 00 00 00 01 75 93 eb 06 4c 3b 78 10 75 a7 45 31 c0 5d 41
Sep 16 13:05:26 debian11lvm kernel: [ 235.773387] RSP: 0018:ffffa78240a3f738 EFLAGS: 00010202
Sep 16 13:05:26 debian11lvm kernel: [ 235.773389] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa78240a3f778
Sep 16 13:05:26 debian11lvm kernel: [ 235.773390] RDX: 0000000000000001 RSI: ffff8dc4c97ae540 RDI: ffff8dc4f2932788
Sep 16 13:05:26 debian11lvm kernel: [ 235.773391] RBP: ffff8dc53dc00000 R08: 0000000000000001 R09: 0000000000001000
Sep 16 13:05:26 debian11lvm kernel: [ 235.773391] R10: ffff8dc4f2932788 R11: ffffffff912cb3e8 R12: ffff8dc4f2932788
Sep 16 13:05:26 debian11lvm kernel: [ 235.773392] R13: ffff8dc4c97ae540 R14: 0000000000000001 R15: 0000000000000000
Sep 16 13:05:26 debian11lvm kernel: [ 235.773393] FS: 0000000000000000(0000) GS:ffff8dc53dc00000(0000) knlGS:0000000000000000
Sep 16 13:05:26 debian11lvm kernel: [ 235.773394] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 16 13:05:26 debian11lvm kernel: [ 235.773397] RIP: 0010:__blk_mq_sched_bio_merge+0xd3/0x100
Sep 16 13:05:26 debian11lvm kernel: [ 235.773399] Code: 74 05 48 83 45 78 01 48 89 ef c6 07 00 0f 1f 40 00 5d 44 89 c0 41 5c 41 5d 41 5e 41 5f c3 31 c0 84 d2 0f 94 c0 48 8b 44 c5 50 80 a8 00 00 00 01 75 93 eb 06 4c 3b 78 10 75 a7 45 31 c0 5d 41
Sep 16 13:05:26 debian11lvm kernel: [ 235.773400] RSP: 0018:ffffa78240a3f738 EFLAGS: 00010202
Sep 16 13:05:26 debian11lvm kernel: [ 235.773411] CR2: 00000000000000a8 CR3: 0000000002750003 CR4: 00000000001706f0
Sep 16 13:05:26 debian11lvm kernel: [ 235.773412] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa78240a3f778
Sep 16 13:05:26 debian11lvm kernel: [ 235.773413] RDX: 0000000000000001 RSI: ffff8dc4c97ae540 RDI: ffff8dc4f2932788
Sep 16 13:05:26 debian11lvm kernel: [ 235.773414] RBP: ffff8dc53dc00000 R08: 0000000000000001 R09: 0000000000001000
Sep 16 13:05:26 debian11lvm kernel: [ 235.773415] R10: ffff8dc4f2932788 R11: ffffffff912cb3e8 R12: ffff8dc4f2932788
Sep 16 13:05:26 debian11lvm kernel: [ 235.773416] R13: ffff8dc4c97ae540 R14: 0000000000000001 R15: 0000000000000000
Sep 16 13:05:26 debian11lvm kernel: [ 235.773417] FS: 0000000000000000(0000) GS:ffff8dc53dd00000(0000) knlGS:0000000000000000
Sep 16 13:05:26 debian11lvm kernel: [ 235.773418] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 16 13:05:26 debian11lvm kernel: [ 235.773419] CR2: 00007f98f6f0f990 CR3: 0000000002750002 CR4: 00000000001706e0
kern.log
The scenario is pretty easy. The kernel module elastio_snap
is not loaded after elastio-snap-dkms
package installation.
Let's say I'm installing elastio-snap-utils
package. It installs elastio-snap-dkms
as dependency. The dkms
and linux-headers
are installed. The module is built but not loaded, so it's necessary to do modprobe elastio-snap
manually after the installation to load it.
Here is an example from Ubuntu 20.04. The same issue is observed on Debian 8, 9, 10.
root@strontium:~# apt install elastio-snap-utils
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
dkms elastio-snap-dkms libelastio-snap1
Suggested packages:
menu
The following NEW packages will be installed:
dkms elastio-snap-dkms elastio-snap-utils libelastio-snap1
0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
Need to get 82,6 kB/149 kB of archives.
After this operation, 768 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://repo.assur.io/master/linux/deb/Debian/10 unstable/main all elastio-snap-dkms all 0.10.14-1debian10 [41,6 kB]
Get:2 http://repo.assur.io/master/linux/deb/Debian/10 unstable/main amd64 libelastio-snap1 amd64 0.10.14-1debian10 [10,9 kB]
Get:3 http://repo.assur.io/master/linux/deb/Debian/10 unstable/main amd64 elastio-snap-utils amd64 0.10.14-1debian10 [30,1 kB]
Fetched 82,6 kB in 6s (13,7 kB/s)
Selecting previously unselected package dkms.
(Reading database ... 269575 files and directories currently installed.)
Preparing to unpack .../dkms_2.8.1-5ubuntu1_all.deb ...
Unpacking dkms (2.8.1-5ubuntu1) ...
Setting up dkms (2.8.1-5ubuntu1) ...
Selecting previously unselected package elastio-snap-dkms.
(Reading database ... 269603 files and directories currently installed.)
Preparing to unpack .../elastio-snap-dkms_0.10.14-1debian10_all.deb ...
Unpacking elastio-snap-dkms (0.10.14-1debian10) ...
Selecting previously unselected package libelastio-snap1.
Preparing to unpack .../libelastio-snap1_0.10.14-1debian10_amd64.deb ...
Unpacking libelastio-snap1 (0.10.14-1debian10) ...
Selecting previously unselected package elastio-snap-utils.
Preparing to unpack .../elastio-snap-utils_0.10.14-1debian10_amd64.deb ...
Unpacking elastio-snap-utils (0.10.14-1debian10) ...
Setting up elastio-snap-dkms (0.10.14-1debian10) ...
Loading new elastio-snap-0.10.14 DKMS files...
Building for 5.4.0-56-generic 5.4.0-58-generic
Building initial module for 5.4.0-56-generic
Secure Boot not enabled on this system.
Done.
elastio-snap.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-56-generic/updates/dkms/
depmod...
DKMS: install completed.
Building initial module for 5.4.0-58-generic
Secure Boot not enabled on this system.
Done.
elastio-snap.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-58-generic/updates/dkms/
depmod...
DKMS: install completed.
Setting up libelastio-snap1 (0.10.14-1debian10) ...
Setting up elastio-snap-utils (0.10.14-1debian10) ...
Configuring initramfs, please wait...
update-initramfs: deferring update (trigger activated)
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.1) ...
Processing triggers for initramfs-tools (0.136ubuntu6.3) ...
update-initramfs: Generating /boot/initrd.img-5.4.0-58-generic
I: The initramfs will attempt to resume from /dev/nvme0n1p3
I: (UUID=b5415352-0b23-4ed5-be7c-5d3c632636fa)
I: Set the RESUME variable to override this.
root@strontium:~#
root@strontium:~# lsmod | grep elastio_snap
root@strontium:~# modprobe elastio-snap
root@strontium:~# lsmod | grep elastio_snap
elastio_snap 53248 0
My steps:
losetup
), part it, create an ext4 fs, mount it./elastio
like /dev/loop0p1
).sudo elioctl destroy 0
and this is the point where the driver gets stuckExpected behavior: the driver should somehow handle this case, maybe it should just write an error somewhere but, definitely, it shouldn't be stuck because in my case it has lead to a hard reboot (press and hold the power button, that's very bad, it could lead to data corruption if some process has been writing some critical/important data to a disk at this moment)
It's possible to mount a snapshot after the fix of the #59. But xfs_repair -n
still complains when mount backup as a loop device:
[elastio@amazon2-amd64-gpt_xfs elastio-snap]$ sudo xfs_repair -n /dev/loop0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used. Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
- scan filesystem freespace and inode maps...
sb_fdblocks 4409714, counted 4416246
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
Ideally speaking all the logs should be flashed before the snapshot device is created and xfs_repair -n
shouldn't found mismatches between sb_fdblocks
in logs/superblock and counted ones.
It is needed if the backup was aborted but the old snapshot wasn't deleted. So, for now, when the new block backup is being run we need to start from the base backup.
In the driver should be some functionality that allows us cancel the old snapshot and take the new one with all changes since the last successful backup
apt-get remove elastio-snap-dkms
is failing with the error. And, as result, the package remains installed.
ek@strontium:~/elastio/elastio(master)$ sudo apt-get remove elastio-snap-dkms
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
dkms libelastio-snap1
Use 'sudo apt autoremove' to remove them.
The following packages will be REMOVED:
elastio-snap-dkms
0 upgraded, 0 newly installed, 1 to remove and 2 not upgraded.
After this operation, 238 kB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 239920 files and directories currently installed.)
Removing elastio-snap-dkms (0.10.14-1debian10) ...
dpkg: error processing package elastio-snap-dkms (--remove):
installed elastio-snap-dkms package pre-removal script subprocess returned error exit status 1
dpkg: too many errors, stopping
Errors were encountered while processing:
elastio-snap-dkms
Processing was halted because there were too many errors.
E: Sub-process /usr/bin/dpkg returned an error code (1)
ek@strontium:~/elastio/elastio(master)$ dpkg -l | grep elastio
ii elastio-repo 0.0.2-1debian10 all Repository package for installation of Elastio software
ii elastio-snap-dkms 0.10.14-1debian10 all Kernel module source for elastio-snap managed by DKMS
rc elastio-snap-utils 0.10.14-1debian10 amd64 Utilities for using elastio-snap kernel module
ii libelastio-snap1 0.10.14-1debian10 amd64 Library for communicating with elastio-snap kernel module
Preconditions:
CentOS 7 or Amazon Linux 2 machine with the root volume, formatted with XFS.
Steps to reproduce:
sudo elioctl setup-snapshot /dev/vda1 /.elastio 0
sudo mount /dev/elastio-snap0 /mnt/
It's failing:
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/elastio-snap0, missing codepage or helper program, or other error.
sudo dd if=/dev/elastio-snap0 of=/home/elastio/big_vol/restore/dd_snap.img bs=1M
sudo losetup --find --show ~/big_vol/restore/dd_snap.img
/dev/loop0
sudo mount /dev/loop0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
sudo mount -t xfs -o ro,norecovery /dev/loop0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
Mount throws the same error regardless of the mount options.
sudo xfs_repair -n /dev/loop0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used. Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
- scan filesystem freespace and inode maps...
sb_fdblocks 4030093, counted 4036625
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 1
- agno = 3
- agno = 2
- agno = 0
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
It always reports about wrong count of the sb_fdblocks
. In this particular case, the xfs_repair
output is pretty good. Time to time it also reports about disconnected inodes, disconnected buckets, corrupted suberblock or missing secondary superblock etc.
sudo xfs_repair /dev/loop0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
It says to mount file system, but mount command was failing before and tried to mount it again, but it still won't.
xfs_repair -L
as a last resort and then repeat xfs_repair
without parameters:[elastio@amazon2-amd64-gpt_xfs ~]$ sudo xfs_repair -L /dev/loop0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
- scan filesystem freespace and inode maps...
sb_fdblocks 4030093, counted 4036625
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (20:2281) is ahead of log (1:2).
Format log to cycle 23.
done
[elastio@amazon2-amd64-gpt_xfs ~]$ sudo xfs_repair /dev/loop0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
The file system seems to be repaired.
[elastio@amazon2-amd64-gpt_xfs ~]$ sudo mount -t xfs -o ro,norecovery /dev/loop0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
[elastio@amazon2-amd64-gpt_xfs ~]$ sudo mount -t xfs /dev/loop0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
[elastio@amazon2-amd64-gpt_xfs ~]$ sudo mount /dev/loop0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
Expected result:
Mount shouldn't fail on the snapshot device (step 2) and on the backup file, binded as loop device (step 5). The check of the file system shouldn't complain about corrupted suberblock or secondary superblock (step 6).
Right now we have just packages build.
I have to add a simple step which will make a module without install. It's possible since builds are moved from the docker containers to VMs with the real kernels.
Currently there is no support for snapping btrfs filesystems.
Example: build of the branch bug/wrong-dev-name
The source branch is detected as git rev-parse --abbrev-ref HEAD
And it's bug/wrong-dev-name
in the first build job
It's detected in the same way in the second build job. But it's already wrong master
instead of real bug/wrong-dev-name
.
As result, packaging build doesn't found artifacts for the latest master
branch build, because it was build of the bug/wrong-dev-name
branch.
It looks like something is changed in the GitHub Actions and this wrong behavior appeared just recently.
The tests are failing: https://github.com/elastio/elastio-snap/runs/1190012942?check_suite_focus=true#step:10:181
elastio-snap: 6bba25b
kernel: 4.9.0-13-amd64
gcc: 6.3.0-18+deb9u1)
bash: 4.4.12(1)-release
python: Python 3.5.3
test_destroy_active_incremental (test_destroy.TestDestroy) ... FAIL
test_destroy_active_snapshot (test_destroy.TestDestroy) ... FAIL
test_destroy_dormant_incremental (test_destroy.TestDestroy) ... skipped 'Broken since 4.17 (see #144)'
test_destroy_dormant_snapshot (test_destroy.TestDestroy) ... skipped 'Broken since 4.17 (see #144)'
test_destroy_nonexistent_device (test_destroy.TestDestroy) ... FAIL
test_destroy_unverified_incremental (test_destroy.TestDestroy) ... umount: /tmp/elastio-snap: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)
ERROR
test_destroy_unverified_snapshot (test_destroy.TestDestroy) ... umount: /tmp/elastio-snap: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)
umount: /tmp/elastio-snap: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)
ERROR
ERROR
ERROR
ERROR
ERROR
======================================================================
ERROR: test_destroy_unverified_incremental (test_destroy.TestDestroy)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/elastio/elastio-snap/tests/test_destroy.py", line 81, in test_destroy_unverified_incremental
util.unmount(self.mount)
File "/home/elastio/elastio-snap/tests/util.py", line 23, in unmount
subprocess.check_call(cmd, timeout=10)
File "/usr/lib/python3.5/subprocess.py", line 271, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['umount', '/tmp/elastio-snap']' returned non-zero exit status 32
======================================================================
ERROR: test_destroy_unverified_snapshot (test_destroy.TestDestroy)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/elastio/elastio-snap/tests/test_destroy.py", line 72, in test_destroy_unverified_snapshot
util.unmount(self.mount)
File "/home/elastio/elastio-snap/tests/util.py", line 23, in unmount
subprocess.check_call(cmd, timeout=10)
File "/usr/lib/python3.5/subprocess.py", line 271, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['umount', '/tmp/elastio-snap']' returned non-zero exit status 32
======================================================================
ERROR: tearDownClass (test_destroy.TestDestroy)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/elastio/elastio-snap/tests/devicetestcase.py", line 34, in tearDownClass
util.unmount(cls.mount)
File "/home/elastio/elastio-snap/tests/util.py", line 23, in unmount
subprocess.check_call(cmd, timeout=10)
File "/usr/lib/python3.5/subprocess.py", line 271, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['umount', '/tmp/elastio-snap']' returned non-zero exit status 32
======================================================================
ERROR: setUpClass (test_setup.TestSetup)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/elastio/elastio-snap/tests/devicetestcase.py", line 24, in setUpClass
cls.kmod.load(debug=1)
File "/home/elastio/elastio-snap/tests/kmod.py", line 31, in load
timeout=self.timeout)
File "/usr/lib/python3.5/subprocess.py", line 271, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['insmod', '../src/elastio-snap.ko', 'debug=1']' returned non-zero exit status 1
======================================================================
ERROR: setUpClass (test_snapshot.TestSnapshot)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/elastio/elastio-snap/tests/devicetestcase.py", line 24, in setUpClass
cls.kmod.load(debug=1)
File "/home/elastio/elastio-snap/tests/kmod.py", line 31, in load
timeout=self.timeout)
File "/usr/lib/python3.5/subprocess.py", line 271, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['insmod', '../src/elastio-snap.ko', 'debug=1']' returned non-zero exit status 1
======================================================================
ERROR: setUpClass (test_transition_incremental.TestTransitionToIncremental)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/elastio/elastio-snap/tests/devicetestcase.py", line 24, in setUpClass
cls.kmod.load(debug=1)
File "/home/elastio/elastio-snap/tests/kmod.py", line 31, in load
timeout=self.timeout)
File "/usr/lib/python3.5/subprocess.py", line 271, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['insmod', '../src/elastio-snap.ko', 'debug=1']' returned non-zero exit status 1
======================================================================
FAIL: test_destroy_active_incremental (test_destroy.TestDestroy)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/elastio/elastio-snap/tests/test_destroy.py", line 40, in test_destroy_active_incremental
self.assertEqual(elastio_snap.transition_to_incremental(self.minor), 0)
AssertionError: 5 != 0
======================================================================
FAIL: test_destroy_active_snapshot (test_destroy.TestDestroy)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/elastio/elastio-snap/tests/test_destroy.py", line 32, in test_destroy_active_snapshot
self.assertEqual(elastio_snap.setup(self.minor, self.device, self.cow_full_path), 0)
AssertionError: 16 != 0
======================================================================
FAIL: test_destroy_nonexistent_device (test_destroy.TestDestroy)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/elastio/elastio-snap/tests/test_destroy.py", line 29, in test_destroy_nonexistent_device
self.assertEqual(elastio_snap.destroy(self.minor), errno.ENOENT)
AssertionError: 0 != 2
----------------------------------------------------------------------
Ran 7 tests in 1.150s
FAILED (failures=3, errors=6, skipped=2)
Error: Process completed with exit code 1.
Or even hanging: https://github.com/elastio/elastio-snap/runs/1189842448?check_suite_focus=true#step:10:184
elastio-snap: fedecce
kernel: 4.9.0-13-amd64
gcc: 6.3.0-18+deb9u1)
bash: 4.4.12(1)-release
python: Python 3.5.3
test_destroy_active_incremental (test_destroy.TestDestroy) ... FAIL
test_destroy_active_snapshot (test_destroy.TestDestroy) ... FAIL
test_destroy_dormant_incremental (test_destroy.TestDestroy) ... skipped 'Broken since 4.17 (see #144)'
test_destroy_dormant_snapshot (test_destroy.TestDestroy) ... skipped 'Broken since 4.17 (see #144)'
Error: The operation was canceled.
In order to prevent Elastio CLI from crashing with SIGBUS signal when trying to read memory mapped to invalid snapshot device, elastio-snap should handle this case and return some readable data.
This issue was observed by working on 3592 and reproduced without cli usage, but just running the stress_copy script.
It's happening after #61
More details are here: https://github.com/elastio/elastio/issues/1036
See elastio/elastio#477 for the user-mode description of this task.
Spinof of https://github.com/elastio/assurio/pull/201 issue in assurio repository.
If we run this script with commands for driver cli
aioctl setup-snapshot /dev/sda4 "/home/cow0.bin" 0
aioctl destroy 0
it fails with
from /var/log/syslog:
May 15 10:39:31 osboxes kernel: [ 2556.577899] assurio-snap: device specified is busy: -16
May 15 10:39:31 osboxes kernel: [ 2556.577904] assurio-snap: error during destroy ioctl handler: -16
I did it for 270 Gb volume.
Now assurio-snap build is failing on Fedora 31 with the kernel 5.6.7-200.
make -C /lib/modules/5.6.7-200.fc31.x86_64/build M=/home/assurio/assurio-snap/src modules
make[2]: Entering directory '/usr/src/kernels/5.6.7-200.fc31.x86_64'
CC [M] /home/assurio/assurio-snap/src/assurio-snap.o
/home/assurio/assurio-snap/src/assurio-snap.c: In function ‘agent_init’:
/home/assurio/assurio-snap/src/assurio-snap.c:5166:51: error: passing argument 4 of ‘proc_create’ from incompatible pointer type [-Werror=incompatible-pointer-types]
5166 | info_proc = proc_create(INFO_PROC_FILE, 0, NULL, &assurio_snap_proc_fops);
| ^~~~~~~~~~~~~~~~~~~~~~~
| |
| const struct file_operations *
In file included from /home/assurio/assurio-snap/src/includes.h:17,
from /home/assurio/assurio-snap/src/assurio-snap.c:8:
./include/linux/proc_fs.h:64:24: note: expected ‘const struct proc_ops *’ but argument is of type ‘const struct file_operations *’
64 | struct proc_dir_entry *proc_create(const char *name, umode_t mode, struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
| ^~~~~~~~~~~
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:268: /home/assurio/assurio-snap/src/assurio-snap.o] Error 1
make[2]: *** [Makefile:1683: /home/assurio/assurio-snap/src] Error 2
make[2]: Leaving directory '/usr/src/kernels/5.6.7-200.fc31.x86_64'
make[1]: *** [Makefile:14: default] Error 2
make[1]: Leaving directory '/home/assurio/assurio-snap/src'
make: *** [Makefile:24: driver] Error 2
[assurio@fedora31-build assurio-snap]$ grep -rnw file_operations
src/assurio-snap.c:585:static inline struct proc_dir_entry *proc_create(const char *name, mode_t mode, struct proc_dir_entry *parent, const struct file_operations *proc_fops){
src/assurio-snap.c:901:static const struct file_operations snap_control_fops = {
src/assurio-snap.c:922:static const struct file_operations assurio_snap_proc_fops = {
I found this error message in /var/log/syslog:
elastio-snap: failed to locate system call table, persistence disabled
During my investigation i found that this is because SYS_MOUNT_ADDR and SYS_UMOUNT_ADDR are zero in kernel-config.h.
Probably this is because my version of kernel has ho sys_mount
and sys_umount
functions. I use XUbuntu 20.04 and kernel 5.13.0-28-generic.
As i understand with such problem it's not possible to properly reload snapshots after reboot.
NOTE FROM @anelson: This behavior is a consequence of the way the driver is implemented. I hesitate to even call it a "bug", it's a limitation of the design.
Leaving this open since it does in fact describe the current behavior of the driver, although fixing it would require a substantial change to how we track changes.
Original bug in elastio repo - https://github.com/elastio/elastio/issues/474.
Run in debian10 Vagrant build box.
Steps to repro (with driver's CLI):
system logs:
elastio@debian10-amd64-build:~$ sudo dmesg --ctime | grep error
[Tue Sep 8 12:01:16 2020] elastio-snap: error writing cow data: -27
[Tue Sep 8 12:01:16 2020] elastio-snap: error writing cow data and mapping: -27
[Tue Sep 8 12:01:16 2020] elastio-snap: error handling write bio: -27
[Tue Sep 8 12:01:16 2020] elastio-snap: error handling write bio in kernel thread: -27
driver's interface file:
elastio@debian10-amd64-build:~/elastio/target/release$ cat /proc/elastio-snap-info
{
"version": "0.10.13",
"devices": [
{
"minor": 0,
"cow_file": "/.elastio/cow0.bin",
"block_device": "/dev/vda1",
"max_cache": 314572800,
"fallocate": 4508876800,
"seq_id": 1,
"uuid": "affe491d129e42aca4677d6a76d67ab1",
"version": 1,
"nr_changed_blocks": 1079501,
"error": -27,
"state": 3
}
]
}
Branch: master
Commit: 648239388a4cc2c35eb484c5938f1b1aae488677
Machines: ubuntu2004-amd64-build
, debian10-amd64-build
Reproducing steps:
lib64
folder, but instead of it:elastio@debian10-amd64-build:~$ ls -al /usr/local/lib64
-rwxr-xr-x 1 root root 16592 Apr 24 03:36 /usr/local/lib64
diff shows this is indeed libelastio-snap.so:
elastio@ubuntu2004-amd64-build:~/elastio-snap$ diff lib/libelastio-snap.so.1 /usr/local/lib64
elastio@ubuntu2004-amd64-build:~/elastio-snap$ echo $?
0
The tests output is hang on the test_destroy_unverified_incremental (test_destroy.TestDestroy)
, interrupted by Ctrl+C:
Python module CFFI is not installed. Installing it...
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Requirement already satisfied: cffi in /usr/local/lib64/python3.9/site-packages (1.14.4)
Requirement already satisfied: pycparser in /usr/local/lib/python3.9/site-packages (from cffi) (2.20)
elastio-snap: 6b5d915
kernel: 5.8.15-301.fc33.x86_64
gcc: 10.2.1
bash: 5.0.17(1)-release
python: Python 3.9.0
test_destroy_active_incremental (test_destroy.TestDestroy) ... ok
test_destroy_active_snapshot (test_destroy.TestDestroy) ... ok
test_destroy_dormant_incremental (test_destroy.TestDestroy) ... skipped 'Broken since 4.17 (see #144)'
test_destroy_dormant_snapshot (test_destroy.TestDestroy) ... skipped 'Broken since 4.17 (see #144)'
test_destroy_nonexistent_device (test_destroy.TestDestroy) ... ok
test_destroy_unverified_incremental (test_destroy.TestDestroy) ... ^CTraceback (most recent call last):
File "/usr/lib64/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/lib64/python3.9/unittest/__main__.py", line 18, in <module>
main(module=None)
File "/usr/lib64/python3.9/unittest/main.py", line 101, in __init__
self.runTests()
File "/usr/lib64/python3.9/unittest/main.py", line 271, in runTests
self.result = testRunner.run(self.test)
File "/usr/lib64/python3.9/unittest/runner.py", line 176, in run
test(result)
File "/usr/lib64/python3.9/unittest/suite.py", line 84, in __call__
return self.run(*args, **kwds)
File "/usr/lib64/python3.9/unittest/suite.py", line 122, in run
test(result)
File "/usr/lib64/python3.9/unittest/suite.py", line 84, in __call__
return self.run(*args, **kwds)
File "/usr/lib64/python3.9/unittest/suite.py", line 122, in run
test(result)
File "/usr/lib64/python3.9/unittest/suite.py", line 84, in __call__
return self.run(*args, **kwds)
File "/usr/lib64/python3.9/unittest/suite.py", line 122, in run
test(result)
File "/usr/lib64/python3.9/unittest/case.py", line 653, in __call__
return self.run(*args, **kwds)
File "/usr/lib64/python3.9/unittest/case.py", line 593, in run
self._callTestMethod(testMethod)
File "/usr/lib64/python3.9/unittest/case.py", line 550, in _callTestMethod
method()
File "/home/elastio/elastio-snap/tests/test_destroy.py", line 78, in test_destroy_unverified_incremental
util.unmount(self.mount)
File "/home/elastio/elastio-snap/tests/util.py", line 23, in unmount
subprocess.check_call(cmd, timeout=10)
File "/usr/lib64/python3.9/subprocess.py", line 368, in check_call
retcode = call(*popenargs, **kwargs)
File "/usr/lib64/python3.9/subprocess.py", line 351, in call
return p.wait(timeout=timeout)
File "/usr/lib64/python3.9/subprocess.py", line 1185, in wait
return self._wait(timeout=timeout)
File "/usr/lib64/python3.9/subprocess.py", line 1909, in _wait
time.sleep(delay)
KeyboardInterrupt
Tail of the dmesg:
[ +0,035524] elastio-snap: ioctl command received: -1605353208
[ +0,000007] elastio-snap: received elastio-snap info ioctl - 12
[ +0,000001] elastio-snap: device specified does not exist: -2
[ +0,000001] elastio-snap: error during reconfigure ioctl handler: -2
[ +0,000001] elastio-snap: minor range = 23 - 0
[ +0,002920] elastio-snap: ioctl command received: 1074020612
[ +0,000002] elastio-snap: received destroy ioctl - 12
[ +0,000002] elastio-snap: device specified does not exist: -2
[ +0,000001] elastio-snap: error during destroy ioctl handler: -2
[ +0,000002] elastio-snap: minor range = 23 - 0
[ +0,023255] ------------[ cut here ]------------
[ +0,000009] percpu ref (blk_queue_usage_counter_release) <= 0 (-149) after switching to atomic
[ +0,000058] WARNING: CPU: 2 PID: 0 at lib/percpu-refcount.c:161 percpu_ref_switch_to_atomic_rcu+0x12f/0x140
[ +0,000001] Modules linked in: loop elastio_snap(OE) fuse xt_conntrack xt_MASQUERADE nf_conntrack_netlink nft_counter xt_addrtype nft_compat br_netfilter bridge stp llc nft_masq nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set overlay nf_tables nfnetlink intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel cirrus rapl drm_kms_helper cec virtio_net net_failover failover virtio_balloon joydev i2c_piix4 drm zram ip_tables virtio_blk ata_generic serio_raw qemu_fw_cfg pata_acpi
[ +0,000043] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G OE 5.8.15-301.fc33.x86_64 #1
[ +0,000002] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ +0,000005] RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x12f/0x140
[ +0,000004] Code: eb 99 80 3d b4 ef 62 01 00 0f 85 4d ff ff ff 48 8b 55 d8 48 8b 75 e8 48 c7 c7 a0 f9 3f a7 c6 05 98 ef 62 01 01 e8 f7 72 ae ff <0f> 0b e9 2b ff ff ff 0f 0b eb a2 cc cc cc cc cc cc 8d 8c 16 ef be
[ +0,000002] RSP: 0018:ffffb70a000e4ef0 EFLAGS: 00010286
[ +0,000002] RAX: 0000000000000052 RBX: 7fffffffffffff6a RCX: 0000000000000000
[ +0,000001] RDX: 0000000000000052 RSI: ffffffffa83f6452 RDI: 0000000000000246
[ +0,000002] RBP: ffff97258d101688 R08: 0000000a9e96093b R09: 0000000000000052
[ +0,000001] R10: 0000000080000002 R11: ffffffffa83f6437 R12: 00003fe44803e438
[ +0,000002] R13: 0000000000000000 R14: ffff9725b6e6cd80 R15: ffff9725b7d2b0d0
[ +0,000002] FS: 0000000000000000(0000) GS:ffff9725b7d00000(0000) knlGS:0000000000000000
[ +0,000029] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0,000002] CR2: 00007f996e9ab9b0 CR3: 0000000215308003 CR4: 0000000000360ee0
[ +0,000008] Call Trace:
[ +0,000018] <IRQ>
[ +0,000008] rcu_do_batch+0x197/0x3e0
[ +0,000013] rcu_core+0x189/0x2e0
[ +0,000006] ? sched_clock+0x5/0x10
[ +0,000005] __do_softirq+0xd9/0x2c4
[ +0,000010] asm_call_irq_on_stack+0xf/0x20
[ +0,000007] </IRQ>
[ +0,000002] do_softirq_own_stack+0x37/0x40
[ +0,000004] irq_exit_rcu+0xc2/0x100
[ +0,000006] sysvec_apic_timer_interrupt+0x34/0x80
[ +0,000017] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ +0,000003] RIP: 0010:native_safe_halt+0xe/0x10
[ +0,000003] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d d6 59 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d c6 59 49 00 f4 c3 cc cc 0f 1f 44 00
[ +0,000002] RSP: 0018:ffffb70a0007bed0 EFLAGS: 00000246
[ +0,000002] RAX: ffffffffa6b73480 RBX: 0000000000000002 RCX: 0000000000000000
[ +0,000001] RDX: 0000000000000002 RSI: ffffb70a0007bea0 RDI: 0000000a9de98b85
[ +0,000001] RBP: 0000000000000002 R08: 0000000000000001 R09: ffff9725b1597a00
[ +0,000002] R10: 00000000000003a1 R11: 0000000000000000 R12: 0000000000000000
[ +0,000001] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ +0,000003] ? __sched_text_end+0x3/0x3
[ +0,000004] default_idle+0x1a/0x140
[ +0,000003] do_idle+0x1f3/0x2a0
[ +0,000003] ? arch_cpu_idle_exit+0x40/0x40
[ +0,000003] cpu_startup_entry+0x19/0x20
[ +0,000004] start_secondary+0x144/0x170
[ +0,000008] secondary_startup_64+0xb6/0xc0
[ +0,000004] ---[ end trace e4e6c8608ee7fe2b ]---
Full dmesg: dmesg.log
Replace all uses of assurio
with elastio
.
error: Installed (but unpackaged) file(s) found:
/usr/src/assurio-snap-0.10.11/.assurio-snap.ko.cmd
/usr/src/assurio-snap-0.10.11/.assurio-snap.mod.cmd
/usr/src/assurio-snap-0.10.11/.assurio-snap.mod.o.cmd
/usr/src/assurio-snap-0.10.11/.assurio-snap.o.cmd
/usr/src/assurio-snap-0.10.11/Module.symvers
/usr/src/assurio-snap-0.10.11/assurio-snap.ko
/usr/src/assurio-snap-0.10.11/assurio-snap.mod
/usr/src/assurio-snap-0.10.11/assurio-snap.mod.c
/usr/src/assurio-snap-0.10.11/assurio-snap.mod.o
/usr/src/assurio-snap-0.10.11/assurio-snap.o
/usr/src/assurio-snap-0.10.11/kernel-config.h
/usr/src/assurio-snap-0.10.11/modules.order
elastio-snap: eb31cd1
kernel: 4.18.0-348.7.1.el8_5.x86_64
gcc: 8.5.0
bash: 4.4.20(1)-release
python: Python 3.6.8
test_destroy_active_incremental (test_destroy.TestDestroy) ... ok
test_destroy_active_snapshot (test_destroy.TestDestroy) ... ok
test_destroy_dormant_incremental (test_destroy.TestDestroy) ... skipped 'Broken since 4.17 (see #144)'
test_destroy_dormant_snapshot (test_destroy.TestDestroy) ... skipped 'Broken since 4.17 (see #144)'
test_destroy_nonexistent_device (test_destroy.TestDestroy) ... ok
Error: The action has timed out.
https://github.com/elastio/elastio-snap/runs/4880852550?check_suite_focus=true#step:12:29
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.