Crash when removing a snapshot and alive nodes < x

Bug #1389125 reported by sirio81
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sheepdog
Fix Committed
Undecided
Unassigned

Bug Description

Reproduced with
-c 2 and 1 node, cluster manager local
-c 3 and 2 nodes, cluster manager zookeeper

Sheepdog daemon version 0.9.0_1_g9d67dec

How to reproduce:

dog cluster format -c 2
Number of copies (2) is larger than number of nodes (1).
Are you sure you want to continue? [yes/no]: yes
using backend plain store

dog vdi create -P test 1G
dog vdi snapshot test

dd if=/dev/urandom bs=1M count=100 | dog vdi write test
100+0 record dentro
100+0 record fuori
104857600 byte (105 MB) copiati, 10,5425 s, 9,9 MB/s

dog vdi list
  Name Id Size Used Shared Creation time VDI id Copies Tag
s test 1 1.0 GB 1.0 GB 0.0 MB 2014-11-04 09:18 7c2b25 2
  test 0 1.0 GB 104 MB 920 MB 2014-11-04 09:18 7c2b26 2

dog vdi delete -s 1 test^C

dog vdi delete -s 1 test
failed to read a response
Failed to write object 807c2b2500000000
failed to update inode for discarding objects: 807c2b2500000000

sheep.log
Nov 04 09:18:03 INFO [main] md_add_disk(343) /mnt/sheep/0/obj, vdisk nr 50, total disk 1
Nov 04 09:18:03 INFO [main] send_join_request(1006) IPv4 ip:127.0.0.1 port:7000 going to join the cluster
Nov 04 09:18:03 NOTICE [main] nfs_init(607) nfs server service is not compiled
Nov 04 09:18:03 WARN [main] check_host_env(497) Allowed open files 1024 too small, suggested 6144000
Nov 04 09:18:03 INFO [main] main(951) sheepdog daemon (version 0.9.0_1_g9d67dec) started
Nov 04 09:18:06 INFO [main] rx_main(830) req=0x21a74d0, fd=19, client=127.0.0.1:59692, op=MAKE_FS, data=(not string)
Nov 04 09:18:06 INFO [main] tx_main(882) req=0x21a74d0, fd=19, client=127.0.0.1:59692, op=MAKE_FS, result=00
Nov 04 09:18:31 INFO [main] rx_main(830) req=0x21a74d0, fd=19, client=127.0.0.1:59700, op=MAKE_FS, data=(not string)
Nov 04 09:18:31 INFO [main] tx_main(882) req=0x21a74d0, fd=19, client=127.0.0.1:59700, op=MAKE_FS, result=00
Nov 04 09:18:38 INFO [main] rx_main(830) req=0x21a74d0, fd=15, client=127.0.0.1:59703, op=NEW_VDI, data=(not string)
Nov 04 09:18:38 INFO [main] tx_main(882) req=0x21a74d0, fd=15, client=127.0.0.1:59703, op=NEW_VDI, result=00
Nov 04 09:18:45 INFO [main] rx_main(830) req=0x21ac470, fd=15, client=127.0.0.1:59707, op=NEW_VDI, data=(not string)
Nov 04 09:18:46 INFO [main] tx_main(882) req=0x21ac470, fd=15, client=127.0.0.1:59707, op=NEW_VDI, result=00
Nov 04 09:20:08 EMERG [io 12419] oid_to_vnodes(80) PANIC: can't find a valid vnode
Nov 04 09:20:08 EMERG [io 12419] crash_handler(268) sheep exits unexpectedly (Aborted).
Nov 04 09:20:08 EMERG [io 12419] sd_backtrace(833) sheep.c:270: crash_handler
Nov 04 09:20:08 EMERG [io 12419] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0xf09f) [0x7f17751ed09f]
Nov 04 09:20:08 EMERG [io 12419] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x34) [0x7f17747e1164]
Nov 04 09:20:08 EMERG [io 12419] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(abort+0x17f) [0x7f17747e43df]
Nov 04 09:20:08 EMERG [io 12419] sd_backtrace(833) sheep.h:80: oid_to_vnodes
Nov 04 09:20:08 EMERG [io 12419] sd_backtrace(833) ops.c:1923: do_process_work
Nov 04 09:20:08 EMERG [io 12419] sd_backtrace(833) work.c:340: worker_routine
Nov 04 09:20:08 EMERG [io 12419] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b4f) [0x7f17751e4b4f]
Nov 04 09:20:08 EMERG [io 12419] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6c) [0x7f177488b7bc]

Revision history for this message
sirio81 (sirio81) wrote :

The same happens using erasure code:
-c 2:1 and 2 nodes

Nov 04 09:53:11 EMERG [io 8186] oid_to_vnodes(80) PANIC: can't find a valid vnode
Nov 04 09:53:11 EMERG [io 8186] crash_handler(268) sheep exits unexpectedly (Aborted).
Nov 04 09:53:11 EMERG [io 8186] sd_backtrace(833) sheep.c:270: crash_handler
Nov 04 09:53:11 EMERG [io 8186] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7fbdab64402f]
Nov 04 09:53:11 EMERG [io 8186] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x34) [0x7fbdaac39474]
Nov 04 09:53:11 EMERG [io 8186] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(abort+0x17f) [0x7fbdaac3c6ef]
Nov 04 09:53:11 EMERG [io 8186] sd_backtrace(833) sheep.h:80: oid_to_vnodes
Nov 04 09:53:12 EMERG [io 8186] sd_backtrace(833) ops.c:1923: do_process_work
Nov 04 09:53:12 EMERG [io 8186] sd_backtrace(833) work.c:340: worker_routine
Nov 04 09:53:12 ERROR [gway 8213] wait_forward_request(416) remote node might have gone away
Nov 04 09:53:12 EMERG [io 8186] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b4f) [0x7fbdab63bb4f]
Nov 04 09:53:12 EMERG [io 8186] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6c) [0x7fbdaace313c]

Revision history for this message
sirio81 (sirio81) wrote :

This bug doesn't affect 0.8.3. I guess its specific of the new gc algorithm.

Changed in sheepdog-project:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.