sheep crashes after OS crash with 0 sized files in the cache

Bug #1516067 reported by mhex on 2015-11-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sheepdog
Undecided
Unassigned

Bug Description

after an OS crash some of the files in the cache were truncated. sheep was crashing when these were being accessed and the log messages were less than informative.

I'd like to know
 * how serious is this?
 * what was likely to be lost (i.e. are these files referring to blocks that were being written and not yet confirmed to the qemu/guest OS as written or were some blocks that were supposed to be on disk lost?
 * does the cache code use O_DIRECT and could there be a problem related to the meta-data not being written while the data was?
 * can you add code to check for this and repair if possible
 * can you add a utility to check for consistency (something like fsck)
 * would using a journal prevent this?

In my specific case I doubt I would have lost any actual data, the last few hundred megabytes of modified data was copied and all seems ok, but I worry that this may happen again with more damage.

Versions:
Sheepdog daemon version 0.7.5
Linux compucom 3.16.0-53-generic #72~14.04.1-Ubuntu SMP Fri Nov 6 18:17:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.19), Copyright (c) 2003-2008 Fabrice Bellard

sheep started as:
/usr/sbin/sheep -c local /mnt/sheep/0/ -z 0 -p 7000 -r 127.0.0.1:4000 -w size=4096 dir=/mnt/sheep/cache/

the last part of the log:

Nov 12 12:32:26 INFO [main] md_add_disk(141) /mnt/sheep/0/obj, nr 1
Nov 12 12:32:27 INFO [main] send_join_request(770) IPv4 ip:127.0.0.1 port:7000
Nov 12 12:32:27 ERROR [main] for_each_object_in_stale(383) /mnt/sheep/0/obj/.stale
Nov 12 12:32:28 NOTICE [main] http_init(451) http service is not complied
Nov 12 12:32:28 ERROR [main] check_host_env(461) WARN: Allowed open files 1024 too small, suggested 1024000
Nov 12 12:32:28 INFO [main] main(853) sheepdog daemon (version 0.7.5) started
Nov 12 12:32:28 INFO [main] recover_object_main(624) object c1968a000007d3 is recovered (1/9182)
Nov 12 12:32:28 INFO [main] recover_object_main(624) object c1968c00000520 is recovered (2/9182)
Nov 12 12:32:28 INFO [main] recover_object_main(624) object 2de2f7000001ee is recovered (3/9182)
 ...... more of the same .....
Nov 12 12:32:29 INFO [main] recover_object_main(624) object 2de2f7000001bc is recovered (9180/9182)
Nov 12 12:32:29 INFO [main] recover_object_main(624) object c1968c000009ac is recovered (9181/9182)
Nov 12 12:32:29 INFO [main] recover_object_main(624) object 128a750000011b is recovered (9182/9182)
Nov 12 12:33:02 ERROR [oc_push 2661] read_cache_object_noupdate(362) size 0, count:4194304, offset 0 Success
Nov 12 12:33:02 EMERG [oc_push 2661] do_push_object(901) PANIC: push failed but should never fail
Nov 12 12:33:02 EMERG [oc_push 2661] crash_handler(250) sheep exits unexpectedly (Aborted).
Nov 12 12:33:02 EMERG [oc_push 2661] sd_backtrace(857) /usr/sbin/sheep() [0x405bd7]
Nov 12 12:33:02 EMERG [oc_push 2661] sd_backtrace(857) /lib/x86_64-linux-gnu/libpthread.so.0(+0x1033f) [0x7f5df413633f]
Nov 12 12:33:02 EMERG [oc_push 2661] sd_backtrace(857) /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7f5df346ecc8]
Nov 12 12:33:02 EMERG [oc_push 2661] sd_backtrace(857) /lib/x86_64-linux-gnu/libc.so.6(abort+0x147) [0x7f5df34720d7]
Nov 12 12:33:02 EMERG [oc_push 2661] sd_backtrace(857) /usr/sbin/sheep() [0x415c5e]
Nov 12 12:33:02 EMERG [oc_push 2661] sd_backtrace(857) /usr/sbin/sheep() [0x428f49]
Nov 12 12:33:02 EMERG [oc_push 2661] sd_backtrace(857) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8181) [0x7f5df412e181]
Nov 12 12:33:02 EMERG [oc_push 2661] sd_backtrace(857) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6c) [0x7f5df353247c]
Nov 12 12:33:04 INFO [oc_push 2661] dump_stack_frames(796) Cannot get info from GDB
Nov 12 12:33:04 INFO [oc_push 2661] dump_stack_frames(798) Set /proc/sys/kernel/yama/ptrace_scope to zero if you are using Ubuntu.
Nov 12 12:33:04 EMERG [oc_push 2661] __sd_dump_variable(731) dump __sys
Nov 12 12:33:07 ERROR [main] crash_handler(490) sheep pid 2648 exited unexpectedly.

Hitoshi Mitake (mitake-hitoshi) wrote :

Thanks for your report, mhex.

We are not using launchpad for bug tracking. We are using github issue: https://github.com/sheepdog/sheepdog/issues
Could you report bugs to github?

In addition, object cache is really unstable feature. Please do not use it.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers