long main thread blocking in recovery process

Bug #1386477 reported by Hitoshi Mitake
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sheepdog
Fix Committed
High
Hitoshi Mitake

Bug Description

Quoting Tsuji-san's email:

By the way, I found other blocking during this test. it was happen when
start sheepdog process at new noder.
Disk I/O was blocked since 19:51:40 to 19:52:40.
19:51:40 is start time of sheepdog by new node.
19:52:40 is the time when recovery was just started (refer log2)

Because there are many object files, so I guess it needs long time to
move obj to .stale and it cause blocking.

==log1==
Oct 24 10:17:17 INFO [main] recover_object_main(624) object c7ab8e00001ada is recovered (263966/263967)
Oct 24 10:17:17 INFO [main] recover_object_main(624) object c7ab8e0000027e is recovered (263967/263967)
Oct 24 10:17:17 INFO [main] default_cleanup(168) cleaning pruge_state_dir
Oct 24 10:17:23 INFO [main] default_cleanup(170) cleaning pruge_state_dir done

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
 0 3 0 64512 64836 741012 0 0 12 58172 296 352 1 14 0 85 0 2014-10-24 10:17:10 JST
 0 3 0 71696 64832 733860 0 0 112 55452 216 230 1 11 0 88 0 2014-10-24 10:17:11 JST
 0 3 0 65252 64832 739952 0 0 0 51408 190 181 0 10 0 90 0 2014-10-24 10:17:12 JST
 0 1 0 68480 64832 737216 0 0 296 62572 261 235 2 13 0 85 0 2014-10-24 10:17:13 JST
 0 2 0 71328 64820 733684 0 0 0 37224 178 179 1 7 0 92 0 2014-10-24 10:17:14 JST
 0 3 0 63264 64824 742696 0 0 0 30792 172 249 1 10 0 89 0 2014-10-24 10:17:15 JST
 0 3 0 71816 64824 732916 0 0 0 24136 89 115 1 3 0 96 0 2014-10-24 10:17:16 JST
 0 3 0 71956 64824 733332 0 0 0 9584 119 146 1 4 0 95 0 2014-10-24 10:17:17 JST
 0 3 0 63524 64824 741704 0 0 0 21672 113 111 1 6 0 93 0 2014-10-24 10:17:18 JST
 0 3 0 63508 64824 741700 0 0 0 0 62 79 1 1 0 98 0 2014-10-24 10:17:19 JST
 0 3 0 63508 64824 741700 0 0 0 0 52 78 1 1 0 98 0 2014-10-24 10:17:20 JST
 0 3 0 63508 64824 741700 0 0 0 0 50 82 1 1 0 98 0 2014-10-24 10:17:21 JST
 0 3 0 63508 64824 741700 0 0 0 0 58 79 1 1 0 98 0 2014-10-24 10:17:22 JST
 0 3 0 63508 64824 741700 0 0 0 0 52 76 1 1 0 98 0 2014-10-24 10:17:23 JST
 2 2 0 69712 64820 734736 0 0 156 105840 273 164 1 21 0 77 1 2014-10-24 10:17:24 JST
 2 3 0 62384 64828 741864 0 0 124 115948 407 366 1 27 0 71 1 2014-10-24 10:17:25 JST
 2 1 0 62528 64828 743024 0 0 0 111620 369 308 0 23 0 77 0 2014-10-24 10:17:26 JST
 0 2 0 63892 64824 741328 0 0 296 122968 361 271 1 25 0 73 1 2014-10-24 10:17:27 JST
 0 2 0 62660 64820 742612 0 0 0 92744 300 201 1 20 0 79 0 2014-10-24 10:17:28 JST
 0 2 0 71824 64816 733144 0 0 156 87192 276 177 2 18 0 80 0 2014-10-24 10:17:29 JST
 0 3 0 71828 64820 732912 0 0 124 76140 264 191 1 17 0 82 0 2014-10-24 10:17:30 JST
 0 3 0 67612 64816 737592 0 0 0 18264 109 129 1 4 0 95 0 2014-10-24 10:17:31 JST
 0 3 0 67852 64816 738096 0 0 0 33264 169 171 1 9 0 90 0 2014-10-24 10:17:32 JST
 0 3 0 60796 64816 744524 0 0 296 23184 129 152 1 5 0 94 0 2014-10-24 10:17:33 JST

==log2==
Oct 23 19:51:39 INFO [main] md_add_disk(141) /home/sheepdog/obj, nr 1
Oct 23 19:51:39 INFO [main] send_join_request(770) IPv4 ip:10.0.0.3 port:7000
Oct 23 19:51:40 INFO [main] main(887) sheepdog daemon (version 0.7.6) started
Oct 23 19:52:39 ALERT [rw] get_vdi_copy_number(104) copy number for 0 not found, set 3
Oct 23 19:52:39 ALERT [rw] get_vdi_copy_number(104) copy number for 0 not found, set 3
Oct 23 19:52:40 INFO [main] recover_object_main(624) object c7ab8e0002d944 is recovered (1/263967)
Oct 23 19:52:41 INFO [main] recover_object_main(624) object c7ab8e000091ee is recovered (2/263967)
Oct 23 19:52:41 INFO [main] recover_object_main(624) object c7ab8e0003af7f is recovered (3/263967)
Oct 23 19:52:41 INFO [main] recover_object_main(624) object c7ab8e00025f80 is recovered (4/263967)
Oct 23 19:52:41 INFO [main] recover_object_main(624) object c7ab8e00004d0c is recovered (5/263967)
Oct 23 19:52:41 INFO [main] recover_object_main(624) object c7ab8e00004b06 is recovered (6/263967)
Oct 23 19:52:41 INFO [main] recover_object_main(624) object c7ab8e0002df48 is recovered (7/263967)
Oct 23 19:52:42 INFO [main] recover_object_main(624) object c7ab8e0003c7c3 is recovered (8/263967)

Changed in sheepdog-project:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.