problem with pack on two zeoraids

Bug #484919 reported by ChrisW
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
gocept.zeoraid
Confirmed
Critical
Christian Theune

Bug Description

Okay, so I finally got brave enough to try packing the two zeos. So, on each server I did:

${buildout:bin-directory}/zeopack 127.0.0.1:${zeo:port}:packed -t 00

So, the pack is direct via zeo, as we discussed, not via zeoraid, but does have a fixed time.
I did this a few times on each server, but only noticed problems later on. zeoraid1 stopped answering for the packed storage, and zeoraid2 showed the packed storage as degraded with zeo1 failed.

Here's when I did the packs on zeo1:
2009-11-18T15:56:49 INFO ZEO.StorageServer (12276/127.0.0.1:53977) pack(time=1258502400.0) started...
2009-11-18T16:14:37 INFO ZEO.StorageServer (12276/127.0.0.1:53977) pack(time=1258502400.0) complete
2009-11-18T16:47:27 INFO ZEO.StorageServer (12276/127.0.0.1:54364) pack(time=1258502400.0) started...
2009-11-18T16:50:08 INFO ZEO.StorageServer (12276/127.0.0.1:54364) pack(time=1258502400.0) complete
2009-11-18T16:56:30 INFO ZEO.StorageServer (12276/127.0.0.1:54528) pack(time=1258502400.0) started...
2009-11-18T16:57:45 INFO ZEO.StorageServer (12276/127.0.0.1:54528) pack(time=1258502400.0) complete

Here's when I did the packs on zeo2:
2009-11-18T15:57:05 INFO ZEO.StorageServer (26939/127.0.0.1:36290) pack(time=1258502400.0) started...
2009-11-18T16:15:44 INFO ZEO.StorageServer (26939/127.0.0.1:36290) pack(time=1258502400.0) complete
2009-11-18T16:37:26 INFO ZEO.StorageServer (26939/127.0.0.1:36829) pack(time=1258502400.0) started...
2009-11-18T16:39:10 INFO ZEO.StorageServer (26939/127.0.0.1:36829) pack(time=1258502400.0) complete
2009-11-18T16:42:49 INFO ZEO.StorageServer (26939/127.0.0.1:36911) pack(time=1258502400.0) started...
2009-11-18T16:44:06 INFO ZEO.StorageServer (26939/127.0.0.1:36911) pack(time=1258502400.0) complete
2009-11-18T16:55:08 INFO ZEO.StorageServer (26939/127.0.0.1:37059) pack(time=1258502400.0) started...
2009-11-18T16:57:53 INFO ZEO.StorageServer (26939/127.0.0.1:37059) pack(time=1258502400.0) complete

From zeoraid1's event log:
2009-11-18T17:03:14 INFO ZEO.zrpc.Connection(S) (127.0.0.1:54617) lastTransaction() raised exception: RAID is inconsistent and was closed.
2009-11-18T17:03:30 INFO ZEO.zrpc.Connection(S) (127.0.0.1:54634) register() raised exception: Storage has been closed.

Neither zeoraid2's event log nor debug log indicate when zeo1 failed for the packed storage, I'll open a separate bug for that.

Now, zeo1 and zeo2 seem to both be running happily, but Packed.fs is 575mb on zeo1 and 671Mb on zeo2.

If I try and do:

bin/zeoraid-packed-manage recover zeo1

...on zeoraid2, I *was* getting the connection refused error, but I've just tried again and it now seems to be recovering okay...
I'm shutting down zeoraid1 until recovery is complete just to be safe...

Changed in gocept.zeoraid:
assignee: nobody → Christian Theune (ct-gocept)
importance: Undecided → Critical
Changed in gocept.zeoraid:
milestone: none → 1.0b7
Revision history for this message
Christian Theune (ctheune) wrote :

So, the last transaction call is very likely again the monitor script trying to connect.

Changed in gocept.zeoraid:
milestone: 1.0b7 → 1.0b8
Changed in gocept.zeoraid:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.