The multi-zeoraid startup problem.

Bug #486598 reported by ChrisW
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
gocept.zeoraid
Confirmed
Medium
Unassigned

Bug Description

So, the setup is that you have a machine hosting zeoraid1 and zeo1, another machine hosting zeoraid2 and zeo2, both zeoraids connecting to both zeos, and multiple client machines with both zeoraid1 and zeoraid2 configured as their zeo server.

Imagine a power failure to the rack holding all these machines, when power is restored, all machines start at roughly the same time.

Bad things may happen:

- the zeoraids may come up before the zeos are ready, causing them both to fail because they both have no backends available

- worst case, zeoraid1 ends up connected to only zeo2, zeoraid2 ends up connected to only zeo1, clients are connected to a mix of zeoraid1 and zeoraid2, writing transactions to all. zeo1 and zeo2 get out of sync in an unrecoverable fashion :-(

The optimal case would be to have zeoraid1 and zeoraid2 up and connected to zeo1 and zeo2, with everything in sync.
An acceptable case would be to have one zeoraid up and both zeos in sync.

Revision history for this message
Christian Theune (ctheune) wrote :

A possible solution would be to allow a grace period on startup for storages to connect successfully (e.g. 10-15 seconds or so).

Revision history for this message
ChrisW (chris-simplistix) wrote :

Indeed, but as long as the same grace period isn't used for "zeoraid status" or "zeoraid details"...

...but I guess it wouldn't be.

Changed in gocept.zeoraid:
status: New → Confirmed
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.