MAAS

maas takes a while to recover after primary database is moved

Bug #1822618 reported by Jason Hobbs on 2019-04-01

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MAAS	Expired	Undecided	Unassigned

Bug Description

This is with 2.5.2-7523-ge4ecbd54d-0ubuntu1~18.04.1.

We have a test where we kill the postgres master, wait for the failover to happen, then try to use maas.

We wait 75 seconds between killing the database master and trying to use maas.

However, we still get an error back from MAAS occasionally:

"500 Internal Server Error (SSL SYSCALL error: EOF detected"

Tags:

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2019-04-02:

Marking this as incomplete provided that:

1. There are no regiond.log's attached for any of the regions.
2. The error above is not clear where it comes from, it talks about 'SSL SYSCALL' when MAAS doesn't even support nor configures SSL.
3. My understanding is that the configuration for postgresql HA has recently changed. Is there any indication that this could be the result of that ?
4. There's no information about the state of postgresql when you see that error. Can you confirm it recover successfully and the database is up to date and no errors with postgresql?

Changed in maas:
status:	New → Incomplete

Revision history for this message

Chris Gregan (cgregan) wrote on 2019-05-30:

logs-2019-05-30-11.09.37.tar Edit (187.8 MiB, application/x-tar)

Changed in maas:
status:	Incomplete → New

Revision history for this message

Björn Tillenius (bjornt) wrote on 2019-09-20:

We did some changes recently in how MAAS detects that a database connection is broken and need to be reconnected. Do you see the same behavior with MAAS 2.6.0?