[doc] create a production troubleshooting guide for "state changing too quickly"

Bug #2069365 reported by Leon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
New
Undecided
Unassigned

Bug Description

> The "state changing too quickly" error is a generic error when we've retried mongo transactions a few times. (@nvinuesa)

The only solution I encountered is to forcefully remove the app. This can be costly.

1. Would you be able improve the error message?

2. It would be great if there was a doc describing how an admin could get out of this situation without forceful removals.

Revision history for this message
Harry Pidcock (hpidcock) wrote :

Which version of Juju? I would expect this to never happen on 3.x

Revision history for this message
Leon (sed-i) wrote :

This was on 3.4.3.

Revision history for this message
Pietro Pasotti (ppasotti) wrote :

Got this a couple of days ago on 3.5.0

Revision history for this message
Simon Richardson (simonrichardson) wrote :

> Would you be able improve the error message?

This is extremely difficult to do in a generalised way with the existing tools we have at hand. We can annotate the error messages, but in reality it might not indicate the actual underlying problem emphatically.

Each time we get this, it does require investigation as the underlying problem might not be resolved with:

> The only solution I encountered is to forcefully remove the app

Having said that, it's clear that there is specific problem.

Can we get a reproducer and more detailed description of how it happened.

Changed in juju:
status: New → Incomplete
Revision history for this message
selcem artan (selcem) wrote :

Reproducer of issue was described in https://docs.google.com/document/d/1DdAONbv4fRZW8j8jFTzf9VxdfslcHZYwk0X1Md-LdeM/edit .

Once loki charm gets in Blocked state after juju controller upgrade, we have tried to refresh juju charm which returns error in case description.

Changed in juju:
status: Incomplete → New
Revision history for this message
Joseph Phillips (manadart) wrote (last edit ):

Mongo transactions are collections of "operations". These operations can be CRUD or they can be assertions.

The assertions are work-arounds for the lack of proper ACID. They lock-down behaviour at the time of running, such as if I am doing something that assumes there are no units on this machine, add a transaction assertion for it.

The error happens when state is inconsistent, and we exhaust our retry budget without being able to satisfy all assertions in the transaction.

We do have logging that outputs failed assertions at TRACE level. I propose that we bump this to WARNING, so that they will be emitted under the default *Juju* logging level.

These warnings will be transient in some cases, because a retried transaction may subsequently succeed, but at least we would have some indication as to what caused a given wholesale failure.

I'll discuss with the team.

Revision history for this message
Natalia Litvinova (natalytvinova) wrote :

Encountered this on Juju 3.5.1, more details on the bundle are here: https://github.com/canonical/grafana-k8s-operator/issues/339

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 2069365] Re: [doc] create a production troubleshooting guide for "state changing too quickly"

I'd be ok with reporting more on failed assertions, but it is fairly likely
to not quite give enough context. The basic form of the assertion is "a
document of this shape exists/doesn't exist". So you might put in a thing
that says "the field foo has the value 'bar'", and all we know is that we
are unable to query for a document that has the value 'bar'. It may be that
the document doesn't exist at all. It may be that the document exists but
has the value "bing". It may be that the document exists but doesn't have
the field, etc.
In general, if you get "state changing too quickly" it is almost always a
"poorly assessed transaction, that has a prerequisite that isn't being
checked correctly". And if we were checking it correctly, then you'd be
getting a better error message. It isn't something that can be achieved
holistically.

Looking through:
https://docs.google.com/document/d/1DdAONbv4fRZW8j8jFTzf9VxdfslcHZYwk0X1Md-LdeM/edit

The first issue appears to be a bug that we have addressed (during
migration if you had a model that was referenced multiple times, it could
cause a conflict.)

The `juju-debug-log` issues is more one of the framework's data itself
being inconsistent. (It clearly has a reference for an event, but that
events snapshot doesn't appear to exist.)

Revision history for this message
Natalia Litvinova (natalytvinova) wrote :

Additionaly to the bug, my problem was caused by controller upgrade with model migration. When I renamed the offfer to a different one - the integration went fine

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.