cache-miss-rate / Invalid JSON

Bug #1585533 reported by Marc-Aurele Brothier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

Hi,

We have VMs which were started with an older version than qemu 2.1 which added "cache-miss-rate" property for XBZRLECacheStats. While trying to migrate the VM to a new host which is running a higher version (2.3) of Qemu we got an exception:

virJSONValueFromString:1642 : internal error: cannot parse json {"return": {"expected-downtime": 1, "xbzrle-cache": {"bytes": 0, "cache-size": 67108864, "cache-miss-rate": -nan, "pages": 0, "overflow": 0, "cache-miss": 8933}, "status": "active", "disk": {"total": 429496729600, "dirty-sync-count": 0, "remaining": 193896382464, "mbps": 0, "transferred": 235600347136, "duplicate": 0, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 0, "normal": 0}, "setup-time": 13, "total-time": 1543124, "ram": {"total": 8599183360, "dirty-sync-count": 4, "remaining": 30695424, "mbps": 830.636997, "transferred": 3100448901, "duplicate": 1358341, "dirty-pages-rate": 7, "skipped": 0, "normal-bytes": 3082199040, "normal": 752490}}, "id": "libvirt-186200"}: lexical error: malformed number, a digit is required after the minus sign.
          67108864, "cache-miss-rate": -nan, "pages": 0, "overflow": 0
                     (right here) ------^

virNetClientStreamRaiseError:191 : stream aborted at client request

Would it be possible to improve the JSON parser to skip the key if the value is incorrect instead of throwing an exception? Then hopefully qemu 2.3 or higher is able to handle the data without this property, falling back to its default.

Revision history for this message
Marc-Aurele Brothier (ma-brothier) wrote :

Sorry, the bug is in libvirt

Changed in qemu:
status: New → Invalid
Revision history for this message
Eric Blake (eblake) wrote : Re: [Qemu-devel] [Bug 1585533] [NEW] cache-miss-rate / Invalid JSON

On 05/25/2016 02:46 AM, Marc Brothier wrote:
> Public bug reported:
>
> Hi,
>
> We have VMs which were started with an older version than qemu 2.1 which
> added "cache-miss-rate" property for XBZRLECacheStats. While trying to
> migrate the VM to a new host which is running a higher version (2.3) of
> Qemu we got an exception:
>
> virJSONValueFromString:1642 : internal error: cannot parse json {"return": {"expected-downtime": 1, "xbzrle-cache": {"bytes": 0, "cache-size": 67108864, "cache-miss-rate": -nan, "pages": 0, "overflow": 0, "cache-miss": 8933}, "status": "active", "disk": {"total": 429496729600, "dirty-sync-count": 0, "remaining": 193896382464, "mbps": 0, "transferred": 235600347136, "duplicate": 0, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 0, "normal": 0}, "setup-time": 13, "total-time": 1543124, "ram": {"total": 8599183360, "dirty-sync-count": 4, "remaining": 30695424, "mbps": 830.636997, "transferred": 3100448901, "duplicate": 1358341, "dirty-pages-rate": 7, "skipped": 0, "normal-bytes": 3082199040, "normal": 752490}}, "id": "libvirt-186200"}: lexical error: malformed number, a digit is required after the minus sign.
> 67108864, "cache-miss-rate": -nan, "pages": 0, "overflow": 0
> (right here) ------^
>
> virNetClientStreamRaiseError:191 : stream aborted at client request

Wow - I've known we have a problem with qemu emitting non-compliant
JSON, but this proves that it is fatal to libvirt. I guess my series on
improving the JSON parser [1] should consider doing a fallback to
s/NaN/0/ and s/Inf/DBL_MAX/ rather than completely erroring out when a
client tries to request it. Meanwhile, it's an easy patch to qemu to
avoid division by zero when generating cache-miss-rate.

[1] https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg03424.html

>
>
> Would it be possible to improve the JSON parser to skip the key if the value is incorrect

Libvirt uses libyajl to parse JSON, and libyajl has an outstanding bug
request to support extensions to JSON such as parsing non-finite floats.
 Since there has been no upstream reaction to the bug request, I
seriously doubt it will happen any time soon, so any change to tolerate
NaN in libvirt would have to be a one-off patch. It sounds like the
better fix is to make qemu emit valid JSON in the first place, rather
than making libvirt deal with broken JSON from qemu.

--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Revision history for this message
Marc-Aurele Brothier (ma-brothier) wrote :

Ok, then I'll reopen the bug with a request to fix JSON output in qemu.

Changed in qemu:
status: Invalid → New
Revision history for this message
Marc-Aurele Brothier (ma-brothier) wrote :

Also found a patch in qemu 2.4.0 (?) which fix a zero division: https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg01173.html which might be the source of the problem.

Revision history for this message
Paolo Bonzini (bonzini) wrote :

Marc, what distro are you running on? QEMU 2.3 is not maintained anymore upstream, so unless you are running on Ubuntu (and then we can reuse this bug tracker) you'll have to reopen the bug in your distro.

Revision history for this message
Marc-Aurele Brothier (ma-brothier) wrote :

We're using Ubuntu, and we manually patched the version 2.3 with the fix referenced. It will be soon deployed and I'll see if that fixes the problem.

Revision history for this message
Marc-Aurele Brothier (ma-brothier) wrote :

We tried to migrate a VM, and now we have a new error:

error : virNetClientProgramDispatchError:177 : internal error: info migration reply was missing return status

:(

Revision history for this message
Marc-Aurele Brothier (ma-brothier) wrote :

I didn't look properly, it's the same error with the patch applied.

Revision history for this message
Marc-Aurele Brothier (ma-brothier) wrote :

Well, to make this work we updated the libvirt code to correct such invalid value and put a "0" in that case for the "cache-miss-rate" value. Can someone confirm that it putting a "0" is a valid choice, or shall we have something else?

Thanks

Revision history for this message
Thomas Huth (th-huth) wrote :

Is there still something to be done for upstream QEMU here? ... otherwise, I assume we can close this bug now?

Changed in qemu:
status: New → Incomplete
Revision history for this message
Marc-Aurele Brothier (ma-brothier) wrote :

I'm not able to test that issue anymore, you can close the ticket.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.