Restore failed: error fetching address

Bug #1499571 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Michael Foord
1.24
Fix Released
Critical
Michael Foord
1.25
Fix Released
Critical
Michael Foord

Bug Description

Restore can fail when bootstrapping a new server and the provider is slow to provide public addresses, as seen in
    http://reports.vapour.ws/releases/3102/job/functional-ha-backup-restore/attempt/2512

    http://reports.vapour.ws/releases/issue/5604a1a7749a561f57c47dc4

This looks like a timing related issue. Retrying or waiting longer might fix this.

Curtis Hovey (sinzui)
summary: - public no address
+ Restore failed: error fetching address
Curtis Hovey (sinzui)
description: updated
Curtis Hovey (sinzui)
Changed in juju-core:
importance: Medium → Critical
importance: Critical → Medium
Curtis Hovey (sinzui)
description: updated
Curtis Hovey (sinzui)
tags: added: blocker ci regression
Revision history for this message
Michael Foord (mfoord) wrote :

These failures are definitely related (as in - at least hitting the code) to the recent changes around determining the public (and private) address of a machine. "no address" is a new error, for when no preferred address has been set.

However, the restore code does set machine / provider addresses using the standard machine.Set*Addresses methods - which ought to be setting the preferred addresses. (In state/backups/restore.go:updateMachineAddresses )

I'll dig in a bit further and see if I can find anything more.

Revision history for this message
Michael Foord (mfoord) wrote :

Ok, I think I have it. When a new machine is created (in this case the bootstrap machine) a new record is created using State.machineDocForTemplates. This *does not* set any preferred addresses - so PreferredPublicAddress (and Private) will return "no address" error until the first time SetMachineAddresses or SetProviderAddresses is called. I can update the creation of new machines to also set the preferred addresses.

Michael Foord (mfoord)
Changed in juju-core:
assignee: nobody → Michael Foord (mfoord)
Michael Foord (mfoord)
Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote : Fix Released in juju-core 1.24

Juju-CI verified that this issue is Fix Released in juju-core 1.24:
    http://reports.vapour.ws/releases/3144

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote : Fix Released in juju-core 1.25

Juju-CI verified that this issue is Fix Released in juju-core 1.25:
    http://reports.vapour.ws/releases/3145

Revision history for this message
Martin Packman (gz) wrote :

This is an intermittent failure, the blesses on 1.24 and 1.25 do not mean it's been fixed. The regression is due to bug 1435283 which is present on all three branches.

tags: added: intermittent-failure
Revision history for this message
Michael Foord (mfoord) wrote :

It *may* have been intermittent because as soon as SetMachineAddresses or SetProviderAddresses was called on the bootstrap machine the PublicAddress and PrivateAddress would become available. With the fix already landed they should be available immediately.

Have they failed since my fix landed?

Revision history for this message
Martin Packman (gz) wrote :

I had not realised these fixes had landed:

<https://github.com/juju/juju/pull/3446>
<https://github.com/juju/juju/pull/3447>

And this forward port is ready to go for master:

<https://github.com/juju/juju/pull/3448>

Changed in juju-core:
importance: Medium → Critical
Revision history for this message
Michael Foord (mfoord) wrote :

A misunderstanding. mgz didn't see that I'd landed a fix and thought the transition to "fix released" was spurious.

tags: removed: intermittent-failure
Revision history for this message
Curtis Hovey (sinzui) wrote :

The commits to 1.24 and 1.25 match the tested and blessed revisions in CI, these branches are fix committed.

http://reports.vapour.ws/releases/issue/5604a1a7749a561f57c47dc4 shows that master did not have this issue at the time of the bug report, but on Friday Oct 2, it appears and is a cause of failure for 3 builds, so the bad behaviour was eventually merged into master and no needs fixing.

Changed in juju-core:
milestone: none → 1.26-alpha1
Martin Packman (gz)
Changed in juju-core:
status: In Progress → Fix Committed
description: updated
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.