curtin should retry fetching from archives after transient failure

Bug #1403133 reported by Larry Michel
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
curtin
Fix Released
High
Blake Rouse
curtin (Ubuntu)
Fix Released
Medium
Unassigned
Trusty
Fix Released
Medium
Unassigned
Vivid
Fix Released
Medium
Unassigned

Bug Description

=== Begin SRU Template ===
[Description]
During installation, curtin will run 'apt-get update' on in the target root. That is done as a requirement to installing new packages in the target.

'apt-get update' is widely known to fail as a result of transient network failures. This is commonly worked around by simply sleeping and re-trying the operation.

The solution implemented is to improve the 'subp' (subprocess) helper in curtin/util to take a 'retries' argument.
If provided that is a iterator that contains a time to sleep before trying again. If no retries is provided, then only one try is done.
Then, the curtin/util.py helper apt_update invokes subp with retries=(1, 2, 3).

[Impact]
Installation fails when a simple retry of 'apt-get update' would have succeeded.

[Test Case]
As this is a transient failure, it is hard to catch and hard to test for.

Installation should be more reliable now, with any 'apt-get update' operation that returned non-zero being retried 3 times.

[Regression Potential]
The only really likely regression path here would be retrying 'apt update' on its successful return. That seems fairly unlikely as the code in subp to check exit status has not changed.

[Other]
Related bugs:
 * bug 972077: apt repository disk format has race conditions
=== End SRU Template ===

We run into transient network issues where index files fail to download. The deployment ends up being marked as failed. Then subsequent deployment succeeds but test has already failed. Curtin should be able to retry when such error happens.

Here's console output:

========================================================================
Get:28 http://archive.ubuntu.com trusty-security/multiverse Translation-en [587 B]
Get:29 http://archive.ubuntu.com trusty-security/restricted Translation-en [2266 B]
Get:30 http://archive.ubuntu.com trusty-security/universe Translation-en [41.5 kB]
Fetched 13.8 MB in 5s (2426 kB/s)
W: Failed to fetch http://archive.ubuntu.com//ubuntu/dists/trusty-updates/universe/i18n/Translation-en Hash Sum mismatch

E: Some index files failed to download. They have been ignored, or old ones used instead.
Unexpected error while running command.
Command: ['chroot', '/tmp/tmp8mxme7/target', 'apt-get', 'update', '--quiet']
Exit code: 100
Reason: -
Stdout: ''
Stderr: ''
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'curthooks']
Exit code: 3
Reason: -
Stdout: "Ign http://archive.ubuntu.com trusty InRelease\nIgn http://archive.ubuntu.com trusty-updates InRelease\nIgn http://archive.ubuntu.com trusty-security InRelease\nGet:1 http://archive.ubuntu.com trusty Release.gpg [933 B]\nGet:2 http://archive.ubuntu.com trusty-updates Release.gpg [933 B]\nGet:3 http://archive.ubuntu.com trusty-security Release.gpg [933 B]\nGet:4 http://archive.ubuntu.com trusty Release [58.5 kB]\nGet:5 http://archive.ubuntu.com trusty-updates Release [62.0 kB]\nGet:6 http://archive.ubuntu.com trusty-security Release [62.0 kB]\nGet:7 http://archive.ubuntu.com trusty/main amd64 Packages [1350 kB]\nGet:8 http://archive.ubuntu.com trusty/restricted amd64 Packages [13.0 kB]\nGet:9 http://archive.ubuntu.com trusty/universe amd64 Packages [5859 kB]\nGet:10 http://archive.ubuntu.com trusty/multiverse amd64 Packages [132 kB]\nGet:11 http://archive.ubuntu.com trusty/main Translation-en [762 kB]\nGet:12 http://archive.ubuntu.com trusty/multiverse Translation-en [102 kB]\nGet:13 http://archive.ubuntu.com trusty/restricted Translation-en [3457 B]\nGet:14 http://archive.ubuntu.com trusty/universe Translation-en [4089 kB]\nGet:15 http://archive.ubuntu.com trusty-updates/main amd64 Packages [384 kB]\nGet:16 http://archive.ubuntu.com trusty-updates/restricted amd64 Packages [8861 B]\nGet:17 http://archive.ubuntu.com trusty-updates/universe amd64 Packages [228 kB]\nGet:18 http://archive.ubuntu.com trusty-updates/multiverse amd64 Packages [9356 B]\nGet:19 http://archive.ubuntu.com trusty-updates/main Translation-en [179 kB]\nGet:20 http://archive.ubuntu.com trusty-updates/multiverse Translation-en [4719 B]\nGet:21 http://archive.ubuntu.com trusty-updates/restricted Translation-en [2266 B]\nGet:22 http://archive.ubuntu.com trusty-updates/universe Translation-en [117 kB]\nGet:23 http://archive.ubuntu.com trusty-security/main amd64 Packages [181 kB]\nGet:24 http://archive.ubuntu.com trusty-security/restricted amd64 Packages [8861 B]\nGet:25 http://archive.ubuntu.com trusty-security/universe amd64 Packages [76.0 kB]\nGet:26 http://archive.ubuntu.com trusty-security/multiverse amd64 Packages [1143 B]\nGet:27 http://archive.ubuntu.com trusty-security/main Translation-en [90.8 kB]\nGet:28 http://archive.ubuntu.com trusty-security/multiverse Translation-en [587 B]\nGet:29 http://archive.ubuntu.com trusty-security/restricted Translation-en [2266 B]\nGet:30 http://archive.ubuntu.com trusty-security/universe Translation-en [41.5 kB]\nFetched 13.8 MB in 5s (2426 kB/s)\nW: Failed to fetch http://archive.ubuntu.com//ubuntu/dists/trusty-updates/universe/i18n/Translation-en Hash Sum mismatch\n\nE: Some index files failed to download. They have been ignored, or old ones used instead.\nUnexpected error while running command.\nCommand: ['chroot', '/tmp/tmp8mxme7/target', 'apt-get', 'update', '--quiet']\nExit code: 100\nReason: -\nStdout: ''\nStderr: ''\n"
Stderr: ''
Success
ci-info: +++++++Authorized keys
========================================================================

Larry Michel (lmic)
description: updated
tags: added: oil
Changed in curtin:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Blake Rouse (blake-rouse) wrote :
Changed in curtin:
status: Triaged → In Progress
assignee: nobody → Blake Rouse (blake-rouse)
status: In Progress → Fix Committed
Scott Moser (smoser)
Changed in curtin (Ubuntu):
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr213-0ubuntu1

---------------
curtin (0.1.0~bzr213-0ubuntu1) wily; urgency=medium

  * New upstream snapshot.
    * retry apt-get update to avoid transient failures (LP: #1403133)
    * detect and handle multipath devices (LP: #1371634)
    * udevadm settle before unmounting target's /dev (LP: #1462139)
    * doc/ improved developer doc and tools using maas images for test
    * use --no-nvram option to grub-install if available (LP: #1311827)

 -- Scott Moser <email address hidden> Fri, 05 Jun 2015 15:06:31 -0400

Changed in curtin (Ubuntu):
status: In Progress → Fix Released
Scott Moser (smoser)
Changed in curtin (Ubuntu Trusty):
status: New → Confirmed
Changed in curtin (Ubuntu Vivid):
status: New → Confirmed
Changed in curtin (Ubuntu Trusty):
importance: Undecided → Medium
Changed in curtin (Ubuntu Vivid):
importance: Undecided → Medium
Scott Moser (smoser)
description: updated
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Larry, or anyone else affected,

Accepted curtin into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr221-0ubuntu1~14.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in curtin (Ubuntu Trusty):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Chris J Arges (arges) wrote :

Hello Larry, or anyone else affected,

Accepted curtin into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr221-0ubuntu1~14.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in curtin (Ubuntu Vivid):
status: Confirmed → Fix Committed
Scott Moser (smoser)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Scott Moser (smoser) wrote :

attaching console log showing curtin retry

Revision history for this message
Scott Moser (smoser) wrote :

related information, I opened bug 1474417 against squid-deb-proxy, which did not consider updates of translation files the way it did for Packages/Release files. the fix for this will help in reducing translation based hash-sum-mismatch.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr221-0ubuntu1~14.04.1

---------------
curtin (0.1.0~bzr221-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * New upstream snapshot.
    - support installation to multipath devices. (LP: #1371634)
    - know that kernel version 4.2.0 maps to linux-generic-lts-wily
    - support install to arm64 systems that use UEFI for boot (LP: #1447834)
    - fix remaining usage of 'lsblk --out' rather than 'lsblk --output'
      (LP: #1386275)
    - retry 'apt-get update' on failure to avoid transient failures
      (LP: #1403133)
    - run udevadm settle before unmounting /dev in a target to avoid transient
      failures (LP: #1462139)
    - fixes and additions to tools used in development.
    - Add --no-nvram to the grub-install command for UEFI. (LP: #1311827)
    - avoid race condition and transient failure due busy device in mkfs
      (LP: #1443542)
    - improvements to device and partition naming code which allow installation
      devices with HP cciss smart array drives(LP: #1401190, #1263181)
    - do not consider devices < 1G as installable targets
  * debian/README.source fix doc on how to create new upstream snapshots

 -- Scott Moser <email address hidden> Wed, 24 Jun 2015 14:31:14 -0400

Changed in curtin (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for curtin has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr221-0ubuntu1~14.10.1

---------------
curtin (0.1.0~bzr221-0ubuntu1~14.10.1) vivid-proposed; urgency=medium

  * New upstream snapshot.
    - support installation to multipath devices. (LP: #1371634)
    - know that kernel version 4.2.0 maps to linux-generic-lts-wily
    - support install to arm64 systems that use UEFI for boot (LP: #1447834)
    - fix remaining usage of 'lsblk --out' rather than 'lsblk --output'
      (LP: #1386275)
    - retry 'apt-get update' on failure to avoid transient failures
      (LP: #1403133)
    - run udevadm settle before unmounting /dev in a target to avoid transient
      failures (LP: #1462139)
    - fixes and additions to tools used in development.
    - Add --no-nvram to the grub-install command for UEFI. (LP: #1311827)
    - avoid race condition and transient failure due busy device in mkfs
      (LP: #1443542)
    - improvements to device and partition naming code which allow installation
      devices with HP cciss smart array drives(LP: #1401190, #1263181)
    - do not consider devices < 1G as installable targets
  * debian/README.source fix doc on how to create new upstream snapshots

 -- Scott Moser <email address hidden> Wed, 24 Jun 2015 16:12:59 -0400

Changed in curtin (Ubuntu Vivid):
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote : Fixed in Curtin 17.1

This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.