facter upgrade crashes puppet

Bug #885998 reported by majordom
102
This bug affects 19 people
Affects Status Importance Assigned to Milestone
puppet
Fix Released
Undecided
Unassigned
facter (Ubuntu)
Fix Released
Critical
Unassigned
Lucid
Fix Released
Critical
Unassigned
Maverick
Fix Released
Critical
Unassigned
Natty
Fix Released
Critical
Unassigned

Bug Description

the 10/31/2011 ubuntu upgrade facter from facter_1.5.6-2ubuntu2.1_all.deb facter_1.5.6-2ubuntu2.2_all.deb.

Since nothing work and I've got a message : "Could not run Puppet configuration client: Could not retrieve local facts: execution expired"

All the clients and the server are running the latest ubuntu 10.04 LTS version of puppet and puppetmaster (puppet_0.25.4-2ubuntu6.5_all and
puppetmaster_0.25.4-2ubuntu6.5_all)

Related branches

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in facter (Ubuntu):
status: New → Confirmed
Colin Watson (cjwatson)
tags: added: regression-update
Changed in facter (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Steve Huff (shuff) wrote :

I can confirm that rolling back to facter_1.5.6-2ubuntu2 resolves the issue on 10.04 LTS.

I'm using puppet 2.6.1-0ubuntu2 from the lucid-backports repo.

In addition, I see the following traceback using facter_1.5.6-2ubuntu2.2 when running `facter --debug`:

# facter --debug
/usr/lib/ruby/1.8/timeout.rb:60:in `rbuf_fill': execution expired (Timeout::Error)
 from /usr/lib/ruby/1.8/timeout.rb:62:in `timeout'
 from /usr/lib/ruby/1.8/timeout.rb:93:in `timeout'
 from /usr/lib/ruby/1.8/net/protocol.rb:134:in `rbuf_fill'
 from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
 from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline'
 from /usr/lib/ruby/1.8/net/http.rb:2024:in `read_status_line'
 from /usr/lib/ruby/1.8/net/http.rb:2013:in `read_new'
 from /usr/lib/ruby/1.8/net/http.rb:1050:in `request'
 from /usr/lib/ruby/1.8/open-uri.rb:248:in `open_http'
 from /usr/lib/ruby/1.8/net/http.rb:543:in `start'
 from /usr/lib/ruby/1.8/open-uri.rb:242:in `open_http'
 from /usr/lib/ruby/1.8/open-uri.rb:616:in `buffer_open'
 from /usr/lib/ruby/1.8/open-uri.rb:164:in `open_loop'
 from /usr/lib/ruby/1.8/open-uri.rb:162:in `catch'
 from /usr/lib/ruby/1.8/open-uri.rb:162:in `open_loop'
 from /usr/lib/ruby/1.8/open-uri.rb:132:in `open_uri'
 from /usr/lib/ruby/1.8/open-uri.rb:518:in `open'
 from /usr/lib/ruby/1.8/open-uri.rb:30:in `open'
 from /usr/lib/ruby/1.8/facter/ec2.rb:10:in `can_connect?'
 from /usr/lib/ruby/1.8/facter/ec2.rb:10:in `can_connect?'
 from /usr/lib/ruby/1.8/facter/ec2.rb:33
 from /usr/lib/ruby/1.8/facter/util/loader.rb:72:in `load'
 from /usr/lib/ruby/1.8/facter/util/loader.rb:72:in `load_file'
 from /usr/lib/ruby/1.8/facter/util/loader.rb:38:in `load_all'
 from /usr/lib/ruby/1.8/facter/util/loader.rb:33:in `each'
 from /usr/lib/ruby/1.8/facter/util/loader.rb:33:in `load_all'
 from /usr/lib/ruby/1.8/facter/util/loader.rb:30:in `each'
 from /usr/lib/ruby/1.8/facter/util/loader.rb:30:in `load_all'
 from /usr/lib/ruby/1.8/facter/util/collection.rb:94:in `load_all'
 from /usr/lib/ruby/1.8/facter.rb:91:in `to_hash'
 from /usr/bin/facter:138

Note the calls to ec2.rb; I would look first at the changes implemented in this branch:

https://code.launchpad.net/~gandelman-a/ubuntu/lucid/facter/lp732953_876130

-Steve

Revision history for this message
David Brewer (david-brewer) wrote :

I just ran into this issue this afternoon on my own servers. The issue is definitely related to the ec2.rb changes in facter 1.5.6-2ubuntu2.2. As a temporary hack/workaround I was able to get Puppet working again by manually editing /usr/lib/ruby/1.8/facter/ec2.rb and reverting the "can_connect?" method to use the code for that method defined in the previous version of the page. Specifically, see below the commented "new" version of the method and my inserted "old" version of the method.

# This version of the method causes timeouts even if your machine is not using EC2
#def can_connect?(ip,port,wait_sec=2)
# url = "http://#{ip}:#{port}"
# Timeout::timeout(wait_sec) {open(url)}
# return true
#rescue
# return false
#end

# This version of the method seems to work, although I can't guarantee it works with EC2
# as my servers are not running there
def can_connect?(ip,port,wait_sec=2)
 Timeout::timeout(wait_sec) {open(ip, port)}
 return true
rescue
  return false
end

Revision history for this message
Jacob Helwig (jhelwig) wrote :

I just checked the change that David posted and it breaks retrieval of EC2 facts.

In the upstream repository (as of d62e079489c07201cb343f2ca109fecd62d6e567, and later refactored in cc67a0148b97e315572cdb905476df1224a78dd5) the can_connect? method started only being called if a couple of additional checks passed.

I'm looking for a good way to back-port some of these changes.

Revision history for this message
Jacob Helwig (jhelwig) wrote :

Adam Gandelman graciously pointed out that he already had branches to fix this. Looking at the changes, they're the minimum to fix the regression and are a better fix than the more invasive back-porting I was looking at. I've also verified his changes on lucid running on EC2, and in a VM.

Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Hello majordom, or anyone else affected,

Accepted into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in facter (Ubuntu Lucid):
importance: Undecided → Critical
status: New → In Progress
Changed in facter (Ubuntu Maverick):
importance: Undecided → Critical
status: New → In Progress
Changed in facter (Ubuntu Natty):
importance: Undecided → Critical
status: New → In Progress
Changed in facter (Ubuntu Lucid):
status: In Progress → Fix Committed
Changed in facter (Ubuntu Maverick):
status: In Progress → Fix Committed
Revision history for this message
Steve Langasek (vorlon) wrote :

Hello majordom, or anyone else affected,

Accepted into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Changed in facter (Ubuntu Natty):
status: In Progress → Fix Committed
Revision history for this message
Steve Langasek (vorlon) wrote :

Hello majordom, or anyone else affected,

Accepted into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Tarjei Huse (tarjei-huse) wrote :

Hi, I tested the package on Lucid and confirm that it fixes the bug.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package facter - 1.5.6-2ubuntu2.3

---------------
facter (1.5.6-2ubuntu2.3) lucid-proposed; urgency=low

  * lib/facter/ec2.rb: Rescue condition in can_connect() when timeout()
    actually has a chance to timeout. (LP: #885998)
 -- Adam Gandelman <email address hidden> Mon, 07 Nov 2011 10:18:18 -0800

Changed in facter (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Skipping the usual 7 day waiting period as this was a regression. I'll release them as soon as testing reports come in.

Revision history for this message
Paul Hirst (paul-hirst) wrote :

I tested the lucid proposed package. The regression is fixed (in my non EC2 environment).

Revision history for this message
Paul Hirst (paul-hirst) wrote :

Just tested the maverick package. Again, it's fixed the problem.

Revision history for this message
Paul Hirst (paul-hirst) wrote :

The natty package fixes the problem too.

I don't have anywhere to test oneric unfortunately.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package facter - 1.5.7-1ubuntu1.3

---------------
facter (1.5.7-1ubuntu1.3) maverick-proposed; urgency=low

  * lib/facter/ec2.rb: Rescue condition in can_connect() when timeout()
    actually has a chance to timeout. (LP: #885998)
 -- Adam Gandelman <email address hidden> Mon, 07 Nov 2011 10:27:58 -0800

Changed in facter (Ubuntu Maverick):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package facter - 1.5.8-2ubuntu2.2

---------------
facter (1.5.8-2ubuntu2.2) natty-proposed; urgency=low

  * debian/patches/fix_ec2_metadata_facts.patch: Refreshed to rescue condition
    in can_connect() when timeout() actually has a chance to timeout.
    (LP: #885998)
 -- Adam Gandelman <email address hidden> Mon, 07 Nov 2011 10:47:01 -0800

Changed in facter (Ubuntu Natty):
status: Fix Committed → Fix Released
Revision history for this message
Martijn van Brummelen (martijn-brumit) wrote :

Confirm 1.5.8-2ubuntu2.3 fixes the problem in facter and makes puppet run again.

Revision history for this message
Steve Huff (shuff) wrote :

another confirmed fix in lucid (non-EC2) for 1.5.8-2ubuntu2.3, on a system that had previously exhibited the issue.

Changed in puppet:
status: New → Fix Released
Changed in facter (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.