amulet _get_proc_start_time has a race which causes service restart checks to fail
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Charm Helpers |
Medium
|
Ryan Beisner | ||
| cinder (Juju Charms Collection) |
Medium
|
Ryan Beisner | ||
| glance (Juju Charms Collection) |
Medium
|
Ryan Beisner | ||
| heat (Juju Charms Collection) |
Medium
|
Ryan Beisner | ||
| neutron-api (Juju Charms Collection) |
Medium
|
Ryan Beisner | ||
| neutron-gateway (Juju Charms Collection) |
Medium
|
Ryan Beisner | ||
| neutron-openvswitch (Juju Charms Collection) |
Medium
|
Ryan Beisner | ||
| openstack-dashboard (Juju Charms Collection) |
Medium
|
Ryan Beisner |
Bug Description
Although most frequently observed in the openstack-
The test is to make a charm config change, then inspect the unit for process start time, and compare that start time with file modification time of the relevant config file. In some cases, that arbitrary sleep results in the _get_proc_
2015-07-13 14:57:39,934 test_900_
2015-07-13 14:57:40,527 test_900_
2015-07-13 14:59:29,632 _get_proc_
2015-07-13 14:59:29,632 _get_proc_
Traceback (most recent call last):
File "tests/
deployment.
File "/home/
getattr(self, test)()
File "/home/
pgrep_
File "/home/
if (self._
File "/home/
return self._get_
File "/home/
return sentry_
File "/usr/lib/
return self._fs_data(path)
File "/usr/lib/
return self._run_
File "/usr/lib/
raise IOError(output)
IOError: Traceback (most recent call last):
File "/tmp/amulet/
s = os.stat(
OSError: [Errno 2] No such file or directory: '/proc/1242'
ERROR subprocess encountered error code 1
real 17m21.555s
Related branches
- Billy Olsen: Pending requested 2015-09-01
- OpenStack Charmers: Pending requested 2015-08-27
- Corey Bryant: Pending requested 2015-08-27
- James Page: Pending requested 2015-08-27
-
Diff: 965 lines (+362/-232)8 files modifiedMakefile (+11/-12)
tests/00-setup (+7/-3)
tests/020-basic-trusty-liberty (+11/-0)
tests/021-basic-wily-liberty (+9/-0)
tests/README (+33/-0)
tests/basic_deployment.py (+171/-164)
tests/charmhelpers/contrib/amulet/utils.py (+101/-53)
tests/tests.yaml (+19/-0)
- Billy Olsen: Approve on 2015-09-02
- Adam Collard (community): Abstain on 2015-09-01
- charmers: Pending requested 2015-08-25
-
Diff: 241 lines (+101/-53)1 file modifiedcharmhelpers/contrib/amulet/utils.py (+101/-53)
- Billy Olsen: Pending requested 2015-09-01
- OpenStack Charmers: Pending requested 2015-08-24
-
Diff: 699 lines (+287/-180)7 files modifiedMakefile (+10/-10)
metadata.yaml (+3/-1)
tests/00-setup (+7/-3)
tests/README (+10/-0)
tests/basic_deployment.py (+136/-113)
tests/charmhelpers/contrib/amulet/utils.py (+101/-53)
tests/tests.yaml (+20/-0)
- Billy Olsen: Pending requested 2015-09-02
- OpenStack Charmers: Pending requested 2015-08-28
-
Diff: 335 lines (+118/-65)5 files modifiedMakefile (+1/-1)
tests/00-setup (+3/-1)
tests/basic_deployment.py (+11/-9)
tests/charmhelpers/contrib/amulet/utils.py (+101/-53)
tests/tests.yaml (+2/-1)
- Chris Glass (community): Approve on 2015-09-22
-
Diff: 122 lines (+47/-16)1 file modifiedcharmhelpers/contrib/amulet/utils.py (+47/-16)
- Liam Young: Approve on 2015-09-28
-
Diff: 2164 lines (+1242/-341)11 files modifiedMakefile (+1/-1)
tests/00-setup (+7/-3)
tests/020-basic-trusty-liberty (+11/-0)
tests/021-basic-wily-liberty (+9/-0)
tests/README (+10/-0)
tests/basic_deployment.py (+514/-264)
tests/charmhelpers/contrib/amulet/deployment.py (+4/-2)
tests/charmhelpers/contrib/amulet/utils.py (+284/-62)
tests/charmhelpers/contrib/openstack/amulet/deployment.py (+23/-9)
tests/charmhelpers/contrib/openstack/amulet/utils.py (+359/-0)
tests/tests.yaml (+20/-0)
- David Ames: Approve on 2016-01-21
-
Diff: 71 lines (+17/-13)1 file modifiedtests/basic_deployment.py (+17/-13)
- David Ames: Approve on 2016-01-21
-
Diff: 190 lines (+76/-25)1 file modifiedtests/basic_deployment.py (+76/-25)
- OpenStack Charmers: Pending requested 2016-01-13
-
Diff: 75 lines (+17/-10)1 file modifiedtests/basic_deployment.py (+17/-10)
no longer affects: | neutron-gateway (Ubuntu) |
Ryan Beisner (1chb1n) wrote : | #1 |
Adam Collard (adam-collard) wrote : | #2 |
I hit this same traceback but diagnosed it differently.
When pgrep_full is used, it looks for the script name in the full command line of all processes. Unfortunately "juju run pgrep foo" is also running, so when we pgrep -a for foo we'll find the juju process. In charm-helpers, there's a "| grep -v pgrep" which I assume is trying to work around this case. Given that the main call to pgrep is only configured to output PIDs though, it's something which will never get excluded. The fix I was going to propose added -a to the pgrep command and then processed the results accordingly.
We should still have handling for when pgrep finds a PID but then fails to look at the mtime of /proc/$pid though since the process might well go away between the two commands.
Ryan Beisner (1chb1n) wrote : | #3 |
I'm not sure I completely follow. What I do know, is I have not experienced the race when using the proposed changes, where it was actually fairly prevalent with a few of these charms. I'd sure entertain further clarification or an example though. :-)
Ryan Beisner (1chb1n) wrote : | #4 |
Note that pgrep on precise doesn't take have the -a option. :-/
Ryan Beisner (1chb1n) wrote : | #5 |
Shifted gears to use pidof instead of pgrep, via an existing helper. Manual tests against keystone, neutron-api and openstack-dashboard are OK. That encompasses charm tests which had previously used pgrep_full True and pgrep_full False. So, functionally backwards-
Changed in neutron-api (Juju Charms Collection): | |
importance: | Undecided → Medium |
Changed in neutron-gateway (Juju Charms Collection): | |
importance: | Undecided → Medium |
Changed in openstack-dashboard (Juju Charms Collection): | |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in neutron-gateway (Juju Charms Collection): | |
status: | New → In Progress |
Changed in neutron-api (Juju Charms Collection): | |
status: | New → In Progress |
Changed in openstack-dashboard (Juju Charms Collection): | |
milestone: | none → 15.10 |
Changed in neutron-gateway (Juju Charms Collection): | |
milestone: | none → 15.10 |
Changed in neutron-api (Juju Charms Collection): | |
milestone: | none → 15.10 |
assignee: | nobody → Ryan Beisner (1chb1n) |
Changed in neutron-gateway (Juju Charms Collection): | |
assignee: | nobody → Ryan Beisner (1chb1n) |
Changed in openstack-dashboard (Juju Charms Collection): | |
assignee: | nobody → Ryan Beisner (1chb1n) |
Changed in neutron-api (Juju Charms Collection): | |
status: | In Progress → Fix Committed |
Changed in charm-helpers: | |
status: | New → Fix Released |
Changed in openstack-dashboard (Juju Charms Collection): | |
status: | In Progress → Fix Committed |
Ryan Beisner (1chb1n) wrote : | #6 |
Similar race, nearby place, will propose something more resilient in the form of an update to contrib/
13:29:22 Traceback (most recent call last):
13:29:22 File "tests/
13:29:22 deployment.
13:29:22 File "/var/lib/
13:29:22 getattr(self, test)()
13:29:22 File "/var/lib/
13:29:22 pgrep_full=True, sleep_time=time):
13:29:22 File "/var/lib/
13:29:22 self._get_
13:29:22 File "/var/lib/
13:29:22 return sentry_
13:29:22 File "/usr/lib/
13:29:22 return self._fs_
13:29:22 File "/usr/lib/
13:29:22 return self._run_
13:29:22 File "/usr/lib/
13:29:22 raise IOError(output)
13:29:22 IOError: Warning: Permanently added '172.17.105.93' (ECDSA) to the list of known hosts.
Ryan Beisner (1chb1n) wrote : | #7 |
Neutron-gateway & nova-cloud-
Ryan Beisner (1chb1n) wrote : | #8 |
Status update: we fixed the process-
Changed in neutron-gateway (Juju Charms Collection): | |
status: | In Progress → Fix Committed |
Changed in neutron-api (Juju Charms Collection): | |
status: | Fix Committed → Fix Released |
Changed in neutron-gateway (Juju Charms Collection): | |
status: | Fix Committed → Fix Released |
Changed in openstack-dashboard (Juju Charms Collection): | |
status: | Fix Committed → Fix Released |
Changed in neutron-openvswitch (Ubuntu): | |
assignee: | nobody → Ryan Beisner (1chb1n) |
Changed in heat (Ubuntu): | |
assignee: | nobody → Ryan Beisner (1chb1n) |
Changed in glance (Ubuntu): | |
assignee: | nobody → Ryan Beisner (1chb1n) |
Changed in cinder (Juju Charms Collection): | |
assignee: | nobody → Ryan Beisner (1chb1n) |
status: | New → In Progress |
Changed in glance (Ubuntu): | |
status: | New → In Progress |
Changed in heat (Ubuntu): | |
status: | New → In Progress |
Changed in neutron-openvswitch (Ubuntu): | |
status: | New → In Progress |
Changed in glance (Ubuntu): | |
status: | In Progress → Confirmed |
Changed in heat (Ubuntu): | |
status: | In Progress → Confirmed |
Changed in cinder (Juju Charms Collection): | |
milestone: | none → 16.01 |
no longer affects: | neutron-openvswitch (Ubuntu) |
no longer affects: | heat (Ubuntu) |
no longer affects: | glance (Ubuntu) |
Changed in heat (Juju Charms Collection): | |
status: | New → Confirmed |
milestone: | none → 16.01 |
Changed in glance (Juju Charms Collection): | |
status: | New → Confirmed |
milestone: | none → 16.01 |
Changed in neutron-openvswitch (Juju Charms Collection): | |
status: | New → In Progress |
milestone: | none → 16.01 |
assignee: | nobody → Ryan Beisner (1chb1n) |
Changed in cinder (Juju Charms Collection): | |
status: | In Progress → Fix Committed |
Changed in neutron-openvswitch (Juju Charms Collection): | |
status: | In Progress → Fix Committed |
Changed in heat (Juju Charms Collection): | |
assignee: | nobody → Ryan Beisner (1chb1n) |
importance: | Undecided → Medium |
Changed in heat (Juju Charms Collection): | |
milestone: | 16.01 → 16.04 |
Changed in cinder (Juju Charms Collection): | |
status: | Fix Committed → Fix Released |
Changed in neutron-openvswitch (Juju Charms Collection): | |
status: | Fix Committed → Fix Released |
Changed in glance (Juju Charms Collection): | |
milestone: | 16.01 → 16.04 |
Changed in heat (Juju Charms Collection): | |
milestone: | 16.04 → 16.07 |
Changed in glance (Juju Charms Collection): | |
milestone: | 16.04 → 16.07 |
Changed in heat (Juju Charms Collection): | |
milestone: | 16.07 → 16.10 |
Changed in glance (Juju Charms Collection): | |
milestone: | 16.07 → 16.10 |
Changed in heat (Juju Charms Collection): | |
milestone: | 16.10 → 17.01 |
Changed in neutron-openvswitch (Juju Charms Collection): | |
importance: | Undecided → Medium |
Changed in glance (Juju Charms Collection): | |
importance: | Undecided → Medium |
Changed in charm-helpers: | |
importance: | Undecided → Medium |
Changed in cinder (Juju Charms Collection): | |
importance: | Undecided → Medium |
Changed in charm-helpers: | |
assignee: | nobody → Ryan Beisner (1chb1n) |
Changed in glance (Juju Charms Collection): | |
assignee: | nobody → Ryan Beisner (1chb1n) |
Changed in charm-glance: | |
assignee: | nobody → Ryan Beisner (1chb1n) |
importance: | Undecided → Medium |
status: | New → Confirmed |
Changed in glance (Juju Charms Collection): | |
status: | Confirmed → Invalid |
Changed in charm-heat: | |
assignee: | nobody → Ryan Beisner (1chb1n) |
importance: | Undecided → Medium |
status: | New → Confirmed |
Changed in heat (Juju Charms Collection): | |
status: | Confirmed → Invalid |
Corey Bryant (corey.bryant) wrote : | #9 |
Ryan, is this still an issue? I know there've been a lot of amulet fixes in the past year+ and this is an old bug. Can we close it out?
no longer affects: | charm-heat |
no longer affects: | charm-glance |
Switching from service_restarted to validate_ service_ config_ changed, the race in this bug still exists, as the process may not be started at the moment that it is checked. I think it needs a try/retry loop with a timeout threshold.
2015-08-24 16:27:24,641 test_900_ restart_ on_config_ change DEBUG: Making config change on neutron-api... logger: Making config change on neutron-api... restart_ on_config_ change DEBUG: Checking that service restarted: neutron-server logger: Checking that service restarted: neutron-server service_ config_ changed DEBUG: Checking neutron-server restarted since 1440433643.0 logger: Checking neutron-server restarted since 1440433643.0 restarted_ since DEBUG: Checking neutron-server restarted since 1440433643.0 logger: Checking neutron-server restarted since 1440433643.0 start_time DEBUG: CMDout: (u'15786', 0) logger: CMDout: (u'15786', 0) start_time DEBUG: Pid for neutron-server 15786 logger: Pid for neutron-server 15786 019-basic- vivid-kilo" , line 9, in <module> run_tests( ) ubuntu/ bzr/next/ neutron- api/tests/ charmhelpers/ contrib/ amulet/ deployment. py", line 93, in run_tests ubuntu/ bzr/next/ neutron- api/tests/ basic_deploymen t.py", line 548, in test_900_ restart_ on_config_ change time=sleep_ time): ubuntu/ bzr/next/ neutron- api/tests/ charmhelpers/ contrib/ amulet/ utils.py" , line 411, in validate_ service_ config_ changed count=retry_ count) ubuntu/ bzr/next/ neutron- api/tests/ charmhelpers/ contrib/ amulet/ utils.py" , line 327, in service_ restarted_ since ubuntu/ bzr/next/ neutron- api/tests/ charmhelpers/ contrib/ amulet/ utils.py" , line 289, in _get_proc_ start_time dir_mtime( sentry_ unit, proc_dir) ubuntu/ bzr/next/ neutron- api/tests/ charmhelpers/ contrib/ amulet/ utils.py" , line 270, in _get_dir_mtime unit.directory_ stat(directory) ['mtime' ] python2. 7/dist- packages/ amulet/ sentry. py", line 88, in directory_stat python2. 7/dist- packages/ amulet/ sentry. py", line 75, in _fs_data unit_script( "filesystem_ data.py {}".format(path)) python2. 7/dist- packages/ amulet/ sentry. py", line 126, in _run_unit_script filesystem_ data.py" , line 7, in <module> sys.argv[ 1])
DEBUG:amulet-
2015-08-24 16:27:26,794 test_900_
DEBUG:amulet-
2015-08-24 16:27:26,796 validate_
DEBUG:amulet-
2015-08-24 16:28:26,856 service_
DEBUG:amulet-
2015-08-24 16:29:15,571 _get_proc_
DEBUG:amulet-
2015-08-24 16:29:15,571 _get_proc_
DEBUG:amulet-
Traceback (most recent call last):
File "tests/
deployment.
File "/home/
getattr(self, test)()
File "/home/
sleep_
File "/home/
retry_
File "/home/
pgrep_full)
File "/home/
return self._get_
File "/home/
return sentry_
File "/usr/lib/
return self._fs_data(path)
File "/usr/lib/
return self._run_
File "/usr/lib/
raise IOError(output)
IOError: Traceback (most recent call last):
File "/tmp/amulet/
s = os.stat(
OSError: [Errno 2] No such file or directory: '/proc/15786'
ERROR subprocess encountered error code 1