Anaconda gets stuck while installing the master node

Bug #1557433 reported by Roman Podoliaka
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Fuel DevOps
Mitaka
Won't Fix
Medium
Unassigned
Newton
Fix Released
Critical
Pawel Brzozowski

Bug Description

Master node never becomes available over SSH during setup:

======================================================================
ERROR: Create environment and set up master node
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/9.0.main.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 96, in wrapper
    with args[0].env.d_env.get_admin_remote()\
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/devops/models/environment.py", line 395, in get_admin_remote
    password=password)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/devops/models/node.py", line 171, in remote
    password=password, private_keys=private_keys)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/devops/helpers/helpers.py", line 257, in __init__
    self.reconnect()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/devops/helpers/helpers.py", line 300, in reconnect
    self.connect()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/devops/helpers/retry.py", line 27, in wrapper
    return func(*args, **kwargs)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/devops/helpers/helpers.py", line 295, in connect
    password=self.password)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/paramiko/client.py", line 311, in connect
    raise NoValidConnectionsError(errors)
NoValidConnectionsError: [Errno None] Unable to connect to port 22 on or 10.109.40.2

Screen (attached) shows that anaconda is stuck waiting for one of its threads to finish, but that never happens after revert of an environment.

Failed build: https://product-ci.infra.mirantis.net/job/9.0.main.ubuntu.bvt_2/63/console

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Anaconda logs

Changed in fuel:
milestone: none → 9.0
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

I restarted the provisioning process and it went fine. Seems to be not reproducible

Changed in fuel:
importance: Undecided → Critical
assignee: nobody → Fuel Library Team (fuel-library)
status: New → Incomplete
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → MOS Linux (mos-linux)
tags: added: area-linux
Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

Looks like networking issue according to [1] and [2].

The solution might be to skip geoip request by setting geoloc=0 to kernel parameters.

Anyway, it's not critical, moving to medium.

[1] https://github.com/clalancette/oz/issues/166
[2] https://github.com/clalancette/oz/pull/169

Changed in fuel:
importance: Critical → Medium
Changed in fuel:
status: Incomplete → Confirmed
status: Confirmed → Incomplete
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

UPDATE
======

The root cause is 'hwclock' that failed to set up time inside virtual machine (master node) from it's host system time. The last record in anaconda.log is:

01:03:40,507 DEBUG anaconda: Waiting for thread AnaTimeInitThread to exit

In syslog there are the following logs at the same time:

01:03:35,146 INFO chronyd: chronyd version 2.1.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +DEBUG +ASYNCDNS +IPV6 +SECHASH)
01:03:35,146 INFO chronyd: Generated key 1
01:03:35,266 INFO kernel:[ 22.785932] vda: vda1 vda2 vda3
01:03:39,328 NOTICE kernel:[ 26.847538] type=1400 audit(1458003819.327:15): avc: denied { read write } for pid=1643 comm="hwclock" path="/dev/pts/0" dev="devpts" ino=3 scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:object_r:devpts_t:s0 tclass=chr_file
01:03:40,500 NOTICE kernel:[ 28.020069] type=1400 audit(1458003820.499:16): avc: denied { ioctl } for pid=1643 comm="hwclock" path="/dev/pts/0" dev="devpts" ino=3 scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:object_r:devpts_t:s0 tclass=chr_file
01:03:40,500 NOTICE kernel:[ 28.020091] type=1400 audit(1458003820.499:17): avc: denied { getattr } for pid=1643 comm="hwclock" path="/dev/pts/0" dev="devpts" ino=3 scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:object_r:devpts_t:s0 tclass=chr_file
04:02:09,382 NOTICE kernel:[10736.901898] type=1400 audit(1458014529.381:18): avc: denied { read } for pid=2429 comm="bash" name=".profile" dev="dm-0" ino=34610 scontext=system_u:system_r:getty_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:admin_home_t:s0 tclass=file

It's not a selinux issues as it might seem because we can see the same errors in logs of successfull run [2]:

08:21:08,961 INFO chronyd: chronyd version 2.1.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +DEBUG +ASYNCDNS +IPV6 +SECHASH)
08:21:08,964 INFO chronyd: Generated key 1
08:21:09,185 INFO kernel:[ 24.756431] vda: vda1 vda2 vda3
08:21:10,000 NOTICE kernel:[ 25.422004] type=1400 audit(1458289270.000:15): avc: denied { ioctl } for pid=1551 comm="hwclock" path="/dev/pts/0" dev="devpts" ino=3 scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:object_r:devpts_t:s0 tclass=chr_file
08:21:10,000 NOTICE kernel:[ 25.422013] type=1400 audit(1458289270.000:16): avc: denied { getattr } for pid=1551 comm="hwclock" path="/dev/pts/0" dev="devpts" ino=3 scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:object_r:devpts_t:s0 tclass=chr_file
08:21:14,284 INFO chronyd: Selected source 95.213.132.254
08:21:14,286 WARNING kernel:[ 29.707263] Adjusting kvm-clock more than 11% (9437184 vs 9311354)

On failed run [2] we've never reached state 'Adjusting kvm-clock' and the test was aborted in 3 hrs.

It looks like an issue on infra side.

[1] https://product-ci.infra.mirantis.net/job/9.0.main.ubuntu.bvt_2/80/
[2] https://product-ci.infra.mirantis.net/job/9.0.main.ubuntu.bvt_2/63/

Changed in fuel:
assignee: MOS Linux (mos-linux) → MOS Infra (mos-infra)
assignee: MOS Infra (mos-infra) → nobody
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Dmitry Klenov (dklenov) wrote :

Moving the issue to infra side according to Dmitry's comment.

Revision history for this message
Alexander Charykov (acharykov) wrote :

@devops I think its bug to you.

Revision history for this message
Igor Shishkin (teran) wrote :

Roman, is the issue still relevant(job logs are 404 at the moment)?

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Well, I don't know. I suppose the issues on the Infra side (according to https://bugs.launchpad.net/fuel/+bug/1557433/comments/6 , if there were any) were not fixed. It's just this is reproduced very rarely, and we haven't seen it lately on BVT runs.

Revision history for this message
Igor Shishkin (teran) wrote :

Discussed with Roman and Dmitry T.: since we don't have any option to investigate the issue setting the bug to incomplete and waiting to get it reproduced.
Next time it would happen please perform the following steps on stucked environment:

#) `thread apply all bt` output from gdb.
#) provide qemu/kvm and kernel versions on HN.
#) perform nmi inject command to stucked VM and reply the question if it helped.

Revision history for this message
Igor Shishkin (teran) wrote :

Setting to Invalid according to bug triaging practices with 1 month of inactivity in Incomplete state.

Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

BVT failed today with this issue, also we got the same symptoms on several runs during the week but not investigated deeply

https://product-ci.infra.mirantis.net/view/10.0/job/10.0.main.ubuntu.bvt_2/1354/
https://puu.sh/uxMWx/5f6d3f9e73.png

Revision history for this message
Pawel Brzozowski (pbrzozowski) wrote :

This server (srv28-bud) in fact was behaving weird before running this test - it will be redeployed soon because it may be the reason why it happened.

Revision history for this message
Pawel Brzozowski (pbrzozowski) wrote :

Server redeployed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.