subnode fails to be accessable over ssh

Bug #1531187 reported by Sean Dague
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Gate
Fix Released
High
Paul Belanger

Bug Description

Some times when running multinode jobs the subnodes aren't accessable over ssh. This causes an early failure when we try to build the layer 2 network.

message:"bash: install_package: command not found"

is the trigger because we try to install openvswitch to build a layer 2 network between nodes.

Curiously.... this is only seen in RAX IAD....

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22bash%3A%20install_package%3A%20command%20not%20found%5C%22

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

depends on /etc/nodepool, there should be ip in eth1, but actually not.

2017-01-04 03:46:26.155480 | [Zuul] Launched by zl07
2017-01-04 03:46:26.155576 | [Zuul] Building remotely on centos-7-rax-ord-6464929 in workspace /home/jenkins/workspace/gate-kolla-dsvm-build-centos-source-centos-7
2017-01-04 03:46:29.365853 | Detailed logs: http://logs.openstack.org/25/416425/1/check/gate-kolla-dsvm-build-centos-source-centos-7/bbdec6a//
2017-01-04 03:46:29.366358 | [Zuul] Task exit code: 0
2017-01-04 03:46:31.356313 | Image build date
2017-01-04 03:46:31.356437 | ================
2017-01-04 03:46:31.357525 | 2017-01-03 04:38
2017-01-04 03:46:31.358093 | Host & kernel
2017-01-04 03:46:31.358160 | =============
2017-01-04 03:46:31.359492 | Linux centos-7-rax-ord-6464929 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
2017-01-04 03:46:31.359573 | Network interface addresses...
2017-01-04 03:46:31.359636 | ==============================
2017-01-04 03:46:31.362633 | 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
2017-01-04 03:46:31.362739 | link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2017-01-04 03:46:31.362797 | inet 127.0.0.1/8 scope host lo
2017-01-04 03:46:31.362861 | valid_lft forever preferred_lft forever
2017-01-04 03:46:31.362919 | inet6 ::1/128 scope host
2017-01-04 03:46:31.363005 | valid_lft forever preferred_lft forever
2017-01-04 03:46:31.363081 | 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
2017-01-04 03:46:31.363153 | link/ether bc:76:4e:10:05:d1 brd ff:ff:ff:ff:ff:ff
2017-01-04 03:46:31.363220 | inet 104.239.192.163/24 brd 104.239.192.255 scope global eth0
2017-01-04 03:46:31.363277 | valid_lft forever preferred_lft forever
2017-01-04 03:46:31.363334 | inet6 fe80::be76:4eff:fe10:5d1/64 scope link
2017-01-04 03:46:31.363391 | valid_lft forever preferred_lft forever
2017-01-04 03:46:31.363470 | 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
2017-01-04 03:46:31.363537 | link/ether bc:76:4e:10:14:95 brd ff:ff:ff:ff:ff:ff
2017-01-04 03:46:31.363600 | inet6 fe80::be76:4eff:fe10:1495/64 scope link
2017-01-04 03:46:31.363662 | valid_lft forever preferred_lft forever

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :
Revision history for this message
Paul Belanger (pabelanger) wrote :

I believe https://review.openstack.org/450436 will fix this issue. There was a race condition in glean when using more then 1 NIC. Glean would call systemctl enable network.service, move then once. Which caused issues in network.service as it doesn't appear to be multi-process safe.

Now, we 'systemctl enable network.service' at image build time stopping glean from doing it on boot.

Changed in openstack-gate:
assignee: nobody → Paul Belanger (pabelanger)
Clark Boylan (cboylan)
Changed in openstack-gate:
status: New → Fix Released
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.