OpenStack-Gate

subnode fails to be accessable over ssh

Bug #1531187 reported by Sean Dague on 2016-01-05

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack-Gate	Fix Released	High	Paul Belanger

Bug Description

Some times when running multinode jobs the subnodes aren't accessable over ssh. This causes an early failure when we try to build the layer 2 network.

message:"bash: install_package: command not found"

is the trigger because we try to install openvswitch to build a layer 2 network between nodes.

Curiously.... this is only seen in RAX IAD....

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22bash%3A%20install_package%3A%20command%20not%20found%5C%22

Revision history for this message

Jeffrey Zhang (jeffrey4l) wrote on 2017-01-04:

depends on /etc/nodepool, there should be ip in eth1, but actually not.

2017-01-04 03:46:26.155480 | [Zuul] Launched by zl07
2017-01-04 03:46:26.155576 | [Zuul] Building remotely on centos-7-rax-ord-6464929 in workspace /home/jenkins/workspace/gate-kolla-dsvm-build-centos-source-centos-7
2017-01-04 03:46:29.365853 | Detailed logs: http://logs.openstack.org/25/416425/1/check/gate-kolla-dsvm-build-centos-source-centos-7/bbdec6a//
2017-01-04 03:46:29.366358 | [Zuul] Task exit code: 0
2017-01-04 03:46:31.356313 | Image build date
2017-01-04 03:46:31.356437 | ================
2017-01-04 03:46:31.357525 | 2017-01-03 04:38
2017-01-04 03:46:31.358093 | Host & kernel
2017-01-04 03:46:31.358160 | =============
2017-01-04 03:46:31.359492 | Linux centos-7-rax-ord-6464929 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
2017-01-04 03:46:31.359573 | Network interface addresses...
2017-01-04 03:46:31.359636 | ==============================
2017-01-04 03:46:31.362633 | 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
2017-01-04 03:46:31.362739 | link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2017-01-04 03:46:31.362797 | inet 127.0.0.1/8 scope host lo
2017-01-04 03:46:31.362861 | valid_lft forever preferred_lft forever
2017-01-04 03:46:31.362919 | inet6 ::1/128 scope host
2017-01-04 03:46:31.363005 | valid_lft forever preferred_lft forever
2017-01-04 03:46:31.363081 | 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
2017-01-04 03:46:31.363153 | link/ether bc:76:4e:10:05:d1 brd ff:ff:ff:ff:ff:ff
2017-01-04 03:46:31.363220 | inet 104.239.192.163/24 brd 104.239.192.255 scope global eth0
2017-01-04 03:46:31.363277 | valid_lft forever preferred_lft forever
2017-01-04 03:46:31.363334 | inet6 fe80::be76:4eff:fe10:5d1/64 scope link
2017-01-04 03:46:31.363391 | valid_lft forever preferred_lft forever
2017-01-04 03:46:31.363470 | 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
2017-01-04 03:46:31.363537 | link/ether bc:76:4e:10:14:95 brd ff:ff:ff:ff:ff:ff
2017-01-04 03:46:31.363600 | inet6 fe80::be76:4eff:fe10:1495/64 scope link
2017-01-04 03:46:31.363662 | valid_lft forever preferred_lft forever

depends on /etc/nodepool, there should be ip in eth1, but actually not.

2017-01-04 03:46:26.155480 | [Zuul] Launched by zl07
2017-01-04 03:46:26.155576 | [Zuul] Building remotely on centos-7-rax-ord-6464929 in workspace /home/jenkins/workspace/gate-kolla-dsvm-build-centos-source-centos-7
2017-01-04 03:46:29.365853 | Detailed logs: http://logs.openstack.org/25/416425/1/check/gate-kolla-dsvm-build-centos-source-centos-7/bbdec6a//
2017-01-04 03:46:29.366358 | [Zuul] Task exit code: 0
2017-01-04 03:46:31.356313 | Image build date
2017-01-04 03:46:31.356437 | ================
2017-01-04 03:46:31.357525 | 2017-01-03 04:38
2017-01-04 03:46:31.358093 | Host & kernel
2017-01-04 03:46:31.358160 | =============
2017-01-04 03:46:31.359492 | Linux centos-7-rax-ord-6464929 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
2017-01-04 03:46:31.359573 | Network interface addresses...
2017-01-04 03:46:31.359636 | ==============================
2017-01-04 03:46:31.362633 | 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
2017-01-04 03:46:31.362739 |     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2017-01-04 03:46:31.362797 |     inet 127.0.0.1/8 scope host lo
2017-01-04 03:46:31.362861 |        valid_lft forever preferred_lft forever
2017-01-04 03:46:31.362919 |     inet6 ::1/128 scope host 
2017-01-04 03:46:31.363005 |        valid_lft forever preferred_lft forever
2017-01-04 03:46:31.363081 | 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
2017-01-04 03:46:31.363153 |     link/ether bc:76:4e:10:05:d1 brd ff:ff:ff:ff:ff:ff
2017-01-04 03:46:31.363220 |     inet 104.239.192.163/24 brd 104.239.192.255 scope global eth0
2017-01-04 03:46:31.363277 |        valid_lft forever preferred_lft forever
2017-01-04 03:46:31.363334 |     inet6 fe80::be76:4eff:fe10:5d1/64 scope link 
2017-01-04 03:46:31.363391 |        valid_lft forever preferred_lft forever
2017-01-04 03:46:31.363470 | 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
2017-01-04 03:46:31.363537 |     link/ether bc:76:4e:10:14:95 brd ff:ff:ff:ff:ff:ff
2017-01-04 03:46:31.363600 |     inet6 fe80::be76:4eff:fe10:1495/64 scope link 
2017-01-04 03:46:31.363662 |        valid_lft forever preferred_lft forever

Revision history for this message

Jeffrey Zhang (jeffrey4l) wrote on 2017-01-04:

check this log link: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22bash:%20install_package:%20command%20not%20found%5C%22

Revision history for this message

Paul Belanger (pabelanger) wrote on 2017-03-29:

I believe https://review.openstack.org/450436 will fix this issue. There was a race condition in glean when using more then 1 NIC. Glean would call systemctl enable network.service, move then once. Which caused issues in network.service as it doesn't appear to be multi-process safe.

Now, we 'systemctl enable network.service' at image build time stopping glean from doing it on boot.

Changed in openstack-gate:
assignee:	nobody → Paul Belanger (pabelanger)

Clark Boylan (cboylan) on 2017-08-04

Changed in openstack-gate:
status:	New → Fix Released
importance:	Undecided → High

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1535850

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.