Fuel for OpenStack

Time sometimes synced in wrong order after revert of snapshots

Bug #1316537 reported by Tatyanka on 2014-05-06

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	Medium	Tatyanka	Fuel for OpenStack 5.1

Bug Description

{"build_id": "2014-05-06_01-31-29", "mirantis": "yes", "build_number": "183", "ostf_sha": "fe718434f88f2ab167779770828a195f06eb29f8", "nailgun_sha": "c61100f34a12c597df32f7498697acd84035957f", "production": "docker", "api": "1.0", "fuelmain_sha": "185fac4a937b970fc82aa48a9c31c7981f8f7659", "astute_sha": "3cffebde1e5452f5dbf8f744c6525fc36c7afbf3", "release": "5.0", "fuellib_sha": "edaecb643f34ca73be3716c5a722bfdd40e06128"}

After investigation of issue we found the next problem. Time on nodes was change from 12:30 on 9:26. It is wrong behavior. May be caused by reverting of snapshot

Steps to Reproduce:
1. Deploy one node cluster using KVM
2. Delete it
3. Deploy neutron gre cluser with 4 nodes (1controller 2 computes 1 cinder+ mongo) with murano/savanna/ceilometer

Expected
Deployment pass , cluster ready, ostf pass

Actual Result:
Controller nodes go offline during installation of openstack
node shows offline on ui, but marked with error in fuel nodes
id | status | name | cluster | ip | mac | roles | pending_roles | online
---|-----------|------------------|---------|--------------|-------------------|---------------|---------------|-------
5 | deploying | Untitled (3c:b2) | 2 | 10.108.0.6 | e6:b9:c3:32:6b:48 | cinder, mongo | | True
4 | ready | Untitled (8a:3d) | 2 | 10.108.0.5 | 6a:bf:99:49:e3:44 | compute | | True
3 | ready | Untitled (8e:5c) | 2 | 10.108.0.4 | c6:c2:fe:83:78:4b | compute | | True
2 | error | Untitled (cd:66) | 2 | 10.108.0.203 | 64:d1:8c:f1:cd:66 | controller | | False

mco ping

4 time=54.02 ms
3 time=56.93 ms
5 time=58.93 ms
1 time=62.11 ms
master time=106.54 ms

node in error state is node -2, but hostname on this node says that it is node-1

arp -a on master
[root@nailgun log]# arp -a
? (172.17.0.11) at de:94:a6:ae:17:03 [ether] on docker0
node-4.test.domain.local (10.108.0.5) at 6a:bf:99:49:e3:44 [ether] on eth0
? (172.17.0.10) at d6:ee:16:fc:ad:07 [ether] on docker0
? (172.17.0.8) at f6:7a:3d:10:f2:46 [ether] on docker0
? (172.17.0.4) at fa:3c:04:24:47:d0 [ether] on docker0
node-3.test.domain.local (10.108.0.4) at c6:c2:fe:83:78:4b [ether] on eth0
node-5.test.domain.local (10.108.0.6) at e6:b9:c3:32:6b:48 [ether] on eth0
node-2.test.domain.local (10.108.0.3) at 64:c6:f9:17:f6:87 [ether] on eth0
? (10.108.0.1) at 52:54:00:2f:85:46 [ether] on eth0
? (172.17.0.3) at aa:f1:99:a8:59:c9 [ether] on docker0
? (172.17.0.7) at 3a:18:a7:aa:a0:b0 [ether] on docker0
03903be110e3 (172.17.0.2) at 0a:4a:38:9a:fc:3a [ether] on docker0
? (172.17.0.9) at da:70:23:b1:75:b8 [ether] on docker0

ip addr on node-2
ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:c6:f9:17:f6:87 brd ff:ff:ff:ff:ff:ff
    inet 10.108.0.3/24 brd 10.108.0.255 scope global eth0
    inet6 fe80::66c6:f9ff:fe17:f687/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:0e:11:ea:81:55 brd ff:ff:ff:ff:ff:ff
    inet 10.108.1.2/24 brd 10.108.1.255 scope global eth1
    inet6 fe80::660e:11ff:feea:8155/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:b9:0b:f7:02:79 brd ff:ff:ff:ff:ff:ff
    inet 10.108.2.2/24 brd 10.108.2.255 scope global eth2
    inet6 fe80::66b9:bff:fef7:279/64 scope link
       valid_lft forever preferred_lft forever
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:0e:8f:31:f9:75 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::660e:8fff:fe31:f975/64 scope link
       valid_lft forever preferred_lft forever
6: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:52:46:7b:19:2d brd ff:ff:ff:ff:ff:ff
    inet 10.108.4.2/24 brd 10.108.4.255 scope global eth4
    inet6 fe80::6652:46ff:fe7b:192d/64 scope link
       valid_lft forever preferred_lft forever
7: eth3.103@eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 64:0e:8f:31:f9:75 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::660e:8fff:fe31:f975/64 scope link
       valid_lft forever preferred_lft forever

Info: ADanin - ip conflict here take place (for case remove 1 cluster - deploy new one)

See original description

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2014-05-06:

fuel-snapshot-2014-05-06_10-56-13.tgz Edit (4.1 MiB, application/x-tar)

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2014-05-06:

logs_from_magic_depl.tar.gz Edit (14.5 MiB, application/x-tar)

Vladimir Kuklin (vkuklin) on 2014-05-06

Changed in fuel:
assignee:	nobody → Fuel Python Team (fuel-python)

Revision history for this message

Andrey Danin (gcon-monolake) wrote on 2014-05-06:

Node-1 was not removed because of MCollective temporary failed to communicate to node-1.
MCO log from node-1:
2014-05-06T09:33:27.707065+00:00 debug: W, [2014-05-06T12:37:40.491489 #2248] WARN -- : runner.rb:122:in `receive' PLMC9: Expired Message: MCollective::MsgTTLExpired: message f301a3ed51e459ed91557e71165bceea from uid=0@6de00bbd14dc created at 1399368802 is 11058 seconds old, TTL is 5400

First timestamp in logs is a master-node time, and second timestamp is a slave-node time.

There was a timeshift occured on a master-node from 2014-05-06T12:30:05 to 2014-05-06T09:26:17:
2014-05-06T12:30:05.014187+00:00 debug: D, [2014-05-06T12:29:57.876365 #2248] DEBUG -- : rabbitmq.rb:66:in `on_hbfire' Publishing heartbeat
2014-05-06T09:26:17.335250+00:00 debug: D, [2014-05-06T12:30:26.377906 #2248] DEBUG -- : rabbitmq.rb:64:in `on_hbfire' Received heartbeat

and after approx 15 minutes node-1 made its own timeshift:
2014-05-06T09:39:58.047350+00:00 debug: D, [2014-05-06T12:44:12.494154 #2248] DEBUG -- : rabbitmq.rb:64:in `on_hbfire' Received heartbeat
2014-05-06T09:40:28.077188+00:00 debug: D, [2014-05-06T09:40:24.296672 #2248] DEBUG -- : rabbitmq.rb:64:in `on_hbfire' Received heartbeat

Dmitry Pyzhov (dpyzhov) on 2014-05-06

Changed in fuel:
assignee:	Fuel Python Team (fuel-python) → Tatyana (tatyana-leontovich)

Tatyanka (tatyana-leontovich) on 2014-05-06

summary:	- Controller goes offline on deploying phase If deletion of cluser take - place previously + Time sometimes sync in wrong order after revert of snapshots
summary:	- Time sometimes sync in wrong order after revert of snapshots + Time sometimes synced in wrong order after revert of snapshots
description:	updated
Changed in fuel:
importance:	High → Medium
milestone:	5.0 → 5.1

Mike Scherbakov (mihgen) on 2014-05-07

Changed in fuel:
status:	New → Confirmed

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2014-05-28:

fixed in other issue

Changed in fuel:
status:	Confirmed → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.