Time sometimes synced in wrong order after revert of snapshots

Bug #1316537 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Tatyanka

Bug Description

{"build_id": "2014-05-06_01-31-29", "mirantis": "yes", "build_number": "183", "ostf_sha": "fe718434f88f2ab167779770828a195f06eb29f8", "nailgun_sha": "c61100f34a12c597df32f7498697acd84035957f", "production": "docker", "api": "1.0", "fuelmain_sha": "185fac4a937b970fc82aa48a9c31c7981f8f7659", "astute_sha": "3cffebde1e5452f5dbf8f744c6525fc36c7afbf3", "release": "5.0", "fuellib_sha": "edaecb643f34ca73be3716c5a722bfdd40e06128"}

After investigation of issue we found the next problem. Time on nodes was change from 12:30 on 9:26. It is wrong behavior. May be caused by reverting of snapshot

Steps to Reproduce:
1. Deploy one node cluster using KVM
2. Delete it
3. Deploy neutron gre cluser with 4 nodes (1controller 2 computes 1 cinder+ mongo) with murano/savanna/ceilometer

Expected
Deployment pass , cluster ready, ostf pass

Actual Result:
Controller nodes go offline during installation of openstack
node shows offline on ui, but marked with error in fuel nodes
 id | status | name | cluster | ip | mac | roles | pending_roles | online
---|-----------|------------------|---------|--------------|-------------------|---------------|---------------|-------
5 | deploying | Untitled (3c:b2) | 2 | 10.108.0.6 | e6:b9:c3:32:6b:48 | cinder, mongo | | True
4 | ready | Untitled (8a:3d) | 2 | 10.108.0.5 | 6a:bf:99:49:e3:44 | compute | | True
3 | ready | Untitled (8e:5c) | 2 | 10.108.0.4 | c6:c2:fe:83:78:4b | compute | | True
2 | error | Untitled (cd:66) | 2 | 10.108.0.203 | 64:d1:8c:f1:cd:66 | controller | | False

mco ping

4 time=54.02 ms
3 time=56.93 ms
5 time=58.93 ms
1 time=62.11 ms
master time=106.54 ms

node in error state is node -2, but hostname on this node says that it is node-1

arp -a on master
 [root@nailgun log]# arp -a
? (172.17.0.11) at de:94:a6:ae:17:03 [ether] on docker0
node-4.test.domain.local (10.108.0.5) at 6a:bf:99:49:e3:44 [ether] on eth0
? (172.17.0.10) at d6:ee:16:fc:ad:07 [ether] on docker0
? (172.17.0.8) at f6:7a:3d:10:f2:46 [ether] on docker0
? (172.17.0.4) at fa:3c:04:24:47:d0 [ether] on docker0
node-3.test.domain.local (10.108.0.4) at c6:c2:fe:83:78:4b [ether] on eth0
node-5.test.domain.local (10.108.0.6) at e6:b9:c3:32:6b:48 [ether] on eth0
node-2.test.domain.local (10.108.0.3) at 64:c6:f9:17:f6:87 [ether] on eth0
? (10.108.0.1) at 52:54:00:2f:85:46 [ether] on eth0
? (172.17.0.3) at aa:f1:99:a8:59:c9 [ether] on docker0
? (172.17.0.7) at 3a:18:a7:aa:a0:b0 [ether] on docker0
03903be110e3 (172.17.0.2) at 0a:4a:38:9a:fc:3a [ether] on docker0
? (172.17.0.9) at da:70:23:b1:75:b8 [ether] on docker0

ip addr on node-2
 ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:c6:f9:17:f6:87 brd ff:ff:ff:ff:ff:ff
    inet 10.108.0.3/24 brd 10.108.0.255 scope global eth0
    inet6 fe80::66c6:f9ff:fe17:f687/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:0e:11:ea:81:55 brd ff:ff:ff:ff:ff:ff
    inet 10.108.1.2/24 brd 10.108.1.255 scope global eth1
    inet6 fe80::660e:11ff:feea:8155/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:b9:0b:f7:02:79 brd ff:ff:ff:ff:ff:ff
    inet 10.108.2.2/24 brd 10.108.2.255 scope global eth2
    inet6 fe80::66b9:bff:fef7:279/64 scope link
       valid_lft forever preferred_lft forever
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:0e:8f:31:f9:75 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::660e:8fff:fe31:f975/64 scope link
       valid_lft forever preferred_lft forever
6: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:52:46:7b:19:2d brd ff:ff:ff:ff:ff:ff
    inet 10.108.4.2/24 brd 10.108.4.255 scope global eth4
    inet6 fe80::6652:46ff:fe7b:192d/64 scope link
       valid_lft forever preferred_lft forever
7: eth3.103@eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 64:0e:8f:31:f9:75 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::660e:8fff:fe31:f975/64 scope link
       valid_lft forever preferred_lft forever

Info: ADanin - ip conflict here take place (for case remove 1 cluster - deploy new one)

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
Revision history for this message
Andrey Danin (gcon-monolake) wrote :

Node-1 was not removed because of MCollective temporary failed to communicate to node-1.
MCO log from node-1:
2014-05-06T09:33:27.707065+00:00 debug: W, [2014-05-06T12:37:40.491489 #2248] WARN -- : runner.rb:122:in `receive' PLMC9: Expired Message: MCollective::MsgTTLExpired: message f301a3ed51e459ed91557e71165bceea from uid=0@6de00bbd14dc created at 1399368802 is 11058 seconds old, TTL is 5400

First timestamp in logs is a master-node time, and second timestamp is a slave-node time.

There was a timeshift occured on a master-node from 2014-05-06T12:30:05 to 2014-05-06T09:26:17:
2014-05-06T12:30:05.014187+00:00 debug: D, [2014-05-06T12:29:57.876365 #2248] DEBUG -- : rabbitmq.rb:66:in `on_hbfire' Publishing heartbeat
2014-05-06T09:26:17.335250+00:00 debug: D, [2014-05-06T12:30:26.377906 #2248] DEBUG -- : rabbitmq.rb:64:in `on_hbfire' Received heartbeat

and after approx 15 minutes node-1 made its own timeshift:
2014-05-06T09:39:58.047350+00:00 debug: D, [2014-05-06T12:44:12.494154 #2248] DEBUG -- : rabbitmq.rb:64:in `on_hbfire' Received heartbeat
2014-05-06T09:40:28.077188+00:00 debug: D, [2014-05-06T09:40:24.296672 #2248] DEBUG -- : rabbitmq.rb:64:in `on_hbfire' Received heartbeat

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Tatyana (tatyana-leontovich)
summary: - Controller goes offline on deploying phase If deletion of cluser take
- place previously
+ Time sometimes sync in wrong order after revert of snapshots
summary: - Time sometimes sync in wrong order after revert of snapshots
+ Time sometimes synced in wrong order after revert of snapshots
description: updated
Changed in fuel:
importance: High → Medium
milestone: 5.0 → 5.1
Mike Scherbakov (mihgen)
Changed in fuel:
status: New → Confirmed
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

fixed in other issue

Changed in fuel:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.