Security group creation randomly failed with Timed out waiting for a reply to message ID
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Invalid
|
High
|
Unassigned | ||
5.1.x |
Won't Fix
|
High
|
MOS Oslo | ||
6.0.x |
Won't Fix
|
High
|
MOS Oslo | ||
6.1.x |
Invalid
|
High
|
Unassigned |
Bug Description
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "5.1.1"
api: "1.0"
build_number: "24"
build_id: "2014-11-
astute_sha: "fce051a6d013b1
fuellib_sha: "5611c516362bea
ostf_sha: "64cb59c681658a
nailgun_sha: "7580f6341a726c
fuelmain_sha: "eac9e2704424d1
𝑰𝒔𝒔𝒖𝒆 𝒓𝒆𝒑𝒓𝒐𝒅𝒖𝒄𝒆𝒅 𝒐𝒏 𝑼𝒃𝒖𝒏𝒕𝒖 𝒂𝒏𝒅 𝑪𝒆𝒏𝒕𝒐𝒔 𝑶𝑺𝒆𝒔
𝟏. 𝐃𝐞𝐩𝐥𝐨𝐲 𝐞𝐧𝐯𝐢𝐫𝐨𝐧𝐦𝐞𝐧𝐭
Env configuration:
mode: HA
3 nodes with controller role
2 nodes with compute role
1 node with cinder role
network_provider: neutron gre
storage back end cinder
interfaces:
INTERFACES = {
'admin': 'eth0',
'public': 'eth1',
'management': 'eth2',
'private': 'eth3',
'storage': 'eth4',
}
2. Be sure that date command on the salves node show the same result
3. run ostf ha test until it passes
4. run ostf sanity tests until it passes
5. run ostf smoke tests - that failed on sec group creation
6. ssh to the controller node and run manually creation of secroups
𝐀𝐜𝐜𝐨𝐫𝐝𝐢𝐧𝐠 𝐢𝐬𝐬𝐮𝐞 𝐰𝐚𝐬 𝐟𝐨𝐮𝐧𝐝𝐞𝐝 𝐨𝐧 𝐂𝐈: 𝐲𝐨𝐮 𝐜𝐚𝐧 𝐮𝐬𝐞 𝐬𝐧𝐚𝐩𝐬𝐡𝐨𝐭 𝐨𝐟 𝐞𝐧𝐯𝐢𝐫𝐨𝐧𝐦𝐞𝐧𝐭 𝐰𝐢𝐭𝐡 𝐫𝐞𝐩𝐫𝐨𝐝𝐮𝐜𝐞𝐝 𝐩𝐫𝐨𝐛𝐥𝐞𝐦. 𝐈𝐧 𝐭𝐡𝐢𝐬 𝐜𝐚𝐬𝐞 𝐲𝐨𝐮 𝐧𝐞𝐞𝐝 𝐭𝐨
1. Revert current deployment (you can use command dos.py revert <env_name> --snapshot-name <snapshot_name> && dos.py resume <env_name> && virsh net-dumpxml <env_name>_admin | grep -P "(\d+\.){3}" -o | awk '{print "Admin node IP: "$0"2"}')
2. Be sure that "date" command on the salves node show the same result. If not synchronize time on slaves manually
3. run ostf ha test until it passes
4. run ostf sanity tests until it passes
5. run ostf smoke tests - that failed on sec group creation
6. ssh to the controller node and run manually creation of secroups
Also you can execute test 𝕙𝕒_𝕕𝕖𝕤𝕥𝕣𝕠𝕪_
1. Revert current deployment (you can use command dos.py revert <env_name> --snapshot-name <snapshot_name> && dos.py resume <env_name> && virsh net-dumpxml <env_name>_admin | grep -P "(\d+\.){3}" -o | awk '{print "Admin node IP: "$0"2"}')
2. Be sure that "date" command on the salves node show the same result. If not synchronize time on slaves manually
3. run ostf ha test until it passes
4. run ostf sanity tests until it passes
5. run ostf smoke tests - that failed on sec group creation
6. ssh to the controller node and run manually creation of secroups
𝑨𝒄𝒕𝒖𝒂𝒍 𝒓𝒆𝒔𝒖𝒍𝒕:
When I try to create sec group manualy
http://
in compute.log on compute node
http://
at the same time one of retries for manual creation(the same group with the same name) finish with success
http://
So there are some randomly fails and it would be great if object creation will be more stable
"Timed out waiting for a reply to message ID" should be related to heartbeats and Oslo.messaging.