Nodes aren't bootstrapped after stopping deployment on provisioning

Bug #1316554 reported by Andrey Sledzinskiy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
Medium
Vladimir Sharshov

Bug Description

Reproduced on {"build_id": "2014-05-06_01-31-29", "mirantis": "yes", "build_number": "183", "ostf_sha": "fe718434f88f2ab167779770828a195f06eb29f8", "nailgun_sha": "c61100f34a12c597df32f7498697acd84035957f", "production": "docker", "api": "1.0", "fuelmain_sha": "185fac4a937b970fc82aa48a9c31c7981f8f7659", "astute_sha": "3cffebde1e5452f5dbf8f744c6525fc36c7afbf3", "release": "5.0", "fuellib_sha": "edaecb643f34ca73be3716c5a722bfdd40e06128"}

Steps:
1. Create enviroment with all default values
2. Add 1 controller, 1 compute, 1 cinder node
3. Click deploy changes button
4. Wait until provisioning starts
5. Click stop deployment

Expected - nodes went offline and bootstrapped
Actual - nodes went offline and stuck in provisioning state. Nodes are bootstrapped only after force reboot

Diagnostic snapshot is attached and logs from master are also attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Changed in fuel:
assignee: nobody → Vladimir Sharshov (vsharshov)
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

I discovered logs and found that second command which should reboot command failed.

2014-05-06T10:50:42 debug: [395] Run shell command ' echo "Run node rebooting command using 'SB' to sysrq-trigger"
        echo "1" > /proc/sys/kernel/panic_on_oops
        echo "10" > /proc/sys/kernel/panic
        echo "b" > /proc/sysrq-trigger
' using ssh
2014-05-06T10:50:42 debug: [395] Run shell command using ssh. Retry 0
2014-05-06T10:50:42 debug: [395] Affected nodes: ["node-20", "node-19", "node-21"]
2014-05-06T10:50:42 debug: [395] Retry result: success nodes: [], error nodes: [], inaccessible nodes: ["node-20", "node-19", "node-21"]
2014-05-06T10:51:12 warning: [395] 7403a914-0f54-4243-a711-0df1f2768e5a: Running shell command on nodes ["20", "19", "21"] finished with errors. Nodes [{"uid"=>"20"}, {"uid"=>"19"}, {"uid"=>"21"}] are inaccessible

Rarely erasing nodes can show such behavior. At now moment i could not reproduce such case. What can we do:
- show error for user, if nodes could not reboot;
- try to reproduce and find why nodes was inaccessible.

Changed in fuel:
importance: High → Medium
status: New → Triaged
milestone: 5.0 → 5.1
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Changed in fuel:
status: Triaged → Confirmed
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Also this issue was reproduced on ISO#208 - cluster with all default values, 1 controller, after stopping controller stuck in provisioning state
Logs are attached below

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Andrey, hi!

I think we discuss this bug here: https://bugs.launchpad.net/fuel/+bug/1316583/comments/15

Initially we started discussing this bug in another bug because thought that they have same nature.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.