Fuel cannot generate diagnostic snapshot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
High
|
Maciej Kwiek | ||
8.0.x |
Fix Released
|
High
|
Maciej Kwiek | ||
Mitaka |
Fix Released
|
High
|
Maciej Kwiek |
Bug Description
MOS 8.0, ISO #328
Steps to reproduce:
1. Reploy environment with Ceilometer, Ceph on 50 nodes 1-2 times (3 controllers, 47 computes, 5 computes with Ceph)
2. Try to generate diagnostic snapshot with Fuel Web UI
Expected result:
1. Snapshot is created
Real result:
1. Snapshot is not created
2. User gets 'exit code: 1 stderr:' message near button "Generate diagnostic snapshot"
3. /var disk space is completely exhausted
It is really hard to gather any logs so I describe behavior as I see it. Right now, at fuel node we have several mount points:
$ [root@fuel ~]# df -h | grep /var
/dev/mapper/os-var 9,5G 6,6G 2,5G 73% /var
/dev/mapper/
As you can see, /var is 9.5G (and usually ~2.5G is free), /var/log is about 100G. Fuel generates snapshot copying logs from /var/log to /var and at some point we got a situation when disk space at /var is exhausted because possible log size is >3G. At that point, everything stucks: user get an error in web UI, disk space is exhausted and nothing happen.
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "8.0"
api: "1.0"
build_number: "328"
build_id: "328"
fuel-nailgun_sha: "ec25ae8df28e1f
python-
fuel-agent_sha: "d354cbe61b055d
fuel-
astute_sha: "c56dfde2da0341
fuel-library_sha: "bcc3d042a47547
fuel-ostf_sha: "b2ebf15a3530b5
fuel-mirror_sha: "8100acb3a56635
fuelmenu_sha: "2942a85796d37f
shotgun_sha: "cacb93cbc28910
network-
fuel-upgrade_sha: "718aa3d7021fee
fuelmain_sha: "3faa824728ce60
tags: | added: area-python |
Changed in fuel: | |
status: | New → Confirmed |
assignee: | nobody → Fuel Python Team (fuel-python) |
tags: | added: team-bugfix |
Changed in fuel: | |
assignee: | Fuel Python Team (fuel-python) → Maciej Kwiek (maciej-iai) |
milestone: | 8.0 → 9.0 |
Changed in fuel: | |
assignee: | Maciej Kwiek (maciej-iai) → Sergii Golovatiuk (sgolovatiuk) |
Changed in fuel: | |
assignee: | Sergii Golovatiuk (sgolovatiuk) → Maciej Kwiek (maciej-iai) |
tags: | added: hit-hcf |
Changed in fuel: | |
status: | Fix Committed → Fix Released |
Hi, I've checked on #361 and it seems that it is not my case: there is no such error in astute.log and it seems that problem is definitely about free space exhausting.
Again, what I did:
* Have 50 nodes
* Enable debug logging
* Deploy cluster
* Click on 'Generate diagnostic snapshot'
* Get the error above
Please check attached logs (I've tarred whole /var/log, snapshotting has been started at ~14:43 UTC) /drive. google. com/a/mirantis. com/file/ d/0B9tzODpFABxk VHBCWmVWbTFfVVU /view?usp= sharing
https:/