[Update][Ceph] Health check fails with warning "too many PGs per OSD"

Bug #1651973 reported by Ilya Bumarskov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Low
Andrey Epifanov

Bug Description

Fuel 9.2 snapshot-id#662
!mos_mu tool was installed from git (https://github.com/aepifanov/mos_mu.git, version date: 21.12.2016)

Steps to reproduce:
   - Deploy 9.0 env with following nodes:
         * Controller
         * Controller
         * Controller
         * Compute, ceph-osd
         * Compute, ceph-osd
         * Compute, ceph-osd
   - Install mos-playbooks tool on master node (git clone https://github.com/aepifanov/mos_mu.git && git checkout 9.2 && cd mos_mu && ./install_ansible.sh)
   - Execute preparation playbooks (ansible-playbook playbooks/mos9_prepare_fuel.yml && ansible-playbook playbooks/mos9_prepare_env.yml -e '{"env_id":1, "snapshot_repo":"9.0-2016-12-20-174323"}')
   - Update fuel node (ansible-playbook playbooks/update_fuel.yml -e '{"rebuild_bootstrap":false}')
   - Update env (fuel2 update install --env 1 --repos mos9.2)
   - Upgrade kernel on 4.4 for bootstrap img (ansible-playbook playbooks/mos9_fuel_upgrade_kernel_4.4.yml)
   - Upgrade kernel on 4.4 (ansible-playbook playbooks/mos9_env_upgrade_kernel_4.4.yml -e '{"env_id":1}')
   - Update ceph (ansible-playbook playbooks/update_ceph -e '{"env_id":1,"restart_ceph":false}')

Observed behavior:
TASK [Show health of Core OpenStack services] **********************************
skipping: [node-6.test.domain.local]
ok: [node-1.test.domain.local] => {
    "msg": [
        "### ceph:",
        " cluster 07b4d8e0-3d73-41ff-90f9-7a76832d1bc7",
        " health HEALTH_WARN",
        " too many PGs per OSD (352 > max 300)",
        " monmap e3: 3 mons at {node-1=10.109.2.2:6789/0,node-2=10.109.2.5:6789/0,node-3=10.109.2.4:6789/0}",
        " election epoch 16, quorum 0,1,2 node-1,node-3,node-2",
        " osdmap e65: 6 osds: 6 up, 6 in",
        " pgmap v541: 704 pgs, 10 pools, 22052 kB data, 50 objects",
        " 12591 MB used, 283 GB / 296 GB avail",
        " 704 active+clean",
        ""
    ]
}
skipping: [node-5.test.domain.local]
skipping: [node-4.test.domain.local]
ok: [node-2.test.domain.local] => {
    "msg": [
        "### ceph:",
        " cluster 07b4d8e0-3d73-41ff-90f9-7a76832d1bc7",
        " health HEALTH_WARN",
        " too many PGs per OSD (352 > max 300)",
        " monmap e3: 3 mons at {node-1=10.109.2.2:6789/0,node-2=10.109.2.5:6789/0,node-3=10.109.2.4:6789/0}",
        " election epoch 16, quorum 0,1,2 node-1,node-3,node-2",
        " osdmap e65: 6 osds: 6 up, 6 in",
        " pgmap v541: 704 pgs, 10 pools, 22052 kB data, 50 objects",
        " 12591 MB used, 283 GB / 296 GB avail",
        " 704 active+clean",
        ""
    ]
}
ok: [node-3.test.domain.local] => {
    "msg": [
        "### ceph:",
        " cluster 07b4d8e0-3d73-41ff-90f9-7a76832d1bc7",
        " health HEALTH_WARN",
        " too many PGs per OSD (352 > max 300)",
        " monmap e3: 3 mons at {node-1=10.109.2.2:6789/0,node-2=10.109.2.5:6789/0,node-3=10.109.2.4:6789/0}",
        " election epoch 16, quorum 0,1,2 node-1,node-3,node-2",
        " osdmap e65: 6 osds: 6 up, 6 in",
        " pgmap v541: 704 pgs, 10 pools, 22052 kB data, 50 objects",
        " 12591 MB used, 283 GB / 296 GB avail",
        " 704 active+clean",
        ""
    ]
}

Changed in fuel:
importance: Undecided → High
milestone: none → 9.2
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

As you can see it is active+clean, so it is fully operational.

Changed in fuel:
assignee: nobody → Andrey Epifanov (aepifanov)
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Per feedback from Ceph team: although 352 PGs is more than recommened it's completely OK and the cluster is fully functional. So setting importance to Low.

There is mon_pg_warn_max_per_osd parameter in ceph.conf which could be set to 0 to supress this warning. That could a fix for this issue.

Changed in fuel:
status: New → Confirmed
importance: High → Low
Revision history for this message
Ilya Bumarskov (ibumarskov) wrote :
Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Ilya Bumarskov (ibumarskov) wrote :

Verified on snapshot-id #778

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.