Applying security-hardening post 3.19 kernel upgrade fails if pre-upgrade kernel was 3.13

Bug #1579963 reported by Wade Holler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Undecided
Robb Romans

Bug Description

opening per email interchange with cloudnull.

--sorry about the title. that is the best that I could do.

1. deployed via liberty OSAD (with underlying kernel 3.13.0-83)

2. upgraded to mitaka but user_variables.yml had apply_security_hardening: true commented out.

3. updated compute and storage nodes to 3.19.0-51 for EMC ScaleIO integration

4. went back and ran : openstack-ansible security-hardening.yml after uncommenting apply_security_hardening: true in user_variables.yml

5. failed with cause of kernel mod br_netfilter missing on storage node

6. modprobe br_netfilter on storage node(s)

7. re-ran security_hardening.yml with success

Revision history for this message
Major Hayden (rackerhacker) wrote :

Thanks for the bug, Wade. I'm looking into this one now.

Changed in openstack-ansible:
assignee: nobody → Major Hayden (rackerhacker)
Revision history for this message
Major Hayden (rackerhacker) wrote :

I'm having trouble reproducing this on Ubuntu 14.04 with kernel 3.19. The playbook ran through without a problem even without br_netfilter loaded.

Do you have a copy of the Ansible error you saw? That might help me track down which task was trying to do something with netfilter.

Changed in openstack-ansible:
status: New → Incomplete
Revision history for this message
Wade Holler (wade-holler) wrote : Re: [Bug 1579963] Re: Applying security-hardening post 3.19 kernel upgrade fails if pre-upgrade kernel was 3.13
Download full text (3.6 KiB)

Of course; I apologize for not putting it in there.

........(above ommitted and successful)......

TASK: [openstack-ansible-security | Check if csh is installed (for
V-38649)] ***

skipping: [stor]

TASK: [openstack-ansible-security | V-38649 - System default umask for csh
must be 077] ***

skipping: [stor]

TASK: [openstack-ansible-security | V-38651 - System default umask for bash
must be 077] ***

skipping: [stor]

TASK: [openstack-ansible-security | V-38528 - The system must log martian
packets] ***

ok: [stor]

TASK: [openstack-ansible-security | V-38537 - The system must ignore ICMPv4
bogus error responses] ***

failed: [stor] => {"failed": true}

msg: Failed to reload sysctl: fs.inotify.max_user_watches = 36864

net.ipv4.conf.all.rp_filter = 0

net.ipv4.conf.default.rp_filter = 0

net.ipv4.ip_forward = 1

net.netfilter.nf_conntrack_max = 262144

vm.dirty_background_ratio = 5

vm.dirty_ratio = 10

vm.swappiness = 5

net.ipv4.neigh.default.gc_thresh1 = 4096

net.ipv4.neigh.default.gc_thresh2 = 8192

net.ipv4.neigh.default.gc_thresh3 = 16384

net.ipv4.route.gc_thresh = 16384

net.ipv4.neigh.default.gc_interval = 60

net.ipv4.neigh.default.gc_stale_time = 120

net.ipv6.neigh.default.gc_thresh1 = 4096

net.ipv6.neigh.default.gc_thresh2 = 8192

net.ipv6.neigh.default.gc_thresh3 = 16384

net.ipv6.route.gc_thresh = 16384

net.ipv6.neigh.default.gc_interval = 60

net.ipv6.neigh.default.gc_stale_time = 120

fs.aio-max-nr = 131072

fs.inotify.max_user_instances = 1024

net.ipv4.conf.all.log_martians = 1

net.ipv4.icmp_ignore_bogus_error_responses = 1

sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such
file or directory

sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such
file or directory

sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-arptables: No such
file or directory

FATAL: all hosts have already failed -- aborting

PLAY RECAP
********************************************************************

           to retry, use: --limit @/root/security-hardening.retry

stor : ok=64 changed=6 unreachable=0 failed=1

On Tue, May 10, 2016 at 10:40 AM Major Hayden <email address hidden> wrote:

> I'm having trouble reproducing this on Ubuntu 14.04 with kernel 3.19.
> The playbook ran through without a problem even without br_netfilter
> loaded.
>
> Do you have a copy of the Ansible error you saw? That might help me
> track down which task was trying to do something with netfilter.
>
> ** Changed in: openstack-ansible
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1579963
>
> Title:
> Applying security-hardening post 3.19 kernel upgrade fails if pre-
> upgrade kernel was 3.13
>
> Status in openstack-ansible:
> Incomplete
>
> Bug description:
> opening per email interchange with cloudnull.
>
> --sorry about the title. that is the best that I could do.
>
> 1. deployed via liberty OSAD (with underlying kernel 3.13.0-83)
>
> 2. upgraded to mitaka but user_variables.yml had
> apply_security_hardening: true commented out.
>
> 3. updated com...

Read more...

Revision history for this message
Major Hayden (rackerhacker) wrote :

Thanks for the additional output, Wade. It looks like something in your sysctl.conf might be causing the problem, and those configurations might not be from the security role itself.

Would you be able to share your /etc/sysctl.conf in the bug ticket?

Revision history for this message
Wade Holler (wade-holler) wrote :

I don't have it pre hardening run but here you go. grep -v ^#

fs.inotify.max_user_watches=36864
net.ipv4.conf.all.rp_filter=0
net.ipv4.conf.default.rp_filter=0
net.ipv4.ip_forward=1
net.netfilter.nf_conntrack_max=262144
vm.dirty_background_ratio=5
vm.dirty_ratio=10
vm.swappiness=5
net.bridge.bridge-nf-call-ip6tables=0
net.bridge.bridge-nf-call-iptables=0
net.bridge.bridge-nf-call-arptables=0
net.ipv4.neigh.default.gc_thresh1=4096
net.ipv4.neigh.default.gc_thresh2=8192
net.ipv4.neigh.default.gc_thresh3=16384
net.ipv4.route.gc_thresh=16384
net.ipv4.neigh.default.gc_interval=60
net.ipv4.neigh.default.gc_stale_time=120
net.ipv6.neigh.default.gc_thresh1=4096
net.ipv6.neigh.default.gc_thresh2=8192
net.ipv6.neigh.default.gc_thresh3=16384
net.ipv6.route.gc_thresh=16384
net.ipv6.neigh.default.gc_interval=60
net.ipv6.neigh.default.gc_stale_time=120
fs.aio-max-nr=131072
fs.inotify.max_user_instances=1024
net.ipv4.conf.all.log_martians=1
net.ipv4.icmp_ignore_bogus_error_responses=1
net.ipv4.icmp_echo_ignore_broadcasts=1
net.ipv4.tcp_syncookies=1
kernel.randomize_va_space=2
net.ipv4.conf.default.send_redirects=0
net.ipv4.conf.all.send_redirects=0

Revision history for this message
Major Hayden (rackerhacker) wrote :

Ah, I've found the issue. 3.13 had 'nf_conntrack' and 3.19 has 'br_netfilter'. There are four entries in your sysctl.conf that are causing issues:

net.netfilter.nf_conntrack_max=262144
net.bridge.bridge-nf-call-ip6tables=0
net.bridge.bridge-nf-call-iptables=0
net.bridge.bridge-nf-call-arptables=0

However, those aren't added by the security role, so these may have been added by the openstack_hosts role. Have you run that role recently to ensure all of the appropriate kernel modules are loaded for your kernel? There's some logic in there to ensure that br_netfilter is loaded:

https://github.com/openstack/openstack-ansible-openstack_hosts/blob/master/defaults/main.yml#L37-L39

Revision history for this message
Wade Holler (wade-holler) wrote :

Great. I did not run the role recently and that was one of the ~bug~ mitigations that I mentioned to Kevin, in the email chain I sent outside the bug notes.

--So is the moral of the story: make sure you run the playbooks after you go upgrading the kernels ? If so I think thats just fine. might want to drop it in the operations guide or something.

Revision history for this message
Major Hayden (rackerhacker) wrote :

This was definitely an edge case since most folks don't upgrade to a modern kernel like that too often. However, this would be worth documenting for sure.

Robb Romans (rromans)
Changed in openstack-ansible:
assignee: Major Hayden (rackerhacker) → Robb Romans (rromans)
Revision history for this message
Robb Romans (rromans) wrote :

I'll take this on as a docs bug. If I'm reading this correctly, if you upgrade the kernel from version 3.13 or earlier, ensure that you re-run the openstack-hosts-setup.yml playbook. Put that information here: http://docs.openstack.org/developer/openstack-ansible/install-guide/ops.html. Is this correct?

Thanks.

Revision history for this message
Major Hayden (rackerhacker) wrote :

That's right. That should solve it.

Robb Romans (rromans)
Changed in openstack-ansible:
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (master)

Fix proposed to branch: master
Review: https://review.openstack.org/317654

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (master)

Reviewed: https://review.openstack.org/317654
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=fbd1f3fae195cf148be3c624335c17c55ac551cd
Submitter: Jenkins
Branch: master

commit fbd1f3fae195cf148be3c624335c17c55ac551cd
Author: Robb Romans <email address hidden>
Date: Tue May 17 12:46:52 2016 -0500

    Docs: Troubleshooting info for 3.13 kernel upgrade

    Add a note to the OSA Operations Guide about the need to re-run the
    hosts setup playbook after a kernel upgrade from v3.13. This patch also
    incidentally moves the introductory section text to be above the TOC.

    Change-Id: Ia04cec6c689d02bff98a6adb623914703b9e3f7b
    Closes-Bug: 1579963

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/openstack-ansible 14.0.0.0b1

This issue was fixed in the openstack/openstack-ansible 14.0.0.0b1 development milestone.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible 14.0.0.0b2

This issue was fixed in the openstack/openstack-ansible 14.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.