Neutron namespace metadata proxy triggers kernel crash on Ubuntu 12.04/3.2 kernel

Bug #1273386 reported by Salvatore Orlando
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
devstack
Invalid
Undecided
Unassigned
neutron
Fix Released
High
Salvatore Orlando
linux (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

In the past 9 days we have been seeing very frequent occurences of this kernel crash: http://paste.openstack.org/show/61869/

Even if the particular crash pasted here is triggered by dnsmasq, in almost all cases the crash is actually triggered by the neutron metada proxy.

This also affects nova badly since this issue, which appears namespace related, results in a hang while mounting the ndb device for key injection.

logstash query: http://logstash.openstack.org/#eyJzZWFyY2giOiJcImtlcm5lbCBCVUcgYXQgL2J1aWxkL2J1aWxkZC9saW51eC0zLjIuMC9mcy9idWZmZXIuYzoyOTE3XCIgYW5kIGZpbGVuYW1lOnN5c2xvZy50eHQiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6ImN1c3RvbSIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJmcm9tIjoiMjAxNC0wMS0xNlQxODo1MDo0OCswMDowMCIsInRvIjoiMjAxNC0wMS0yN1QxOToxNjoxMSswMDowMCIsInVzZXJfaW50ZXJ2YWwiOiIwIn0sInN0YW1wIjoxMzkwODUwMzI2ODY0fQ==

We have seen about 398 hits since the bug started to manifest.
Decreased hit rate in the past few days is due to less neutron patches being pushed.

James Page (james-page)
summary: - Neutron namespace metadata proxy triggers kernel crash
+ Neutron namespace metadata proxy triggers kernel crash on Ubuntu
+ 12.04/3.2 kernel
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1273386

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: precise
Revision history for this message
Russell Bryant (russellb) wrote :

Related patch: https://review.openstack.org/#/c/69445/

This patch is trying to update the kernel we use on devstack test nodes

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
tags: added: gate-failure
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

I have worked a bit more on this issue, and it seems that the crash happens when a process is executed in a namespace. So it's not the metadata proxy doing something that crashes the kernel, but is the act of launching the metadata proxy which causes the crash.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

More info.
The crash actually happen when a process is killed. This has consistently happened in 10 failures I've analyzed.
So the first conclusion is that this might be, after all, a red herring, and the kernel dump being merely a consequence of the abrupt killing of a process.

I've pushed this to verify whether this is the case: https://review.openstack.org/#/c/69579/

In this patch, I'm trying to use, just for testing purposes, SIGTERM instead of SIGKILL

Thierry Carrez (ttx)
Changed in neutron:
milestone: none → icehouse-3
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

I've run the related patch twice, and the kernel bug was hit even with SIGTERM rather than SIGKILL.

I think this kind of confirms that "failure to become ACTIVE" error is related to issues while terminating metadata proxies and dnsmasq instances.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Download full text (8.1 KiB)

pasting one trace here for the record

Jan 24 19:59:19 localhost kernel: [ 1028.120275] ------------[ cut here ]------------
Jan 24 19:59:19 localhost kernel: [ 1028.120848] kernel BUG at /build/buildd/linux-3.2.0/fs/buffer.c:2917!
Jan 24 19:59:19 localhost kernel: [ 1028.121602] invalid opcode: 0000 [#1] SMP
Jan 24 19:59:19 localhost kernel: [ 1028.122098] CPU 1
Jan 24 19:59:19 localhost kernel: [ 1028.122314] Modules linked in: xt_mac xt_physdev xt_conntrack ipt_REDIRECT xfs rmd160 crypto_null xfrm_user ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key camellia lzo cast6 cast5 deflate zlib_deflate cts ctr gcm ccm serpent blowfish_generic blowfish_x86_64 blowfish_common twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common xcbc sha512_generic des_generic cryptd aes_x86_64 kvm ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp openvswitch(O) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath nbd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables psmouse serio_raw virtio_balloon acpiphp floppy
Jan 24 19:59:19 localhost kernel: [ 1028.130686]
Jan 24 19:59:19 localhost kernel: [ 1028.130686] Pid: 12224, comm: dnsmasq Tainted: G O 3.2.0-58-virtual #88-Ubuntu Bochs Bochs
Jan 24 19:59:19 localhost kernel: [ 1028.130686] RIP: 0010:[<ffffffff811a76d3>] [<ffffffff811a76d3>] submit_bh+0x113/0x120
Jan 24 19:59:19 localhost kernel: [ 1028.130686] RSP: 0000:ffff88015b2df738 EFLAGS: 00010246
Jan 24 19:59:19 localhost kernel: [ 1028.130686] RAX: 0000000000000005 RBX: ffff880072d71f08 RCX: 00000000000b2a1e
Jan 24 19:59:19 localhost kernel: [ 1028.130686] RDX: 000000000000008d RSI: ffff880072d71f08 RDI: 0000000000000211
Jan 24 19:59:19 localhost kernel: [ 1028.130686] RBP: ffff88015b2df758 R08: ffffffff811190c7 R09: 0000000180150015
Jan 24 19:59:19 localhost kernel: [ 1028.130686] R10: ffff8801ec485180 R11: 0000000001000000 R12: 0000000000000211
Jan 24 19:59:19 localhost kernel: [ 1028.130686] R13: ffff880004a8c824 R14: 0000000000000001 R15: ffff880072d71f08
Jan 24 19:59:19 localhost kernel: [ 1028.130686] FS: 00007fd15c3aa700(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
Jan 24 19:59:19 localhost kernel: [ 1028.130686] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 24 19:59:19 localhost kernel: [ 1028.130686] CR2: 00000000042c5560 CR3: 000000020f9b6000 CR4: 00000000000006e0
Jan 24 19:59:19 localhost kernel: [ 1028.130686] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 24 19:59:19 localhost kernel: [ 1028.130686] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 24 19:59:19 localhost kernel: [ 1028.130686] Process dnsmasq (pid: 12224, threadinfo ffff88015b2de000, task ffff8801a7178000)
Jan 24 19:59:19 localhost kernel: [ 1028.130686] Stack:
Jan 24 19:59:19 localhost kernel: [ 1028.130686] ffff8801ec485180 ffff880072d71f08 0...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/69653

Revision history for this message
Stefan Bader (smb) wrote :

Did you find out whether this actually was a regression compared to the previous kernel? Also, since the problem seems to be something submitting a bufferhead that is not mapped (anymore?) on releasing the mount namespace, maybe the content of /proc/<pid>/mount* of the process about to be killed.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Hi Stefan,

I have a few more info.
I am trying to run the tests which trigger the issue on two machines, one with 3.2.0-57 and the other with 3.2.0-58.
So far, after 20 runs on each machine, the crash did not happen, so I can't yet confirm the regression.

However, back in october we had another kernel issue, which we solved by not executing anymore the process which was causing it (arping). This was originally tracked under bug 1224001.

I've looked at a crash dump I saved from the time [1], and it's exactly the same.
So I think that the update from 3.2.0-57 to 58 rather than introduce a bug exacerbated a pre-existent condition.
Indeed I also verified that the diff did not introduce any change in the do_exit code path.

Finally, just for the sake of it, I tried to apply [2], but just ended up crashing the kernel at every exit call
Stefan, let me know if there is anything I can do to help you nail down the issue once the bug manifests; I'm not trying to write a script which just obsessively reproduces the sequence of operations which leads to the crash in the hope to have something easily reproducible.

[1] http://paste.openstack.org/show/48056/
[2] https://kernel.googlesource.com/pub/scm/linux/kernel/git/khilman/linux-omap-pm/+/8aac62706adaaf0fab02c4327761561c8bda9448%5E%21/#F0

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

forgot to mention that when we dealt with bug 1224001 we were running 3.2.0-54

Revision history for this message
Stefan Bader (smb) wrote :

Hi Salvatore, so that old crash really looks quite similar. One interesting difference is that this process did simply exit and was not killed. Which sounds, like you did suspect as well, that this is not a regression in a recent kernel but something in the way things are executed changed and makes it just happen more often now.

I hear from James Page that right now some changes in OpenStack will cause more things happen in parallel. That maybe causes some races to get triggered which haven't been observed (that often) before.

To get a better understanding what the exact state of things is when this happens, would it be possible to enable crashdump on one of the development machines experiencing the issue and set it to panic on the oops? A description how to set up crashdump you would find at https://wiki.ubuntu.com/Kernel/CrashdumpRecipe. To cause the oops to become a panic you would have to echo 1 into /proc/sys/kernel/panic_on_oops after boot.

Revision history for this message
Chris J Arges (arges) wrote :

@salvatore-orlando

I've built a test kernel with:
8aac6270 and a pre-req c99fe536

I've been able to boot it but haven't done much testing. Here it is:
http://people.canonical.com/~arges/lp1273386/

If you have time you can try reproducing on this kernel to see if it fixes the issue as your originally suspected.
However, the crashdump should be prioritized in solving this issue.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

thanks Chris I will try your kernel asap!

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

We have now rolled the new kernel in the nodes used for the experimental jobs.
This did not help anyway. In the logs the 'waiting for thing to become ACTIVE' failures are still present, but there is a different failure mode (no kernel crash).

However, openvswitch is just not working [1] and the vm boot times out after 120 seconds as it is unable to reach vswitchd [2].
I reckon we should install the saucy version of ovs (1.10) [3]

On another note, there still is something interesting happening during nbd mount for key injection, as it seems no suitable nbd device is found [4]. I am not sure this is desired behaviour, but at least it won't hang VM boot.

Finally, the new kernel is also unable to resolve the hostname. This might not be a problem but puts error-level statements in pretty much all log files.

[1] http://logs.openstack.org/64/61964/21/experimental/experimental-tempest-dsvm-neutron-isolated/39133b0/logs/screen-q-agt.txt.gz?level=WARNING#_2014-01-29_15_46_25_922
[2] http://logs.openstack.org/64/61964/21/experimental/experimental-tempest-dsvm-neutron-isolated/39133b0/logs/screen-n-cpu.txt.gz?level=WARNING#_2014-01-29_15_48_34_969
[3] http://packages.ubuntu.com/saucy/openvswitch-switch
[4] http://logs.openstack.org/64/61964/21/experimental/experimental-tempest-dsvm-neutron-isolated/39133b0/logs/screen-n-cpu.txt.gz?level=WARNING#_2014-01-29_15_46_13_781

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

of course I did not mean "the new kernel is also unable to resolve the hostname". the comment was referred to the node, not the kernel

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

It looks like the OVS and DNS failures were due to nodes not properly built and these issues were resolved.
However, experimental jobs seem to be revealing "Timed out waiting to get to ACTIVE" errors without any kernel crash dump.

The error is still what appears to be a hang while mounting the nbd device.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Sorry I was blind.

Kernel crash even with 3.11, same trace.
http://paste.openstack.org/show/62152/

Revision history for this message
Chris J Arges (arges) wrote :

@salvatore-orlando:

If you've tested with 3.11 then you've tested with the patches identified (8aac6270 and c99fe536) and thus these patches are not the fixes we're looking for.
Therefore the next most important experiment would be to get the crash dump as outlined in #12. This data would tell us much more about the state of the machine when the OOPS occurs. Let us know if you need help getting this up and running.
Thanks!

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Stefan, regarding comment #12

A few of us have noticed that recent changes in the testing framework are causing neutron network resources to not be deleted (I filed bug 1274410 this morning, but we have known about this for almost a week unfortunately).

A side effect is that this leaves a rather significant amount of processes running in namespaces. I don't know if this might be a possible cause for this particular failure.

I will try and apply crashdump to the dev machine which I'm using as a repro enviroment. On this machine however, I've not seen the failure so far. I won't be however able to do this in the next few hours, so if somebody else from the neutron team wants to take over, please go ahead.

Revision history for this message
Kyle Mestery (mestery) wrote :

Salvatore, I'm going to try and find a way to reproduce this one in my local setup, which I hope will make this easier to diagnose and fix. Going by the comments you recently posted, perhaps creating a large amount of namespaces is a clue, so I will give that a shot.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

The kernel bug manifests when mounting of nbd devices is combined with ip namespace operations.
Using openstack it can be reproduced only with the following configuration:
- compute service must run on the same node as the dhcp-agent and/or the l3-agent
- file injection should be turned on: libvirt.inject_partition != -2
- key injection should be enabled: libvirt.inject_key = True
- config drive should be disabled: force_config_drive = 'False' or empty string or None

If these conditions are met, nbd mount will be used to inject the key into the instance. This will trigger something in the kernel which subsequently will cause the crash in a process running in a network namespace. After this crash nbd mount won't work anymore.

Using openstack the crash can be reproduced within a few minutes with the scripts available here: https://gist.github.com/salv-orlando/8715991

- keep_booting_stuff.sh creates and destroys vms continuously, ensuring a key is always injected
- stress_me_to_death creates network namespaces, launches a process in them, kills the process and then the namespace

The two scripts require openstack and should be executed concurrently.
Crashdump data will posted as soon as possible.

It should not be too hard to provide a script that reproduces the issue and is indepedent from openstack.

Changed in neutron:
status: New → Triaged
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Crash dumps from test machine.

Revision history for this message
Edgar Magana (emagana) wrote :

Folks,

One of the kernel guys in our team is suggesting to try ext4 instead of ext3 or disable ext3 journaling!
Also, it seems that 3.2 is becoming an old kernel, we may plan to move to a newer kernel.
Just some suggestions!

Revision history for this message
Clark Boylan (cboylan) wrote :

Couple of things. We get ext3 on these filesystems because that is what Rackspace gives us. I have a hunch that if OpenStack (nova) were to change its default fs to ext4 that in a week and a half we would start seeing that in the nodes Rackspace gives us. But the first iteration of that change got reverted so it will need to be sorted. 3.2 is a bit old but we recently tried 3.11 (using the saucy precise kernel backport) and that didn't fix the problem according to the comments above.

I do have a change (https://review.openstack.org/70283) that will mount the ext3 filesystems we are given as ext4. We can see if that will help. But really we have the ability to fix these problems as an upstream, we should take advantage of that.

Revision history for this message
Stefan Bader (smb) wrote :

Adding some notes here which we gathered by talking to James Page who in turn was talking to <forgotwho>... One thing to get hold of is real crashdumps. But those will be relatively big and Chris is currently trying to see whether we can trigger this on a devstack environment.

From what I understand, when forcing the config drive off, the process preparing the image (which I assume runs in its own namespace) will mount the image just before providing it to the guest and write some config data directly into the image. So this makes the namespace of that process have modified mount information and that might be our problem. Not sure there is also some timing side to it or whether it only depends on doing these steps. But at least this gets us ahead in understanding what goes on (and why it would happen more frequently just now).

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

The bug is still hitting stable builds. Backport is possible, but we should solve first the related guestfs issues (bug 1275267)

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

I would rather think that is unrelated. The backtrace looks completely different and I would think the other issue will trigger much more reliable than this one. Also I think that the older occurrence that Salvatore saw were before newer versions of ovs dkms were backported into Precise.

Revision history for this message
Chris J Arges (arges) wrote :

At this point I'm having a difficult time reproducing on my end. I've followed the steps above, but it seems that I can't trigger it. Are there any other suggestions or more simple reproducers I can attempt? Thanks

Revision history for this message
Allison Randal (allison) wrote :

Stefan, what's the most important thing we could do to help you help us? I see a few threads of inquiry in the comments that trailed off:

- abstracting Salvatore's script for high thrash rates on VM creation/destruction and network namespace creation, to try reproducing the issue without running openstack

- ext4 vs ext3

- capturing crashdumps from a devstack gate run (or simulated gate run on a local machine)

- determining whether the failure is a regression from a previous version of the kernel (seems not)

- determining whether the failure occurs on 3.11 kernel in addition to 3.2 kernel (seems it does)

What looks most promising? This is a critical issue, and still producing intermittent failures, so it's worth a prod or two on the OpenStack side to get things rolling again.

Revision history for this message
Stefan Bader (smb) wrote : Re: [Bug 1273386] Re: Neutron namespace metadata proxy triggers kernel crash on Ubuntu 12.04/3.2 kernel

Hey Allison, long time no see. :)

> Stefan, what's the most important thing we could do to help you help us?
> I see a few threads of inquiry in the comments that trailed off:

I would say either find a way to reproduce without OpenStack or manage to get
some crashdumps. The crashdumps might be simpler to prepare for and with luck we
catch the issue happening on those hosts set up to crashdump.
While that can be done without too much thought, being able to reproduce it
locally with a minimal dependencies would allow us quicker iterations when
trying fixes. also, Chris and I managed to approach the devstack setup in two
different ways and maybe both of us made little mistakes there (His being more
successful as I thought it to be a good idea to use a Xen guest but have not
thought of whether openstack supports nested with Xen).

> - abstracting Salvatore's script for high thrash rates on VM
> creation/destruction and network namespace creation, to try reproducing
> the issue without running openstack
> - capturing crashdumps from a devstack gate run (or simulated gate run
> on a local machine)

Changed in neutron:
importance: Critical → High
Thierry Carrez (ttx)
Changed in neutron:
milestone: icehouse-3 → icehouse-rc1
Changed in neutron:
milestone: icehouse-rc1 → none
Revision history for this message
Alan Pevec (apevec) wrote :

AFAICT this is worked around in master by switching to config drive but stable/havana devstack is missing support for that, so I've proposed https://review.openstack.org/82874

Changed in devstack:
assignee: nobody → Alan Pevec (apevec)
status: New → In Progress
Changed in neutron:
status: Triaged → Fix Committed
milestone: none → icehouse-rc1
Joe Gordon (jogo)
no longer affects: nova
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: icehouse-rc1 → 2014.1
Alan Pevec (apevec)
Changed in devstack:
assignee: Alan Pevec (apevec) → nobody
Revision history for this message
Sean Dague (sdague) wrote :

This devstack bug was last updated over 180 days ago, as devstack
is a fast moving project and we'd like to get the tracker down to
currently actionable bugs, this is getting marked as Invalid. If the
issue still exists, please feel free to reopen it.

Changed in devstack:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.