A instance cannot ping itself after taken snapshot

Bug #1040255 reported by Sam Su
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
Artem Andreev

Bug Description

I am running a mini Essex cluster (Precise packages 2012.1-0ubuntu2.3) with one controller and one compute node.

In the environment, I have created two instances named with test1 and test10 on the only compute node, the instance test1 associated with the floating IP 10.100.20.17 has been taken snapshot, and the instance test10 associated with the floating IP 10.100.20.18 has never been taken snapshot. The instance test1 cannot ping itself IP 10.100.20.17, but the instance test 10 can.

Here is my info:
/etc/nova/nova.conf:
http://pastebin.com/MUZcwUhn

qemu version:
root@cnode-02:~# qemu-system-x86_64 --version
QEMU emulator version 1.0 (qemu-kvm-1.0), Copyright (c) 2003-2008 Fabrice Bellard

root@cnode-02:~# libvirtd --version
libvirtd (libvirt) 0.9.8

Because of this problem, any application based on Openstack cannot do self-check from it's floating IP. If someone can solve this issue, that is much appreciated, if need more info, please let me know.

Thanks,
sam

Revision history for this message
Artem Andreev (just-wow) wrote :

I think I know the reason for that

Changed in nova:
assignee: nobody → Artem Andreev (just-wow)
Revision history for this message
Mark McLoughlin (markmc) wrote :

Assuming Artem has confirmed the issue

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
milestone: none → folsom-rc1
Revision history for this message
Artem Andreev (just-wow) wrote :

The root of the problem is the hairpin mode being turned off on domain's NICs which causes packets to be lost. By default libvirt creates domains (instances) with hairpin mode set to off and then nova re-enables it with self._enable_hairpin(instance) call. Since in Folsom snapshot method uses self._create_domain to restore instance's state the _enable_hairpin doesn't get called. The possible solution is to move _enable_hairpin call from self._create_domain_and_network into self._create_domain method. Seems ambiguous to me and requires additional consideration by a veteran. However the patch is attached to the bug.

PS: Since the problem is reproducible in the Essex and the fix is rather trivial there the patch is going to be submitted to gerrit straightaway.

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Artem Andreev (just-wow) wrote :

Meanwhile found out that the patch fixed this problem for the Folsom release and is already submitted. https://review.openstack.org/#/c/11925/ Thus just fixing this for the Essex release.

Revision history for this message
Sam Su (sam-su) wrote :

Thank you for your great fix.
I will do some test to verify your patch.

Revision history for this message
Artem Andreev (just-wow) wrote :

Here is the patch for the essex release. I suppose the target milestone can be changed to essex.

Revision history for this message
Artem Andreev (just-wow) wrote :
Download full text (4.1 KiB)

Sam,

Since we're considering packaged Essex, check if the patch was applied
properly by consulting
your /usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py line
712 (). Is should contain self._enable_hairpin call . If it does not,
you're patching something else rather than nova code in use.

On Tue, Aug 28, 2012 at 12:48 AM, Sam <email address hidden> wrote:

> I add this patch to my mini Essex environment, but it looks like not work
> for me.
> Can you help me double check why it didn't worked?
>
> Thanks,
> Sam
>
>
> On Sat, Aug 25, 2012 at 8:23 AM, Sam Su <email address hidden> wrote:
>
> > Great, thank you so much for your help. I will try this later.
> >
> > Thanks,
> > Sam
> >
> > Sent from my iPad
> >
> > On 2012-8-25, at 上午1:27, Artem Andreev <email address hidden>
> wrote:
> >
> > > Here is the patch for the essex release. I suppose the target milestone
> > > can be changed to essex.
> > >
> > > ** Patch added: "bug1040255_essex.diff"
> > >
> >
> https://bugs.launchpad.net/nova/+bug/1040255/+attachment/3276286/+files/bug1040255_essex.diff
> > >
> > > --
> > > You received this bug notification because you are subscribed to the
> bug
> > > report.
> > > https://bugs.launchpad.net/bugs/1040255
> > >
> > > Title:
> > > A instance cannot ping itself after taken snapshot
> > >
> > > Status in OpenStack Compute (Nova):
> > > In Progress
> > >
> > > Bug description:
> > > I am running a mini Essex cluster (Precise packages 2012.1-0ubuntu2.3)
> > > with one controller and one compute node.
> > >
> > > In the environment, I have created two instances named with test1 and
> > > test10 on the only compute node, the instance test1 associated with
> > > the floating IP 10.100.20.17 has been taken snapshot, and the instance
> > > test10 associated with the floating IP 10.100.20.18 has never been
> > > taken snapshot. The instance test1 cannot ping itself IP 10.100.20.17,
> > > but the instance test 10 can.
> > >
> > > Here is my info:
> > > /etc/nova/nova.conf:
> > > http://pastebin.com/MUZcwUhn
> > >
> > > qemu version:
> > > root@cnode-02:~# qemu-system-x86_64 --version
> > > QEMU emulator version 1.0 (qemu-kvm-1.0), Copyright (c) 2003-2008
> > Fabrice Bellard
> > >
> > > root@cnode-02:~# libvirtd --version
> > > libvirtd (libvirt) 0.9.8
> > >
> > > Because of this problem, any application based on Openstack cannot do
> > > self-check from it's floating IP. If someone can solve this issue,
> > > that is much appreciated, if need more info, please let me know.
> > >
> > > Thanks,
> > > sam
> > >
> > > To manage notifications about this bug go to:
> > > https://bugs.launchpad.net/nova/+bug/1040255/+subscriptions
> >
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1040255
>
> Title:
> A instance cannot ping itself after taken snapshot
>
> Status in OpenStack Compute (Nova):
> In Progress
>
> Bug description:
> I am running a mini Essex cluster (Precise packages 2012.1-0ubuntu2.3)
> with one controller and one compute node.
>
> In the environment, I have created two instances named with test1 and
> test10 on the only co...

Read more...

Revision history for this message
Sam Su (sam-su) wrote :

Great, it works, thank you so much for your help.

Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → none
Revision history for this message
Artem Andreev (just-wow) wrote :

I disagree that this bug is a duplicate although the root cause is the same. However for the stable/essex branch the fix provided by Yaguang Tang does not resolve the problem for the snapshotting case. I've submitted my essex patch for review.

Revision history for this message
Sam Su (sam-su) wrote : Invitation to connect on LinkedIn

LinkedIn
------------

Bug,

I'd like to add you to my professional network on LinkedIn.

- Sam

Sam Su
Engineer at FutureWei Technologies
San Francisco Bay Area

Confirm that you know Sam Su:
https://www.linkedin.com/e/flbljd-hnejsarw-4h/isd/17682817855/nMqh465p/?hs=false&tok=2ofyAsUSzPLRY1

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/flbljd-hnejsarw-4h/nBOuoFSNz9T0tNYVZ43Uvze_hc3J7cenpsdixOfP/goo/1040255%40bugs%2Elaunchpad%2Enet/20061/I5840855061_1/?hs=false&tok=0f-ugpj5zPLRY1

(c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.