services no longer reliably stop in stable/liberty / master

Bug #1446583 reported by Sean Dague
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Critical
Brant Knudson
Kilo
Fix Released
Critical
Unassigned
OpenStack Compute (nova)
Fix Released
High
Sean Dague
Kilo
Fix Released
Undecided
Unassigned
OpenStack Identity (keystone)
Fix Released
Critical
Brant Knudson
Kilo
Fix Released
Critical
Brant Knudson
oslo-incubator
Fix Released
Critical
Julien Danjou
oslo.service
Fix Released
Critical
Marian Horban

Bug Description

In attempting to upgrade the upgrade branch structure to support stable/kilo -> master in devstack gate, we found the project could no longer pass Grenade testing. The reason is because pkill -g is no longer reliably killing off the services:

http://logs.openstack.org/91/175391/5/gate/gate-grenade-dsvm/0ad4a94/logs/grenade.sh.txt.gz#_2015-04-21_03_15_31_436

It has been seen with keystone-all and cinder-api on this patch series:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiVGhlIGZvbGxvd2luZyBzZXJ2aWNlcyBhcmUgc3RpbGwgcnVubmluZ1wiIEFORCBtZXNzYWdlOlwiZGllXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjE3MjgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0Mjk2MTU0NTQ2MzB9

There were a number of changes to the oslo-incubator service.py code during kilo, it's unclear at this point which is the issue.

Note: this has returned in stable/liberty / master and oslo.service, see comment #50 for where this reemerges.

Sean Dague (sdague)
Changed in oslo-incubator:
importance: Undecided → Critical
Revision history for this message
Sean Dague (sdague) wrote :
Download full text (58.2 KiB)

Here's what a working shutdown for keystone looks like:

http://logs.openstack.org/33/175533/1/check/check-grenade-dsvm/51e9a99/logs/old/screen-key.txt.gz#_2015-04-20_20_08_24_354

2015-04-20 20:08:24.354 9448 INFO keystone.openstack.common.service [-] Child caught SIGTERM, exiting
2015-04-20 20:08:24.358 9450 INFO keystone.openstack.common.service [-] Child caught SIGTERM, exiting
2015-04-20 20:08:24.359 9442 INFO keystone.openstack.common.service [-] Caught SIGTERM, stopping children
2015-04-20 20:08:24.360 9457 INFO keystone.openstack.common.service [-] Child caught SIGTERM, exiting
2015-04-20 20:08:24.361 9451 INFO keystone.openstack.common.service [-] Child caught SIGTERM, exiting
2015-04-20 20:08:24.361 9452 INFO keystone.openstack.common.service [-] Child caught SIGTERM, exiting
2015-04-20 20:08:24.363 9453 INFO keystone.openstack.common.service [-] Child caught SIGTERM, exiting
2015-04-20 20:08:24.367 9455 INFO keystone.openstack.common.service [-] Child caught SIGTERM, exiting
2015-04-20 20:08:24.369 9456 INFO keystone.openstack.common.service [-] Child caught SIGTERM, exiting
2015-04-20 20:08:24.369 9442 INFO keystone.openstack.common.service [-] Waiting on 12 children to exit
2015-04-20 20:08:24.369 9442 INFO keystone.openstack.common.service [-] Child 9447 exited with status 1
2015-04-20 20:08:24.370 9442 INFO keystone.openstack.common.service [-] Child 9448 exited with status 1
2015-04-20 20:08:24.370 9442 INFO keystone.openstack.common.service [-] Child 9449 exited with status 1
2015-04-20 20:08:24.370 9442 INFO keystone.openstack.common.service [-] Child 9450 exited with status 1
2015-04-20 20:08:24.371 9442 INFO keystone.openstack.common.service [-] Child 9452 exited with status 1
2015-04-20 20:08:24.371 9442 INFO keystone.openstack.common.service [-] Child 9457 exited with status 1
2015-04-20 20:08:24.371 9454 INFO keystone.openstack.common.service [-] Child caught SIGTERM, exiting
2015-04-20 20:08:24.371 9442 INFO keystone.openstack.common.service [-] Child 9451 exited with status 1
2015-04-20 20:08:24.372 9442 INFO keystone.openstack.common.service [-] Child 9453 exited with status 1
2015-04-20 20:08:24.372 9442 INFO keystone.openstack.common.service [-] Child 9446 exited with status 1
2015-04-20 20:08:24.373 9442 INFO keystone.openstack.common.service [-] Child 9455 exited with status 1
2015-04-20 20:08:24.378 9442 INFO keystone.openstack.common.service [-] Child 9454 exited with status 1
2015-04-20 20:08:24.378 9442 INFO keystone.openstack.common.service [-] Child 9456 exited with status 1
2015-04-20 20:08:24.379 9442 DEBUG keystone.openstack.common.service [-] Full set of CONF: wait /opt/stack/old/keystone/keystone/openstack/common/service.py:387
2015-04-20 20:08:24.379 9442 DEBUG keystone.openstack.common.service [-] ******************************************************************************** log_opt_values /usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py:2052
2015-04-20 20:08:24.379 9442 DEBUG keystone.openstack.common.service [-] Configuration options gathered from: log_opt_values /usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py:2053
2015-04-20 20:08:24.379 9442 DEBUG keystone.openstack.common.se...

Revision history for this message
Sean Dague (sdague) wrote :

The left over processes in the failing run are as follows:

2015-04-21 03:15:40.613 | stack 26996 0.1 0.9 149236 73800 ? Ss 03:01 0:01 /usr/bin/python /usr/local/bin/keystone-all --config-file /etc/keystone/keystone.conf
2015-04-21 03:15:40.613 | stack 27055 0.4 1.0 261868 83552 ? S 03:01 0:03 /usr/bin/python /usr/local/bin/keystone-all --config-file /etc/keystone/keystone.conf
2015-04-21 03:15:40.613 | stack 27060 0.5 1.0 261828 83584 ? S 03:01 0:04 /usr/bin/python /usr/local/bin/keystone-all --config-file /etc/keystone/keystone.conf

26996 is the session leader, and who got the original kill signal. It looks like 27055 and 27060 never got the signal, or never logged it.

Changed in keystone:
importance: Undecided → Critical
milestone: none → kilo-rc2
Changed in cinder:
importance: Undecided → Critical
Thierry Carrez (ttx)
tags: added: kilo-rc-potential
Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

A few suspicious changes:

optimizing waiting for children: https://review.openstack.org/#/c/156345/
reverting ^^: https://review.openstack.org/#/c/168924/
reverting the revert: https://review.openstack.org/#/c/174365/

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

The plan is to revert https://review.openstack.org/#/c/156345/ and then sync the change into the stable/kilo branches of cinder and keystone with depends-on set to the incubator change.

Changed in oslo-incubator:
assignee: nobody → Julien Danjou (jdanjou)
status: New → Triaged
Revision history for this message
Doug Hellmann (doug-hellmann) wrote :
Revision history for this message
Julien Danjou (jdanjou) wrote :

It's not clear if https://review.openstack.org/#/c/156345/ is the culprit has it is not and has never been present in Cinder.

Revision history for this message
Brant Knudson (blk-u) wrote :

It looks like if there's any client connections open then keystone won't shut down. I tried to recreate this with curl and keystone shuts down just fine, but if I use `nc localhost 5000` and CTRL-C the key screen it doesn't exit until I kill the nc process.

Revision history for this message
Brant Knudson (blk-u) wrote :

I should say keystone doesn't exit until I kill the nc process or until I send another SIGINT to keystone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/175857

Changed in keystone:
assignee: nobody → Julien Danjou (jdanjou)
status: New → In Progress
Revision history for this message
Brant Knudson (blk-u) wrote : Re: services no long reliably stop in stable/kilo

keystone doesn't have https://review.openstack.org/#/c/156345/ either. I can try it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/175859

Revision history for this message
Julien Danjou (jdanjou) wrote : Re: services no long reliably stop in stable/kilo

Still proposed to Keystone since Sean thinks the Cinder might be something else though:

master: https://review.openstack.org/#/c/175857/
stable/kilo: https://review.openstack.org/#/c/175859/

Revision history for this message
Brant Knudson (blk-u) wrote :

It would be interesting to know if there were any open connections to keystone when it failed to stop, or if this is caused by something else, too.

Revision history for this message
Brant Knudson (blk-u) wrote :

I don't see any change in behavior with https://review.openstack.org/#/c/175851/ applied to keystone.

Sean Dague (sdague)
summary: - services no long reliably stop in stable/kilo
+ services no longer reliably stop in stable/kilo
Revision history for this message
Sean Dague (sdague) wrote : Re: services no longer reliably stop in stable/kilo

Brant, I specifically tear down all the services in reverse order here, so there should not be any other openstack services still running.

Revision history for this message
Eric Harney (eharney) wrote :

Looking briefly at Cinder...

Grenade calls devstack's stop_cinder, which calls stop_process, which in turn calls pkill -g. pkill doesn't appear to wait here.

Grenade then calls "ensure_services_stopped cinder-api" which looks with ps for the service.

Unless I'm missing something, if Cinder takes very long to handle SIGTERM, this could be a race and grenade may be checking to see if it stopped when it is still in the process of shutting down.

Revision history for this message
Sean Dague (sdague) wrote :

Eric, so why would it take a "very long time"? At a certain point the service does need to shut down. It's not doing anything when shutdown is called on it (per logs) so being slow for no reason seems.... hand wavey.

Revision history for this message
Sean Dague (sdague) wrote :

Here is another run with a full process dump when we fail on keystone - http://logs.openstack.org/91/175391/6/check/check-grenade-dsvm-partial-ncpu/12ddd99/logs/grenade.sh.txt.gz

Nothing that's keystone chatty is up. So I don't think Brant's theory holds.

Revision history for this message
Brant Knudson (blk-u) wrote :

It would be interesting to see the output of the following command:

$ ss -p -t -o state established '( dport = :5000 or dport = :35357 )'
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 0 127.0.0.1:41798 127.0.0.1:5000 users:(("nc",17051,3))

The process dump shows that horizon is still running.

Revision history for this message
Sean Dague (sdague) wrote :

Feel free to add to the grenade debugging patch:

https://review.openstack.org/#/c/175935/

And then recheck:

https://review.openstack.org/#/c/175391

Revision history for this message
Brant Knudson (blk-u) wrote :

http://logs.openstack.org/91/175391/6/check/check-grenade-dsvm/a98210d/logs/grenade.sh.txt.gz#_2015-04-21_18_59_20_324

There are a couple of connections open to keystone:

2015-04-21 18:59:20.324 | + ss -p -t -o state established '( dport = :5000 or dport = :35357 )'
2015-04-21 18:59:20.327 | Recv-Q Send-Q Local Address:Port Peer Address:Port
2015-04-21 18:59:20.337 | 0 0 127.0.0.1:51305 127.0.0.1:5000
2015-04-21 18:59:20.337 | 0 0 127.0.0.1:51321 127.0.0.1:5000

I haven't been able to look at this much yet to figure out how to get keystone to kill active connection threads.

Revision history for this message
Brant Knudson (blk-u) wrote :

An alternative "fix" would be to figure out where those connections are coming from and kill it off, but it seems like a bug in keystone to not shut down on a regular shutdown signal if there's a client connection. I haven't seen docs either way... in some ways it might be nice to not shut down right away while there are active connections. The admin can always kill keystone with SIGKILL (SIGQUIT also worked) if they really want it to go away.

Revision history for this message
Brant Knudson (blk-u) wrote :

Just as an example, glance is happy to shutdown on SIGINT even with a client connected... I wonder how it does it. (And why every service does it differently...)

Revision history for this message
Josh Durgin (jdurgin) wrote :

Ceph ran into a similar signal handling issue in its cli. The root cause there was the main python thread calling a C lib when SIGINT came in. When this happens, python can be delayed from receiving SIGINT. Some discussion of a similar issue here: https://bugs.python.org/issue5315

Is is possible some calls to C libs are not being wrapped safely for eventlet somewhere, especially long-running ones similar to select?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to keystone (stable/kilo)

Related fix proposed to branch: stable/kilo
Review: https://review.openstack.org/176306

Changed in oslo-incubator:
status: Triaged → Fix Released
milestone: none → ongoing
milestone: ongoing → liberty-1
Revision history for this message
Brant Knudson (blk-u) wrote : Re: services no longer reliably stop in stable/kilo

Posted a change to oslo-incubator that worked for keystone: https://review.openstack.org/#/c/176151/

Thierry Carrez (ttx)
Changed in keystone:
milestone: kilo-rc2 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on keystone (stable/kilo)

Change abandoned by Doug Hellmann (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/175859
Reason: duplicate

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Doug Hellmann (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/176306
Reason: duplicate

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/176391

Changed in keystone:
assignee: Julien Danjou (jdanjou) → Brant Knudson (blk-u)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/176392

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on keystone (stable/kilo)

Change abandoned by Thierry Carrez (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/175859
Reason: superseded by https://review.openstack.org/#/c/176392/

Changed in cinder:
assignee: nobody → Brant Knudson (blk-u)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/176455

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/176457

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (stable/kilo)

Reviewed: https://review.openstack.org/176392
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=65a50eebb8d0a53a2c4c226eb9a564c4d535ac68
Submitter: Jenkins
Branch: stable/kilo

commit 65a50eebb8d0a53a2c4c226eb9a564c4d535ac68
Author: Brant Knudson <email address hidden>
Date: Wed Apr 22 11:33:00 2015 -0500

    Sync oslo-incubator Ie51669bd278288b768311ddf56ad31a2f28cc7ab

    This syncs to oslo-incubator to commit 64b5819 and also includes
    51280db.

    Change-Id: I7b43a67a0b67fe0ff5ac3d87708ecc4ab52102f8
    Depends-On: Ie51669bd278288b768311ddf56ad31a2f28cc7ab
    Closes-Bug: #1446583
    (cherry picked from commit 797da5f05444e7cfbf55df52867ade6107834f00)

Changed in keystone:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (master)

Reviewed: https://review.openstack.org/176391
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=797da5f05444e7cfbf55df52867ade6107834f00
Submitter: Jenkins
Branch: master

commit 797da5f05444e7cfbf55df52867ade6107834f00
Author: Brant Knudson <email address hidden>
Date: Wed Apr 22 11:33:00 2015 -0500

    Sync oslo-incubator Ie51669bd278288b768311ddf56ad31a2f28cc7ab

    This syncs to oslo-incubator to commit 64b5819 and also includes
    51280db.

    Change-Id: I7b43a67a0b67fe0ff5ac3d87708ecc4ab52102f8
    Depends-On: Ie51669bd278288b768311ddf56ad31a2f28cc7ab
    Closes-Bug: #1446583

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/176455
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=d73ac96d18c66aa4dd5b7d7f8d7c22e8f8434683
Submitter: Jenkins
Branch: master

commit d73ac96d18c66aa4dd5b7d7f8d7c22e8f8434683
Author: Brant Knudson <email address hidden>
Date: Wed Apr 22 14:57:53 2015 -0500

    service child process normal SIGTERM exit

    service.py had some code where the child process would catch the
    SIGTERM from the parent just so it could exit with 1 status rather
    than with an indication that it exited due to SIGTERM. When
    shutting down the parent doesn't care in what way the child ended,
    only that they're all gone, so this code is unnecessary.

    Also, for some reason this caused the child to never exit while
    there was an open connection from a client. Probably something
    with eventlet and signal handling.

    This is a cherry-pick of oslo-incubator commit
    702bc569987854b602ef189655c201c348de84cb .

    Change-Id: I87f3ca4da64fb8070e4d6c3876a2f1ce1a3ca71d
    Closes-Bug: #1446583

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in oslo-incubator:
status: Fix Released → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on keystone (master)

Change abandoned by Julien Danjou (<email address hidden>) on branch: master
Review: https://review.openstack.org/175857

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/kilo)

Reviewed: https://review.openstack.org/176457
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=b05274c96bc48e749e6ad21633b39158838c313e
Submitter: Jenkins
Branch: stable/kilo

commit b05274c96bc48e749e6ad21633b39158838c313e
Author: Brant Knudson <email address hidden>
Date: Wed Apr 22 14:57:53 2015 -0500

    service child process normal SIGTERM exit

    service.py had some code where the child process would catch the
    SIGTERM from the parent just so it could exit with 1 status rather
    than with an indication that it exited due to SIGTERM. When
    shutting down the parent doesn't care in what way the child ended,
    only that they're all gone, so this code is unnecessary.

    Also, for some reason this caused the child to never exit while
    there was an open connection from a client. Probably something
    with eventlet and signal handling.

    This is a cherry-pick of oslo-incubator commit
    702bc569987854b602ef189655c201c348de84cb .

    Change-Id: I87f3ca4da64fb8070e4d6c3876a2f1ce1a3ca71d
    Closes-Bug: #1446583
    (cherry picked from commit d73ac96d18c66aa4dd5b7d7f8d7c22e8f8434683)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/176777

Changed in nova:
assignee: nobody → Sean Dague (sdague)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/176802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/176806

Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → High
Thierry Carrez (ttx)
no longer affects: nova/kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/176777
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f0f158af375efcc91e7c75859fceeec0111f424b
Submitter: Jenkins
Branch: master

commit f0f158af375efcc91e7c75859fceeec0111f424b
Author: Sean Dague <email address hidden>
Date: Thu Apr 23 10:40:34 2015 -0400

    sync oslo: service child process normal SIGTERM exit

    service.py had some code where the child process would catch the
    SIGTERM from the parent just so it could exit with 1 status rather
    than with an indication that it exited due to SIGTERM. When
    shutting down the parent doesn't care in what way the child ended,
    only that they're all gone, so this code is unnecessary.

    Also, for some reason this caused the child to never exit while
    there was an open connection from a client. Probably something
    with eventlet and signal handling.

    Syncs commit: 702bc569987854b602ef189655c201c348de84cb

    Change-Id: I3c5249f5e59bb396bcb50964907ea61ebb2a3c8a
    Closes-Bug: #1446583

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Elena Ezhova (eezhova) wrote : Re: services no longer reliably stop in stable/kilo

Are there any plans to revert this [1] back? Am I right to assume that [2] wasn't the core reason of this bug?

[1] https://review.openstack.org/#/c/175851/1
[2] https://review.openstack.org/#/c/156345/

Revision history for this message
Sean Dague (sdague) wrote :

So, the current believe is that the bug was always there. #2 makes it show up *much* more often, my guess is because it removes the eventlet sleep. So the master code in oslo-incubator now has something which adds to 175851.

Thierry Carrez (ttx)
tags: removed: kilo-rc-potential
Thierry Carrez (ttx)
Changed in oslo-incubator:
status: Fix Committed → Fix Released
milestone: liberty-1 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/179287

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/179288

Changed in keystone:
milestone: none → liberty-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/176806
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c1b6e1aef9c528b9fb183c8d607f0f99e1856468
Submitter: Jenkins
Branch: stable/kilo

commit c1b6e1aef9c528b9fb183c8d607f0f99e1856468
Author: Sean Dague <email address hidden>
Date: Thu Apr 23 10:40:34 2015 -0400

    sync oslo: service child process normal SIGTERM exit

    service.py had some code where the child process would catch the
    SIGTERM from the parent just so it could exit with 1 status rather
    than with an indication that it exited due to SIGTERM. When
    shutting down the parent doesn't care in what way the child ended,
    only that they're all gone, so this code is unnecessary.

    Also, for some reason this caused the child to never exit while
    there was an open connection from a client. Probably something
    with eventlet and signal handling.

    Syncs commit: 702bc569987854b602ef189655c201c348de84cb

    Change-Id: I3c5249f5e59bb396bcb50964907ea61ebb2a3c8a
    Closes-Bug: #1446583
    (cherry picked from commit f0f158af375efcc91e7c75859fceeec0111f424b)

tags: added: in-stable-kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)
Download full text (6.4 KiB)

Reviewed: https://review.openstack.org/179287
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=cabe7c1a1d5b35e58fc4ed34b12fcccd4416835e
Submitter: Jenkins
Branch: master

commit 5987bb2290f629e59b0bcced2f8fe22cdeb9cc6d
Author: John Griffith <email address hidden>
Date: Thu Apr 23 12:07:12 2015 -0600

    Add external genconfig calls

    After moving to oslo.config we still were using
    incubator config generator. This was ok, but the
    problem is we haven't been pulling config options
    from the oslo libs.

    This is a hack that just appends external lib calls
    and appends those options to the sample file being built.

    Change-Id: I2634b20ef4abd3bf7990f845d59ad3d208db234f
    (cherry picked from commit 51a22591a44932463847ed3247899db32ac49444)
    Closes-Bug: #1447380

commit b05274c96bc48e749e6ad21633b39158838c313e
Author: Brant Knudson <email address hidden>
Date: Wed Apr 22 14:57:53 2015 -0500

    service child process normal SIGTERM exit

    service.py had some code where the child process would catch the
    SIGTERM from the parent just so it could exit with 1 status rather
    than with an indication that it exited due to SIGTERM. When
    shutting down the parent doesn't care in what way the child ended,
    only that they're all gone, so this code is unnecessary.

    Also, for some reason this caused the child to never exit while
    there was an open connection from a client. Probably something
    with eventlet and signal handling.

    This is a cherry-pick of oslo-incubator commit
    702bc569987854b602ef189655c201c348de84cb .

    Change-Id: I87f3ca4da64fb8070e4d6c3876a2f1ce1a3ca71d
    Closes-Bug: #1446583
    (cherry picked from commit d73ac96d18c66aa4dd5b7d7f8d7c22e8f8434683)

commit 2727e8865ce7b9ef4eec81f7f07b7a0726eb304b
Author: Lucian Petrut <email address hidden>
Date: Fri Mar 27 14:15:25 2015 +0200

    Windows SMBFS: fix volume extend

    The Windows SMBFS driver inherits the Linux SMBFS driver,
    overriding Windows specific methods.

    This commit Ic89cffc93940b7b119cfcde3362f304c9f2875df added the
    volume name as an extra argument to the _do_extend_volume in order
    to check if differencing images are pointing to backing files other
    than the according volume disks.

    Although this is not required on Windows, this method should accept
    this extra argument in order to have the same signature as the
    method it overrides. At the moment, this raises the following
    exception:

    TypeError: _do_extend_volume() takes exactly 3 arguments (4 given)

    Closes-Bug: #1437290
    (cherry picked from commit dca29e9ab3cdde210d3777e7c6b4a6849447058a)
    Change-Id: I868d7de4a2c68f3fc520ba476a5660a84f440bb1

commit cc9bd73479ab4f0d14ee66eccab6fa285b8836b9
Author: Daisuke Fujita <email address hidden>
Date: Wed Apr 15 14:03:31 2015 +0900

    Fix a wrong argument of create method

    Change the argument 'QoSSpecs.create' to 'qos_specs.create'.

    Closes-Bug: #1443331
    (cherry picked from commit a3c0a4104f95acff00d3a9721caa4da730619fb7)
    Change-Id: Iabebc5f1681be75fb06d83...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (master)
Download full text (4.7 KiB)

Reviewed: https://review.openstack.org/179288
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=9bc6043eb06199b8d4dbf6698e129d984a59cc11
Submitter: Jenkins
Branch: master

commit 65a50eebb8d0a53a2c4c226eb9a564c4d535ac68
Author: Brant Knudson <email address hidden>
Date: Wed Apr 22 11:33:00 2015 -0500

    Sync oslo-incubator Ie51669bd278288b768311ddf56ad31a2f28cc7ab

    This syncs to oslo-incubator to commit 64b5819 and also includes
    51280db.

    Change-Id: I7b43a67a0b67fe0ff5ac3d87708ecc4ab52102f8
    Depends-On: Ie51669bd278288b768311ddf56ad31a2f28cc7ab
    Closes-Bug: #1446583
    (cherry picked from commit 797da5f05444e7cfbf55df52867ade6107834f00)

commit 579a065c0dcce554a5dca86164eb8f1d6fb43c4d
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Apr 20 17:55:55 2015 +0000

    Updated from global requirements

    Change-Id: I72af7a36f2c3ba206be06fa35323386801e6ff81

commit 906485152a8ec886cf4a45cbe1037184ce39f1a1
Author: Andreas Jaeger <email address hidden>
Date: Mon Apr 20 11:11:25 2015 +0200

    Release Import of Translations from Transifex

    Manual import of Translations from Transifex. This change also removes
    all po files that are less than 66 per cent translated since such
    partially translated files will not help users.

    This change needs to be done manually since the automatic import does
    not handle the proposed branches and we need to sync with latest
    translations.

    Change-Id: Iaf4bdae303b06c1af4023fe2daa3a6b03c195ee9

commit 18ca7fabece4837ad56e435bc9d5f0b6278fa4be
Author: Alexander Makarov <email address hidden>
Date: Mon Apr 6 15:49:41 2015 +0300

    Make memcache client reusable across threads

    memcache.Client is inherited from threading._local so instances are only
    accessible from current thread or eventlet. Present workaround broke
    inheritance chain so super() call is unusable.

    This patch makes artificial client class mimic inheritance from
    threading._local while using generic object methods allowing reusability.

    Change-Id: Ic5d5709695877afb995fd816bb0e4ce711b99b60
    Closes-Bug: #1440493
    (cherry picked from commit 33a95575fc3778bf8ef054f7b9d24fcb7c75100b)

commit cedce339a08d475617c7f57c148e192dc3709a34
Author: Thierry Carrez <email address hidden>
Date: Thu Apr 16 22:19:42 2015 +0200

    Set default branch to stable/kilo

    Open stable/kilo branch by setting defaultbranch for git-review.

    Change-Id: If5b35b0fc5a85ba8dda16dc6b365537ed0d839bc

commit 86df39c01e96ad3b15e33eb6fc1bf726a0a704c5
Author: Eric Brown <email address hidden>
Date: Mon Apr 13 11:37:53 2015 -0700

    backend_argument should be marked secret

    Since the backend_argument can potentially contain a password,
    it should be marked secret to avoid leakage into the logs.

    Closes-Bug: #1443598

    Change-Id: I55663db4cf2df84a66de8f64fba4b4f129ae827d
    (cherry picked from commit f9db1a65bd4d83d12c572ba4d5807845996ef410)

commit b679e7d6be18d33ebdfe133161a3daf2f305d954
Author: Lance Bragstad <email address hidden>
Date: Tue Apr 7 18:47:34 2015 +0000

    Update man p...

Read more...

Thierry Carrez (ttx)
Changed in cinder:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Changed in keystone:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Revision history for this message
Sean Dague (sdague) wrote : Re: services no longer reliably stop in stable/kilo

This has returned in liberty and is currently blocking grenade in much the same way as last time.

https://review.openstack.org/#/c/190175/ is most likely part of the problem. It looks like after we ensured that SIGTERM actually terminated the program at the kilo release, folks from hds.com complained that some work units weren't processed after that patch, so it was flipped back to a graceful version which is possible will never exit.

Current unit tests in oslo.service have gotten much flakier since then as well, which may be related.

Changed in oslo.service:
importance: Undecided → Critical
summary: - services no longer reliably stop in stable/kilo
+ services no longer reliably stop in stable/liberty / master
description: updated
Revision history for this message
Elena Ezhova (eezhova) wrote :

According to grenade.sh logs [1] of gate-grenade-dsvm job in https://review.openstack.org/#/c/229363/7 , it failed shutting down nova-compute. The function that stops services waits for them to exit only for 10 seconds [2]. Meanwhile, I experimented with shutting down nova-compute on my devstack (with code from master) and it turned out that it may take it more than 10 seconds to stop (sometimes 12 or even 15 seconds). That's why increasing wait_for to, say, 20 seconds can solve the problem with grenade, I think.

[1] http://logs.openstack.org/63/229363/5/check/gate-grenade-dsvm/badeb08/logs/grenade.sh.txt.gz
[2] https://github.com/openstack-dev/grenade/blob/master/inc/upgrade#L141

Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

Another solution of this problem is following patch.
This patch adds 10 seconds loop and check service status for each 0.3 second at stop_process in devstack.
At the end, the process is killed by SIGKILL even if the process is not finished during10 seconds.

Ensure a service stop during stop_process
https://review.openstack.org/#/c/208064/

Revision history for this message
Elena Ezhova (eezhova) wrote :

Observed a failure with shutting down nova-api in gate-grenade-dsvm-neutron job. [1] Nova-api failed to shut down in 30 seconds and in old/screen-n-api log the following trace can be seen [2].

[1] http://logs.openstack.org/30/228730/6/check/gate-grenade-dsvm-neutron/29a9eeb/logs/grenade.sh.txt.gz
[2] http://paste.openstack.org/show/475272/

Revision history for this message
Elena Ezhova (eezhova) wrote :

As I found out, all services are stopped using pkill -g <service_pid> [1] which means that SIGTERM is sent to all processes in a given process group simultaneously. That doesn't seem like a 100% correct way because a parent process is supposed to terminate its children itself [2]. So from my POW it would be more reasonable to stop services using kill <service_pid>.

[1] https://github.com/openstack-dev/devstack/blob/master/functions-common#L1478
[2] https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L536-L551

Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

This patch was merged for workaround today.
https://review.openstack.org/#/c/230107/

Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
Thierry Carrez (ttx)
Changed in keystone:
milestone: liberty-1 → 8.0.0
Thierry Carrez (ttx)
Changed in cinder:
milestone: liberty-1 → 7.0.0
Marian Horban (mhorban)
Changed in oslo.service:
assignee: nobody → Marian Horban (mhorban)
Marian Horban (mhorban)
Changed in oslo.service:
status: New → In Progress
Revision history for this message
Marian Horban (mhorban) wrote :

I added oslo.service patch https://review.openstack.org/#/c/238694/. In this patch I added graceful_shutdown_timeout config option. This option is responsible for interrupting graceful shutdown after timeout. Default value for this option is 0. It means that service will be waiting forever. Such behaviour is taken from Apache GracefulShutdownTimeout directive(but could be discussed and changed if we need so).
Also I added devstack review https://review.openstack.org/#/c/238983/ for configuration graceful_shutdown_timeout option of devstack services.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Changed in oslo.service:
status: In Progress → Fix Released
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/oslo.service 1.3.0

This issue was fixed in the openstack/oslo.service 1.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.