Actions do not start executing

Bug #1847091 reported by Alexandros Soumplis on 2019-10-07
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
watcher
Undecided
Unassigned

Bug Description

When I start an action plan, the first action is executed successfully and then no other action starts, they all remain in PENDING state. The last logs in watcher-applier.log are:

019-10-07 11:53:01.011 35814 DEBUG oslo_messaging._drivers.amqpdriver [req-d1d1f4bb-c757-48c7-b750-1f7928d6fb13 - - - - -] CAST unique_id: 9712b83da2434b048ff670e96d487edf NOTIFY exchange 'openstack' topic 'notifications.info' _send /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:613
2019-10-07 11:53:01.020 35814 DEBUG watcher.notifications.base [req-c2f77228-7f7a-493a-8def-398b130117f5 - - - - -] Emitting notification `action.execution.end` _emit /usr/local/lib/python2.7/dist-packages/watcher/notifications/base.py:177
2019-10-07 11:53:01.022 35814 DEBUG oslo_messaging._drivers.amqpdriver [req-c2f77228-7f7a-493a-8def-398b130117f5 - - - - -] CAST unique_id: 775ee978babd427481346df8535b1ff9 NOTIFY exchange 'openstack' topic 'notifications.info' _send /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:613
2019-10-07 11:53:01.969 35814 DEBUG watcher.applier.workflow_engine.default [req-c2f77228-7f7a-493a-8def-398b130117f5 - - - - -] Post-condition action: action_type:change_nova_service_state uuid:8515b07d-6d4c-46b8-b530-f693bff7984a do_post_execute /usr/local/lib/python2.7/dist-packages/watcher/applier/workflow_engine/default.py:136

An strace on the watcher-applier process indicates an infinite wait on some break.
root@wa:~# strace -f -s 1024 -p 35814
strace: Process 35814 attached
strace: [ Process PID=35814 runs in x32 mode. ]
strace: [ Process PID=35814 runs in 64 bit mode. ]
brk(0x55d96a9f6000) = 0x55d96a9f6000
brk(0x55d96aa17000) = 0x55d96aa17000
brk(0x55d96aa38000) = 0x55d96aa38000
brk(0x55d96aa59000) = 0x55d96aa59000
brk(0x55d96aa7a000) = 0x55d96aa7a000
brk(0x55d96aa9b000) = 0x55d96aa9b000
brk(0x55d96aabc000) = 0x55d96aabc000
brk(0x55d96aadd000) = 0x55d96aadd000
brk(0x55d96aafe000) = 0x55d96aafe000
(...)

I am running the latest stable/rocky from github on Ubuntu 18.04

licanwei (li-canwei2) wrote :

Hi,
Watcher Applier uses taskflow as workflow engine, actions are added to taskflow as tasks.
Maybe there are something wrong about taskflow.
Could you paste more info ? such as beginning from debug log: ' Emitting notification `action_plan.execution.start'

Alexandros Soumplis (soumplis) wrote :

Please find the full logs (some fqdns and id are obfuscated) of a complete run (watcher-applier start and actionplan start) here:

https://haste.tools.grnet.gr/wicijoyida.bash

licanwei (li-canwei2) wrote :

It's strange, from the log :2019-10-08 06:22:34.176 1900 DEBUG watcher.applier.workflow_engine.base [req-875a653c-3136-40b1-881a-d3e5fbf005a3 - - - - -] POST_EXECUTE ---> action_type:change_nova_service_state uuid:460b2ae8-9e00-4dd3-833c-5e56b49a5415==1.0 post_execute /usr/local/lib/python2.7/dist-packages/watcher/applier/workflow_engine/base.py:245
line 245 in base.py should be a log.debug line, but in rocky version it's LOG.exception(e) https://github.com/openstack/watcher/blob/stable/rocky/watcher/applier/workflow_engine/base.py#L245

I wish you check the files workflow_engine/base.py and default.py

Alexandros Soumplis (soumplis) wrote :

You are right, this is because I have added a couple of debug log lines and changed everything. I have downloaded again the upstream code (applying this patch https://review.opendev.org/#/c/686110/2 ) and you can find the relevant log here:

https://haste.tools.grnet.gr/upamiwerak

licanwei (li-canwei2) wrote :

2019-10-08 13:18:56.994 38357 DEBUG watcher.applier.workflow_engine.default [req-ff29921f-9c0e-4e36-ae57-ceaf71e0370c - - - - -] Post-condition action: action_type:change_nova_service_state uuid:4e14809d-9019-4343-a1bc-18b0cc455f2c do_post_execute /usr/local/lib/python2.7/dist-packages/watcher/applier/workflow_engine/default.py:136
2019-10-08 13:18:57.000 38357 DEBUG watcher.applier.workflow_engine.default [req-ff29921f-9c0e-4e36-ae57-ceaf71e0370c - - - - -] Post-condition action: action_type:change_nova_service_state uuid:b139d297-c64c-49bd-9efe-e409f4987a92 do_post_execute /usr/local/lib/python2.7/dist-packages/watcher/applier/workflow_engine/default.py:136

There are two actions were executed successfully.
How many actions in the actionplan?
Could you run some cli cmd?

nova service-list
watcher audit list
watcher actionplan show db4b4ead-af3e-4a51-b915-1b9f5b1493e4
watcher action list --detail --action-plan=db4b4ead-af3e-4a51-b915-1b9f5b1493e4

and we can talk in irc channel #openstack-watcher if you will, thanks

Alexandros Soumplis (soumplis) wrote :

For the record, changing the parallelization parameter in watcher.conf to "disable" parallelization seems to allow actions to start and the actionplan completes.

The change in watcher.conf is:
parallelization = change_node_power_state:1,change_nova_service_state:1,migrate:1,nop:1,resize:1,sleep:1,turn_host_to_acpi_s3_state:1,volume_migrate:1

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers