Actions do not start executing

Bug #1847091 reported by Alexandros Soumplis
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
watcher
New
Undecided
Unassigned

Bug Description

When I start an action plan, the first action is executed successfully and then no other action starts, they all remain in PENDING state. The last logs in watcher-applier.log are:

019-10-07 11:53:01.011 35814 DEBUG oslo_messaging._drivers.amqpdriver [req-d1d1f4bb-c757-48c7-b750-1f7928d6fb13 - - - - -] CAST unique_id: 9712b83da2434b048ff670e96d487edf NOTIFY exchange 'openstack' topic 'notifications.info' _send /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:613
2019-10-07 11:53:01.020 35814 DEBUG watcher.notifications.base [req-c2f77228-7f7a-493a-8def-398b130117f5 - - - - -] Emitting notification `action.execution.end` _emit /usr/local/lib/python2.7/dist-packages/watcher/notifications/base.py:177
2019-10-07 11:53:01.022 35814 DEBUG oslo_messaging._drivers.amqpdriver [req-c2f77228-7f7a-493a-8def-398b130117f5 - - - - -] CAST unique_id: 775ee978babd427481346df8535b1ff9 NOTIFY exchange 'openstack' topic 'notifications.info' _send /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:613
2019-10-07 11:53:01.969 35814 DEBUG watcher.applier.workflow_engine.default [req-c2f77228-7f7a-493a-8def-398b130117f5 - - - - -] Post-condition action: action_type:change_nova_service_state uuid:8515b07d-6d4c-46b8-b530-f693bff7984a do_post_execute /usr/local/lib/python2.7/dist-packages/watcher/applier/workflow_engine/default.py:136

An strace on the watcher-applier process indicates an infinite wait on some break.
root@wa:~# strace -f -s 1024 -p 35814
strace: Process 35814 attached
strace: [ Process PID=35814 runs in x32 mode. ]
strace: [ Process PID=35814 runs in 64 bit mode. ]
brk(0x55d96a9f6000) = 0x55d96a9f6000
brk(0x55d96aa17000) = 0x55d96aa17000
brk(0x55d96aa38000) = 0x55d96aa38000
brk(0x55d96aa59000) = 0x55d96aa59000
brk(0x55d96aa7a000) = 0x55d96aa7a000
brk(0x55d96aa9b000) = 0x55d96aa9b000
brk(0x55d96aabc000) = 0x55d96aabc000
brk(0x55d96aadd000) = 0x55d96aadd000
brk(0x55d96aafe000) = 0x55d96aafe000
(...)

I am running the latest stable/rocky from github on Ubuntu 18.04

Revision history for this message
licanwei (li-canwei2) wrote :

Hi,
Watcher Applier uses taskflow as workflow engine, actions are added to taskflow as tasks.
Maybe there are something wrong about taskflow.
Could you paste more info ? such as beginning from debug log: ' Emitting notification `action_plan.execution.start'

Revision history for this message
Alexandros Soumplis (soumplis) wrote :

Please find the full logs (some fqdns and id are obfuscated) of a complete run (watcher-applier start and actionplan start) here:

https://haste.tools.grnet.gr/wicijoyida.bash

Revision history for this message
licanwei (li-canwei2) wrote :

It's strange, from the log :2019-10-08 06:22:34.176 1900 DEBUG watcher.applier.workflow_engine.base [req-875a653c-3136-40b1-881a-d3e5fbf005a3 - - - - -] POST_EXECUTE ---> action_type:change_nova_service_state uuid:460b2ae8-9e00-4dd3-833c-5e56b49a5415==1.0 post_execute /usr/local/lib/python2.7/dist-packages/watcher/applier/workflow_engine/base.py:245
line 245 in base.py should be a log.debug line, but in rocky version it's LOG.exception(e) https://github.com/openstack/watcher/blob/stable/rocky/watcher/applier/workflow_engine/base.py#L245

I wish you check the files workflow_engine/base.py and default.py

Revision history for this message
Alexandros Soumplis (soumplis) wrote :

You are right, this is because I have added a couple of debug log lines and changed everything. I have downloaded again the upstream code (applying this patch https://review.opendev.org/#/c/686110/2 ) and you can find the relevant log here:

https://haste.tools.grnet.gr/upamiwerak

Revision history for this message
licanwei (li-canwei2) wrote :

2019-10-08 13:18:56.994 38357 DEBUG watcher.applier.workflow_engine.default [req-ff29921f-9c0e-4e36-ae57-ceaf71e0370c - - - - -] Post-condition action: action_type:change_nova_service_state uuid:4e14809d-9019-4343-a1bc-18b0cc455f2c do_post_execute /usr/local/lib/python2.7/dist-packages/watcher/applier/workflow_engine/default.py:136
2019-10-08 13:18:57.000 38357 DEBUG watcher.applier.workflow_engine.default [req-ff29921f-9c0e-4e36-ae57-ceaf71e0370c - - - - -] Post-condition action: action_type:change_nova_service_state uuid:b139d297-c64c-49bd-9efe-e409f4987a92 do_post_execute /usr/local/lib/python2.7/dist-packages/watcher/applier/workflow_engine/default.py:136

There are two actions were executed successfully.
How many actions in the actionplan?
Could you run some cli cmd?

nova service-list
watcher audit list
watcher actionplan show db4b4ead-af3e-4a51-b915-1b9f5b1493e4
watcher action list --detail --action-plan=db4b4ead-af3e-4a51-b915-1b9f5b1493e4

and we can talk in irc channel #openstack-watcher if you will, thanks

Revision history for this message
Alexandros Soumplis (soumplis) wrote :

For the record, changing the parallelization parameter in watcher.conf to "disable" parallelization seems to allow actions to start and the actionplan completes.

The change in watcher.conf is:
parallelization = change_node_power_state:1,change_nova_service_state:1,migrate:1,nop:1,resize:1,sleep:1,turn_host_to_acpi_s3_state:1,volume_migrate:1

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.