Jobs fail in upstream infra when setting iptables rules in host prepare

Bug #1885697 reported by Sagi (Sergey) Shnaidman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

we see a lot of issues with setting iptables rule when preparing a host for a job:

2020-06-30 10:09:39.887078 | TASK [persistent-firewall : Persist ipv4 rules]
2020-06-30 10:09:40.034707 | primary | ERROR
2020-06-30 10:09:40.035937 | primary | {
2020-06-30 10:09:40.036571 | primary | "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stdout'\n\nThe error appears to be in '/var/lib/zuul/builds/7cdd1b201d0e462680ea7ac71d0777b6/untrusted/project_0/opendev.org/zuul/zuul-jobs/roles/persistent-firewall/tasks/persist/RedHat.yaml': line 1, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Persist ipv4 rules\n ^ here\n"
2020-06-30 10:09:40.036712 | primary | }
2020-06-30 10:09:41.637665 | secondary | changed

https://fa41114c73dc4ffe3f14-2bb0e09cfc1bf1e619272dff8ccf0e99.ssl.cf2.rackcdn.com/738557/2/check/tripleo-ci-centos-8-containers-multinode/7cdd1b2/job-output.txt

It causes jobs to retry again and again

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

from today:
https://56e72acce5e67d635b2f-b5bb81507bf0dacae1b327a9b41f478a.ssl.cf2.rackcdn.com/periodic/opendev.org/openstack/tripleo-quickstart-extras/master/tripleo-ci-centos-8-containers-multinode-ussuri/1ea74dd/job-output.txt

2020-07-01 07:29:04.673502 | primary | MODULE FAILURE

2020-07-01 07:29:07.845507 | TASK [persistent-firewall : Persist ipv4 rules]
2020-07-01 07:29:07.988329 | primary | ERROR
2020-07-01 07:29:07.989444 | primary | {
2020-07-01 07:29:07.989565 | primary | "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stdout'\n\nThe error appears to be in '/var/lib/zuul/builds/1ea74dd63e4e48d6a5b5266b67ecca36/untrusted/project_0/opendev.org/zuul/zuul-jobs/roles/persistent-firewall/tasks/persist/RedHat.yaml': line 1, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Persist ipv4 rules\n ^ here\n"
2020-07-01 07:29:07.989680 | primary | }
2020-07-01 07:29:09.888427 | secondary | changed

iptables-save for ipv4 fails to execute.

wes hayutin (weshayutin)
tags: added: promotion-blocker
Revision history for this message
Alex Schultz (alex-schultz) wrote :

Further review of this bug points to it being a system wide issue. I think it's an executor/ansible issue around core module execution. Specifically fnctl calls from https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/basic.py#L2665-L2667 which can return EACCES (e.g. errno 13). It's probably something that needs to be reproduced outside of zuul and improved in ansible itself. The proposed patch for that role adds a retry but that doesn't take effect for MODULE FAILURE errors.

Changed in tripleo:
milestone: victoria-1 → victoria-3
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
milestone: wallaby-3 → wallaby-rc1
Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.