Plan delete fails due to missing, but present .gitignore

Bug #1646450 reported by Steven Hardy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
High
Unassigned

Bug Description

Swift list shows a .gitignore, but we then fail deleting it - I'm not sure why atm - this breaks the plan delete (we return 0 here too, which looks wrong):

Deleting plan overcloud...
Starting new HTTP connection (1): 192.0.2.1
"POST /v2/action_executions HTTP/1.1" 201 319
HTTP POST http://192.0.2.1:8989/v2/action_executions 201
Object DELETE failed
clean_up DeletePlan:
END return value: 0

[stack@instack ~]$ cd /var/log/mistral/
[stack@instack mistral]$ tail executor.log
2016-12-01 10:22:55.748 27190 ERROR tripleo_common.actions.plan File "/usr/lib/python2.7/site-packages/tripleo_common/actions/plan.py", line 233, in run
2016-12-01 10:22:55.748 27190 ERROR tripleo_common.actions.plan swift.delete_object(self.container, data['name'])
2016-12-01 10:22:55.748 27190 ERROR tripleo_common.actions.plan File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1844, in delete_object
2016-12-01 10:22:55.748 27190 ERROR tripleo_common.actions.plan headers=headers)
2016-12-01 10:22:55.748 27190 ERROR tripleo_common.actions.plan File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1673, in _retry
2016-12-01 10:22:55.748 27190 ERROR tripleo_common.actions.plan service_token=self.service_token, **kwargs)
2016-12-01 10:22:55.748 27190 ERROR tripleo_common.actions.plan File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1480, in delete_object
2016-12-01 10:22:55.748 27190 ERROR tripleo_common.actions.plan raise ClientException.from_response(resp, 'Object DELETE failed', body)
2016-12-01 10:22:55.748 27190 ERROR tripleo_common.actions.plan ClientException: Object DELETE failed: http://192.0.2.1:8080/v1/AUTH_5e6152a640374da8bdd78043087e2ce8/overcloud/.gitignore 404 Not Found [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2016-12-01 10:22:55.748 27190 ERROR tripleo_common.actions.plan
[stack@instack mistral]$ swift list overcloud | grep gitignore
.gitignore

[stack@instack ~]$ swift download overcloud .gitignore
Object 'overcloud/.gitignore' not found

[stack@instack ~]$ swift list overcloud | grep gitignore
.gitignore

[stack@instack ~]$ swift download overcloud ".gitignore"
Object 'overcloud/.gitignore' not found
[stack@instack ~]$ swift delete overcloud ".gitignore"
Error Deleting: overcloud/.gitignore: Object DELETE failed: http://192.0.2.1:8080/v1/AUTH_5e6152a640374da8bdd78043087e2ce8/overcloud/.gitignore 404 Not Found [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<

Steven Hardy (shardy)
Changed in tripleo:
status: New → Triaged
importance: Undecided → High
milestone: none → ocata-2
Revision history for this message
Julie Pichon (jpichon) wrote :

I'm getting a similar problem. I have many more 'not found' files - I think because I saw more of them being deleted successfully before failing on the Conflict.

In the Swift logs I see: undercloud object-server: ERROR with remote server 192.168.24.1:6001/1: ConnectionTimeout (0.5s)
showing up after failing to delete, which is not a valid IP on my undercloud (it should be in the 192.0.2.0 range). I did run a failed undercloud update due to the new networking range - I went past it after updating undercloud.conf, but I wonder if maybe something got misconfigured along the way somehow. The config files in /etc/swift look correct though, so hard to say if it's actually related.

Revision history for this message
Julie Pichon (jpichon) wrote :

Also for me it is failing on http://192.0.2.1:8080/v1/AUTH_7c9bb1cd08c2430dadc10963157fe160/overcloud/all-nodes-validation.yaml, likely because this is the first file in my container.

Revision history for this message
Julie Pichon (jpichon) wrote :

In my case it looks like some IPs got added wrongly to the swift rings when I hit bug 1645267.

# swift-ring-builder container.builder search --region 1
Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
             0 1 1 192.0.2.1 6001 192.0.2.1 6001 1 1.00 131072 0.00
             1 1 1 192.168.24.1 6001 192.168.24.1 6001 1 1.00 131072 0.00

Running the following commands from /etc/swift helped me get past the problem and finally delete the container:

# swift-ring-builder account.builder remove 192.168.24.1
# swift-ring-builder container.builder remove 192.168.24.1
# swift-ring-builder object.builder remove 192.168.24.1

# swift-ring-builder container.builder rebalance 42
# swift-ring-builder object.builder rebalance 42
# swift-ring-builder account.builder rebalance 42

(Looking at the help for swift-ring-builder maybe 42 isn't actually needed? I'm not sure what it does.)

Revision history for this message
Steven Hardy (shardy) wrote :

Since the root cause of this was https://bugs.launchpad.net/tripleo/+bug/1645267 I'll mark this as a duplicate

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.