dlrnapi promoter: promotion of older link > newer link

Bug #1756892 reported by Matt Young
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Matt Young

Bug Description

The dlrn promoter [1] is promoting older symlinks atop newer symlinks. This was observed in ocata recently.

the following is roughly what's happened:

```
2018-03-16 03:34:16, a0/81/a081db24c546edb189738376f584155f349060ed_e0df978b, current-tripleo-rdo-internal
<snip/>
2018-03-08 10:43:57, ff/d1/ffd1c0125f4a4a96df68f779ed76dd5efd706b96_756c2a38, current-tripleo-rdo-internal
2018-03-08 05:14:15, ff/d1/ffd1c0125f4a4a96df68f779ed76dd5efd706b96_756c2a38, current-tripleo-rdo
2018-03-08 01:43:34, ff/d1/ffd1c0125f4a4a96df68f779ed76dd5efd706b96_756c2a38, current-tripleo
```

On 3/8 [2] we promoted the following symlink:

https://trunk.rdoproject.org/centos7-ocata/ff/d1/ffd1c0125f4a4a96df68f779ed76dd5efd706b96_756c2a38

Which was fine. This updated the current-tripleo-rdo-internal symlink [3]

<snip>
2018-03-08 15:43:52,481 5969 INFO promoter sftp> ln -s /var/www/html/images/ocata/rdo_trunk/ffd1c0125f4a4a96df68f779ed76dd5efd706b96_756c2a38 /var/www/html/images/ocata/rdo_trunk/current-tripleo-rdo-internal
</snip>

This link was created 3/8
---

then on 3/16 [4] we promoted a symlink that's older than previously promoted link! This was from 2/9. The snippit from that log is [5]

The full ocata promotion history log is here: [6] showing a unified progression (and the error). Current ocata state (which is presently incorrect) is shown on the RDO dashboard [7]

[1] https://github.com/rdo-infra/ci-config/tree/master/ci-scripts/dlrnapi_promoter
[2] http://38.145.34.55/ocata.log-20180308
[3] http://paste.openstack.org/show/704514
[4] http://38.145.34.55/ocata.log-20180316
[5] http://paste.openstack.org/show/704521
[6] http://paste.openstack.org/show/704527
[7] https://dashboards.rdoproject.org/ocata

Tags: ci quickstart
Revision history for this message
Matt Young (halcyondude) wrote :

Update. The ocata promoter is currently attempting to promote the most recent rdo1 hash:

2018-03-20 02:53:25,581 9490 INFO promoter Trying to promote current-tripleo-rdo to current-tripleo-rdo-internal

2018-03-20 02:53:29,525 9490 INFO promoter The currently promoted hash is {'timestamp': 1521185656, 'promote_name': 'current-tripleo-rdo-internal', 'distro_hash': 'e0df978be1aa1dd7a133a2e017d1496964bd2648', 'user': 'ciuser', 'commit_hash': 'a081db24c546edb189738376f584155f349060ed'}

2018-03-20 02:53:33,279 9490 DEBUG promoter Hashes fetched (tried to get the last 5): [{'timestamp': 1521056055, 'promote_name': 'current-tripleo-rdo', 'distro_hash': '19239ca40e09fe11281233c55315e306c9ecc1b1', 'user': 'ciuser', 'commit_hash': '8d52b8c644cbfc03c60a42058a2ec723d821b684'}, {'timestamp': 1520504055, 'promote_name': 'current-tripleo-rdo', 'distro_hash': '756c2a382eb90da3fc255e481313313829bdd5e1', 'user': None, 'commit_hash': 'ffd1c0125f4a4a96df68f779ed76dd5efd706b96'}, {'timestamp': 1519121659, 'promote_name': 'current-tripleo-rdo', 'distro_hash': 'e0df978be1aa1dd7a133a2e017d1496964bd2648', 'user': None, 'commit_hash': '98b61f4ceed804e222c4cb22b45b23bc35ee833d'}, {'timestamp': 1518258254, 'promote_name': 'current-tripleo-rdo', 'distro_hash': 'e0df978be1aa1dd7a133a2e017d1496964bd2648', 'user': None, 'commit_hash': 'a081db24c546edb189738376f584155f349060ed'}, {'timestamp': 1518171853, 'promote_name': 'current-tripleo-rdo', 'distro_hash': '302accd1ee673b2c6323d38e02ec5d77159bfe3b', 'user': None, 'commit_hash': '7aadc716a7b30992974a31281af9ffeeb989c6f3'}]

## it finds the correct (next) hash to promote...

2018-03-20 02:53:44,832 9490 DEBUG promoter Remaining hashes after removing already promoted ones: [{'timestamp': 1521056055, 'promote_name': 'current-tripleo-rdo', 'distro_hash': '19239ca40e09fe11281233c55315e306c9ecc1b1', 'user': 'ciuser', 'commit_hash': '8d52b8c644cbfc03c60a42058a2ec723d821b684'}]

## But then removes it because it determines that the hash is older than the one currently promoted...which is wrong...

2018-03-20 02:53:44,832 9490 DEBUG promoter Remaining hashes after removing ones older than the currently promoted: []

This is happening here:

https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/dlrnapi_promoter/dlrnapi_promoter.py#L193

```
    if old_hashes is not None:
        latest_hashes = [new_hashes for new_hashes in latest_hashes
                         if new_hashes['timestamp'] > old_hashes['timestamp']]
    logger.debug('Remaining hashes after removing ones older than the '
'currently promoted: %s', latest_hashes)
```

Revision history for this message
Matt Young (halcyondude) wrote :

I've confirmed that the promoter script is working as designed. Because we promoted the older hash, it's seeing the "newly created" old hash as "new"

---

new_hashes = {dict} {'timestamp': 1521056055, 'promote_name': 'current-tripleo-rdo', 'distro_hash': '19239ca40e09fe11281233c55315e306c9ecc1b1', 'user': 'ciuser', 'commit_hash': '8d52b8c644cbfc03c60a42058a2ec723d821b684'}
 'commit_hash' (140489939556512) = {str} '8d52b8c644cbfc03c60a42058a2ec723d821b684'
 'distro_hash' (140489939556608) = {str} '19239ca40e09fe11281233c55315e306c9ecc1b1'
 'promote_name' (140489935413576) = {str} 'current-tripleo-rdo'
 'timestamp' (140490018491920) = {int} 1521056055
 'user' (140490018616496) = {str} 'ciuser'
 __len__ = {int} 5

old_hashes = {dict} {'timestamp': 1521185656, 'promote_name': 'current-tripleo-rdo-internal', 'distro_hash': 'e0df978be1aa1dd7a133a2e017d1496964bd2648', 'user': 'ciuser', 'commit_hash': 'a081db24c546edb189738376f584155f349060ed'}
 'commit_hash' (140489939556512) = {str} 'a081db24c546edb189738376f584155f349060ed'
 'distro_hash' (140489939556608) = {str} 'e0df978be1aa1dd7a133a2e017d1496964bd2648'
 'promote_name' (140489935413576) = {str} 'current-tripleo-rdo-internal'
 'timestamp' (140490018491920) = {int} 1521185656
 'user' (140490018616496) = {str} 'ciuser'
 __len__ = {int} 5

---

datetime.fromtimestamp(old_hashes['timestamp']) = {datetime} 2018-03-16 03:34:16
datetime.fromtimestamp(new_hashes['timestamp']) = {datetime} 2018-03-14 15:34:15

Revision history for this message
Matt Young (halcyondude) wrote :

Root cause for how this happened:

- threw an exception while fetching old promtions
- SCRIPT CONTINUED INSTEAD OF FAIL && BAIL
- script found old job success records for old hash
- script created a *NEW* promotion record

details interleaved with logs:

http://paste.openstack.org/show/705438/1

```

There are 2 patches we need to land ASAP, one is already done (thanks trown)

- #1 https://review.rdoproject.org/r/#/c/12716 (log exceptions)
- #2 TODO: patch to exit script in event of an exception

We should not continue with dubious data.

Revision history for this message
Matt Young (halcyondude) wrote :

link from previous comment should be http://paste.openstack.org/show/705438

Revision history for this message
Matt Young (halcyondude) wrote :

Posted a patch to prevent this in the future.

https://review.rdoproject.org/r/12980

Changed in tripleo:
assignee: nobody → Matt Young (halcyondude)
status: Triaged → In Progress
Revision history for this message
Matt Young (halcyondude) wrote :

patch merged

Revision history for this message
Matt Young (halcyondude) wrote :
Revision history for this message
Arx Cruz (arxcruz) wrote :

Marking as fix released, let me know if need something else

Changed in tripleo:
status: In Progress → Fix Committed
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.