backup_restore: cassandra temporary system tables needs special handling

Bug #1564141 reported by Ignatious Johnson Christopher
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
Undecided
aswani kumar
R2.21.x
Fix Committed
Undecided
aswani kumar
R2.22.x
Fix Committed
Undecided
aswani kumar
R3.0
Fix Committed
Undecided
aswani kumar
Trunk
Fix Committed
Undecided
aswani kumar

Bug Description

There are some special tables in cassandra "system" keyspace such us "hints' which will store data temporarily.

when a snapshot is done during the time when data is present in hints table, it creates hints/snapshots/<old-timestamp>/<data>.

Later(after the data from hints table is removed by system) when fab backup_cassandra_db is executed, this task creates snapshots, and relay on the presence of snapshots dir(https://github.com/Juniper/contrail-fabric-utils/blob/93dc7e111015a413653d6ae885523e65a6d24294/fabfile/tasks/backup_restore.py#L212) to find the tables which has data snapshots.

As the hints table has snapshots dir from earlier snpshot(not from fab backup_cassandra_db), this table is also treated as the table from which the snapshot needs to be backed up.

However the hints/snapshots/ will not not have the <new-timestamap>, so the rsync fails (
https://github.com/Juniper/contrail-fabric-utils/blob/93dc7e111015a413653d6ae885523e65a6d24294/fabfile/tasks/backup_restore.py#L235)

Tags: provisioning
Revision history for this message
Ignatious Johnson Christopher (ijohnson-x) wrote :

Logs:
-------

2016-03-30 15:51:46:864349: Fatal error: sudo() received nonzero return code 23 while executing!
2016-03-30 15:51:46:864349:
2016-03-30 15:51:46:864349: Requested: rsync -avzR -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" mydata/data/system/hints/snapshots/1459378293426 root@99.1.1.26:/root/sdkvse4/
2016-03-30 15:51:46:864349: Executed: sudo -S -p 'sudo password:' /bin/bash -l -c "cd /var/lib/cassandra/ && rsync -avzR -e \"ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null\" mydata/data/system/hints/snapshots/1459378293426 root@99.1.1.26:/root/sdkvse4/"
2016-03-30 15:51:46:864349:
2016-03-30 15:51:46:864389: Aborting.
2016-03-30 15:51:46:864389:
2016-03-30 15:51:47:092934: Backup cassandra DB Failed .... Aborting
2016-03-30 15:51:47:092985:

description: updated
Revision history for this message
Ignatious Johnson Christopher (ijohnson-x) wrote :

Latest snapshots timestamp comparison:
--------------------------------------------------------------------
root@sdkvse4:/opt/contrail/utils# ls -lrt /var/lib/cassandra/mydata/data/config_db_uuid/obj_uuid_table/snapshots/
total 36
drwxr-xr-x 2 root root 4096 Mar 27 21:19 1459138784206
drwxr-xr-x 2 root root 4096 Mar 28 14:29 1459200546548
drwxr-xr-x 2 root root 4096 Mar 28 14:40 1459201233034
drwxr-xr-x 2 root root 4096 Mar 28 14:42 1459201373821
drwxr-xr-x 2 root root 4096 Mar 28 14:49 1459201769443
drwxr-xr-x 2 root root 4096 Mar 28 22:52 1459230762869
drwxr-xr-x 2 root root 4096 Mar 28 22:53 1459230817235
drwxr-xr-x 2 root root 4096 Mar 28 22:55 1459230923123
drwxr-xr-x 2 root root 4096 Mar 30 15:51 1459378293426
root@sdkvse4:/opt/contrail/utils# ls -lrt /var/lib/cassandra/mydata/data/system/hints/snapshots/
total 4
drwxr-xr-x 2 root root 4096 Mar 28 14:29 1459200546548
root@sdkvse4:/opt/contrail/utils#

description: updated
description: updated
description: updated
Revision history for this message
Ignatious Johnson Christopher (ijohnson-x) wrote :

Fix is to skip the older snapshots,

root@sdkvse4:/opt/contrail/utils# diff fabfile/tasks/backup_restore.py.old fabfile/tasks/backup_restore.py
235a236,239
> if not exists(os.path.join(path_to_cassandra,
> snapshot, snapshot_name)):
> # Older snapshot skipping
> continue
root@sdkvse4:/opt/contrail/utils#

tags: added: provisioning
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/18890
Submitter: aswani kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/18894
Submitter: aswani kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/18891
Submitter: aswani kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/18893
Submitter: aswani kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/18889
Submitter: aswani kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/18891
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/8b5b94cb9b09541e4274d71a6403c7a64d55c519
Submitter: Zuul
Branch: R2.21.x

commit 8b5b94cb9b09541e4274d71a6403c7a64d55c519
Author: Aswani Kumar Reddy G <email address hidden>
Date: Wed Mar 30 23:25:19 2016 +0530

fix for cassandra temporary system tables needs special handling and error in handling tsn nodes

1)added fix to skip backup and restore instances when there are tsn and tor nodes
2)skip start and stop of nova compute reboot for tsn and tor nodes

Change-Id: Iaaa63c89eb21e70356a68eb8d7d88140636f13f5
Closes-Bug: #1563640
Closes-Bug: #1564141

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/18889
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/f6fe1964f1c5884c24b5ffcec9acbe18d0600457
Submitter: Zuul
Branch: master

commit f6fe1964f1c5884c24b5ffcec9acbe18d0600457
Author: Aswani Kumar Reddy G <email address hidden>
Date: Wed Mar 30 23:25:19 2016 +0530

fix for cassandra temporary system tables needs special handling and error in handling tsn nodes

1)added fix to skip backup and restore instances when there are tsn and tor nodes
2)skip start and stop of nova compute reboot for tsn and tor nodes

Change-Id: Iaaa63c89eb21e70356a68eb8d7d88140636f13f5
Closes-Bug: #1563640
Closes-Bug: #1564141

Changed in juniperopenstack:
milestone: none → r3.1.0.0-fcs
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/18890
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/9db958ce05f80bde4a2d437befd5b2bf28c9bfbd
Submitter: Zuul
Branch: R3.0

commit 9db958ce05f80bde4a2d437befd5b2bf28c9bfbd
Author: Aswani Kumar Reddy G <email address hidden>
Date: Wed Mar 30 23:25:19 2016 +0530

fix for cassandra temporary system tables needs special handling and error in handling tsn nodes

1)added fix to skip backup and restore instances when there are tsn and tor nodes
2)skip start and stop of nova compute reboot for tsn and tor nodes

Change-Id: Iaaa63c89eb21e70356a68eb8d7d88140636f13f5
Closes-Bug: #1563640
Closes-Bug: #1564141

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/18894
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/f7e3f51e9134c9162470988f8f4089a35abd4a7a
Submitter: Zuul
Branch: R2.20

commit f7e3f51e9134c9162470988f8f4089a35abd4a7a
Author: Aswani Kumar Reddy G <email address hidden>
Date: Wed Mar 30 23:25:19 2016 +0530

fix for cassandra temporary system tables needs special handling and error in handling tsn nodes

1)added fix to skip backup and restore instances when there are tsn and tor nodes
2)skip start and stop of nova compute reboot for tsn and tor nodes

Change-Id: Iaaa63c89eb21e70356a68eb8d7d88140636f13f5
Closes-Bug: #1563640
Closes-Bug: #1564141

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/18893
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/c121487cd5d11936b8e5b76113bc30f408e2d2bf
Submitter: Zuul
Branch: R2.22.x

commit c121487cd5d11936b8e5b76113bc30f408e2d2bf
Author: Aswani Kumar Reddy G <email address hidden>
Date: Wed Mar 30 23:25:19 2016 +0530

fix for cassandra temporary system tables needs special handling and error in handling tsn nodes

1)added fix to skip backup and restore instances when there are tsn and tor nodes
2)skip start and stop of nova compute reboot for tsn and tor nodes

Change-Id: Iaaa63c89eb21e70356a68eb8d7d88140636f13f5
Closes-Bug: #1563640
Closes-Bug: #1564141

information type: Proprietary → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.