gnocchi-upgrade failed with 'table already exist' error

Bug #1755564 reported by Yossi Ovadia
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Michele Baldessari

Bug Description

3 controller ( >1 number of computes. )

We notice the following error ( which is not 100% repeatable ) often on our deployment
queens with containers.

"+ echo 'Running command: '\\''/usr/bin/gnocchi-upgrade --sacks-number=128'\\'''",
"+ exec /usr/bin/gnocchi-upgrade --sacks-number=128",
"2018-03-13 09:24:34,859 [7] WARNING oslo_config.cfg: Option \"coordination_url\" from group \"storage\" is deprecated. Use option \"coordination_url\" from group \"DEFAULT\".",
"2018-03-13 09:24:35,164 [7] INFO gnocchi.cli.manage: Upgrading indexer SQLAlchemyIndexer: mysql+pymysql://gnocchi:YqvWJ7bex7KZJG9WzCe2kHHFp@172.17.1.16/gnocchi?read_default_
tc/my.cnf.d/tripleo.cnf",
..
....

"2018-03-13 09:24:38,914 [7] CRITICAL root: Traceback (most recent call last):",
" File \"/usr/bin/gnocchi-upgrade\", line 10, in <module>",
" sys.exit(upgrade())",
" File \"/usr/lib/python2.7/site-packages/gnocchi/cli/manage.py\", line 81, in upgrade",
" index.create_archive_policy(ap)",
" File \"/usr/lib/python2.7/site-packages/gnocchi/indexer/sqlalchemy.py\", line 616, in create_archive_policy",
" raise indexer.ArchivePolicyAlreadyExists(archive_policy.name)",
"ArchivePolicyAlreadyExists: Archive policy high already exists",

We also saw this error on different deployment-

"+ exec /usr/bin/gnocchi-upgrade --sacks-number=128\"...
...
...
\" File \\"/usr/lib/python2.7/site-packages/pymysql/err.py\\", line 107, in raise_mysql_exception\",
 \" raise errorclass(errno, errval)\",
 \"InternalError: (1050, u\\"Table 'archive_policy_rule' already exists\\")\",
 \"2018-03-13 00:11:59,916 [7] CRITICAL root: Traceback (most recent call last):\",
 \" File \\"/usr/bin/gnocchi-upgrade\\", line 10, in <module>\",
 \" sys.exit(upgrade())\",

What we suspect is some sort of race condition where one out of the 3 controller creates the table and others throws the exception.

Revision history for this message
Yossi Ovadia (jabadia) wrote :

This is the cod, my question in comment - # Yossi

    def create_archive_policy(self, archive_policy):
        ap = ArchivePolicy(
            name=archive_policy.name,
            back_window=archive_policy.back_window,
            definition=archive_policy.definition,
            aggregation_methods=list(archive_policy.aggregation_methods),
        )
        try:
            with self.facade.writer() as session:
                session.add(ap)
        except exception.DBDuplicateEntry:
            # Yossi comment - can the below change to 'Warning' or something? if the table exists that's great, we can skip. or am i missing something ?

            raise indexer.ArchivePolicyAlreadyExists(archive_policy.name)
        return ap

Changed in tripleo:
assignee: nobody → Yossi Ovadia (jabadia)
Changed in tripleo:
milestone: none → rocky-1
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Michele Baldessari (michele) wrote :

Rasca has seen this one as well on OSP12. Seems that at least parts(?) of gnocchi-upgrade are just not resilient when run from multiple nodes:
(undercloud) [stack@undercloud ~]$ for i in 14 11 20; do ssh heat-admin@192.168.24.$i "sudo docker ps -a |grep gnocchi_db_sync"; done
a241fb77a544 192.168.24.1:8787/rhosp12/openstack-gnocchi-api:2018-03-10.1 "kolla_start" 7 hours ago Exited (0) 7 hours ago gnocchi_db_sync
2ddbbfd504ce 192.168.24.1:8787/rhosp12/openstack-gnocchi-api:2018-03-10.1 "kolla_start" 7 hours ago Exited (0) 7 hours ago gnocchi_db_sync
dc94de4a4147 192.168.24.1:8787/rhosp12/openstack-gnocchi-api:2018-03-10.1 "kolla_start" 7 hours ago Exited (1) 7 hours ago gnocchi_db_sync

I'll propose something to avoid it in tht, but we'll need some telemetry eyes on it

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/553028

Changed in tripleo:
assignee: Yossi Ovadia (jabadia) → Michele Baldessari (michele)
status: Triaged → In Progress
Changed in tripleo:
importance: Medium → High
tags: added: pike-backport-potential queens-backport-potential
Revision history for this message
Raoul Scarazzini (rasca) wrote :

I tested the patch over the same environment described by Michele above and it worked.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/553328

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/553028
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=1d704d4733768dc7d502d6cccaf083c6382e1edd
Submitter: Zuul
Branch: master

commit 1d704d4733768dc7d502d6cccaf083c6382e1edd
Author: Michele Baldessari <email address hidden>
Date: Wed Mar 14 20:49:05 2018 +0100

    Fix gnocchi-upgrade Table <..> already exists errors

    Currently we are calling /usr/bin/gnocchi-upgrade
    --sacks-number=SACK_NUM from each node where gnocchi-api is part of the
    role. gnocchi-upgrade seems to be racy and we sometimes end up with the
    following error:

    2018-03-14 12:39:39,683 [1] ERROR oslo_db.sqlalchemy.exc_filters: DBAPIError exception wrapped from (pymysql.err.InternalError) (1050, u"Table 'archive_policy' already exists") [SQL: u'\nCREATE TABLE archive_policy (\n\tname VARCHAR(255) NOT NULL, \n\tback_window INTEGER NOT NULL, \n\tdefinition TEXT NOT NULL, \n\taggregation_methods TEXT NOT NULL, \n\tPRIMARY KEY (name)\n)ENGINE=InnoDB CHARSET=utf8\n\n']
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
        context)
      File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
        cursor.execute(statement, parameters)
      File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute
        result = self._query(query)
      File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query
        conn.query(q)
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 856, in query
        self._affected_rows = self._read_query_result(unbuffered=unbuffered)
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1057, in _read_query_result
        result.read()
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1340, in read
        first_packet = self.connection._read_packet()
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1014, in _read_packet
        packet.check_error()
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
        err.raise_mysql_exception(self._data)
      File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
        raise errorclass(errno, errval)
    InternalError: (1050, u"Table 'archive_policy' already exists")

    Let's run it from a the bootstrap node by wrapping it into the
    bootstrap_host_exec magic.

    Change-Id: I106512eeffff3425608a543f9bc5e6a9508d15e5
    Closes-Bug: #1755564

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/553402

Raoul Scarazzini (rasca)
Changed in tripleo:
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/553402
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=187dba22b9a9a7a6079beac9d0b1feaa09d61853
Submitter: Zuul
Branch: stable/pike

commit 187dba22b9a9a7a6079beac9d0b1feaa09d61853
Author: Michele Baldessari <email address hidden>
Date: Wed Mar 14 20:49:05 2018 +0100

    Fix gnocchi-upgrade Table <..> already exists errors

    Currently we are calling /usr/bin/gnocchi-upgrade
    --sacks-number=SACK_NUM from each node where gnocchi-api is part of the
    role. gnocchi-upgrade seems to be racy and we sometimes end up with the
    following error:

    2018-03-14 12:39:39,683 [1] ERROR oslo_db.sqlalchemy.exc_filters: DBAPIError exception wrapped from (pymysql.err.InternalError) (1050, u"Table 'archive_policy' already exists") [SQL: u'\nCREATE TABLE archive_policy (\n\tname VARCHAR(255) NOT NULL, \n\tback_window INTEGER NOT NULL, \n\tdefinition TEXT NOT NULL, \n\taggregation_methods TEXT NOT NULL, \n\tPRIMARY KEY (name)\n)ENGINE=InnoDB CHARSET=utf8\n\n']
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
        context)
      File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
        cursor.execute(statement, parameters)
      File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute
        result = self._query(query)
      File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query
        conn.query(q)
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 856, in query
        self._affected_rows = self._read_query_result(unbuffered=unbuffered)
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1057, in _read_query_result
        result.read()
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1340, in read
        first_packet = self.connection._read_packet()
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1014, in _read_packet
        packet.check_error()
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
        err.raise_mysql_exception(self._data)
      File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
        raise errorclass(errno, errval)
    InternalError: (1050, u"Table 'archive_policy' already exists")

    Let's run it from a the bootstrap node by wrapping it into the
    bootstrap_host_exec magic.

    Change-Id: I106512eeffff3425608a543f9bc5e6a9508d15e5
    Closes-Bug: #1755564
    (cherry picked from commit 1d704d4733768dc7d502d6cccaf083c6382e1edd)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.openstack.org/553328
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=7d50af58ea583e14ad2f0f42fad0e608bf428e53
Submitter: Zuul
Branch: stable/queens

commit 7d50af58ea583e14ad2f0f42fad0e608bf428e53
Author: Michele Baldessari <email address hidden>
Date: Wed Mar 14 20:49:05 2018 +0100

    Fix gnocchi-upgrade Table <..> already exists errors

    Currently we are calling /usr/bin/gnocchi-upgrade
    --sacks-number=SACK_NUM from each node where gnocchi-api is part of the
    role. gnocchi-upgrade seems to be racy and we sometimes end up with the
    following error:

    2018-03-14 12:39:39,683 [1] ERROR oslo_db.sqlalchemy.exc_filters: DBAPIError exception wrapped from (pymysql.err.InternalError) (1050, u"Table 'archive_policy' already exists") [SQL: u'\nCREATE TABLE archive_policy (\n\tname VARCHAR(255) NOT NULL, \n\tback_window INTEGER NOT NULL, \n\tdefinition TEXT NOT NULL, \n\taggregation_methods TEXT NOT NULL, \n\tPRIMARY KEY (name)\n)ENGINE=InnoDB CHARSET=utf8\n\n']
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
        context)
      File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
        cursor.execute(statement, parameters)
      File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute
        result = self._query(query)
      File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query
        conn.query(q)
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 856, in query
        self._affected_rows = self._read_query_result(unbuffered=unbuffered)
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1057, in _read_query_result
        result.read()
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1340, in read
        first_packet = self.connection._read_packet()
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1014, in _read_packet
        packet.check_error()
      File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
        err.raise_mysql_exception(self._data)
      File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
        raise errorclass(errno, errval)
    InternalError: (1050, u"Table 'archive_policy' already exists")

    Let's run it from a the bootstrap node by wrapping it into the
    bootstrap_host_exec magic.

    Change-Id: I106512eeffff3425608a543f9bc5e6a9508d15e5
    Closes-Bug: #1755564

tags: added: in-stable-queens
Revision history for this message
kobig (kobi.ginon) wrote :

Seems that the fix is doig the job with 3 controllers, (we did not encounter this issue with 1 Controller)
still validating to make sure this is a sold fix

regards

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.11

This issue was fixed in the openstack/tripleo-heat-templates 7.0.11 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 9.0.0.0b1

This issue was fixed in the openstack/tripleo-heat-templates 9.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 9.0.0.0b2

This issue was fixed in the openstack/tripleo-heat-templates 9.0.0.0b2 development milestone.

Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.