ec2api.metadata ERROR 'NoResultFound: No row was found for one()'

Bug #1660888 reported by Jake Yip
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ec2-api
Fix Released
Undecided
tikitavi

Bug Description

Getting metadata from ec2-api-metadata using:

  curl http://169.254.169.254/2009-04-04/meta-data/hostname

sometimes error out with:

2017-02-01 07:05:34.734 1029 INFO ec2api.wsgi.server [req-1667a4a2-499a-4e42-9e62-46af3edd29d6 - - - - -] 172.26.82.57,172.26.8.4 "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 500 len: 229 time: 2.2862999
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata [req-cd694338-39bf-4b41-b239-0e2660dc3f4d - - - - -] Unexpected error.
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata Traceback (most recent call last):
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/metadata/__init__.py", line 94, in __call__
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata resp = self._get_metadata(path_tokens, requester)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/metadata/__init__.py", line 273, in _get_metadata
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata requester['private_ip'])
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/metadata/api.py", line 117, in get_metadata_item
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata _get_ec2_instance_and_reservation(context, os_instance_id))
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/metadata/api.py", line 143, in _get_ec2_instance_and_reservation
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata 'value': [instance_id]}])
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/api/instance.py", line 438, in describe_instances
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata max_results=max_results, next_token=next_token)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/api/instance.py", line 409, in describe
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata max_results=max_results, next_token=next_token)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/api/common.py", line 495, in describe
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata max_results=max_results, next_token=next_token)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/api/common.py", line 401, in describe
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata self.items = self.get_db_items()
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/api/instance.py", line 331, in get_db_items
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata self.groups_name_to_id = _get_groups_name_to_id(self.context)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/api/instance.py", line 1046, in _get_groups_name_to_id
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata for g in (security_group_api.describe_security_groups(context)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/api/security_group.py", line 201, in describe_security_groups
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata context, group_id, group_name, filter)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/api/common.py", line 495, in describe
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata max_results=max_results, next_token=next_token)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/api/common.py", line 419, in describe
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata item = self.auto_update_db(item, os_item)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/api/common.py", line 309, in auto_update_db
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata self.get_id(os_item))
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/api/ec2utils.py", line 294, in auto_create_db_item
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata return db_api.add_item(context, kind, item)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/db/api.py", line 83, in add_item
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata return IMPL.add_item(context, kind, data)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/db/sqlalchemy/api.py", line 72, in wrapper
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata return f(*args, **kwargs)
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/ec2api/db/sqlalchemy/api.py", line 111, in add_item
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata filter(models.Item.id.like('%s-%%' % kind)).
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2699, in one
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata raise orm_exc.NoResultFound("No row was found for one()")
2017-02-01 07:05:37.283 1029 ERROR ec2api.metadata NoResultFound: No row was found for one()

We have neutron proxying metadata to ec2-api-metadata, which further proxies to nova-api-metadata. Is this a correct configuration?

Revision history for this message
Jake Yip (waipengyip) wrote :

Found out a possible place where the bug is occurring.

In the items table, I found a row with the same os_id that the code is trying to update, but with a different project_id. The row looks like this

MariaDB [ec2api]> select * from items where os_id = 'e2e8e955-71e2-402a-9f65-a9d757844a7c';
+-------------+----------------------------------+--------+--------------------------------------+------+
| id | project_id | vpc_id | os_id | data |
+-------------+----------------------------------+--------+--------------------------------------+------+
| sg-ae9770a4 | 911a5b6898a04dac9f23cb6318541f02 | NULL | e2e8e955-71e2-402a-9f65-a9d757844a7c | {} |
+-------------+----------------------------------+--------+--------------------------------------+------+

pdb shows the code trying to create a row with os_id = 'e2e8e955-71e2-402a-9f65-a9d757844a7c' with project_id = '62997fe4cc86415799f5980f94513e05' (different from what is in DB).

(Pdb) vars(item_ref)
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x7f33cc774ed0>, 'os_id': 'e2e8e955-71e2-402a-9f65-a9d757844a7c', 'vpc_id': None, 'project_id': '62997fe4cc86415799f5980f94513e05', 'data': '{}', 'id': 'sg-84076db3'}

This throws the code into db_exception.DBDuplicateEntry, but the subsequent code can't get an item_ref because the filters are too restrictive. I will submit a patch for consideration.

However, this raises some questions:

1) how did the row get created in the first place, and

2) is it safe to update project_id and id for a unique os_id ?

Jake Yip (waipengyip)
Changed in ec2-api:
assignee: nobody → Jake Yip (waipengyip)
Revision history for this message
Feodor Tersin (ftersin) wrote :

1) I have several ideas. Lets figure this out. Which project_id is related to which tenant?

2) No, items cannot be moved from one project to another. The only allowed project_id update is its initialisation, where initially an item is added with None project_id, and then we update it (this is used for public images).

Revision history for this message
Feodor Tersin (ftersin) wrote :

3) Could you also compare os_id of the existing and the added items? Which neutron security groups they are related for?

4) If you've debugged this, probably there is a way to reproduce this? Could you provide the steps?

Revision history for this message
Jake Yip (waipengyip) wrote :

Hi Feodor,

1) I am not sure what do you mean by this?

2) OK

3) os_id of existing and added is the same, hence the error

4) I can't replicate this, in the sense I do not know what process created the existing row. I do have the pdb dump when it crashes.

Revision history for this message
Jake Yip (waipengyip) wrote :
Download full text (5.3 KiB)

1) I wonder if this answers your question:

The os_id e2e8e955-71e2-402a-9f65-a9d757844a7c corresponds to a secgroup. Looking it up using neutron API gives

$ neutron security-group-show e2e8e955-71e2-402a-9f65-a9d757844a7c
+----------------------+--------------------------------------------------------------------+
| Field | Value |
+----------------------+--------------------------------------------------------------------+
| description | Security Group for 026b0aef-78e1-4758-91b5-2991e9c1ef70 |
| id | e2e8e955-71e2-402a-9f65-a9d757844a7c |
| name | SecGroup_026b0aef-78e1-4758-91b5-2991e9c1ef70 |
| security_group_rules | { |
| | "remote_group_id": null, |
| | "direction": "egress", |
| | "protocol": null, |
| | "description": "", |
| | "ethertype": "IPv4", |
| | "remote_ip_prefix": null, |
| | "port_range_max": null, |
| | "security_group_id": "e2e8e955-71e2-402a-9f65-a9d757844a7c", |
| | "port_range_min": null, |
| | "tenant_id": "2f3e9e705b0b460b9de90d9844e88fd2", |
| | "id": "06833642-8fd2-461b-a857-fcfa5fefb0eb" |
| | } | | | { |
| | "remote_group_id": null, |
| | "direction": "ingress", |
| | "protocol": "tcp", |
| | "description": "", |
| | "ethertype": "IPv4", |
| | "remote_ip_prefix": "0.0.0.0/0", |
| | "port_range_max": 3306, |
| | "security_group_id": "e2e8e955-71e2-402a-9f65-a9d757844a7c", |
| | "port_range_min": 3306, |
| | "tenant_id": "2f3e9e705b0b460b9de90d9844e88f...

Read more...

Revision history for this message
Feodor Tersin (ftersin) wrote :

1 I mean i'd want to know tenants which are related to the problem data. 62997fe4cc86415799f5980f94513e05 is a tenant of the instance. 2f3e9e705b0b460b9de90d9844e88fd2 is default tenant of admin account which is used by ec2api. What is for 911a5b6898a04dac9f23cb6318541f02?

Revision history for this message
Feodor Tersin (ftersin) wrote :

5 what's been happened with SecGroup_026b0aef-78e1-4758-91b5-2991e9c1ef70? What is its history? Was it just generated and specified for the instance launch? Or it was added for the running instance later? Or smth else?

Revision history for this message
Jake Yip (waipengyip) wrote :

911a5b6898a04dac9f23cb6318541f02 is another tenant of a user.

How do I find out who does SecGroup_026b0aef-78e1-4758-91b5-2991e9c1ef70 belongs to? Doing a `neutron security-group-list` using tenant 62997fe4cc86415799f5980f94513e05 doesn't show any secgroup of the name or id.

Revision history for this message
Feodor Tersin (ftersin) wrote :

Looks like the reason (at least one of them) of the problem is that describe_security_group unexpectedly processes groups not for the instance's project only. The groups are returned by call of neutron.list_security_group (security_group.SecurityGroupEngineNeutron.get_os_groups). Could you please turn on debug level logging and get logs for this call and its result?

Revision history for this message
Feodor Tersin (ftersin) wrote :

Jake, it looks like that i missed your last comment, and forgot to answer some your questions. Let me explain my thoughts about the problem.

At the first, os_id is designed as an unique identifier of an OS object within a certain OS deployment, generated by OS itself. This is a key assumption used to link items of ec2api db with OS objects. If it is not so, a lot of ec2api internals has to be reworked. However OS identifiers are unique indeed. And i do not believe that your OS has more than one security group with the same id.

According to info you provided, there is the security group in service tenant, which is reflected in ec2api with wrong project_id (belongs to another tenant). This should never happen, it is an impossible state.

Another point is that isolation per project there is used in ec2api. Regular code which processes user request should neither get OS items which belong to another project, nor manage such db items.

However in your deployment ec2api fails on handling security group of service tenant, processing request for instance's tenant.

I suppose that the reason of the wrong db state is that previously when a similar request for another instance of the other tenant was being processed, it handled this security group and created the db item for that assigned there its own project id.

And finally your instance comes to ec2api, which handles that security group erroneously too, and fails.

So i think that the reason of problem is that ec2api metadata server handles this security group. But it does this because it receives the group from Neutron, despite of specified filter by a tenant.

Because that i suspect that this filtering does not work, and want to look at debug level logs of metadata server which contain request arguments and received data.

Revision history for this message
Jake Yip (waipengyip) wrote :

Hi Feodor,

Thanks! Have been looking into it over the last few days. You are right on your hunch, the filtering does not work.

We are using the nova secgroup engine, but it lists all the security groups of the 'service' tenant instead of the user tenant. I've also looked at the nova call, there seems to be no way of passing a project_id to filter for.

I've also tried to look into nova code, there's a part marked as TODO to allow search_opts with params

https://github.com/openstack/nova/blob/stable/mitaka/nova/compute/api.py#L4229

Right now a rough implementation will be to list all secgroups and filter, but this is SLOW and seems to only list 1000 secgroups in our environment. So it probably needs more work?

Hope I'm on the right track... I might not be explaining it too well, so I've uploaded some code to https://review.openstack.org/#/c/427554/

Revision history for this message
Andrey Pavlov (apavlov-e) wrote :

Hi Jake,
But why do you use nova engine for all network stuff if your deployment has neutron?

Revision history for this message
Jake Yip (waipengyip) wrote :

HI Andrey,

It's due to `full_vpc_support = False`, apparently things don't work very well if we set that to True.

I could test out with True and see what fails. Not sure how that affects this bug though?

Revision history for this message
Jake Yip (waipengyip) wrote :

Sorry I just saw your comments on my commit regarding this.

Yes we use nova because we set full_vpc_support = false. I think we have a problem booting instances with ec2-api if this is true, I'll have to do some more testing for more specifics.

I have tried setting it to true, the neutron engine works fine.

I don't understand how this relates to the bug, for example, what if we don't want to have VPC support? Is that a supported scenario?

Revision history for this message
Andrey Pavlov (apavlov-e) wrote :

Yeah, you are right that it's a correct scenario, it's a valid bug and it should be fixed :)

But... nova-network is deprecated and will be removed soon. So we can release this fix only for Ocata and previous versions.
BTW - what version do you use/need?

And in fact in your deployment ec2-api uses neutron via nova-proxy. And you found some TODO-s in this proxy. So it's better to fix working ec2-api with neutron directly.
We suggested that users will use full_vpc_support in case of neutron deployment (to avoid proxying). I had thoughts that this proxy can't work well with other objects too (maybe addresses/floating ips).

Also it would be very helpful if you file bugs for cases with enabled full_vpc_support.

Revision history for this message
Feodor Tersin (ftersin) wrote :

Jake, thank you for the investigation. You're right, Nova API cannot filter security groups by project_id other than user's one, but ec2api does not take it into account.

You use the configuration (full_vpc_support=False over Neutron) which is somewhat a side branch of mainstream usages. It's not tested in CI, despite it's supported. It's designed right for that purpose which you use: to disable vpc-related features for Neutron-based deployments for some reasons (mostly for old deployments previously used Nova's ec2api). This is why the configuration parameter name is directly not related to nova network/neutron engines, but has a functional meaning.

I think ec2api might support this feature while Nova supports corresponding API's (security groups, floating IPs), which are supported now despite nova-network deprecation. But someday Nova disbands these APIs, and i do not think that ec2api team will continue to support this feature, because the reworking it onto pure Neutron API costs more than the feature worth. So i agree with Andrey that it makes sense for your deployment to convert it to full vpc support.

As for your question ('what if i do not want to have vpc')... There are two separate pages in AWS Console for EC2 and VPC, though, AWS provides the single API for that (called EC2) and the single subset of aws CLI (selected by ec2 parameter). So we may say about VPC-related commands as about volume-related commands for example. I mean, what if a cloud admin says 'i don't want to have volumes supported via ec2api'? I think the right way to get such restrictions is AWS IAM. This is a separate huge task, though.

Btw, was there any reason for you to switch off vpc support except bugs you found?

Also i join to Andrew to say that we'll very graceful if you report bugs with full vpc support.

Feodor Tersin (ftersin)
Changed in ec2-api:
assignee: Jake Yip (waipengyip) → tikitavi (rtikitavi)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ec2-api (master)

Reviewed: https://review.openstack.org/427554
Committed: https://git.openstack.org/cgit/openstack/ec2-api/commit/?id=6b3c283894c9ab3edbac80008ea80e1c437ff1b3
Submitter: Jenkins
Branch: master

commit 6b3c283894c9ab3edbac80008ea80e1c437ff1b3
Author: Jake Yip <email address hidden>
Date: Wed Feb 1 16:07:24 2017 +1100

    Using neutron engine in security groups describe.

    Nova engine works incorrect in case when describe is using in metadata.
    Security-group-list in nova cannot be filtered by tenant,
    listing all secgroups in case of big amount of groups can be slow
    and may have limitations in number.

    Co-Author: tikitavi <email address hidden>

    Change-Id: I199b0f4f4febad4c23a0d8968f7858763bcbf00c
    Closes-Bug: #1660888

Changed in ec2-api:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ec2-api (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/494630

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ec2-api 7.0.0

This issue was fixed in the openstack/ec2-api 7.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ec2-api (stable/ocata)

Change abandoned by Andrey Pavlov (<email address hidden>) on branch: stable/ocata
Review: https://review.opendev.org/494630

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.