sql backend throwing sporadic error

Bug #1214257 reported by gordon chung
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Released
High
gordon chung

Bug Description

2013-08-20 02:14:34.942 9348 ERROR ceilometer.collector.dispatcher.database [req-3846dc97-1188-401d-af2b-a8184f0b42dc None None] Failed to record metering data: Unterminated string starting at: line 1 column 349 (char 349)
2013-08-20 02:14:34.943 9348 ERROR ceilometer.collector.dispatcher.database [req-3846dc97-1188-401d-af2b-a8184f0b42dc None None] Unterminated string starting at: line 1 column 349 (char 349)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database Traceback (most recent call last):
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/opt/stack/ceilometer/ceilometer/collector/dispatcher/database.py", line 65, in record_metering_data
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database self.storage_conn.record_metering_data(meter)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/opt/stack/ceilometer/ceilometer/storage/impl_sqlalchemy.py", line 195, in record_metering_data
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database resource = session.merge(Resource(id=str(data['resource_id'])))
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1475, in merge
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database load=load, _recursive=_recursive)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1514, in _merge
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database merged = self.query(mapper.class_).get(key[1])
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 775, in get
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database return self._load_on_ident(key)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2512, in _load_on_ident
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database return q.one()
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2184, in one
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database ret = list(self)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2348, in instances
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database rows = [process[0](row, None) for row in fetch]
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/mapper.py", line 2120, in _instance
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database populate_state(state, dict_, row, isnew, only_load_props)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/mapper.py", line 1974, in populate_state
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database populator(state, dict_, row)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/strategies.py", line 150, in fetch_col
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database dict_[key] = row[col]
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/types.py", line 662, in process
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database return process_value(value, dialect)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/opt/stack/ceilometer/ceilometer/storage/sqlalchemy/models.py", line 66, in process_result_value
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database value = json.loads(value)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database return _default_decoder.decode(s)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database obj, end = self.scan_once(s, idx)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database ValueError: Unterminated string starting at: line 1 column 349 (char 349)
2013-08-20 02:14:34.943 9348 TRACE ceilometer.collector.dispatcher.database
2013-08-20 02:14:35.053 9348 ERROR ceilometer.pipeline [-] Pipeline meter_pipeline: Continue after error from publisher <ceilometer.publisher.rpc.RPCPublisher object at 0x381d4d0>
2013-08-20 02:14:35.053 9348 TRACE ceilometer.pipeline Traceback (most recent call last):
2013-08-20 02:14:35.053 9348 TRACE ceilometer.pipeline File "/opt/stack/ceilometer/ceilometer/pipeline.py", line 217, in _publish_samples
2013-08-20 02:14:35.053 9348 TRACE ceilometer.pipeline p.publish_samples(ctxt, transformed_samples)
2013-08-20 02:14:35.053 9348 TRACE ceilometer.pipeline File "/opt/stack/ceilometer/ceilometer/publisher/rpc.py", line 178, in publish_samples
2013-08-20 02:14:35.053 9348 TRACE ceilometer.pipeline self.flush()
2013-08-20 02:14:35.053 9348 TRACE ceilometer.pipeline File "/opt/stack/ceilometer/ceilometer/publisher/rpc.py", line 224, in flush
2013-08-20 02:14:35.053 9348 TRACE ceilometer.pipeline self.local_queue.pop(0)
2013-08-20 02:14:35.053 9348 TRACE ceilometer.pipeline IndexError: pop from empty list

Tags: sql-backend
gordon chung (chungg)
Changed in ceilometer:
assignee: nobody → gordon chung (chungg)
Revision history for this message
gordon chung (chungg) wrote :

this is thrown because we pass in None as resource_id in http request. the sql model requires resource to build meter. the question is do we add a resource_id, remove the resource dependency from meters, or wait until events are properly setup to record body and scrap this Sample.

tags: added: sql-backend
Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

The API should validate that a resource ID is provided. It's a required field.

Julien Danjou (jdanjou)
Changed in ceilometer:
status: New → Triaged
importance: Undecided → High
milestone: none → havana-3
Revision history for this message
gordon chung (chungg) wrote :

i'll look at what else is useful in request environ variables that might help give us resource id. one way is to take the service endpoint catalog that keystone auth_token middleware adds and try to figure out which service(resource) is being called. but the service catalog is optional according to middleware notes so i'm hoping there's another option.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

Gordon, when you say "API" do you mean "ceilometer's REST API" or something else? Because if it's the REST API, the caller should provide the resource explicitly and we should not guess.

Revision history for this message
gordon chung (chungg) wrote :

Doug, i mean the middleware notifier https://blueprints.launchpad.net/ceilometer/+spec/count-api-requests. i don't believe resource_id is available at middleware stage of request so the notification sent i don't believe has a resource_id... i was under the assumption that service_id in keystone was resource_id but now that i look at resource_ids collected by Ceilometer, that doesn't appear to be the case...

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

OK. Are you measuring API calls against individual instances, etc.?

Revision history for this message
gordon chung (chungg) wrote :

based on the blueprint, it isn't measuring against an instance specifically. it seems to just count the api requests each service receives (and captures a set of request.environ variables for metadata). see middleware here: https://review.openstack.org/#/c/41423/

seems to me if anything, the middleware could be the resource_id

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

I think I'd use the service name (keystone, nova, whatever). The URL is part of the CADF payload, right?

Revision history for this message
Julien Danjou (jdanjou) wrote :

How do we get the service name, that's the question.

Revision history for this message
gordon chung (chungg) wrote :
Download full text (7.7 KiB)

to get the service name we'd probably need to take HTTP_HOST value and find the corresponding entry in HTTP_X_SERVICE_CATALOG. (this is what we do in to generate CADF as well)

the full url path isn't part of CADF payload either -- it can be added (as its extensible) but it's not part of formal spec. i could get Matt to propose to add it to formal spec if you find value there (we've been adding/proposing a bunch of Openstack attr since we've started)

here is what the payload contains right now (the CADF_EVENT attr is there if and only if you use the middleware.audit, middleware.notifier does captures everything else):

RESPONSE

{
 "_id" : ObjectId("5213edc8eb2d733c8ed7d7cd"),
 "counter_name" : "http.response",
 "user_id" : "36448cb2ec454631a6a8cb664a29332f",
 "message_signature" : "ef69c38520c06891bf7a577d663ed41a74a74f6fcc1c3c998c5b558c2631f1c3",
 "timestamp" : ISODate("2013-08-20T22:29:28.482Z"),
 "resource_id" : null,
 "resource_metadata" : {
  "host" : "nova-api.plwdevstack",
  "request" : {
   "HTTP_X_TENANT_NAME" : "service",
   "SCRIPT_NAME" : "/v2",
   "REQUEST_METHOD" : "GET",
   "PATH_INFO" : "/2fb92e2139654e87bc03c42543eababf/os-floating-ips",
   "SERVER_PROTOCOL" : "HTTP/1.0",
   "HTTP_X_USER_ID" : "36448cb2ec454631a6a8cb664a29332f",
   "HTTP_X_AUTH_TOKEN" : "tokenid",
   "HTTP_USER_AGENT" : "python-novaclient",
   "HTTP_X_DOMAIN_NAME" : null,
   "HTTP_X_PROJECT_DOMAIN_ID" : null,
   "REMOTE_PORT" : "48729",
   "HTTP_X_ROLE" : "_member_,admin,ResellerAdmin",
   "HTTP_X_IDENTITY_STATUS" : "Confirmed",
   "HTTP_X_DOMAIN_ID" : null,
   "SERVER_PORT" : "8774",
   "CADF_EVENT" : {
    "typeURI" : "http://schemas.dmtf.org/cloud/audit/1.0/event",
    "eventTime" : "2013-08-20T22:28:18.122305+0000",
    "target" : {
     "typeURI" : "service/compute",
     "addresses" : [
      {
       "url" : "http://10.0.2.15:8774/v2/2fb92e2139654e87bc03c42543eababf",
       "name" : "admin"
      },
      {
       "url" : "http://10.0.2.15:8774/v2/2fb92e2139654e87bc03c42543eababf",
       "name" : "private"
      },
      {
       "url" : "http://10.0.2.15:8774/v2/2fb92e2139654e87bc03c42543eababf",
       "name" : "public"
      }
     ],
     "id" : "2f134369e44b4bb0b681551228474203",
     "name" : "nova"
    },
    "observer" : "target",
    "tags" : [
     "correlation_id?value=f36fff90-6b5e-54c2-9438-73dc60c57657"
    ],
    "eventType" : "activity",
    "initiator" : {
     "typeURI" : "service/security/account/user",
     "name" : "ceilometer",
     "credential" : {
      "token" : "tokenid",
      "identity_status" : "Confirmed"
     },
     "host" : {
      "agent" : "python-novaclient",
      "address" : "10.0.2.15"
     },
     "project_id" : "2fb92e2139654e87bc03c42543eababf",
     "id" : "36448cb2ec454631a6a8cb664a29332f"
    },
    "reason" : {
     "reasonCode" : "200",
     "reasonType" : "HTTP"
    },
    "reporterchain" : [
     {
      "reporterTime" : "2013-08-20T22:29:28.480996+0000",
      "role" : "modifier",
      "reporter" : "target"
     }
    ],
    "action" : "list",
    "outcome" : "success",
    "id" : "c5e63215-9afd-5911-8644-0210448efa5c"
   },
   "HTTP_X_SERVICE_CATALOG" : "[{\"endpoints_links\": [], \"endpoints\": ...

Read more...

Revision history for this message
Julien Danjou (jdanjou) wrote :

We don't need CADF at all, and requiring it would not be a good idea.
HTTP_HOST seems like a good start. It would be great to have the service type too somewhere.
Maybe we could enhance all API to export a HTTP_SERVICE_TYPE or the like.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

The middleware could use a configuration option to control the value and default to HTTP_HOST if it is not set.

Revision history for this message
gordon chung (chungg) wrote :

Julien, agreed, CADF is only there if using middleware.audit. using middleware.notifier gives the same payload minus CADF_EVENT.

in audit_middleware we take the request.url/HTTP_HOST, and find corresponding entry in HTTP_X_SERVICE_CATALOG to get service name and service_type. i can take the same logic and apply it to middleware.notifier. question is, should we calculate this at ceilometer collector or at middleware.notifier.

Revision history for this message
gordon chung (chungg) wrote :

ok, i'll add HTTP_SERVICE_TYPE and HTTP_SERVICE_NAME to req.environ in middleware.notifier by calculating using HTTP_X_SERVICE_CATALOG (if it exists). brainstorm, does this make sense for this to be done in auth_token if HTTP_X_SERVICE_CATALOG comes from there anyways?

 in notification handler we take HTTP_SERVICE_NAME as resource_id or HTTP_HOST if HTTP_SERVICE_NAME does not exist.

Revision history for this message
gordon chung (chungg) wrote :

whoops, didn't notice your comment Doug. i'm pretty indifferent to whether we set a config option and append as SERVICE_NAME or if we try to calculate SERVICE_NAME and SERVICE_TYPE from SERVICE_CATALOG. either or works for me -- i guess the latter has the overhead of calculating this each time but it would be offer a more accurate SERVICE_NAME. i guess forced to choose, the config option path would be simplest.

what do you guys think?

Revision history for this message
Julien Danjou (jdanjou) wrote :

I have the feeling that the service catalog lookup is likely to fail when using complicated setup with HTTP proxy in the middle, isn't it?

The config option is more a pain, but it's going to be more reliable.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

Yes, a runtime lookup is going to be more prone to variance or failure that a configuration option won't be.

Revision history for this message
gordon chung (chungg) wrote :

cool cool, just so we're aware, defaulting to HTTP_HOST would give a SERVICE_NAME something like 'http://10.0.2.15:8774/'. it seems odd but i guess that is the only consistent variable to identify the service.

Changed in ceilometer:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/43585
Committed: http://github.com/openstack/ceilometer/commit/c675197fa326319bb410b788595088918597dc39
Submitter: Jenkins
Branch: master

commit c675197fa326319bb410b788595088918597dc39
Author: Gordon Chung <email address hidden>
Date: Sat Aug 24 11:14:12 2013 -0400

    missing resource in middleware notification

    sql backend throws errors because resource is a required value
    set the resource to HTTP_X_SERVICE_NAME value.

    Change-Id: If2407881b8e3f3ce313ade5295258f0f72f6ea71
    Partial-Bug: #1214257

gordon chung (chungg)
Changed in ceilometer:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in ceilometer:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in ceilometer:
milestone: havana-3 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.