Bug because some pysnmp version implements GETBULK instead of BULKWALK
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceilometer |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Dear all,
BLOT:
pysnmp version 4.2.5 (allowed by ceilometer requirements) implements bulkCmd using GETBULK. For pysnmp version 4.3.2 bulkCmd is implemented using BULKWALK. I believe snmp inspector OID caching assumes that bulkCmd is implemented using BULKWALK. Therefore, requirement should be updated to pysnmp<
DETAILS:
I found an issue with the pysnmp version that it is set in the requirements for ceilometer:
https:/
This bug mostly affects me because I am prototyping some changes snmp inspector. However, I think it is important enough to introduce serious issues if not address now. It is also as I found out, easy to fix.
I am using snmp for collecting information about hardware. The heart of the code it is the snmp inspector, that makes snmp calls and collect previous call's results in a cache.
https:/
When running the snmp inspector for type_prefix, the inspector uses bulkCmd from pysnmp to collect all the OIDs for a given OID prefix.
https:/
Each time the inspect_generic is called for a new type_prefix, the inspector checks if it is in cache, by looking for at least one metric sharing the the same prefix. If this is the case, my understanding is that the inspector assumes that all the OID values of the subtrees with common prefix (and only with common prefix) are cached.
This can only true if bulkCmd collects all the OID within the subtrees with same prefix, i.e. it does not collect metrics with different prefixes. This is equivalent to say that bulkCmd implements the equivalent to BULKWALK command. However, I found out this depends on the pysnmp version that is used.
The pysnmp requirement in master branch is pysnmp<
---
from pprint import pprint as pp
from pysnmp.
auth = cmdgen.
transport = cmdgen.
cmdrun = cmdgen.
errorIndication, errorStatus, errorIndex, varBindTable = cmdrun.bulkCmd(
auth, transport, 0, 100, '1.3.6.
)
pp(varBindTable)
---
This code running using pysnmp==4.2.5 returns exactly 100 records including OID that are not in the initial prefix like 1.3.6.1.
If I run the same test code using pysnmp==4.3.2 I get as return ~32 records (depending on the number of disk in the host), all of them with the same prefix as prefix OID, or 1.3.6.1.
In my case I have the following problem if bulkCmd == GETBULK. I have cases where a call for a prefix 1.3.6.1.
Sorry for the long explanation, it was a subtle issue that really consumed a lot my time to understand it. I am new to ceilometer and also pysnmp.
I hope you find this useful!
Victor
While this is a very interesting and detailled report, Ceilometer works correctly with 4.2.5 so the requirements are not wrong. There's nothing to change here.
It's pretty obvious that, yes, installing the latest version of any dependency might fix bug. Updating requirements each time a lib release a new version that fixes some bugs they had is not the job of Ceilometer developers. The requirements list the compatibility. Not which version have bug or not.
I hope I made things clearer!