Comment 3 for bug 1918340

Revision history for this message
Henrique Marques (hmdmarques) wrote :

Thank you for your time analysing these issues.

My intention is solely to pass information to the OpenStack community, allowing to improve tests and have a more effective test suite (my apologies for any information supplied in a wrong manner). In the end, having a test suite that is more capable of capturing future (probable) bugs.

The fault injection performed on the compute/api.py was done in the stable/ussuri because when we started it was the most recent released version.
I must say that I cannot repeat the process in the master branch in a timely manner, because the faults we injected (defined based on closed and resolved bug reports) lead to too many faulty versions to test (11309 versions to be more specific). Using the setup I have available testing all faulty versions takes over 200 days and testing the faulty versions that pass undetected through the tests takes nearly 50 days.

With this said, I must emphasize that I am reporting just part (72 cases) of what I found during experiments, that mostly require trivial changes to the test cases, but allow for more effective unit tests. In total we have found 290 probable bugs that are not being detected by any of the unit, functional and integration tests (notice that these are probable bugs, representative of what OpenStack has already experienced and fixed in the past).

Fixing these issues would allow to improve the test coverage and overall effectiveness.

I will highlight just some of the most relevant bugs detected:
-Removing @check_instance_lock allows operations to be executed on instances that are locked
-Changing condition expressions result in operations being performed when not supposed (e.g. cache reset)
-Exception handling being removed results in unexpected behaviour
-Many other fault types result in incorrect values being returned by the called functions. The reason for this is that mock functions in the tests are not validating the receveid parameters and return a fixed expected value. This is obfuscating some of the issues up until that function call.
All the problems described above are not being detected by the test suite.

At the moment I am unable to propose a fix due to time constraints (working full-time, doing a MSc), but would like to report these issues, so that they benefit the community.

Regards
Henrique