Fault Injection #1 - improve unit test effectiveness

Bug #1918340 reported by Henrique Marques
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

Description
===========
I have performed fault injection in openstack nova by changing the code of compute/api.py (inserting a representative/probable bug) and then ran the unit, functional and integration tests and discover that some of the bugs inserted were not detected by the test suite:
The reference WIDS (Wrong string in initial data) is a type of fault where the string used in a variable initialization is set to an incorrect value.

Steps to reproduce
==================

Line of Code Original Code Incorrect Code
102 AGGREGATE_ACTION_UPDATE_META = 'UpdateMeta' AGGREGATE_ACTION_UPDATE_META = 'NHZWTCGB'

Refactor the line of code above to the incorrect code. Then execute the unit tests.

Expected result
===============
The unit tests should detect the fault.

Actual result
===============
The fault was not detected by the unit tests.

Environment
===========
The code tested is on the stable/ussuri branch.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

@Henrique: Thanks for the reports. Do you have an intention to proposing fixes for these? I'm asking this as it is hard to judge that if these are real faults in the system causing user visible problems. Or these are just missing test coverage.

If you are planning to propose fixes then I suggest to open only those bugs where you also are ready to propose the fix. I also suggests to do the fault reproduction on master as bugs, if exists, needs to be fixed on master first and then backported to stable branches.

If you are not planning to propose fixes then please only open those bugs that causing user facing faults.

I will mark this bug Incomplete until you answer my question above. Please set it back to New once you answered.
I will also mark the rest of the fault injection bug as Invalid until we clarify these questions.

Changed in nova:
status: New → Incomplete
Revision history for this message
sean mooney (sean-k-mooney) wrote :

Hi when mass opening bugs like this its generally considered polite to reach out to the proejct in quetion before had.

Nova does not typeicly consier any bug cause by fault injection to be valid.
if you cannot recreate the same condition with the existing code using public api then its not a vaild bug.

setting this to incomplete.
please go through all the other fault injection bugs and ensure the same fault can be created without modifying code and close them if they cannot before opening any other fault injection bugs

a better way to approach Harding testing would be to compile a list of gaps in an etherpad
then discuss it on the mailing list and likely group it into a since bug with multiple patches to resolve the error.

mass filing bugs like this is strongly discusaged as it make tracking really issue users are facing much much harder.

feel free to reach out to us on irc #openstack-nova or the openstack discuss mailing list to talk about this more and what your goals are.
regards
sean

Revision history for this message
Henrique Marques (hmdmarques) wrote :

Thank you for your time analysing these issues.

My intention is solely to pass information to the OpenStack community, allowing to improve tests and have a more effective test suite (my apologies for any information supplied in a wrong manner). In the end, having a test suite that is more capable of capturing future (probable) bugs.

The fault injection performed on the compute/api.py was done in the stable/ussuri because when we started it was the most recent released version.
I must say that I cannot repeat the process in the master branch in a timely manner, because the faults we injected (defined based on closed and resolved bug reports) lead to too many faulty versions to test (11309 versions to be more specific). Using the setup I have available testing all faulty versions takes over 200 days and testing the faulty versions that pass undetected through the tests takes nearly 50 days.

With this said, I must emphasize that I am reporting just part (72 cases) of what I found during experiments, that mostly require trivial changes to the test cases, but allow for more effective unit tests. In total we have found 290 probable bugs that are not being detected by any of the unit, functional and integration tests (notice that these are probable bugs, representative of what OpenStack has already experienced and fixed in the past).

Fixing these issues would allow to improve the test coverage and overall effectiveness.

I will highlight just some of the most relevant bugs detected:
-Removing @check_instance_lock allows operations to be executed on instances that are locked
-Changing condition expressions result in operations being performed when not supposed (e.g. cache reset)
-Exception handling being removed results in unexpected behaviour
-Many other fault types result in incorrect values being returned by the called functions. The reason for this is that mock functions in the tests are not validating the receveid parameters and return a fixed expected value. This is obfuscating some of the issues up until that function call.
All the problems described above are not being detected by the test suite.

At the moment I am unable to propose a fix due to time constraints (working full-time, doing a MSc), but would like to report these issues, so that they benefit the community.

Regards
Henrique

Changed in nova:
status: Incomplete → New
Revision history for this message
Lee Yarwood (lyarwood) wrote :

Apologies if I'm being overly simplistic here but how is changing a _constant_ fault injection?

The value of the constant isn't something we assert, just the behaviour it causes when used.

Revision history for this message
Henrique Marques (hmdmarques) wrote :

Dear All,

Regardless of the interest in covering fault injection cases or not, I must emphasize that the point of this message is solely to improve the current battery of tests - which clearly are not covering a few situations. This is regardless of the usefulness of the fault injection process itself (that is applied to systems where reliability is of utmost concern), which usually includes different kinds of faults that may represent typical developer mistakes (e.g., setting a constant with an incorrect value is a well-known case, but just an example among many others, like calling functions with wrong parameters, having extraneous code in an if instruction, etc). Depending on the goals, sometimes even faults that are not directly related with programmers actions are injected, e.g., bit-flips in memory.

Anyway, I think the point is not to discuss the merits of fault injection. Let me try to summarise:
- If you wish to keep the current tests from not covering certain cases they should be kept as is. In case you are interested in improving the current tests, they should be augmented preferably in the direction I am pointing out (the options are obviously immense, but the ones identified are based on the analysis of previous mistakes made by developers and reported on launchpad).

Best regards,
Henrique

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Fixing unit tests or tech debt concern don't really need to have bug reports. That's also why we have Gerrit, for discussing whether the debt fix is good or not.
So, instead of discussing here about what to do, please upload a new change fixing what you want and ask us to review it by #openstack-nova, we'll do.

Changed in nova:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.