LLVM: Interactive mode segfaults after runtime error

Bug #781615 reported by Matt Giuca
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mars
Fix Released
Critical
Matt Giuca

Bug Description

The interactive mode segfaults (sometimes, but repeatably) after a Mars error. After some fairly extensive analysis, I am becoming convinced that there is no "memory corruption" happening; there must be some weird interaction between Mercury and setjmp/longjmp.

For example, the following sequence segfaults:
?> error("x")
?> 1

Note that the segfault occurs on the call to show (at the instruction level, it is inside the actual compiled show:Int function). It is possible to get a few other instructions out, but eventually it will segfault. The longjmp puts it in a bad state.

This can be tested by writing Mercury wrappers around setjmp/longjmp, but it may not be possible to trigger it.

It might be prudent not to use them at all. Instead, either:
- Have @mars.throw_mars_error set a global flag. All function calls check the global flag if in the interactive mode (very annoying LLVM code will be generated), or
- Use Mercury exceptions. Ensure that llvm.run_function does not say will_not_call_mercury (and why does it promise pure? backend_llvm.exec_instr should promise pure, not run_function). Have a global function pointer in the runtime which is called when an error occurs (again, conditional on interactive mode), and before executing the LLVM code, have Mercury store a function pointer to a Mercury function there, which throws a Mercury exception.
- Try our luck with setcontext instead. This more advanced version of setjmp/longjmp seems to let you create separate stacks, so perhaps it won't mess with Mercury's stack unlike setjmp/longjmp.

Related branches

Revision history for this message
Matt Giuca (mgiuca) wrote :

There actually is a bug causing incorrect memory to be read. When a Mars statement fails in interactive mode, the environment (including the subscript map) is not updated, causing the same variable names to be reused on subsequent statements. The code in backend_llvm.store_global (which is used to write each local variable to an LLVM global) will be called with the name of an existing global, and LLVM will automatically generate a new unique name. However, the corresponding load_global will load from the old name, thus getting the value that was previously stored in the variable of the same name, usually with a different type.

This explains some very weird behaviour, such that if you throw an error with exactly 1 character:
?> a = error("X")
Runtime error: X
and then try to show an integer:
?> 42
Rather than segfaulting, it will print out the character in the error message:
X

This can be easily fixed; just make store_global delete any existing global if it finds one.

Revision history for this message
Matt Giuca (mgiuca) wrote :

Fixed in llvm-backend r1215.

Changed in mars:
status: Triaged → Fix Committed
Matt Giuca (mgiuca)
Changed in mars:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.