windows build sometimes fails with "[performing final GC..." "should not get access violation in dynamic space"

Bug #2097197 reported by 3b
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Fix Committed
Undecided
Unassigned

Bug Description

With recent (testing on 2.5.1 to 2.5.1.6 or so, probably since 2.4.7+?)
sbcl host on win32, builds with `--fancy` (and probably saving cores in
general) sometimes fails in the host with

"[performing final GC...fatal error encountered in SBCL pid
1022033084: should not get access violation in dynamic space"

Some local changes happened to make it repeatable enough to debug, and
it seems to be segfaulting when `zero_all_free_ranges` tries to clear
`next_free_page` because `save_to_filehandle` wants to write an even
number of gc pages on windows. If the final GC happened to return pages
to the OS, that page won't be committed and we get a fault.

Simple fix of making sure the region to be zeroed is committed first
seems to fix the problem, but it seems like it would be better to
clean up / refactor the various places currently handling the
expanding/zeroing/committing of the space to be written.

On win32, `write_bytes` commits the range it is going to write with a
comment that "I can't see how we'd ever attempt writing from
uncommitted memory". Presumably the above case where GC returned it to
the OS was the cause there before `zero_all_free_ranges` started
expanding the range as well. This would fail if some other platform
started using different backend page size, and shouldn't be needed if
the pages were properly zeroed and committed already.

`output_space` is where it actually expands the requested write size
to a multiple of backend page size, with a comment that it is a bad
idea, suggesting security problems. The commit (c771265) that started
clearing the extra page suggests it also caused bugs.

`save_to_filehandle` doesn't ensure any extra space that will be
written will be zeroed (or committed) for any of the other regions it
passes to `output_space`, leaving room for future bugs there as well.
Not sure how many of those might have uncommitted pages, though, or if
it is even safe to zero extra space beyond the requested size.

https://github.com/3b/sbcl/tree/dbghelp-2.5.1.6 has the local patch that
triggers the problem for me, but since it depends on the size of the heap
it probably also depends on things like the lengths of paths ending up in
debug info, other random environment info that ends up stored somewhere,
etc.

Tags: os-windows
Stas Boukarev (stassats)
Changed in sbcl:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.