This is an archive of the discontinued LLVM Phabricator instance.

[scudo][standalone] Correct releaseToOS behavior
ClosedPublic

Authored by cryptoad on Oct 4 2019, 10:10 AM.

Details

Summary

There was an issue in releaseToOSMaybe: one of the criteria to
decide if we should proceed with the release was wrong. Namely:

const uptr N = Sci->Stats.PoppedBlocks - Sci->Stats.PushedBlocks;
if (N * BlockSize < PageSize)
  return; // No chance to release anything.

I meant to check if the amount of bytes in the free list was lower
than a page, but this actually checks if the amount of in use bytes
was lower than a page.

The correct code is:

const uptr BytesInFreeList =
  Region->AllocatedUser -
  (Region->Stats.PoppedBlocks - Region->Stats.PushedBlocks) * BlockSize;
if (BytesInFreeList < PageSize)
  return 0; // No chance to release anything.

Consequences of the bug:

  • if a class size has less than a page worth of in-use bytes (allocated or in a cache), reclaiming would not occur, whatever the amount of blocks in the free list; in real world scenarios this is unlikely to happen and be impactful;
  • if a class size had less than a page worth of free bytes (and enough in-use bytes, etc), then reclaiming would be attempted, with likely no result. This means the reclaiming was overzealous at times.

I didn't have a good way to test for this, so I changed the prototype
of the function to return the number of bytes released, allowing to
get the information needed. The test added fails with the initial
criteria.

Another issue is that ReleaseToOsInterval can actually be 0, meaning
we always try to release (side note: it's terrible for performances).
so change a > 0 check to >= 0.

Additionally, decrease the CanRelease threshold to PageSize / 32.
I still have to make that configurable but I will do it at another time.

Finally, rename some variables in printStats: I feel like "available"
was too ambiguous, so change it to "total".

Event Timeline

cryptoad created this revision.Oct 4 2019, 10:10 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 4 2019, 10:10 AM
Herald added subscribers: Restricted Project, delcypher. · View Herald Transcript
cryptoad updated this revision to Diff 223243.Oct 4 2019, 10:14 AM

Add a couple of EXPECT to the test.

cryptoad edited the summary of this revision. (Show Details)Oct 4 2019, 10:37 AM
cryptoad edited the summary of this revision. (Show Details)Oct 4 2019, 11:58 AM
morehouse added inline comments.Oct 4 2019, 3:27 PM
lib/scudo/standalone/tests/primary_test.cpp
195

Does this test the two cases mentioned in the description?

  • < 1 page in use, > 1 page in free list (should release)
  • < 1 page in free list (shouldn't release)
cryptoad marked an inline comment as done.Oct 4 2019, 3:44 PM
cryptoad added inline comments.
lib/scudo/standalone/tests/primary_test.cpp
195

Indeed, this tests the aforementioned first case, not the second one.
I don't think I have a way to test that from here. One of the indicators being LastReleaseAtNs being updated (without released bytes), and it's not accessible.
I'll see if I can toy with the prototype some more to bubble the information up the chain, unless you have an idea.

morehouse accepted this revision.Oct 4 2019, 4:26 PM
morehouse added inline comments.
lib/scudo/standalone/tests/primary_test.cpp
195

Is there a way to glean unmapped/released bytes from GlobalStats? If not, it seems like something worth adding to Scudo's telemetry.

I also see there's already a printStats method. Perhaps we could modify it to print into a buffer where we could extract the released metadata.

This revision is now accepted and ready to land.Oct 4 2019, 4:26 PM
cryptoad marked 3 inline comments as done.Oct 7 2019, 10:33 AM
cryptoad added inline comments.
lib/scudo/standalone/tests/primary_test.cpp
195

So the plan of record on my side:

  • Landing this as is, fixing the issue at hand takes precedence
  • I am going to have to revisit how to fit the release stats in the mix: the primary's releaseToOS doesn't grab a cache, so we can't fit the stats in there; We could put the total of released bytes in regions data, but that departs from the model of all the other stats being the caches. Maybe do that globally indeed, that will require more thought.
  • for the printStats suggestion: it is something I have to do as well, for mallocz purposes for example. Technically this could be leveraged through a lit test in it's current form I think, but right now I only have unit tests. So lit tests is also on my plate.
This revision was automatically updated to reflect the committed changes.
cryptoad marked an inline comment as done.