This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Update BlockRPONumber prior to use.
ClosedPublic

Authored by mattd on Dec 20 2018, 5:42 PM.

Details

Summary

The original patch addressed the use of BlockRPONumber by forcing a sequence point when accessing that map in a conditional. In short we found cases where that map was being accessed with blocks that had not yet been added to that structure. For context, I've kept the wall of text below, to what we are trying to fix, by always ensuring a updated BlockRPONumber.

Backstory

I was investigating an ICE (segfault accessing a DenseMap item). This failure happened non-deterministically, with no apparent reason and only on a Windows build of LLVM (from October 2018).

After looking into the crashes (multiple core files) and running DynamoRio, the cores and DynamoRio (DR) log pointed to the same code in GVN::performScalarPRE(). The values in the map are unsigned integers, the keys are llvm::BasicBlock*. Our test case that triggered this warning and periodic crash is rather involved. But the problematic line looks to be:

GVN.cpp: Line 2197

if (BlockRPONumber[P] >= BlockRPONumber[CurrentBlock] &&

To test things out, I cooked up a patch that accessed the items in the map outside of the condition, by forcing a sequence point between accesses. DynamoRio stopped warning of the issue, and the test didn't seem to crash after 1000+ runs.

My investigation was on an older version of LLVM, (source from October this year). What it looks like was occurring is the following, and the assembly from the latest pull of llvm in December seems to confirm this might still be an issue; however, I have not witnessed the crash on more recent builds. Of course the asm in question is generated from the host compiler on that Windows box (not clang), but it hints that we might want to consider how we access the BlockRPONumber map in this conditional (line 2197, listed above). In any case, I don't think the host compiler is wrong, rather I think it is pointing out a possibly latent bug in llvm.

  1. There is no sequence point for the >= operation.
  1. A call to a DenseMapBase::operator[] can have the side effect of the map reallocating a larger store (more Buckets, via a call to DenseMap::grow).
  1. It seems perfectly legal for a host compiler to generate assembly that stores the result of a call to operator[] on the stack (that's what my host compile of GVN.cpp is doing) . A second call to operator[] might encourage the map to 'grow' thus making any pointers to the map's store invalid. The >= compares the first and second values. If the first happens to be a pointer produced from operator[], it could be invalid when dereferenced at the time of comparison.

The assembly generated from the Window's host compiler does show the result of the first access to the map via operator[] produces a pointer to an unsigned int. And that pointer is being stored on the stack. If a second call to the map (which does occur) causes the map to grow, that address (on the stack) is now invalid.

Diff Detail

Repository
rL LLVM

Event Timeline

mattd created this revision.Dec 20 2018, 5:42 PM

If we're calling grow() on BlockRPONumber in this code, something has gone wrong. Both "P" and "CurrentBlock" should have an RPO number at this point in the code, so it shouldn't actually grow.

It looks like r225536 added a new caller to performScalarPRE(), and BlockRPONumber is not correctly initialized at that point. I'm surprised this hasn't led to any miscompiles.

mattd added a comment.Dec 21 2018, 9:18 AM

Thanks for the feedback @efriedma.

If we're calling grow() on BlockRPONumber in this code, something has gone wrong. Both "P" and "CurrentBlock" should have an RPO number at this point in the code, so it shouldn't actually grow.

In the body of GVN::performPRE() there is case where a block will split via a call to GVN::splitCriticalEdges. If the crit edge(s) are split, new basic blocks are introduced to the function but are not added to the BlockRPONumber map. In fact the test test/Transform/GVN/PRE/pre-basic-add.ll creates such a situation. This means that BlockRPONumber might be accessed for a basic block that has not been added to it.

It looks like r225536 added a new caller to performScalarPRE(), and BlockRPONumber is not correctly initialized at that point. I'm surprised this hasn't led to any miscompiles.

The message in r225536 does seem to suggest a periodic LTO related failure, and that is similar to what we were seeing.

If the crit edge(s) are split, new basic blocks are introduced to the function but are not added to the BlockRPONumber map.

That's a bug. The RPO numbers need to be up-to-date for this check to make sense.

mattd updated this revision to Diff 180168.Jan 3 2019, 4:18 PM
mattd retitled this revision from [GVN] Force a sequence point when comparing two DenseMap values. to [GVN] Initialize BlockRPONumber when prior to use..
mattd edited the summary of this revision. (Show Details)

This patch aims to clean-up two items in GVN.

1 - The DenseMap BlockRPONumber in GVN should be populated by the time
performScalarPRE is called. That map is initialized when GVN starts, and only
if the EnablePRE ("-enable-pre") flag is passed (defaults to true). It seems
that no new blocks should be introduced to the function while it is being
analyzed if EnablePRE is set. If so, the values in the aforementioned map
might no longer be correct.

It turns out that performPRE can introduce new blocks via splitCriticalEdges.
performPRE can be called multiple times (until a fixed-point is found). That
routine introduces new blocks into the function being analyzed. Thus the
BlockRPONumber map should be updated when new blocks are added, but the code
currently does not update that map.

The solution in the patch here lazily updates the map just prior to its use.

2 - The other item that I think needs to be corrected was introduced in a more
recent change. That change introduced a code path that calls performScalarPRE
without checking the EnablePRE ("-enable-pre") flag. EnablePRE is responsible
for building up the BlockRPONumber map. If that flag is set to false when the
GVN pass starts, then the BlockRPONumber map will be empty.

EnablePRE is defaulted to true so it's unlikely someone will ever set it false,
outside of testing/debugging. This is probably why we we never actually see
case #2 occur (no test exists which tests "-enable-pre=false"). However,
I do think it is reasonable to add an assertion and EnablePRE check.

At this point, I'm not convinced we need a test for #1 above, as it is NFCI,
although I have a hard time saying "NFCI," because the mechanics change, but the
intended semantics should remain. It does make sense to add a test for #2 setting
'-enable-pre=false' (which no tests do) and just checking that nothing asserts. I
found an existing test that does call performScalarPRE even with EnablePRE set
to false, and have added a RUN line to test the newly added assertion does not trigger.

mattd retitled this revision from [GVN] Initialize BlockRPONumber when prior to use. to [GVN] Update BlockRPONumber prior to use..Jan 3 2019, 4:18 PM

This generally makes sense, I think.

The assumption here that blocks won't get erased during a GVN iteration is a little fragile... but probably okay. Please make BlockRPONumber a DenseMap<AssertingVH<BasicBlock>, uint32_t> to make sure no future change erases a block at an unexpected point.

lib/Transforms/Scalar/GVN.cpp
1652 ↗(On Diff #180168)

This doesn't have the performance characteristics you want: F.size() is linear in the size of the function. Please use an explicit boolean to track invalidation.

mattd updated this revision to Diff 180721.Jan 8 2019, 1:07 PM
mattd marked an inline comment as done.
  • Add a bool to GVN that tracks the status (valid or invalid) of the BlockRPONumber map.
  • Use that flag to indicate that the BlockRPONumber is invalid when blocks are added to the function being analyzed.
  • Make the BlockRPONumber map: DenseMap<AssertingVH<BasicBlock>, uint32_t>

Thinking about it a bit more, I think I'd prefer to put the change to make EnablePRE control PRE for GEPs in a separate patch. That will make it clear what's happening, and it could affect performance for anyone using that setting.

It should be straightforward to find a testcase where the old code would have used incorrect RPO numbers, even if EnablePRE isn't overridden; just add an assertion like assert(BlockRPONumber.count(P) && BlockRPONumber.count(CurrentBlock); to GVN::performScalarPRE, and run that compiler over some code. Actually, maybe worth adding that assert anyway.

mattd updated this revision to Diff 180961.Jan 9 2019, 4:35 PM
  • Moved the logic for controlling access to performPRE/performScalarPRE functions into a separate patch.
  • Added an assertion that will trigger on 9 tests if the BlockRPONumber map is invalid. In particular, test/Transforms/GVN/nonescaping-malloc.ll, which also requires an assert build when testing.
efriedma accepted this revision.Jan 9 2019, 4:54 PM

LGTM.

In theory, it's probably possible to construct a test where the lack of BlockRPONumber actually affects the generated IR. But it's not worth spending the time on that... the assertions give enough coverage.

This revision is now accepted and ready to land.Jan 9 2019, 4:54 PM
This revision was automatically updated to reflect the committed changes.