This is an archive of the discontinued LLVM Phabricator instance.

Fix SourceManager::isBeforeInTranslationUnit bug with token-pasting
ClosedPublic

Authored by sammccall on Sep 26 2022, 4:59 PM.

Details

Summary

isBeforeInTranslationUnit compares SourceLocations across FileIDs by
mapping them onto a common ancestor file, following include/expansion edges.

It is possible to get a tie in the common ancestor, because multiple
"chunks" of a macro arg will expand to the same macro param token in the body:

#define ID(X) X
#define TWO 2
ID(1 TWO)

Here two FileIDs both expand into X in ID's expansion:

  • one containing 1 and spelled on line 3
  • one containing 2 and spelled by the macro expansion of TWO

isBeforeInTranslationUnit breaks this tie by comparing the two FileIDs:
the one "on the left" is always created first and is numerically smaller.
This seems correct so far.

Prior to this patch it also takes a shortcut (unclear if intentionally).
Instead of comparing the two FileIDs that directly expand to the same location,
it compares the original FileIDs being compared. These may not be the
same if there are multiple macro expansions in between.
This *almost* always yields the right answer, because macro expansion
yields "trees" of FileIDs allocated in a contiguous range: when comparing tree A
to tree B, it doesn't matter what representative you pick.

However, the splitting of >> tokens is modeled as macro expansion (as if
the first '>' was a macro that expands to a '>' spelled a scratch buffer).
This splitting occurs retroactively when parsing, so the FileID allocated is
larger than expected if it were a real macro expansion performed during lexing.
As a result, macro tree A can be on the left of tree B, and yet contain
a token-split FileID whose numeric value is *greator* than those in B.
In this case the tiebreak gives the wrong answer.

Concretely:

#define ID(X) X
template <typename> class S{};
ID(
  ID(S<S<int>> x);
  int y;
)

Given Greater = (typeloc of S<int>).getEndLoc();
      Y       = (decl of y).getLocation();
isBeforeInTranslationUnit(Greater, Y) should return true, but returns false.

Here the common FileID of (Greater, Y) is the body of the outer ID
expansion, and they both expand to X within it.
With the current tiebreak rules, we compare the FileID of Greater (a split)
to the FileID of Y (a macro arg expansion into X of the outer ID).
The former is larger because the token split occurred relatively late.

This patch fixes the issue by removing the shortcut. It tracks the immediate
FileIDs used to reach the common file, and uses these IDs to break ties.
In the example, we now compare the macro arg expansion of the inner ID()
to the macro arg expansion of Y, and find that it is smaller.

This requires some changes to the InBeforeInTUCacheEntry (sic).
We store a little more data so it's probably slightly slower.
It was difficult to resist more invasive changes:

  • performance: the sizing is very suspicious, and once the cache "fills up" we're thrashing a single entry
  • API: the class seems to be needlessly complicated

However I tried to avoid mixing these with subtle behavior changes, and
will send a followup instead.

Diff Detail

Event Timeline

sammccall created this revision.Sep 26 2022, 4:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2022, 4:59 PM
sammccall requested review of this revision.Sep 26 2022, 4:59 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 26 2022, 4:59 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

However I tried to avoid mixing these with subtle behavior changes, and will send a followup instead.

D134694 if you're interested. I guess I should try to get performance measures though...

hokein accepted this revision.Sep 29 2022, 12:53 AM

Thanks for digging into the rabbit role and the great analysis. It looks good from my side.

Re the test case ID(I TWO 3) we discussed yesterday, I verified that there is no issue. (we don't merge the 1 2 3 into a single SLOCEntry, each one have a dedicated FileID, and FileID(2) < FileID(3)).

However I tried to avoid mixing these with subtle behavior changes, and will send a followup instead.

D134694 if you're interested. I guess I should try to get performance measures though...

While doing the review, I agree that this part of code is unnecessary complicated, I think doing cleanup is probably a good idea.

clang/lib/Basic/SourceManager.cpp
2108–2111

nit: I found the first/second usage really hurts the code readability here (using the struct Entry int all places is probably better, but this requires some changes to the existing interface, no required action here).

2126

maybe add an assertion for "local and load FileIDs are never mixed".

This revision is now accepted and ready to land.Sep 29 2022, 12:53 AM
This revision was landed with ongoing or failed builds.Oct 5 2022, 9:29 AM
This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.