This is an archive of the discontinued LLVM Phabricator instance.

[IRSim] Finding Branch Similarity
ClosedPublic

Authored by AndrewLitteken on Jul 28 2021, 12:55 PM.

Details

Summary

The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them.

This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar.

We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter.

Diff Detail

Event Timeline

AndrewLitteken requested review of this revision.Jul 28 2021, 12:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2021, 12:55 PM

Adding comments to the code, and adjusting tests.

yroux added a subscriber: yroux.Aug 25 2021, 2:42 AM

I found an issue with the IROutliner while testing this patch which I don't easily catch, when pruning icompatible regions there is an assert violation at llvm/include/llvm/ADT/ilist_iterator.h:138
The problem can be reproduce with this reduce testcase:

opt -S -iroutliner ~/test.ll -o -

test.ll:

@a = global i8* null

define void @foo() {
entry:
  br label %for.cond

for.cond:
  %c.0 = phi i32 [ undef, %entry ], [ %inc3, %for.end ]
  %d.0 = phi i32 [ undef, %entry ], [ %d.1, %for.end ]
  %conv = trunc i32 %c.0 to i8
  br label %for.cond1

for.cond1:
  %d.1 = phi i32 [ %d.0, %for.cond ], [ %inc, %for.body ]
  %cmp = icmp slt i32 %d.1, 0
  br i1 %cmp, label %for.body, label %for.end

for.body:
  %0 = load i8*, i8** @a
  %arrayidx = getelementptr inbounds i8, i8* %0, i32 %d.1
  store i8 %conv, i8* %arrayidx
  %inc = add nsw i32 %d.1, 1
  br label %for.cond1

for.end:
  %inc3 = add nsw i32 %c.0, 1
  br label %for.cond
}
llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp
754

Maybe use ASSERT_GT and ASSERT_LT here and in the rest of the testcase

Updating to fix bug pointed out by Yvan, and adding test to catch it.

paquette added inline comments.Aug 26 2021, 3:20 PM
llvm/include/llvm/Analysis/IRSimilarityIdentifier.h
130

Why int?

150

remove fluff

303
359
461

comment?

461

comment on this variable?

860

doxygen comment on this variable?

llvm/lib/Analysis/IRSimilarityIdentifier.cpp
26

Mention this is for debugging?

174

pull second if into this one?

175

pull if into the if above?

688

de morgan this and early return true to reduce indentation?

688

Cache ItA->Inst and ItB->Inst in variables?

also can you early return true or false using this condition? This if is pretty big.

703

this is a pretty dense statement. can you simplify this somehow? maybe it would be better as a helper function or something?

713

this could assign to AContained right?

bool AContained = BasicBloc A.find(...) != ...;
806

why is there a special case for 2? comment?

llvm/lib/Transforms/IPO/IROutliner.cpp
1392 ↗(On Diff #368736)

I think that you should check for the end of the list in a different way. This seems like a workaround for a different bug? Nowhere else in the codebase uses isKnownSentinel directly.

AndrewLitteken added inline comments.Aug 30 2021, 9:44 AM
llvm/include/llvm/Analysis/IRSimilarityIdentifier.h
130

I'll add a comment for this variable, but it's because we could be targeting blocks that are "in front of" the other blocks in the function, or behind, so it needs to be able to be positive or negative.

llvm/lib/Analysis/IRSimilarityIdentifier.cpp
806

This should probably get moved to the outliner side of things.

Updating based on comments

Missing context for the patch?

llvm/include/llvm/Analysis/IRSimilarityIdentifier.h
131

Can you add an example of "ahead of" and "behind" in the comments here?

145

typo: IRInstrucitonData

613

Can you add a comment explaining the role of this struct?

621

Should this be a doxygen comment?

651

I think an example would be useful here?

657

I think it would be good to document what a relative location is somewhere in a comment in this patch.

llvm/lib/Analysis/IRSimilarityIdentifier.cpp
187

De Morgan + return this value?

631

What does it mean if !AContained?

Maybe this function needs some comments?

713

run-on sentence is hard to read?

732

Merge this if in with the previous one?

741

use any_of and return !checkRelativeLocations?

llvm/lib/Transforms/IPO/IROutliner.cpp
1457 ↗(On Diff #369520)
  • Can you pull IDIt->Inst into variable?
  • simpler:
if (blah && blah != NextInst)
   return true;

Adding diff with context

AndrewLitteken added inline comments.Aug 30 2021, 3:36 PM
llvm/lib/Analysis/IRSimilarityIdentifier.cpp
187

Since it's possible this function will have more instruction types added to it, that may remain true, and for clarity I feel like it's better to keep it as is.

Updating based on comments.

paquette added inline comments.Aug 31 2021, 3:01 PM
llvm/include/llvm/Analysis/IRSimilarityIdentifier.h
141

This is a function, so I think it should be documented more like "Fills data structures for IRInstructionData..."

llvm/lib/Analysis/IRSimilarityIdentifier.cpp
59

Why would I ever be null?

llvm/lib/Transforms/IPO/IROutliner.cpp
1451 ↗(On Diff #369565)

This is duplicated with the code above. Can this be a helper?

AndrewLitteken added inline comments.Aug 31 2021, 3:03 PM
llvm/lib/Analysis/IRSimilarityIdentifier.cpp
59

I changed things up so that at the end of a function, there is a nullptr used for the instruction of the IRInstructionData so that the end of the list can be detected rather than having to use the isKnownSentinel code from before.

paquette added inline comments.Sep 2 2021, 2:21 PM
llvm/lib/Analysis/IRSimilarityIdentifier.cpp
692

ItA->RelativeBlockLocations and ItB->RelativeBlockLocations into variables? Then you can reuse them in the zip below.

697

can you use a typedef for this?

like ZippedRelativeLocationsT or something?

Updating based on comments

paquette added inline comments.Sep 3 2021, 10:21 AM
llvm/lib/Transforms/IPO/IROutliner.cpp
1361 ↗(On Diff #370424)

This name is kind of confusing considering the other Instruction is named NextInst?

1363 ↗(On Diff #370424)

Just use NextIDInst here?

1365 ↗(On Diff #370424)

Same here?

llvm/test/Transforms/IROutliner/region-end-of-module.ll
4 ↗(On Diff #370424)
llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp
44

Add comment saying what the false is describing?

Identifier(/*blah = */false);
AndrewLitteken marked an inline comment as done.Sep 3 2021, 12:22 PM
AndrewLitteken added inline comments.
llvm/lib/Transforms/IPO/IROutliner.cpp
1363 ↗(On Diff #370424)

That would be using the next instruction in the list, this is checking what the current instruction that was passed in.

1365 ↗(On Diff #370424)

I'll change it for this one.

AndrewLitteken marked an inline comment as done.

Can you fix the pre-merge checks?

paquette added inline comments.Sep 3 2021, 3:56 PM
llvm/lib/Transforms/IPO/IROutliner.cpp
1361 ↗(On Diff #370647)

Can NextIDIt be the end/one past the end? Is this accessible in those cases?

llvm/test/Transforms/IROutliner/region-end-of-module.ll
3 ↗(On Diff #370647)

each of the cases here end in a br.

What about unreachable and ret?

AndrewLitteken added inline comments.Sep 3 2021, 4:04 PM
llvm/lib/Transforms/IPO/IROutliner.cpp
1361 ↗(On Diff #370647)

It shouldn’t be. This is looking from the start of the candidate to the end of the candidate, which should only ever include actual instructions. So it could be the end of the function, in which case there is a buffer IRInstructionData which is the nullptr, which is what we are checking here.

llvm/test/Transforms/IROutliner/region-end-of-module.ll
3 ↗(On Diff #370647)

Those are into checked for similarity right now, and are excluded from the candidates as a result.

But I can add a function that ends in both for potential future developments.

paquette accepted this revision.Sep 3 2021, 4:05 PM

LGTM

This revision is now accepted and ready to land.Sep 3 2021, 4:05 PM
This revision was landed with ongoing or failed builds.Sep 6 2021, 11:56 AM
This revision was automatically updated to reflect the committed changes.