This is an archive of the discontinued LLVM Phabricator instance.

[ScopBuilder]Skip getting leader when merging statements to close holes
ClosedPublic

Authored by bin.narwal on Aug 30 2019, 8:17 AM.

Details

Summary

Hi,

Function joinOrderedInstructions merges instructions when a leader is encountered twice.
It also notices that leaders in SeenLeaders may lose their leadership in previous merging,
and tries to handle the case using following code:

Instruction *PrevLeader = UnionFind.getLeaderValue(SeenLeaders.back());

However, this is wrong because it always gets leader for the last element of SeenLeaders,
and I believe it's wrong even we get leader for Prev here. As a result, Statements in cases
like the one in patch aren't merged as expected. After investigation, I believe it's
unnecessary to get leader instruction at all. This is based on fact: Although leaders in
SeenLeaders could lose leadership, they only lose to others in SeenLeaders, in other words,
one existing leader will be chosen as new leader of merged equivalent statements. We can
take advantage of this and simply check if current leader equals to Prev and break merging
if it does.
The patch also adds a new test. Any comments?

Thanks,

Diff Detail

Event Timeline

bin.narwal created this revision.Aug 30 2019, 8:17 AM

Thank you for the find. Indeed the name of PrevLeader suggests it to be the leader of Prev. It's leader should be identical to leader to the last element of SeenLeaders, but only before the first unionSet operation. Then the tail ("B") has been merged into Inst's leader "A", making the condition a tautology. That's definitely wrong!

However, I don't think we can save on the getLeaderValue call. Each of the instruction's leader may still be in SeenLeader, but the order in which it appears matter (it's a SetVector), and Inst's leader may change as well. The most explicit version I can think of is:

for (Instruction *Prev : reverse(SeenLeaders)) {
  // Items added to 'SeenLeaders' are leaders, but may have lost their
  // leadership status when merged into another statement.
  Instruction *PrevLeader = UnionFind.getLeaderValue(Prev);
  Instruction *InstLeader = UnionFind.getLeaderValue(Inst);
  if (PrevLeader == InstLeader)
    break;
  UnionFind.unionSets(Prev, Leader);
}

which also passes granularity_scalar-indep_ordered-2.ll. I don't think it's part of the API which of the leaders passed to UnionFind.unionSets becomes the new leader, so we have to check them both (unless you find an argument that one/both of the, are unnecessary).

The code above is the same as (which so far is my preferred version):

for (Instruction *Prev : reverse(SeenLeaders)) {
  if (UnionFind.isEquivalent(Prev, Inst))
    break;
  UnionFind.unionSets(Prev, Leader);
}

We need regressions tests with more (potential) statements to see the difference and check that not all of them are merged.

lib/Analysis/ScopBuilder.cpp
2044–2046

Your new comment refers to the code before your patch, which will not be seen in the checkout anymore.

Thank you for the find. Indeed the name of PrevLeader suggests it to be the leader of Prev. It's leader should be identical to leader to the last element of SeenLeaders, but only before the first unionSet operation. Then the tail ("B") has been merged into Inst's leader "A", making the condition a tautology. That's definitely wrong!

However, I don't think we can save on the getLeaderValue call. Each of the instruction's leader may still be in SeenLeader, but the order in which it appears matter (it's a SetVector), and Inst's leader may change as well. The most explicit version I can think of is:

Yes, the order is not guaranteed, and Inst's leader could change during merging, but IIUC, these do not matter.
For the check condition:

if (Prev != Leader)

"Leader" is computed when we try to insert "Inst" into SeenLeaders. Whether leader of the "Leader" changes doesn't have impact on the loop, as long as the first "Prev" in the merging group is actually "Leader". The cons is we may do unnecessary call to unionSets because we only break at the first Prev Inst of merging group.
So looks like we can either compare real leaders of the two Insts, or just compare the two Insts it-selves.

BTW, if SetVector supports truncate, we can truncate away Insts of merged group and push the new leader every time after the loop. This way we don't need to check leaders, don't need to do unnecessary unionSets, and always keep SeenLeaders minimal. But it's SetVector...
Thanks,

for (Instruction *Prev : reverse(SeenLeaders)) {
  // Items added to 'SeenLeaders' are leaders, but may have lost their
  // leadership status when merged into another statement.
  Instruction *PrevLeader = UnionFind.getLeaderValue(Prev);
  Instruction *InstLeader = UnionFind.getLeaderValue(Inst);
  if (PrevLeader == InstLeader)
    break;
  UnionFind.unionSets(Prev, Leader);
}

which also passes granularity_scalar-indep_ordered-2.ll. I don't think it's part of the API which of the leaders passed to UnionFind.unionSets becomes the new leader, so we have to check them both (unless you find an argument that one/both of the, are unnecessary).

The code above is the same as (which so far is my preferred version):

for (Instruction *Prev : reverse(SeenLeaders)) {
  if (UnionFind.isEquivalent(Prev, Inst))
    break;
  UnionFind.unionSets(Prev, Leader);
}

We need regressions tests with more (potential) statements to see the difference and check that not all of them are merged.

Ahh, I think I understand the argument. Thank you for your patience. But I strongly suggest to update the description. Suggestion:

// We are backtracking from the last element until we see Inst's leader in SeenLeaders and merge all into one set. Although which instructions are leaders changes during the execution of this loop, it is irrelevant as we are just searching for the element that we already confirmed is in the list.
 for (Instruction *Prev : reverse(SeenLeaders)) {

If SetVector.insert would have returned an iterator to the found element, we could also have forward iterated from that element to the end.

Maybe also add a comment to bool Inserted = SeenLeaders.insert(Leader) such as:

// Since previous iterations might have merged sets, some items in SeenLeaders are not leaders anymore. However, former leaders represent the same set as the actually leader also in SeenLeaders and by construction are all trailing the list.

Since apparently its even hard for myself (who wrote this algorithm) to re-understand what's going on.

Ahh, I think I understand the argument. Thank you for your patience. But I strongly suggest to update the description. Suggestion:

// We are backtracking from the last element until we see Inst's leader in SeenLeaders and merge all into one set. Although which instructions are leaders changes during the execution of this loop, it is irrelevant as we are just searching for the element that we already confirmed is in the list.
 for (Instruction *Prev : reverse(SeenLeaders)) {

If SetVector.insert would have returned an iterator to the found element, we could also have forward iterated from that element to the end.

Maybe also add a comment to bool Inserted = SeenLeaders.insert(Leader) such as:

// Since previous iterations might have merged sets, some items in SeenLeaders are not leaders anymore. However, former leaders represent the same set as the actually leader also in SeenLeaders and by construction are all trailing the list.

Since apparently its even hard for myself (who wrote this algorithm) to re-understand what's going on.

How about below comments? I made your suggested comment a bit easier for me to understand (my bad English!). Also removed the trailing words since previously merged groups could be not trailing in SeenLeaders.

Instruction *Leader = UnionFind.getLeaderValue(Inst);
// Since previous iterations might have merged sets, some items in
// SeenLeaders are not leaders anymore. However, The new leader of
// previously merged instructions must be one of the former leaders
// of these merged instructions.
bool Inserted = SeenLeaders.insert(Leader);
if (Inserted)
  continue;

// Merge statements to close holes. Say, we have already seen statements A
// and B, in this order. Then we see an instruction of A again and we would
// see the pattern "A B A". This function joins all statements until the
// only seen occurrence of A.
for (Instruction *Prev : reverse(SeenLeaders)) {
  // We are backtracking from the last element until we see Inst's leader
  // in SeenLeaders and merge all into one set. Although leaders of
  // instructions change during the execution of this loop, it's irrelevant
  // as we are just searching for the element that we already confirmed is
  // in the list.
  if (Prev == Leader)
    break;
  UnionFind.unionSets(Prev, Leader);
}

Does this look fine?

Thanks,

Also removed the trailing words since previously merged groups could be not trailing in SeenLeaders.

Right, they are just grouped.

Does this look fine?

Yes. Once you update the patch and give me the permission to, I can commit it.

This comment was removed by Meinersbur.

Sorry for delaying. Here is the updated patch as agreed. Also, please help me commit it.

Thanks,

This revision was not accepted when it landed; it landed in state Needs Review.Sep 12 2019, 6:02 PM
This revision was automatically updated to reflect the committed changes.