This is an archive of the discontinued LLVM Phabricator instance.

LoadCombine Load Aliasing Fix
AbandonedPublic

Authored by rriddle on Jun 18 2016, 3:12 PM.

Details

Summary

Removed the alias set tracker and added individual load aliasing checks for each individual chain.

Diff Detail

Event Timeline

rriddle updated this revision to Diff 61175.Jun 18 2016, 3:12 PM
rriddle retitled this revision from to LoadCombine BugFixes : Combine negative index GEPS and fix load aliasing.
rriddle updated this object.
rriddle added a reviewer: Bigcheese.
rriddle added a subscriber: llvm-commits.
majnemer requested changes to this revision.Jun 18 2016, 3:52 PM
majnemer added a reviewer: majnemer.
majnemer added a subscriber: majnemer.

Please add testcases to the tests directory. Also, please split out each independent change. Each change is easier to analyze in isolation.

This revision now requires changes to proceed.Jun 18 2016, 3:52 PM
rriddle updated this revision to Diff 61179.Jun 18 2016, 5:47 PM
rriddle retitled this revision from LoadCombine BugFixes : Combine negative index GEPS and fix load aliasing to LoadCombine Load Aliasing Fix.
rriddle updated this object.
rriddle edited edge metadata.
rriddle edited edge metadata.

Removed the other patch fixes and added a test to the test suite

eli.friedman added inline comments.
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
13

What transform is this testcase checking for? I don't think you can prove any useful aliasing property between %i, %i1, and %str?

rriddle added inline comments.Jun 18 2016, 6:29 PM
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
13

In the current implementation %1 is being aliased in that store operation which would cause every load chain to try and combine which would prevent the load at %0 from being able to combine with %2, %4, etc.

eli.friedman added inline comments.Jun 18 2016, 7:02 PM
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
13

I'm not sure I follow... is the transform you're trying to perform something like this?

void f(int* a, int* b) {
  a[0] = b[0];
  a[1] = b[1];
  a[2] = b[2];
  a[3] = b[3];
  // Combine to "memcpy(a, b, 16)"???
}
rriddle added inline comments.Jun 18 2016, 7:12 PM
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
13

No, the point of transform is to combine each of the individual loads into a single large one. In this specific test, there are multiple GEP + load pairs coming from %i. Each of those will combine down to a single i128 load. It may be easier to see with this output

define i32 @Load_MultiChain(i32* %i) {
  %1 = getelementptr inbounds i32, i32* %i, i64 1
  %2 = bitcast i32* %i to i8*
  %3 = getelementptr i8, i8* %2, i64 0
  %4 = bitcast i8* %3 to i128*
  %.combined = load i128, i128* %4, align 4
  %combine.extract.shift = lshr i128 %.combined, 32
  %combine.extract.trunc1 = trunc i128 %combine.extract.shift to i32
  %5 = load i32, i32* %1, align 4
  %combine.extract.trunc = trunc i128 %.combined to i32
  %6 = load i32, i32* %i, align 4
  %7 = getelementptr inbounds i32, i32* %i, i64 2
  %combine.extract.shift2 = lshr i128 %.combined, 64
  %combine.extract.trunc3 = trunc i128 %combine.extract.shift2 to i32
  %8 = load i32, i32* %7, align 4
  %9 = getelementptr inbounds i32, i32* %i, i64 3
  %combine.extract.shift4 = lshr i128 %.combined, 96
  %combine.extract.trunc5 = trunc i128 %combine.extract.shift4 to i32
  %10 = load i32, i32* %9, align 4
  %11 = getelementptr inbounds i32, i32* %i, i64 4
  %12 = load i32, i32* %11, align 4
  %13 = getelementptr inbounds i32, i32* %i, i64 5
  %14 = load i32, i32* %13, align 4
  %15 = add nsw i32 %combine.extract.trunc, %14
  ret i32 %15
}

In the above every load that originated from i has been combined into a single i128 load. This test is not about transforming into the most optimal result it is testing to make sure that a referenced load chain does not affect all of the other load chains in the basic block. The problem with the current implementation is that it is too conservative when it comes to a write operation that aliases a load. When it reaches such an instruction it tries to combine all of the load chains at that point. This reduces the amount of loads that will be combined if it combines every chain when it reaches a store that references a load.

rriddle added inline comments.Jun 18 2016, 7:17 PM
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
13

The line at 12 makes more sense when you visualize the load chains at that point. I will visualize the load chains in this format {base_ptr : referenced loads by instruction name }.
So at line 12 the load chains are as follows :

{ %i : %0 } 
{ %str : %1}

The store instruction at line 12 is storing into %i1 which aliases the base ptr %str. This causes every load chain to try and combine, but at this point each only has 1 load; so there is no combine operation performed. The problem with this is that the chain with baseptr %i wasn't referenced by that store operation so why are we trying to combine that chain at this point.

eli.friedman added inline comments.Jun 18 2016, 7:25 PM
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
13

The problem isn't whether it's optimal... the problem is that the transformation doesn't preserve the semantics of the code. For my C code, -load-combine (plus instcombine to clean up the result) gives:

define void @f(i32* nocapture %a, i32* nocapture readonly %b) local_unnamed_addr #0 {
entry:
  %0 = bitcast i32* %b to i128*
  %.combined = load i128, i128* %0, align 4
  %combine.extract.trunc = trunc i128 %.combined to i32
  store i32 %combine.extract.trunc, i32* %a, align 4, !tbaa !1
  %combine.extract.shift = lshr i128 %.combined, 32
  %combine.extract.trunc1 = trunc i128 %combine.extract.shift to i32
  %arrayidx3 = getelementptr inbounds i32, i32* %a, i64 1
  store i32 %combine.extract.trunc1, i32* %arrayidx3, align 4, !tbaa !1
  %combine.extract.shift2 = lshr i128 %.combined, 64
  %combine.extract.trunc3 = trunc i128 %combine.extract.shift2 to i32
  %arrayidx5 = getelementptr inbounds i32, i32* %a, i64 2
  store i32 %combine.extract.trunc3, i32* %arrayidx5, align 4, !tbaa !1
  %combine.extract.shift4 = lshr i128 %.combined, 96
  %combine.extract.trunc5 = trunc i128 %combine.extract.shift4 to i32
  %arrayidx7 = getelementptr inbounds i32, i32* %a, i64 3
  store i32 %combine.extract.trunc5, i32* %arrayidx7, align 4, !tbaa !1
  ret void
}

This is completely, utterly wrong.

rriddle abandoned this revision.Jun 19 2016, 9:24 AM
rriddle updated this revision to Diff 61215.Jun 19 2016, 12:37 PM
rriddle added a reviewer: eli.friedman.

Fixed argument aliasing and updated test suite

rriddle updated this revision to Diff 61216.Jun 19 2016, 12:41 PM

Updated Memory location creation

eli.friedman added inline comments.Jun 19 2016, 1:39 PM
/Users/rriddle/Desktop/llvm/llvm/lib/Transforms/Scalar/LoadCombine.cpp
235

Useless comment.

248

Useless comment.

249

Convention is to write this as "if (RefInfo & MRI_Mod)".

251

Isn't this missing the step of sorting Loads->second?

261

What's the justification for the early exit here? Testcase?

265

Is updating the end iterator actually necessary?

269

Useless comment.

/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
9

Maybe it would be better to use noalias argument pointers instead of allocas? This testcase is a little weird because you're loading undef.

33

The allocas seem unnecessary here.

rriddle updated this revision to Diff 61224.Jun 19 2016, 2:04 PM

Redundant checks and comments. Updated test arguments.

rriddle updated this revision to Diff 61225.Jun 19 2016, 2:07 PM

Fixed load sorting

rriddle updated this revision to Diff 61226.Jun 19 2016, 2:12 PM
eli.friedman added inline comments.Jun 19 2016, 2:19 PM
/Users/rriddle/Desktop/llvm/llvm/lib/Transforms/Scalar/LoadCombine.cpp
238

Weird indentation.

256

This block of code is very similar to the body of LoadCombine::combineLoads; please refactor.

rriddle updated this revision to Diff 61228.Jun 19 2016, 2:27 PM

Refactor CombineLoads

eli.friedman edited edge metadata.Jun 20 2016, 11:23 AM

LGTM. If you don't have commit access, I'll commit this for you.

eli.friedman accepted this revision.Jun 20 2016, 11:23 AM
eli.friedman edited edge metadata.
majnemer added inline comments.Jun 20 2016, 11:26 AM
/Users/rriddle/Desktop/llvm/llvm/lib/Transforms/Scalar/LoadCombine.cpp
140–141

Please appropriately format this.

249

Please format this appropriately.

eli.friedman requested changes to this revision.Jun 21 2016, 7:20 PM
eli.friedman edited edge metadata.

I was going over this one more time before committing it... and I realized you don't actually have any test coverage for this change. I'm pretty sure the given testcase passes without your patch.

This revision now requires changes to proceed.Jun 21 2016, 7:20 PM
rriddle updated this revision to Diff 62109.Jun 28 2016, 10:35 AM
rriddle edited edge metadata.

Updated tests

Sorry about that.

rriddle abandoned this revision.Dec 15 2016, 3:49 AM