Removed the alias set tracker and added individual load aliasing checks for each individual chain.
Details
Diff Detail
Event Timeline
Please add testcases to the tests directory. Also, please split out each independent change. Each change is easier to analyze in isolation.
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll | ||
---|---|---|
12 | What transform is this testcase checking for? I don't think you can prove any useful aliasing property between %i, %i1, and %str? |
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll | ||
---|---|---|
12 | In the current implementation %1 is being aliased in that store operation which would cause every load chain to try and combine which would prevent the load at %0 from being able to combine with %2, %4, etc. |
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll | ||
---|---|---|
12 | I'm not sure I follow... is the transform you're trying to perform something like this? void f(int* a, int* b) { a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; a[3] = b[3]; // Combine to "memcpy(a, b, 16)"??? } |
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll | ||
---|---|---|
12 | No, the point of transform is to combine each of the individual loads into a single large one. In this specific test, there are multiple GEP + load pairs coming from %i. Each of those will combine down to a single i128 load. It may be easier to see with this output define i32 @Load_MultiChain(i32* %i) { %1 = getelementptr inbounds i32, i32* %i, i64 1 %2 = bitcast i32* %i to i8* %3 = getelementptr i8, i8* %2, i64 0 %4 = bitcast i8* %3 to i128* %.combined = load i128, i128* %4, align 4 %combine.extract.shift = lshr i128 %.combined, 32 %combine.extract.trunc1 = trunc i128 %combine.extract.shift to i32 %5 = load i32, i32* %1, align 4 %combine.extract.trunc = trunc i128 %.combined to i32 %6 = load i32, i32* %i, align 4 %7 = getelementptr inbounds i32, i32* %i, i64 2 %combine.extract.shift2 = lshr i128 %.combined, 64 %combine.extract.trunc3 = trunc i128 %combine.extract.shift2 to i32 %8 = load i32, i32* %7, align 4 %9 = getelementptr inbounds i32, i32* %i, i64 3 %combine.extract.shift4 = lshr i128 %.combined, 96 %combine.extract.trunc5 = trunc i128 %combine.extract.shift4 to i32 %10 = load i32, i32* %9, align 4 %11 = getelementptr inbounds i32, i32* %i, i64 4 %12 = load i32, i32* %11, align 4 %13 = getelementptr inbounds i32, i32* %i, i64 5 %14 = load i32, i32* %13, align 4 %15 = add nsw i32 %combine.extract.trunc, %14 ret i32 %15 } In the above every load that originated from i has been combined into a single i128 load. This test is not about transforming into the most optimal result it is testing to make sure that a referenced load chain does not affect all of the other load chains in the basic block. The problem with the current implementation is that it is too conservative when it comes to a write operation that aliases a load. When it reaches such an instruction it tries to combine all of the load chains at that point. This reduces the amount of loads that will be combined if it combines every chain when it reaches a store that references a load. |
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll | ||
---|---|---|
12 | The line at 12 makes more sense when you visualize the load chains at that point. I will visualize the load chains in this format {base_ptr : referenced loads by instruction name }. { %i : %0 } { %str : %1} The store instruction at line 12 is storing into %i1 which aliases the base ptr %str. This causes every load chain to try and combine, but at this point each only has 1 load; so there is no combine operation performed. The problem with this is that the chain with baseptr %i wasn't referenced by that store operation so why are we trying to combine that chain at this point. |
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll | ||
---|---|---|
12 | The problem isn't whether it's optimal... the problem is that the transformation doesn't preserve the semantics of the code. For my C code, -load-combine (plus instcombine to clean up the result) gives: define void @f(i32* nocapture %a, i32* nocapture readonly %b) local_unnamed_addr #0 { entry: %0 = bitcast i32* %b to i128* %.combined = load i128, i128* %0, align 4 %combine.extract.trunc = trunc i128 %.combined to i32 store i32 %combine.extract.trunc, i32* %a, align 4, !tbaa !1 %combine.extract.shift = lshr i128 %.combined, 32 %combine.extract.trunc1 = trunc i128 %combine.extract.shift to i32 %arrayidx3 = getelementptr inbounds i32, i32* %a, i64 1 store i32 %combine.extract.trunc1, i32* %arrayidx3, align 4, !tbaa !1 %combine.extract.shift2 = lshr i128 %.combined, 64 %combine.extract.trunc3 = trunc i128 %combine.extract.shift2 to i32 %arrayidx5 = getelementptr inbounds i32, i32* %a, i64 2 store i32 %combine.extract.trunc3, i32* %arrayidx5, align 4, !tbaa !1 %combine.extract.shift4 = lshr i128 %.combined, 96 %combine.extract.trunc5 = trunc i128 %combine.extract.shift4 to i32 %arrayidx7 = getelementptr inbounds i32, i32* %a, i64 3 store i32 %combine.extract.trunc5, i32* %arrayidx7, align 4, !tbaa !1 ret void } This is completely, utterly wrong. |
/Users/rriddle/Desktop/llvm/llvm/lib/Transforms/Scalar/LoadCombine.cpp | ||
---|---|---|
240 | Useless comment. | |
253 | Useless comment. | |
254 | Convention is to write this as "if (RefInfo & MRI_Mod)". | |
256 | Isn't this missing the step of sorting Loads->second? | |
266 | What's the justification for the early exit here? Testcase? | |
270 | Is updating the end iterator actually necessary? | |
274 | Useless comment. | |
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll | ||
10 | Maybe it would be better to use noalias argument pointers instead of allocas? This testcase is a little weird because you're loading undef. | |
34 | The allocas seem unnecessary here. |
I was going over this one more time before committing it... and I realized you don't actually have any test coverage for this change. I'm pretty sure the given testcase passes without your patch.
Please appropriately format this.