Given two Store instructions with equivalent pointer operands,
they could be merged into their common successor basic block if
the value operand of one is bitcasted to match the type of the
other. This patch only allows a bitcast between non-aggregate
primitive types with matching bitwidths.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
60,050 ms | x64 debian > MLIR.Examples/standalone::test.toy |
Event Timeline
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1624 | Check for isBitOrNoopPointerCastable() instead, and please add a test that requires inserting ptrtoint or inttoptr. Also this is missing a check for hasSameSpecialState(). Please add a test with for example different alignments. | |
1632 | You should only insert the bitcast when actually doing the transform. We should not inserting if the transform is aborted later. |
- Replaced bitwidth equivalence check with isBitOrNoopPointerCastable()
- Inserted hasSameSpecialState() check
- Added a test with different alignments
- Added ptrtoint test
- Created an AMDGPU-specific test for inttoptr. This test appears to add an iteration to instruction combining until it reaches a fixed point. That is, instcombine will assume that it will run indefinitely when instcombine-infinite-loop-threshold is set to 3 whereas the test passes when I change the threshold value to 4.
llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll | ||
---|---|---|
221 | Can't these merge using the minimum of the alignments? |
llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll | ||
---|---|---|
243 | This one is undefined because the alloca has default align 2 |
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1616 | These isSingleValueType / is VectorTy checks should not be necessary, isBitOrNoopPointerCastable will already do all the necessary type checks. | |
1678 | There is no need to rewrite to a new temporary store instruction. You can insert the bitcast directly at the phi node operand. You also need to use the IRBuilder to create the cast, otherwise it will not be queued for reprocessing. In that case you also don't need the InsertBitcast flag, because the IRBuilder will omit the bitcast if it is not needed. If you do everything correctly, then your AMDGPU problem should go away as well -- the instcombine-infinite-loop-threshold flag exists specifically to detect these worklist management bugs. | |
llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll | ||
221 | It's possible, but please not in this patch. |
- Use IRBuilder to create cast
- Eliminate InsertBitcast flag
- Fixed alignment test
- Removed redundant checks
- Eliminated temporary store
@nikic The issue with infinite-loop-threshold continues to persist.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1605 | Doesn't look like returning StoreInst * is needed/useful anymore -- return bool instead? | |
1638 | This break got dropped. I think this is the reason for the test regressions and the crashes in pre-merge checks. Took me a while to spot because I usually only look on the right side of the diff... |
- Brought back the break statement which was removed accidentally
- OtherStoreIsMergable returns a bool now, instead of a StoreInst *
- Corrected tests
- Simplified OtherStoreIsMergeable as requested
The reason for another iteration of instcombine for @inttoptr_merge(..) is that mergeStoreIntoSuccessor() generates a case where a store/load pair is merged into an inttoptr which is later merged with the PHI inserted by mergeStoreIntoSuccessor() in the first iteration. I am not sure if implementing these cases in mergeStoreIntoSuccessor() is viable since it will make the code redundant.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1649–1653 | This can be simplified as suggested, since if we find a non-mergable store, the mayWriteToMemory test just below will do the return false. Also I think moving the cast-to-StoreInst inside the helper function would make the patch simpler overall, i.e. something like: auto IsMergeableStore = [&](Instruction *OtherStore) -> bool ... |
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1649–1653 | We still need the StoreInst to assign to OtherStore, so I don't think moving the cast into the helper would work. |
I've looked into this as well, and I think this would mostly be solved by D75362 (we visit instructions in the wrong order, in a way that is relevant here). It's okay to ignore this.
It took me a while to understand why AMDGPU is relevant here. The difference is that with default data layout i64 only has 4 byte alignment, so we first need to promote the alignment to 8 before the transform can be performed. For AMDGPU the alignment is 8 to start with. You can make this test AMDGPU independent by just explicitly specifying the alignment on the stores, instead of using natural alignment.
- Simplified code
- Made inttoptr_merge() target independent by inserting align 8 to store instructions and support from D75362 as mentioned
Doesn't look like returning StoreInst * is needed/useful anymore -- return bool instead?