The code is doing the optimization:
((a | c1) << c2) ==> (a << c2) + (c1 << c2)
But this is only valid if a and c1 have no common bits being set.
Details
- Reviewers
arsenm foad - Commits
- rG60d9010aaf0f: AMDGPU: Fix issue in shl(or) combine
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The right way is to transform the pattern to (a << c2) | (c1 << c2)
But the right transformation does not do any help on folding the
constant offset into the memory instructions.
It should help because SelectionDAG::isBaseWithConstantOffset knows how to match OR (if the known bits do not overlap) as well as ADD.
It should help because SelectionDAG::isBaseWithConstantOffset knows how to match OR (if the known bits do not overlap) as well as ADD.
Do you mean we can optimize for shl(or) with the help of knownbits? I think I agree with you. But I am not sure whether it is really helpful in practical cases. The isBaseWithConstantOffset you mentioned specifically designed to work on stack slot access. I am not sure if such patterns can also be observed more broadly. And I think such kind of optimization should be done separately, maybe it should be added in the common LLVM code. And the lit-test should also be redesigned. I would rather fix the problematic transformation first. sounds ok to you?
Is there a negative test for the common bits case?
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
9577 | Don’t know why this has to change |
I have added one: shl_or_ptr_not_combine_2use_lds
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
9577 | I don't know what happened:( will fix it. |
I think this is OK, but wouldn't it be simpler to transform ((a | c1) << c2) ==> (a << c2) | (c1 << c2) and remove the knownbits check? Or do you think that would make the generated code worse overall?
I observed more assembly instructions in one typical IR with the transform ((a | c1) << c2) ==> (a << c2) | (c1 << c2). That's why I wanted to remove the code in the first version. But later I find common code tries hard to make (shl (or x, c1), c2) -> add (shl x, c2), (shl c1, c2) happen. So I just fix the issue to help possible cases.
Don’t know why this has to change