When trying to simplify instruction based on demanded vector elements,
we are able to look through bitcast that does not change number of
vector elements.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
576–578 | Is there not a more convenient way in PatternMatch to check for matching fixed elements? |
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
576–578 | Sorry I don't have a good idea to simplify the code. Please let me know if you have a good suggestion. |
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
369 | It might be neater to handle BitCast in findDemandedEltsBySingleUser instead. In either case, you're introducing recursion, so perhaps it should have a depth limit. | |
571 | It took me a long time to understand this part of the patch. It is handling cases like this: %var = call ... // has multiple uses %var1 = bitcast <4 x i32> %var to <4 x float> // has single use %var2 = extractelement <4 x float> %var1, i64 0 If the operand of extractelement itself had multiple uses, then we would handle it correctly here by calling findDemandedEltsByAllUsers. But because it is wrapped inside a single-use bitcast, we call SimplifyDemandedVectorElts directly, which recurses through the bitcast but does not handle multiple uses at the recursive step. Really all this code feels like a bit of a hack that we only call SimplifyDemandedVectorElts in certain places (when we think there's a good chance that not all elements are demanded) and we only handle multiple uses in certain places. @spatel do you have any thoughts about this? |
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
576–578 | dyn_cast instead of isa+cast, but There probably should be a helper to check this somewhere |
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
369 | Improving findDemandedEltsBySingleUser() sounds like a good idea. @RKSimon might have more to say about that. There may be a codegen equivalent that we can use as a guideline. | |
llvm/test/Transforms/InstCombine/AMDGPU/demanded-vector-elts-multi-user.ll | ||
4 | I can't tell what changes are expected from this patch. The patch is target-independent, so the tests should also be target-independent and minimized as much as possible. |
llvm/test/Transforms/InstCombine/AMDGPU/demanded-vector-elts-multi-user.ll | ||
---|---|---|
18 | Can you post patch against pre-commited base tests? |
It might be neater to handle BitCast in findDemandedEltsBySingleUser instead. In either case, you're introducing recursion, so perhaps it should have a depth limit.