The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones.
This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output.
Paths
| Differential D57398
SimplifyDemandedVectorElts for all intrinsics ClosedPublic Authored by reames on Jan 29 2019, 10:32 AM.
Details Summary The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones. This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output.
Diff Detail Event TimelineComment Actions
Thank you for pushing on this. While trying to find an answer to your question, I found a bug in the patch. The answer is that x86_avx512_mask_mul_sd_round and friends have the order of operation changed on two transforms. Previously, we did their dedicated transform, then fell through into the code I removed. Now, we simplify via demanded elements first, then process them via dedicated transforms. That fallthrough is where the bug is though. We're now falling through into the wrong handler. The fallthrough needs replaced w/a break. Comment Actions LGTM - future patches should hopefully be able to move more of the SimplifyDemandedVectorElts/SimplifyDemandedVectorEltsLow code over. This revision is now accepted and ready to land.Jan 30 2019, 1:01 AM Comment Actions
Not sure what you mean here. Over to where? from where? Closed by commit rL352653: SimplifyDemandedVectorElts for all intrinsics (authored by reames). · Explain WhyJan 30 2019, 11:21 AM This revision was automatically updated to reflect the committed changes. Comment Actions
I meant for cases like the SimplifyDemandedVectorEltsLow call in Intrinsic::x86_vcvtph2ps_128/256 which can now be removed and similar code moved into InstCombiner::SimplifyDemandedVectorElts so it is correctly handled within a recursive call. Not all intrinsics would work, but some are probably worth it. Comment Actions
Ah, yes. I don't plan to focus on the x86 specific intrinsics, but I agree that would be a worthwhile cleanup. At a high level, do you know why we need these intrinsics at all? What are they expressing which can be well represented/pattern matched from normal IR? Comment Actions
I'll take a look at some point.
The cvtph2ps intrinsics are still around because x86 is terrible at f16 lowering (scalar and vector) - see PR37554
Revision Contents
Diff 184181 lib/Transforms/InstCombine/InstCombineCalls.cpp
test/Transforms/InstCombine/X86/x86-avx512.ll
|