This is an archive of the discontinued LLVM Phabricator instance.

SimplifyDemandedVectorElts for all intrinsics
ClosedPublic

Authored by reames on Jan 29 2019, 10:32 AM.

Details

Summary

The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones.

This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output.

Diff Detail

Repository
rL LLVM

Event Timeline

reames created this revision.Jan 29 2019, 10:32 AM

Any insight into why those tests have changed?

reames planned changes to this revision.Jan 29 2019, 2:27 PM

Any insight into why those tests have changed?

Thank you for pushing on this. While trying to find an answer to your question, I found a bug in the patch.

The answer is that x86_avx512_mask_mul_sd_round and friends have the order of operation changed on two transforms. Previously, we did their dedicated transform, then fell through into the code I removed. Now, we simplify via demanded elements first, then process them via dedicated transforms.

That fallthrough is where the bug is though. We're now falling through into the wrong handler. The fallthrough needs replaced w/a break.

reames updated this revision to Diff 184181.Jan 29 2019, 2:32 PM

Fix bug noted in last comment.

RKSimon accepted this revision.Jan 30 2019, 1:01 AM

LGTM - future patches should hopefully be able to move more of the SimplifyDemandedVectorElts/SimplifyDemandedVectorEltsLow code over.

This revision is now accepted and ready to land.Jan 30 2019, 1:01 AM

future patches should hopefully be able to move more of the SimplifyDemandedVectorElts/SimplifyDemandedVectorEltsLow code over.

Not sure what you mean here. Over to where? from where?

This revision was automatically updated to reflect the committed changes.

future patches should hopefully be able to move more of the SimplifyDemandedVectorElts/SimplifyDemandedVectorEltsLow code over.

Not sure what you mean here. Over to where? from where?

I meant for cases like the SimplifyDemandedVectorEltsLow call in Intrinsic::x86_vcvtph2ps_128/256 which can now be removed and similar code moved into InstCombiner::SimplifyDemandedVectorElts so it is correctly handled within a recursive call. Not all intrinsics would work, but some are probably worth it.

future patches should hopefully be able to move more of the SimplifyDemandedVectorElts/SimplifyDemandedVectorEltsLow code over.

Not sure what you mean here. Over to where? from where?

I meant for cases like the SimplifyDemandedVectorEltsLow call in Intrinsic::x86_vcvtph2ps_128/256 which can now be removed and similar code moved into InstCombiner::SimplifyDemandedVectorElts so it is correctly handled within a recursive call. Not all intrinsics would work, but some are probably worth it.

Ah, yes. I don't plan to focus on the x86 specific intrinsics, but I agree that would be a worthwhile cleanup.

At a high level, do you know why we need these intrinsics at all? What are they expressing which can be well represented/pattern matched from normal IR?

future patches should hopefully be able to move more of the SimplifyDemandedVectorElts/SimplifyDemandedVectorEltsLow code over.

Not sure what you mean here. Over to where? from where?

I meant for cases like the SimplifyDemandedVectorEltsLow call in Intrinsic::x86_vcvtph2ps_128/256 which can now be removed and similar code moved into InstCombiner::SimplifyDemandedVectorElts so it is correctly handled within a recursive call. Not all intrinsics would work, but some are probably worth it.

Ah, yes. I don't plan to focus on the x86 specific intrinsics, but I agree that would be a worthwhile cleanup.

I'll take a look at some point.

At a high level, do you know why we need these intrinsics at all? What are they expressing which can be well represented/pattern matched from normal IR?

The cvtph2ps intrinsics are still around because x86 is terrible at f16 lowering (scalar and vector) - see PR37554