This is an archive of the discontinued LLVM Phabricator instance.

[X86] Update fast-isel tests for changes from D48346.
ClosedPublic

Authored by craig.topper on Jun 19 2018, 10:28 PM.

Details

Summary

The new IR fixes a mismatch in the final extractelement for the i32 intrinsics. Previously we extracted a 64-bit element even though we only wanted 32 bits.

SimplifyDemandedElts isn't able to make FP elements undef now and the shuffle mask I used prevents the use of horizontal add we had before. Not sure we should have been using horizontal add anyway. It's implemented on Intel with two port 5 shuffles and an add. So we have on less shuffle now, but an additional instruction to decode.

Diff Detail

Event Timeline

craig.topper created this revision.Jun 19 2018, 10:28 PM
RKSimon accepted this revision.Jun 20 2018, 3:56 AM

LGTM - avx512-reduceMinMaxIntrin.c codegen could be cleaned up a bit to match these better - we don't typically include all the alloca etc. from -O0 code

This revision is now accepted and ready to land.Jun 20 2018, 3:56 AM
This revision was automatically updated to reflect the committed changes.