This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Vector integer/float conversion memory folding
ClosedPublic

Authored by RKSimon on Oct 24 2014, 10:44 AM.

Details

Summary

Added a missing memory folding relationship for the CVTDQ2PS instruction (and its AVX variants) - we can safely fold these (but not the CVTDQ2PD versions which have a register/memory size discrepancy in the source operand). I've added a test case demonstrating that stack folding now works.

Also fixed an issue with the VCVTTPD2DQ / VCVTTPS2DQ instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand.

Finally, tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1).

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 15421.Oct 24 2014, 10:44 AM
RKSimon retitled this revision from to [X86][SSE] Vector integer/float conversion memory folding.
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon added reviewers: qcolombet, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: Unknown Object (MLST).
qcolombet edited edge metadata.Oct 24 2014, 11:07 AM

Hi Simon,

Thanks for fixing this.

Could you add a test case for the second issue you fixed?

Also, could you make three patches, one for each concern?

Thanks,
-Quentin

I'll split the cvttp*todq fix from the cvtdq2ps patch and add extra tests to the avx1-stack-reload-folding.ll

BTW I could have added AVX-512 equivalents as well but have no hardware to test it on....

Does the third part of the patch need review? Its pretty benign.

Does the third part of the patch need review? Its pretty benign.

No, go for it :).

Thanks,
-Quentin

RKSimon updated this revision to Diff 15488.Oct 27 2014, 6:29 AM
RKSimon edited edge metadata.

Removed the CVTTP*2DQ fixes - I will create a separate patch for this shortly.

spatel added inline comments.Oct 27 2014, 9:07 AM
test/CodeGen/X86/avx1-stack-reload-folding.ll
17–23 ↗(On Diff #15488)

Is there a guideline regarding IR test cases with types that aren't specified by the ABI?

Ie, is the lowering of 128-element vectors to legal x86 types stable? Would it be better to define a struct here?

qcolombet added inline comments.Oct 27 2014, 12:22 PM
test/CodeGen/X86/avx1-stack-reload-folding.ll
17–23 ↗(On Diff #15488)

Same remark as Sanjay: would it be possible to have a simpler test case, especially with legal types?

RKSimon updated this revision to Diff 15528.Oct 28 2014, 12:40 PM

Updated test so that we don't use non-native vector types as arguments - now uses pointers to load/store instead.

We need to keep to this basic pattern to ensure we spill a lot to stack and the 'big vectors' approach drastically helps makes this simple to understand. Using a more basic load/cvt/store pattern would mean that it wouldn't be the memory folding code thats being tested - the load-execute version of the instruction from the tables would be matched much earlier instead.

qcolombet accepted this revision.Oct 30 2014, 11:50 AM
qcolombet edited edge metadata.

Hi Simon,

We need to keep to this basic pattern to ensure we spill a lot to stack and the 'big vectors' approach drastically helps makes this simple to understand. Using a more basic load/cvt/store pattern would mean that it wouldn't be the memory folding code thats being tested - the load-execute version of the instruction from the tables would be matched much earlier instead.

I was afraid of that. Could you add a comment explaining that at the being of the tests?

Please commit with this change.

Thanks,
-Quentin

This revision is now accepted and ready to land.Oct 30 2014, 11:50 AM
RKSimon closed this revision.Nov 5 2014, 2:39 PM
RKSimon updated this revision to Diff 15833.

Closed by commit rL221407 (authored by @RKSimon).