This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.
ClosedPublic

Authored by andreadb on Dec 9 2014, 11:14 AM.

Details

Summary

Hi David, Quentin (and all),

This patch teaches the Instruction Combiner how to fold a call to 'Intrinsic::x86_sse4a_insertqi' if the 'length field' (3rd operand) is set to zero, and if the sum between field 'length' and 'bit index' (4th operand) is bigger than 64.

From the AMD64 Architecture Programmer’s Manual:

  1. "If the sum of the bit index + length field is greater than 64, the results are undefined."
  2. "A value of zero in the field length is defined as a length of 64."

As a consequence of 1. and 2., "If the length field is 0 and the bit index is 0, bits 63:0 of the source operand are inserted. For any other value of the bit index, the results are undefined."

This patch improves the existing combining logic for Intrinsic::x86_sse4a_insertqi adding extra checks to address both point 1. and point 2.

Added extra test cases to existing test 'vec_demanded_elts.ll'.

Please let me know if ok to submit.
Thanks!
Andrea

Diff Detail

Repository
rL LLVM

Event Timeline

andreadb updated this revision to Diff 17085.Dec 9 2014, 11:14 AM
andreadb retitled this revision from to [InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi..
andreadb updated this object.
andreadb edited the test plan for this revision. (Show Details)
andreadb added reviewers: qcolombet, nadav, majnemer.
andreadb added a subscriber: Unknown Object (MLST).
qcolombet accepted this revision.Dec 9 2014, 1:18 PM
qcolombet edited edge metadata.

Hi Andrea,

LGTM.

Thanks,
-Quentin

lib/Transforms/InstCombine/InstCombineCalls.cpp
779 ↗(On Diff #17085)

Maybe add: per AMD64 manual, or something like that.

This revision is now accepted and ready to land.Dec 9 2014, 1:18 PM
majnemer added inline comments.Dec 9 2014, 5:27 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
784 ↗(On Diff #17085)

Is there anything that prevents Index + Length from wrapping around? If not, what happens?

qcolombet added inline comments.Dec 9 2014, 6:08 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
784 ↗(On Diff #17085)

I think there is nothing that would prevent that, but in that case we fall into this condition:
"If the sum of the bit index + length field is greater than 64, the results are undefined.”

So I’d say nothing to be done here but maybe I misunderstood something :).

andreadb updated this revision to Diff 17165.Dec 11 2014, 5:13 AM
andreadb edited edge metadata.

Hi David and Quentin,

I uploaded a new version of the patch.

Differences with respect to the previous patch are:

  • I added the extra comments requested by Quentin.
  • I also added a comment that hopefully should answer the question raised by David about the potential wrap around in the sum Index + Length.

Please let me know if ok to submit.

Thanks!
Andrea.

Hi Andrea,

LGTM.

Thanks,
-Quentin

majnemer accepted this revision.Dec 11 2014, 12:27 PM
majnemer edited edge metadata.

LGTM

This revision was automatically updated to reflect the committed changes.