Hi David, Quentin (and all),
This patch teaches the Instruction Combiner how to fold a call to 'Intrinsic::x86_sse4a_insertqi' if the 'length field' (3rd operand) is set to zero, and if the sum between field 'length' and 'bit index' (4th operand) is bigger than 64.
From the AMD64 Architecture Programmer’s Manual:
- "If the sum of the bit index + length field is greater than 64, the results are undefined."
- "A value of zero in the field length is defined as a length of 64."
As a consequence of 1. and 2., "If the length field is 0 and the bit index is 0, bits 63:0 of the source operand are inserted. For any other value of the bit index, the results are undefined."
This patch improves the existing combining logic for Intrinsic::x86_sse4a_insertqi adding extra checks to address both point 1. and point 2.
Added extra test cases to existing test 'vec_demanded_elts.ll'.
Please let me know if ok to submit.
Thanks!
Andrea