This is an archive of the discontinued LLVM Phabricator instance.

[AVX-512] Add support for commuting VPTERNLOG.
ClosedPublic

Authored by craig.topper on Sep 11 2016, 11:15 PM.

Details

Summary

VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector.

We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value.

The first operand is the highest of the 3-bits(bit A) and the third operand is the lowest(bit C). So to swap the first and second operand, we need to swap the rows in following table where A and B have different values and C has the same value. So bit 2 and bit 4 swap, and bit 3 and bit 5 swap.

Bit ABit BBit CImmediate
000Imm[0]
001Imm[1]
010Imm[2]
011Imm[3]
100Imm[4]
101Imm[5]
110Imm[6]
111Imm[7]

This patch reuses some of the code from FMA3 commuting since it is also a three source instruction. Most of findFMA3CommutedOpIndices is split out into findThreeSrcCommutedOpIndices to be reused to determine which operands can be commuted. I also changed it to use TSFlags bits for determining masked instructions since the FMA3Group attribute bits would not work for VPTERNLOG, but the TSFlag bits work for both.

For VPTERNLOG we call the new findThreeSrcCommutedOpIndices from findCommutedOpIndices with no additional processing.

The code from getFMA3OpcodeToCommuteOperands that determines which of the 3 possible cases is being requested, is split out into a helper function getThreeSrcCommuteCase that returns 0, 1, 2 for the case number if its a valid case, or -1 if its not.

X86InstrInfo::commuteInstructionImpl for VPTERNLOG calls getThreeSrcCommuteCase and if its a valid case, we use the case number to lookup which bits to swap in the immediate and make the modifications.

There appears to be an issue with the two address instruction pass where it stops searching additional commutable operands if it decides to swap the first two operands. It could look harder for a better commute. This deficiency can be seen in test cases where some vmovdqa64 instructions remaing that could have been removed if the first and third operand were commuted instead of the first and second. I hope to address this in a future commit.

Diff Detail

Event Timeline

craig.topper retitled this revision from to [AVX-512] Add support for commuting VPTERNLOG..
craig.topper updated this object.
craig.topper added a subscriber: llvm-commits.
v_klochkov accepted this revision.Sep 21 2016, 4:09 PM
v_klochkov edited edge metadata.

Hi Craig,

Excuse me for the delay, I did not realize that this patch was based on the existing FMA commute infrastructure
and thus I thought I was not the primary reviewer of this change-set.

Ok, I reviewed this patch and I have only 2 minor comments.
Otherwise, this patch looks great to me.

Regarding 'two address instruction pass' issue, I think it is another test case for this bug:
https://llvm.org/bugs/show_bug.cgi?id=17229
You may want to mention the problem with VTERNLOG in that bug.

Thank you,
Slava

lib/Target/X86/X86InstrInfo.cpp
3324

It may be good to mention here what the returned values 0, 1, 2 mean. I.e. 0 means that it is possible to commute operands SrcOp1 and SrcOp2, etc.

3341

"result of FMA."
The new function is not FMA specific. The comment section should be updated.

This revision is now accepted and ready to land.Sep 21 2016, 4:09 PM
craig.topper closed this revision.Sep 21 2016, 8:14 PM

Committed in r282132.