Page MenuHomePhabricator

[x86] allow pairs of PCMPEQ for vector-sized integer equality comparisons (PR33325)

Authored by spatel on Dec 28 2017, 2:24 PM.



This is an extension of D31156 with the goal that we'll allow memcmp() == 0 expansion for x86 to use 2 pairs of loads per block.

The memcmp expansion pass (formerly part of CGP) will generate this kind of pattern with oversized integer compares, so we want to transform these into x86-specific vector nodes before legalization splits things into scalar chunks.

See PR33325 for more details:

Diff Detail


Event Timeline

spatel created this revision.Dec 28 2017, 2:24 PM
RKSimon added inline comments.Dec 29 2017, 2:31 PM
3 ↗(On Diff #128305)

Please can you add AVX1/AVX512F/AVX512BW test cases to prove its not doing anything dodgy with those?

spatel updated this revision to Diff 128389.Jan 1 2018, 9:21 AM

Patch updated:

  1. Fixed the check for 'IsOrXorXor'; we must confirm that the 2nd operand ('Y') of the setcc is actually a zero.
  2. Added runs / checks for more ISA variations to show that we're behaving as expected in those cases. Note that we do not expect to produce 'i256' types via memcmp expansion for anything before AVX2, so I don't think we care about optimizing those cases (the last 2 tests just show the legal scalar code in that scenario).

cc @craig.topper and @hfinkel -- This might be an interesting case for the x86 preferred vector width effort (D41341) even for 128/256-bit vectors. Here, we're taking scalar code that is or would be produced by memcmp expansion and translating it to vectors based on the available ISA, but without accounting for the preferred vector width. I think we should add that predicate as a follow-up to prevent producing vector code if the user has requested we avoid it. There's no AVX512-specific codegen here or in memcmp expansion, so that's not a danger (yet).

courbet added inline comments.Jan 1 2018, 11:34 PM
36308 ↗(On Diff #128389)

I would add a comment to explain when this is typically generated, else this feels a bit magic. Something along the lines of:

"This pattern is typically generated by the memcmp expansion pass with oversized integer compares (see PR33325)."

36314 ↗(On Diff #128389)

The isNullConstant(Y) is duplicated here with the definition of IsOrXorXor. Let's keep it inside IsOrXorXor ( and maybe rename to IsOrXorXorCCZero ?

36332 ↗(On Diff #128389)

eq: this is a bit misleading on the first read. Change to eq|ne or cc ?

spatel marked 3 inline comments as done.Jan 2 2018, 7:14 AM
spatel added inline comments.
36314 ↗(On Diff #128389)

I agree with improving the variable name, but I don't see how we can simplify the logic unless we repeat the OrXorXor checks? We have:

A && !(A && B) --> A && (!A || !B) --> A && !B

where A is isNullConstant and B is OrXorXor

spatel updated this revision to Diff 128419.Jan 2 2018, 7:16 AM
spatel marked an inline comment as done.

Patch updated:
No functional difference from previous draft, but improved code comments and variable name.

courbet added inline comments.Jan 2 2018, 7:19 AM
36314 ↗(On Diff #128389)

Never mind; I misread the condition. Let's just fix the variable name.

courbet accepted this revision.Jan 2 2018, 8:20 AM
This revision is now accepted and ready to land.Jan 2 2018, 8:20 AM
This revision was automatically updated to reflect the committed changes.