This is an archive of the discontinued LLVM Phabricator instance.

[X86] Emit BZHI when mask is ~(-1 << nbits))
ClosedPublic

Authored by lebedev.ri on May 28 2018, 8:11 AM.

Details

Summary

In D47428, i propose to choose the ~(-(1 << nbits)) as the canonical form of low-bit-mask formation.
As it is seen from these tests, there is a reason for that.

AArch64 currently better handles ~(-(1 << nbits)), but not the more traditional (1 << nbits) - 1 (sic!).
The other way around for X86.
It would be much better to canonicalize.

This patch is completely monkey-typing.
I don't really understand how this works :)
I have based it on // x & (-1 >> (32 - y)) pattern.

Also, when we only have BMI, i wonder if we could use BEXTR with start=0 ?

Related links:
https://bugs.llvm.org/show_bug.cgi?id=36419
https://bugs.llvm.org/show_bug.cgi?id=37603
https://bugs.llvm.org/show_bug.cgi?id=37610
https://rise4fun.com/Alive/idM

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.May 28 2018, 8:11 AM

Rebased ontop of reworked tests,
Redo patternmatch ontop of the first matcher there, x & ((1 << y) - 1).

  • Rebased ontop of revisited tests.
  • Monkey-fix the rest of the patterns when the index is i8.
lebedev.ri updated this revision to Diff 149628.Jun 3 2018, 2:49 AM

Rebased ontop of updated tests.

I will follow-up with a similar patches for bextr, but after this lands.

RKSimon added inline comments.Jun 4 2018, 9:17 AM
lib/Target/X86/X86InstrInfo.td
2524 ↗(On Diff #149640)

We're duplicating all these patterns for 32/64 cases - worth wrapping these in a bzhi_patterns multiclass?

lebedev.ri marked an inline comment as done.

Apply multiclass to the cases where $lz is always GR8.
This is kinda messy, not too good with tablegen yet..

I'm not quite sure how to handle those last few special cases where $lz is always GR32.

craig.topper added inline comments.Jun 4 2018, 1:34 PM
lib/Target/X86/X86InstrInfo.td
2445 ↗(On Diff #149835)

I'm thorougly confused as to how this many arguments are "Intrinsic". I wouldn't have expected that to work unless Intrinsic is very loosely defined. DstMemInt and DstInt should be of type "Instruction". And they should probably be DstMemInst and DstInst. I think "Int" should be of type ValueType and renamed to VT.

craig.topper added inline comments.Jun 4 2018, 1:45 PM
lib/Target/X86/X86InstrInfo.td
2433 ↗(On Diff #149835)

bitwidth isn't used in this class.

lebedev.ri updated this revision to Diff 149853.Jun 4 2018, 2:22 PM
lebedev.ri marked an inline comment as done.

Cleanup multiclass a bit more.

lib/Target/X86/X86InstrInfo.td
2445 ↗(On Diff #149835)

I'm thorougly confused as to how this many arguments are "Intrinsic".

I'm super confused by this too.
I'm not quite familiar with this, and there does not seem to be any documentation,
so i'm writing this via the copy-paste method :/
Naturally, the results are rather suboptimal.

But even then, this is still backwards. We want to define input patterns,
and then define the output pattern, as sub-class. But i guess that can be done later.

lebedev.ri added inline comments.Jun 5 2018, 6:36 AM
lib/Target/X86/X86InstrInfo.td
2465–2469 ↗(On Diff #149853)

I understand that this still has some duplication.
I have tried reworking it further, but i do not yet understand how
to pass either RC:$src or (x86memop addr:$src) as multiclass param.

Any directional hints on this?
Is this sufficiently tablegen-y, or any suggestions on how to condense it further?

This revision is now accepted and ready to land.Jun 6 2018, 9:53 AM

LGTM

Thank you for the review!

This revision was automatically updated to reflect the committed changes.