Page MenuHomePhabricator

[SelectionDAG] Optimization of BITREVERSE legalization for power-of-2 integer scalar/vector types
ClosedPublic

Authored by RKSimon on Jun 21 2016, 5:37 PM.

Details

Summary

An extension of D19978, this patch replaces the default BITREVERSE evaluation of individual bit masks+shifts with block mask+shifts when we have integer elements of power-of-2 bits in size.

After calling BSWAP to reverse the order of the constituent bytes (which typically follows a similar approach), every neighbouring 4-bits, 2-bits and finally 1-bit pairs are masked off and swapped over with shifts.

In doing so we can significantly reduce the number of operations required.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 61469.Jun 21 2016, 5:37 PM
RKSimon retitled this revision from to [SelectionDAG] Optimization of BITREVERSE legalization for power-of-2 integer scalar/vector types.
RKSimon updated this object.
RKSimon added reviewers: jmolloy, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
spatel edited edge metadata.Jun 22 2016, 11:14 AM

I haven't looked at bitreverse before this, so James or someone else should review too. A few nits inline, but otherwise LGTM.

Is there something that guarantees that an i2 or i4 type doesn't get in here? Should we assert that VT is i8 or larger?

BTW, there's a bug in the LangRef:
"The llvm.bitreverse.iN intrinsic returns an i16 value"

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2560 ↗(On Diff #61469)

Open code?

2568–2569 ↗(On Diff #61469)

If I'm reading this sentence correctly, it should be:
"If we can, perform"

test/CodeGen/X86/vector-bitreverse.ll
16 ↗(On Diff #61469)

I wish we printed hex asm comments for 8-bit immediates too. :)

RKSimon updated this revision to Diff 61723.Jun 23 2016, 2:21 PM
RKSimon edited edge metadata.

Updated based on Sanjay's comments.

James - any comments?

jmolloy edited edge metadata.Jul 4 2016, 6:27 AM

Hi,

Apologies, I was away on vacation.

Difficult to see in the x86 test noise, but do we have any tests that check the original, unoptimized behaviour? (like using i7 or i24)?

Cheers,

James

RKSimon updated this revision to Diff 63302.Jul 8 2016, 1:12 PM
RKSimon edited edge metadata.

Updated with examples of non-legal types (i4 and i24)

RKSimon updated this revision to Diff 63402.Jul 9 2016, 3:13 PM

Readded the aarch64 bitreverse tests

jmolloy accepted this revision.Jul 22 2016, 1:17 AM
jmolloy edited edge metadata.

Sorry Simon,

LGTM.

This revision is now accepted and ready to land.Jul 22 2016, 1:17 AM
This revision was automatically updated to reflect the committed changes.