This is an archive of the discontinued LLVM Phabricator instance.

[X86] Recognize constant arrays with special values and replace loads from it with subtract and shift instructions, which then will be replaced by X86 BZHI machine instruction.
ClosedPublic

Authored by aymanmus on Jun 13 2017, 5:07 AM.

Details

Summary

Recognize constant arrays with the following values:

0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, .... , 2^(size - 1) -1

where size is the size of the array.

the result of a load with index idx from this array is equivalent to the result of the following:

(0xFFFFFFFF >> (sub 32, idx))             (assuming the array of type 32-bit integer).

And the result of an 'AND' operation on the returned value of such a load and another input, is exactly equivalent to the X86 BZHI instruction behavior.

See test cases in the LIT test for better understanding.

Diff Detail

Event Timeline

aymanmus created this revision.Jun 13 2017, 5:07 AM
aymanmus updated this revision to Diff 102318.Jun 13 2017, 5:11 AM

Apply clang-format on changes

aymanmus edited the summary of this revision. (Show Details)Jun 13 2017, 5:12 AM
spatel added a subscriber: spatel.Jun 13 2017, 7:41 AM
delena edited edge metadata.Jun 15 2017, 1:50 AM

I suggest to put this transformation in a separate pass. The pass should be added to X86 IR passes ,see X86PassConfig::addIRPasses().
Other targets, interested in this pass, may add it as well.

aymanmus updated this revision to Diff 103348.Jun 21 2017, 4:29 AM

Moving the transformation to an independent pass, which currently runs only on X86, as suggested by @delena.

RKSimon edited edge metadata.Jun 21 2017, 5:49 AM
RKSimon added subscribers: filcab, davide.

While I don't want to end up forcing you to over-engineer this, it seems this pass is currently very specific and should be generalised to make it straightforward to work for constants arrays of other patterns that are easily re-materializable (upper/lower masks, single bit masks, etc.) - "SubstituteLoadWithRematerializationPass", possibly driven by the TTI cost models?

lib/CodeGen/SubstituteLoadWithSubShr.cpp
70 ↗(On Diff #103348)

(style) Don't nest these - do an early-out if they are null

146 ↗(On Diff #103348)

(style) Remove all the extra braces.

While I don't want to end up forcing you to over-engineer this, it seems this pass is currently very specific and should be generalised to make it straightforward to work for constants arrays of other patterns that are easily re-materializable (upper/lower masks, single bit masks, etc.) - "SubstituteLoadWithRematerializationPass", possibly driven by the TTI cost models?

I agree that the name of the pass should be more generic.
Do you mean renaming by "generalization" ? Or you propose to add this transformation for all targets and ask TTI about profitability of this transformation?

While I don't want to end up forcing you to over-engineer this, it seems this pass is currently very specific and should be generalised to make it straightforward to work for constants arrays of other patterns that are easily re-materializable (upper/lower masks, single bit masks, etc.) - "SubstituteLoadWithRematerializationPass", possibly driven by the TTI cost models?

I agree that the name of the pass should be more generic.
Do you mean renaming by "generalization" ? Or you propose to add this transformation for all targets and ask TTI about profitability of this transformation?

By generalization I mean setup the pass so that other patterns can be easily added in the future (e.g. 0xFF, 0xFE, ..., 0x80, 0x00 high 'sign' masks are common too).

I think its OK to keep this as X86 only initially, but it shouldn't be BMI2 only - checking the cost of a sub + shift compared to a constant pool load should work alright (even better if PR31274 ever gets addressed).

include/llvm/CodeGen/Passes.h
423 ↗(On Diff #103348)

Comment describing the pass.

lib/CodeGen/SubstituteLoadWithSubShr.cpp
86 ↗(On Diff #103348)

dyn_cast<> can fail - add a check for null:

if (!Elem || Elem->getZExtValue() != (((uint64_t)1 << i) - 1))
102 ↗(On Diff #103348)

Duplicate code - pull out

unsigned EltSizeInBits = ElemTy->getScalarSizeInBits();
APInt ElemSize = APInt(EltSizeInBits, EltSizeInBits);
test/CodeGen/X86/replace-load-with-sub-shr.ll
2 ↗(On Diff #103348)

Test without bmi2 as well - the pass currently runs on that too.

aymanmus updated this revision to Diff 115575.Sep 17 2017, 8:17 AM
aymanmus retitled this revision from [X86] Recognize constant arrays with special values and replace loads from it with subtract and shift instructions, which then may be replaced by BZHI machine instruction. to [X86] Recognize constant arrays with special values and replace loads from it with subtract and shift instructions, which then will be replaced by X86 BZHI machine instruction..
aymanmus edited the summary of this revision. (Show Details)

Total redesign to the previous solution.
moved the transformation to a more "natural" area.

aymanmus updated this revision to Diff 115626.Sep 18 2017, 5:31 AM

Kind ping #2

Please submit the same test cases before optimization.

aymanmus updated this revision to Diff 119825.Oct 23 2017, 3:30 AM
m_zuckerman added inline comments.Oct 29 2017, 10:12 AM
lib/Target/X86/X86ISelLowering.cpp
32204

align the line

32227

Can you explain why the element must to be a global variable?

32232
  1. please clarify this if (add comments)
  2. Why you need also the isa<ConstantDataArray>(Init) ,is it not the same as GV->isConstant()?
32254

What about 64 bit?

32258

You can write
SDValue Inp = Node->getOperand(1-i);

aymanmus marked an inline comment as done.Nov 13 2017, 3:50 AM
aymanmus added inline comments.
lib/Target/X86/X86ISelLowering.cpp
32227

Clang will always hoist a local constant array to be global in LLVM IR.

32232

The second condition makes sure that it is a constant data *array*. GV->isConstant does not.

32254

It's only an example for the 32-bit case, the same happens for 64 as well.

LGTM with no one has nothing to add !!!

m_zuckerman accepted this revision.Dec 12 2017, 3:37 AM
This revision is now accepted and ready to land.Dec 12 2017, 3:37 AM
This revision was automatically updated to reflect the committed changes.