This is an archive of the discontinued LLVM Phabricator instance.

[X86][AVX512] Detect repeated constant patterns in BUILD_VECTOR suitable for broadcasting.
ClosedPublic

Authored by aymanmus on Nov 17 2016, 8:46 AM.

Details

Summary

Check if a build_vector node includes a repeated constant pattern and replace it with a broadcast of that pattern.
For example:
"build_vector <0, 1, 2, 3, 0, 1, 2, 3>" would be replaced by "broadcast <0, 1, 2, 3>"

This would decrease the code size and the size of chunks loaded from constant pool (less cache line splits and forwarding problems).

Diff Detail

Repository
rL LLVM

Event Timeline

aymanmus updated this revision to Diff 78369.Nov 17 2016, 8:46 AM
aymanmus retitled this revision from to [X86][AVX512] Detect repeated constant patterns in BUILD_VECTOR suitable for broadcasting..
aymanmus updated this object.
aymanmus added reviewers: igorb, delena, zvi.
aymanmus added a subscriber: llvm-commits.
RKSimon added a subscriber: RKSimon.

Thanks for looking at this - there is a lot of potential here for constant pool savings - sign/zero extension also come to mind.

I'm worried about always enabling this - your tests break a lot of memory folds which could cause register pressure issues, so we'd trade a small constant pool saving for increased instruction counts/complexity and stack usage.

I tried to detail this here https://llvm.org/bugs/show_bug.cgi?id=27141#c5

test/CodeGen/X86/avx-logic.ll
3 ↗(On Diff #78369)

If you add --check-prefix=CHECK as the first FileCheck then many of these test should collapse.

aymanmus updated this revision to Diff 78718.Nov 21 2016, 7:23 AM

Adding "hasOneUse()" as an additional condition to NOT applying the replacement, to avoid register pressure.

A follow up to this work may be considered:

  • Mark broadcast instruction as "isReMaterializable" to reduce spills and fills

I ran internal benchmarks and got a nice performance gain (geomean 0.2%-1.5% with tests improving up to 2.5%).

zvi edited edge metadata.Nov 27 2016, 7:11 AM

Ayman, the tests don't check the constant pool entries, due to how the update-llc-check tool works, but can you please verify that the pool entries for the broadcasted values have the required alignment (and not more than needed)?

spatel edited edge metadata.Nov 27 2016, 3:37 PM
In D26802#606196, @zvi wrote:

Ayman, the tests don't check the constant pool entries, due to how the update-llc-check tool works, but can you please verify that the pool entries for the broadcasted values have the required alignment (and not more than needed)?

There's an example of extra constant checking in test/CodeGen/X86/copysign-constant-magnitude.ll that might be useful.

Also, please clang-format the patch; there's a lot of non-standard spacing in the new code.

aymanmus updated this revision to Diff 79394.Nov 28 2016, 5:27 AM
aymanmus edited edge metadata.
  • Added checks also to constant pool entries.
  • 'Clang-format'ed the added code.
spatel added inline comments.Nov 28 2016, 6:44 AM
lib/Target/X86/X86ISelLowering.cpp
6107–6108 ↗(On Diff #79394)

If you cast the SplatValue to an APFloat, can you expand the target check to 'hasAVX()' and remove the is64Bit() restriction?

aymanmus added inline comments.Nov 28 2016, 7:28 AM
lib/Target/X86/X86ISelLowering.cpp
6107–6108 ↗(On Diff #79394)

AVX have no 16bit broadcast (float or int), so we will miss the cases where splatValue is of size 16.

spatel added inline comments.Nov 28 2016, 9:53 AM
lib/Target/X86/X86ISelLowering.cpp
6107–6108 ↗(On Diff #79394)

Ok, but this check is artificially limiting the 32-bit and 64-bit cases from firing on an AVX target and the 64-bit case from firing on a 32-bit target, right? Can you allow those cases or add a FIXME comment to allow it in a follow-up patch?

RKSimon added inline comments.Dec 1 2016, 5:31 AM
lib/Target/X86/X86ISelLowering.cpp
6037 ↗(On Diff #79394)

This can be reduced to:
APInt Val = SplatValue.lshr(ScalarSize * i).trunc(ScalarSize);

RKSimon edited edge metadata.Dec 2 2016, 6:14 AM

Not asking you to do this, but it seems like it'd be useful to add support for BROADCAST / SUBV_BROADCAST of constant pool data to X86AsmPrinter::EmitInstruction assembly comments.

aymanmus updated this revision to Diff 80202.Dec 4 2016, 2:07 AM
aymanmus edited edge metadata.
  • Broadcast 32/64/128 bits also when having only AVX feature.
  • Broadcast 64 bit also in 32-bit systems.
spatel accepted this revision.Dec 5 2016, 9:12 AM
spatel edited edge metadata.

LGTM - see inline comments for some nits.

lib/Target/X86/X86ISelLowering.cpp
6336 ↗(On Diff #80202)

The function name confused me because the bitcast is optional, right?
Would it make more sense to just call this 'isUseOfShuffle()'?

6337 ↗(On Diff #80202)

Use range-for-loop?

for (auto *U : N->uses())
6383 ↗(On Diff #80202)

its an -> it's a

6389 ↗(On Diff #80202)

'LLVMContext' is more commonly abbreviated as 'Ctx' to make it more distinguishable from abbreviations of 'Constant'.

6413–6414 ↗(On Diff #80202)

I'd hoist these last 2 lines of the comment above to where you check if Subtarget.hasAVX().

This revision is now accepted and ready to land.Dec 5 2016, 9:12 AM
aymanmus marked 4 inline comments as done.Dec 6 2016, 12:57 AM
aymanmus added inline comments.
lib/Target/X86/X86ISelLowering.cpp
6413–6414 ↗(On Diff #80202)

This comment explains the specific "else block" it is contained in (explains why we use floats instead of integers in these specific cases).

This revision was automatically updated to reflect the committed changes.