This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds
ClosedPublic

Authored by RKSimon on May 14 2023, 8:20 AM.

Details

Summary

This patch analyzes AVX512 instructions for full vector width folded loads from the constant pool and attempts to determine if it can be replaced with a smaller broadcast folded variant. Typically the broadcast opportunities were missed by type-width mismatches or mulituse limitations which have been removed in later passes.

As well as introducing broadcast fold tables (which can hopefully be extended/automated in the future), this also handles mismatches in the AND/ANDN/OR/XOR/TERNLOG type-widths, catching additional missed opportunities.

This is patch is pulled from the ongoing work based on D150143, but without removing the existing DAG constant broadcast lowering code - this patch is currently a late stage cleanup only.

The intention is to add additional broadcast/extension handling of constants in future patches, but it turned out that AVX512 broadcast handling was the easiest to start with.

Diff Detail

Event Timeline

RKSimon created this revision.May 14 2023, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2023, 8:20 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon requested review of this revision.May 14 2023, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2023, 8:20 AM
goldstein.w.n added inline comments.May 14 2023, 8:55 AM
llvm/lib/Target/X86/X86FixupVectorConstants.cpp
39

Since this is AVX512 only, maybe it should be "X86 Fixup Vector AVX512 Constants"? Likewise for the filename.

266

Is it only for logic ops? Why can't we do something like add, sub, etc... iff we can rebuildSplatableConstant with the correct BitWidth?

268

Can we convert rm ops that aren't in evex encoding? Or since is that dangerous in some way?

371

Why no attempt for OpBcst256/128/16/8 on any of these? If they are truly unused can we remove this to kill the deadcode?

373

do all the OpBcst\d+ opcodes not always match? Seems to be the same for all of them. If so I think it would be alot simpler to try all OpBcst sizes < Native_BitWidth. You could have an overload if you have want to support only checking a specific width later.

RKSimon added inline comments.May 14 2023, 9:19 AM
llvm/lib/Target/X86/X86FixupVectorConstants.cpp
39

No, as I said in the summary, the intention is to expand this to handle broadcast/extensions in future patches.

266

The broadcast width matches the element width, we can't replace a VPADDD with a VPADDQ - but bitwise ops don't care (as long as we're not using the predicate mask which we aren't for these particular cases).

268

This pass is before the X86EvexToVex pass so we shouldn't encounter such cases.

371

There are no OpBcst256/128/16/8 equivalent VPTERNLOG ops - we'd have to pull out the broadcast into a separate op, which isn't what we're after.

I can remove the deadcode if you think it important, but it will be readded in a follow up patch very soon.

373

I had wondered about trying to handle this inside lookupBroadcastFoldTable but I wasn't sure if it'd get confusing to have a memop for one element width returning a bcstop for another element width.

goldstein.w.n added inline comments.May 14 2023, 10:48 AM
llvm/lib/Target/X86/X86FixupVectorConstants.cpp
373

Think it would be clearer. Having 7 arguments to a function, where most of them are magic numbers to a degree isn't particularly clear either. Also seems less bug prone (could imagine easily accidentally do ConvertToBroadcast(0, X86::Opcode, ...) instead of ConvertToBroadcast(0, 0, X86::Opcode, ...).

RKSimon updated this revision to Diff 522021.May 14 2023, 1:39 PM

Move bit logic broadcasts into broadcast fold tables

Add additional custom broadcast fold tables with element size mismatch

Will it increase compile time much?

llvm/lib/Target/X86/X86.h
67

a pass

llvm/lib/Target/X86/X86FixupVectorConstants.cpp
105

Should check isBFloatTy too?

201

SclTy->is16bitFPTy()

219

Can we use if (!SclTy->isIntegerTy()) so that we can unify and put them in one loop?

274

Do we have chance to pass one more OpBcstN at the same time? Otherwise, we can simply pass BitWidth and a single OpBcst.

296

Why not MachineInstr &MI : MBB?

llvm/lib/Target/X86/X86InstrFoldTables.cpp
37

Missing X86::VANDNPSZrr

RKSimon added inline comments.May 18 2023, 7:47 AM
llvm/lib/Target/X86/X86FixupVectorConstants.cpp
219

ConstantDataVector get/getFP helpers use the width of the raw bits to help determine the type so we wouldn't be able to merge the loops.

274

I'll see if I can refactor the loop so that we only have a single call to ConvertToBroadcast

We're going to need this layout of ConvertToBroadcast for followup patches which adds some of the basic load -> broadcast support from D150143, so I'd prefer not to change it unless you think it absolutely necessary?

llvm/lib/Target/X86/X86InstrFoldTables.cpp
37

Good catch!

RKSimon updated this revision to Diff 523398.May 18 2023, 8:59 AM

Address comments from @pengfei

Will it increase compile time much?

Iterating the the basic blocks is very cheap, the key is to avoid extracting the constant raw bits data too frequently - I think I've addressed this by pulling out the getConstantFromPool call, but only trying to find a splat (and generate the new constant) if we have a suitable opcode.

The cost of the lookupBroadcastFoldTable calls are reduced by sorting the tables on first use (as we already do for the other folding tables) and using a lower_bound search.

goldstein.w.n added inline comments.May 18 2023, 10:36 AM
llvm/lib/Target/X86/X86FixupVectorConstants.cpp
130

Do we need a check that Bits->getBitWidth() >= SplatBitWidth?

232

Is there a way to check that MI is VEC type? If so maybe check that too before going to the foldingtable lookup?

RKSimon added inline comments.May 18 2023, 10:45 AM
llvm/lib/Target/X86/X86FixupVectorConstants.cpp
130

Adding an assert at the top would probably be enough, checking C->getPrimitiveSizeInBits % SplatBitWidth == 0.

232

Yes, we can probably do a check using the MCInstrDesc flags

RKSimon updated this revision to Diff 523529.May 18 2023, 1:08 PM

Address @goldstein.w.n suggestsions

LuoYuanke added inline comments.May 18 2023, 7:17 PM
llvm/lib/Target/X86/X86FixupVectorConstants.cpp
78

I'm not quite familar with MachineConstantPoolValue, so I don't understand why we should bail for MachineConstantPoolEntry. What's the difference between MachineConstantPoolValue and Constant?

128

It looks to me isSplatableConstant should return a bool value, but it return APInt value. Does it make sense to rename it to getSplatableConstant?

RKSimon added inline comments.May 19 2023, 2:04 AM
llvm/lib/Target/X86/X86FixupVectorConstants.cpp
78

MachineConstantPoolValue are an abstract class that are difficult to extract any data from as its not supposed to be manipulated. Constant entries however, which X86 nearly always uses, are trivial to work with.

class MachineConstantPoolEntry {
public:
  union {
    const Constant *ConstVal;
    MachineConstantPoolValue *MachineCPVal;
  } Val;
RKSimon updated this revision to Diff 523699.May 19 2023, 2:19 AM

Replace isSplatableConstant with getSplatableConstant

Thanks @LuoYuanke - any more comments?

Thanks @LuoYuanke - any more comments?

No more comments from me.

pengfei accepted this revision.May 19 2023, 8:33 AM

LGTM.

This revision is now accepted and ready to land.May 19 2023, 8:33 AM
Matt added a subscriber: Matt.May 22 2023, 2:31 PM
This revision was landed with ongoing or failed builds.May 23 2023, 3:01 AM
This revision was automatically updated to reflect the committed changes.