This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu] Reimplement LDS lowering
AbandonedPublic

Authored by JonChesterfield on Nov 16 2022, 9:26 AM.

Details

Summary

Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.

Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
JonChesterfield requested review of this revision.Nov 16 2022, 9:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2022, 9:26 AM
  • remove if (verbose) debugging hook

This could (and should) be prettier, and there should be lit tests for the (gated behind the default =module, so inactive here) table lowering pass. This has been through epsdb in the hybrid mode and set up to force all lowerings through the table lookup so there's a credible chance it works correctly.

One path forward is:

  • land this in the current somewhat unpolished state and possibly toggle the default in rocm to hybrid immediately to unblock the inlining experimentation
  • a follow up patch changes the default to hybrid along with a bunch of lit tests showing how the table lookup behaves
  • I refactor this implementation from behind more paranoid test coverage, in particular to walk the functions list less often

Alternatively I'll iterate on it in phab until the inessential complexity is reduced enough. It's a bigger patch than I'd have liked, but at least the complicated backend lowering landed separately in D125060. This is mostly an IR transform.

scchan added a subscriber: scchan.Nov 16 2022, 9:41 AM
scchan added a subscriber: piotr.Nov 16 2022, 9:46 AM

Structure lends itself relatively well to refactoring the current lowering as "module", then adding kernel and table as separate patches, then hybrid as a final one. That's significant work for the author and does little to de-risk this, but would make each review easier to do.

I think we need to get some documentation explaining the LDS lowering flow and the options. Not sure where the best place for it; I guess AMDGPUUsage?

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
87

I think you're supposed to use StringLiteral these days for some reason

98

Don't see the point of this string copy, GV.getName() gives you a StringRef to begin with

126

Capitalize

128

Why not just ==?

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
111–112

These aren't really gaining anything from being part of AMDGPUMachineFunction, but I guess it's where the other stuff already lives

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
6015

Could just early return on each of these instead of or'ing

arsenm added inline comments.Nov 16 2022, 10:48 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
14–49

I somehow missed this giant comment in the first pass. Perhaps this should move to AMDGPUUsage

161–190

You're re-inventing cl::opt with clEnumVal

419–420

Comment the bool operand meanings

612

Don't need llvm:: (not sure why this seems to be the most popular function for people to put it on)

642

Can you add an assert somewhere that the maximum LDS size fits in 16-bits? I'm also wondering if we should just make LDS pointers 16-bit with 4 byte alignment, or add a 16-bit address space for this

654

Early return and reduce indentation

680

Use new style member initializers

987–993

This whole thing is just return isa<Instruction>(U.getUser())

1037

Don't need llvm::

llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
171

Is this actually reachable?

nlopes added inline comments.Nov 16 2022, 1:33 PM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
534

Please use PoisonValue here if possible. We are trying to remove undef values, and thus we want to avoid adding new uses that can be easily replaced with poison.
Thank you!

arsenm requested changes to this revision.Nov 16 2022, 3:55 PM
This revision now requires changes to proceed.Nov 16 2022, 3:55 PM
  • Initial tests
  • Add test for lowering via table
JonChesterfield marked 12 inline comments as done.
  • addressing easier comments from Matt
JonChesterfield marked an inline comment as done.Nov 19 2022, 10:15 AM

Was sidetracked by trying to work out why global-variable-relocs.ll and waitcnt-looptest.ll have started failing, but neither do anything with LDS (no addrspace(3) vars plus lowering pass makes no change to the IR), and rolling back to main doesn't fix them. Buildbot is green so that's weird, attributing to local quirk / non-determinism.

Comments responded to above, lit test coverage could still be better.

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
161–190

So I was. Interesting. Changed to use clEnumValN (to preserve the namespacing at the use sites).

Sadly the test case lds-reject-unknown-lowering-kind.ll - updated with the new error string - is now refusing to run under check-llvm despite running fine in a shell. That coverage seems less important when using a ready-made parser so I've deleted the test.

534

Whether this can be easily replaced with poison depends on how the rest of the toolchain deals with poison. The string replacement is fine, it's whether it then blows up under testing due to interaction with the rest of the compiler that encourages caution.

The semantics of poison fit this behaviour - if this value is read, the compiler has a bug or runtime behaviour has diverged from that compiler's expectation. I would therefore like to change from undef to poison after this lands, so that if it does fail testing we have a much easier time triaging.

612

Had it in a few other places too, on various types and on sort. Dropped all of them, then put it back on sort to resolve ambiguity with std::sort (which doesn't make masses of sense to me but will stay on task)

642

The backend currently errors on that in user-facing fashion. A check here will be slightly less accurate than in the back end because this runs before promote alloca puts more things into LDS and I'm not sure we have more information to provide a better report either - what sort of error message do you have in mind?

I suppose the current code path would truncate the offsets to i16 before the back end crashes, but that seems harmless. We'd have slightly cleaner tests with a native sixteen bit type but there's not much in it.

llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
171

Definitely yes. It's hit when lowering the lookup table for the table fallback.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
128

iirc that did the wrong thing with a string literal, as opposed to a StringLiteral

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
111–112

I'd guess they're here because the dynamic lds alignment is tracked through the instance, but with that now mostly factored out they could be moved under some utility header.

arsenm requested changes to this revision.Nov 22 2022, 7:22 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
161–190

I'm not seeing this change here?

387

Don't need std::string, but this is also unused?

393

Don't need std::string, but this is also unused?

520

Type *I16

534

Just change to poison. Nothing we're doing in the backend is aware enough of the difference (currently poison just gets selected to the same implicit_def anyway, although that's kind of broken)

575

Not sure if this is just copying something existing, but I think it would be preferably to consistently use . as the word separator instead of mixing in _

642

I don't mean here, I mean somewhere. Promote alloca also shouldn't be pushing the usage up into something unsupportable

657

New line

791

llvm_unreachable

819–822

Probably should promote this to a DiagnosticInfoUnsupported

This revision now requires changes to proceed.Nov 22 2022, 7:22 AM
JonChesterfield marked an inline comment as done.
  • Change lowering kind flag to use clEnumValN
JonChesterfield marked 8 inline comments as done.
  • dead variables removed, undef to poison, rename offset_table to offset.table
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
575

OK, so the rest of the similar things are dot delimited, but I_write_in_snake_case by default so that's how this happens. Will rename it to llvm.amdgcn.lds.offset.table

657

Ambiguous comment, let me know if you meant something other than a newline between the declaration and the early return

  • Use i32 for table instead of i16, still need to drop the inttoptr
  • minor cleanup
JonChesterfield added a comment.EditedNov 23 2022, 3:53 PM

Something of a speed bump. https://reviews.llvm.org/D131246 breaks the table and hybrid implementation here. Fails check-openmp reliably. "illegal SGPR to VGPR copy".

Stack trace from a debug build, which oddly printed it twice. This is from the check-openmp test offloading_success.c which is the simplest one. save-temps unfortunately generates IR which doesn't trigger the same failure.

 #0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) $HOME/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:11
 #1 PrintStackTraceSignalHandler(void*) $HOME/llvm-project/llvm/lib/Support/Unix/Signals.inc:636:1
 #2 llvm::sys::RunSignalHandlers() $HOME/llvm-project/llvm/lib/Support/Signals.cpp:102:5
 #3 SignalHandler(int) $HOME/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x13140)
 #5 raise ./signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #6 abort ./stdlib/abort.c:81:7
 #7 llvm::report_fatal_error(llvm::Twine const&, bool) $HOME/llvm-project/llvm/lib/Support/ErrorHandling.cpp:125:5
 #8 $HOME/llvm-project/llvm/lib/Support/ErrorHandling.cpp:83:3
 #9 llvm::AMDGPUMCCodeEmitter::getBinaryCodeForInstr(llvm::MCInst const&, llvm::SmallVectorImpl<llvm::MCFixup>&, llvm::APInt&, llvm::APInt&, llvm::MCSubtargetInfo const&) const $HOME/llvm-build-debug/llvm/lib/Target/AMDGPU/AMDGPUGenMCCodeEmitter.inc:63586:10
#10 (anonymous namespace)::SIMCCodeEmitter::encodeInstruction(llvm::MCInst const&, llvm::raw_ostream&, llvm::SmallVectorImpl<llvm::MCFixup>&, llvm::MCSubtargetInfo const&) const $HOME/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp:321:3
#11 llvm::MCELFStreamer::emitInstToData(llvm::MCInst const&, llvm::MCSubtargetInfo const&) $HOME/llvm-project/llvm/lib/MC/MCELFStreamer.cpp:554:22
#12 llvm::MCObjectStreamer::emitInstructionImpl(llvm::MCInst const&, llvm::MCSubtargetInfo const&) $HOME/llvm-project/llvm/lib/MC/MCObjectStreamer.cpp:440:5
#13 llvm::MCObjectStreamer::emitInstruction(llvm::MCInst const&, llvm::MCSubtargetInfo const&) $HOME/llvm-project/llvm/lib/MC/MCObjectStreamer.cpp:419:3
#14 llvm::AsmPrinter::EmitToStreamer(llvm::MCStreamer&, llvm::MCInst const&) $HOME/llvm-project/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:404:1
#15 llvm::AMDGPUAsmPrinter::emitInstruction(llvm::MachineInstr const*) $HOME/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:275:5
#16 llvm::AsmPrinter::emitFunctionBody() $HOME/llvm-project/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1660:13
#17 llvm::AMDGPUAsmPrinter::runOnMachineFunction(llvm::MachineFunction&) $HOME/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:515:28
#18 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) $HOME/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:91:8
#19 llvm::FPPassManager::runOnFunction(llvm::Function&) $HOME/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1430:23
#20 llvm::FPPassManager::runOnModule(llvm::Module&) $HOME/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1476:16
#21 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) $HOME/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1545:23
#22 llvm::legacy::PassManagerImpl::run(llvm::Module&) $HOME/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:535:16
#23 llvm::legacy::PassManager::run(llvm::Module&) $HOME/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1672:3
#24 codegen(llvm::lto::Config const&, llvm::TargetMachine*, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex const&) $HOME/llvm-project/llvm/lib/LTO/LTOBackend.cpp:413:7
#25 llvm::lto::backend(llvm::lto::Config const&, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) $HOME/llvm-project/llvm/lib/LTO/LTOBackend.cpp:510:5
#26 llvm::lto::LTO::runRegularLTO(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int)>) $HOME/llvm-project/llvm/lib/LTO/LTO.cpp:1144:13
#27 llvm::lto::LTO::run(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int)>, std::function<llvm::Expected<std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int)>> (unsigned int, llvm::StringRef)>) $HOME/llvm-project/llvm/lib/LTO/LTO.cpp:1040:18
#28 (anonymous namespace)::linkBitcodeFiles(llvm::SmallVectorImpl<llvm::object::OffloadFile>&, llvm::SmallVectorImpl<llvm::StringRef>&, llvm::opt::ArgList const&) $HOME/llvm-project/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:861:19
#29 (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl<llvm::object::OffloadFile>&, llvm::opt::InputArgList const&) $HOME/llvm-project/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1140:15
#30 main $HOME/llvm-project/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1369:10
#31 __libc_start_main ./csu/../csu/libc-start.c:308:16
#32 _start ($HOME/llvm-build-debug/llvm/./bin/clang-linker-wrapper+0x457d3a)

I'm unclear whether this is a latent problem exposed by D131246 or a novel failure mode. Since this patch works as written - a benefit from defaulting to module - I'm hoping we can land this, then fix that lowering bug, then change the default to hybrid. Sound reasonable?

jmmartinez added inline comments.Nov 28 2022, 2:01 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
497

I might be wrong, but I think the copy-elision is done for the newly created pair, but not for the arguments used for create it.

I'd rather do:

return {std::move(direct_map_kernel), std::move(indirect_map_kernel)};

Without move: https://godbolt.org/z/f5Ghr8M1o
With move: https://godbolt.org/z/WeG7Y8nY8

511

Is there a missing const& for the Variables argument ?

537

Could you use a const& for kernels ?

775–778

In the case HybridModuleRoot is null, the reference HybridModuleRootKernels will point to a temporary that has already been destroyed.

1233

What do you think about taking an r-value reference for LDSVarsToTransformArg. That would avoid the copy into the local variable LDSVarsToTransform.

llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
176–177

Cosmetic: There is a shorter ConstantInt::get that takes directly an APInt and an LLVMContext.

foad added inline comments.Nov 28 2022, 2:33 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
511

Isn't std::vector<GlobalVariable *> const & is just a long way of saying ArrayRef<GlobalVariable *>?

jmmartinez added inline comments.Nov 28 2022, 4:31 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
511

+1

JonChesterfield marked 14 inline comments as done.
  • addressing comments
arsenm added inline comments.Nov 29 2022, 4:11 PM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
522

SmallVector

554

Use NumberKernels from above?

555

Extra line

592

getFirstNonPHIOrDbgOrAlloca

606

Extra line

609

The constant expression case matters?

In particular instruction uses of addrspacecasted globals come up a lot. The constant initializer case is harder

JonChesterfield marked 6 inline comments as done.Dec 1 2022, 6:01 AM
  • address Matt's comments

@arsenm thanks, all addressed. Constantexpr were a recurring pain for this pass because their uniqueness interferes with the callgraph walking so are unconditionally rewritten into instructions at the top of run().

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
14–49

reluctant to document this in user facing fashion - there's a credible risk of it being improved in the future, and none of the flags here are intended to be used on a per application basis

497

sure

511

arrayref is good, thanks

537

got some mileage out of ArrayRef for these arguments, thanks

609

Constant expressions involving LDS variables are unconditionally expanded before this point by eliminateConstantExprUsesOfLDSFromAllInstructions.

Primarily because one constantexpr can be reachable from multiple kernels through instructions which are only reachable from distinct kernels, so eliminating them up front makes the analysis more precise as well as the rest of this pass much simpler.

Constant initializer is indeed harder and is presently out of scope - LDS variables have undef as their initialiser, and non-LDS variables have program scope which is broadly resistant to per-kernel-execution initialisation with the address of LDS variables.

An exception for that can be carved out for invariant LDS addresses (e.g. if we want an attribute to say a given global is always at offset 42, whatever kernel is involved), but we don't have a concept for that implemented yet beyond the emergent feature of the back end calculating offsets into a per-kernel struct.

642

changed it to i32 for now, can always back it back down to i16 with more error reporting later if we want to. Stopped short of dropping inttoptr as it's difficult to test with D131246 applied

775–778

Today I learned temporary lifetime extension doesn't work through the ternary operator. Sad times. Also using nullptr as the key into the map wins me a segfault, so gone with ugly workaround pending a better idea.

819–822

i think this is OK as is - it's an undocumented command line flag, where "kernel" is commented as only applicable in narrow cases - and the report_fatal_error path is otherwise only reachable on a bug in this pass

1233

this was refactoring noise, inlined replaceLDSVariablesWithStructVec into replaceLDSVariablesWithStruct to remove

arsenm accepted this revision.Dec 6 2022, 7:17 AM

LGTM with some nits

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
979

Capitalize

1271

Junk comment?

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
140

Capitalize

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
6007

Newlines

6013

More early return?

This revision is now accepted and ready to land.Dec 6 2022, 7:17 AM
  • remove obsolete comment
Herald added a reviewer: andreadb. · View Herald Transcript
Herald added a reviewer: shafik. · View Herald Transcript
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: herhut. · View Herald Transcript
Herald added a reviewer: rriddle. · View Herald Transcript
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a reviewer: MaskRay. · View Herald Transcript
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a reviewer: rafauler. · View Herald Transcript
Herald added a reviewer: Amir. · View Herald Transcript
Herald added a reviewer: maksfb. · View Herald Transcript
Herald added a reviewer: NoQ. · View Herald Transcript
Herald added a reviewer: ributzka. · View Herald Transcript
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a reviewer: njames93. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
This revision now requires review to proceed.Dec 6 2022, 7:25 AM
JonChesterfield removed projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project.
This revision is now accepted and ready to land.Dec 6 2022, 7:31 AM
Herald added a reviewer: andreadb. · View Herald Transcript
Herald added a reviewer: shafik. · View Herald Transcript
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: herhut. · View Herald Transcript
Herald added a reviewer: rriddle. · View Herald Transcript
Herald added a reviewer: MaskRay. · View Herald Transcript
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a reviewer: rafauler. · View Herald Transcript
Herald added a reviewer: Amir. · View Herald Transcript
Herald added a reviewer: maksfb. · View Herald Transcript
Herald added a reviewer: NoQ. · View Herald Transcript
Herald added a reviewer: ributzka. · View Herald Transcript
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a reviewer: njames93. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
This revision now requires review to proceed.Dec 6 2022, 7:31 AM
JonChesterfield abandoned this revision.Dec 6 2022, 7:31 AM

I can't undo what arc has done here. Apologies for the mass spam.

philnik added a subscriber: philnik.Dec 6 2022, 7:40 AM

I can't undo what arc has done here. Apologies for the mass spam.

You have to update the diff first and then remove the people. Otherwise Herald will just re-add them.