This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPULowerModuleLDSPass] Kernels do not always have entries in KernelToReplacement map
AbandonedPublic

Authored by jmmartinez on Jul 11 2023, 3:52 AM.

Details

Summary

The crash was a result of the accesses in lines 512 and 1176. If the kernel was not present in the map, a default initialized entry was created.

Fixes SWDEV-404491

Diff Detail

Event Timeline

jmmartinez created this revision.Jul 11 2023, 3:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2023, 3:52 AM
jmmartinez requested review of this revision.Jul 11 2023, 3:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2023, 3:52 AM
jmmartinez edited the summary of this revision. (Show Details)Jul 11 2023, 4:01 AM

Yes, seems like the same bug

Thanks for the patch! On reading through I think there's something more fundamentally broken here. Please could you hold fire on this for a moment while I dig through it?

There's some refactoring in line which seems worth having asap - the extra const, the spurious copy deleted - would you prefer I post a diff with those parts pulled out or you post one for me to accept?

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
486–487

I don't follow this chunk. Previously count != 0 followed by lookup. New find != end followed by using the previous value. Is there a functional change I'm missing here?

1178–1179

Nervous about this. There's a bunch of equality relationships between structures in this pass and the IR generated, probably not adequately documented.

Some number of kernels get identified as needing LDS related stuff and those get put in the OrderedKernels vector. There's an intrinsic which amounts to an index into that vector which persists beyond this pass. If KernelToReplacement is missing an entry for one of those kernels, I've botched the control flow elsewhere in this function.

Thanks for the patch! On reading through I think there's something more fundamentally broken here. Please could you hold fire on this for a moment while I dig through it?
There's some refactoring in line which seems worth having asap - the extra const, the spurious copy deleted - would you prefer I post a diff with those parts pulled out or you post one for me to accept?

No problem! I will move it into a separate patch.

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
486–487

I'll move it out in another patch. It is a consequence of adding the const.

With const, we cannot use operator[] since it inserts a default initialized element if none exist (the count was guarding against it).

1178–1179

If I'm not wrong, this happens since lowerKernelScopeStructVariables doesn't add replacements for Dynamic LDS variables.

Kernels accesing only Dynamic LDS still need an id for the lookup table., even if they don't have any static LDS kernel variable.

(I hope I'm not mixing concepts)

OK, I think I've got the root cause.

The slow path lowering for normal variables is a table lookup indexed by kernel id, which is an integer assigned somewhat arbitrarily and later made available through the lds intrinsic (involves dedicating a sgpr to the i32 value).

The lowering for dynamic variables is a table lookup indexed by the same kernel id because I didn't think it was worth burning a second sgpr on it.

The pass successfully assigns integers to kernels that use either of these variables. assignLDSKernelIDToEachKernel takes two sets, combines them, sorts the result.

Where it then goes wrong is the lookup table construction assumes every kernel with an index will have a corresponding element, which was true before dynamic variable lowering was introduced. Now however, if a kernel uses dynamic variables and no table lookup static ones, it'll have an assigned index but the corresponding table entry needs to be poison.

Lowerdynamicldsvariables handles this correctly because it was written at the same time as breaking that invariant.

Right fix is going to involve poison at some positions in the lookup table, probably similarly to buildLookupTable. Likely to look like a hybrid of this patch and D154972.

arsenm requested changes to this revision.Jul 11 2023, 5:56 PM

I assume this is redundant now?

This revision now requires changes to proceed.Jul 11 2023, 5:56 PM
jmmartinez abandoned this revision.Jul 12 2023, 12:47 AM
jmmartinez marked an inline comment as done.