Page MenuHomePhabricator

[AMDGPU][WIP] Lower LDS Global Variables.
Needs RevisionPublic

Authored by hsmhsm on Nov 15 2020, 11:09 PM.

Diff Detail

Unit TestsFailed

TimeTest
550 msx64 debian > Clang.Misc::backend-resource-limit-diagnostics.cl
Script: -- : 'RUN: at line 2'; not /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/13.0.0/include -nostdsysteminc -emit-codegen-only -triple=amdgcn-- /mnt/disks/ssd0/agent/llvm-project/clang/test/Misc/backend-resource-limit-diagnostics.cl 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/clang/test/Misc/backend-resource-limit-diagnostics.cl
300 msx64 debian > LLVM.CodeGen/AMDGPU::32-bit-local-address-space.ll
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/32-bit-local-address-space.ll | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefixes=SI,FUNC,GFX7 /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/32-bit-local-address-space.ll
290 msx64 debian > LLVM.CodeGen/AMDGPU::amdpal-callable.ll
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc -mtriple=amdgcn--amdpal -verify-machineinstrs < /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/amdpal-callable.ll | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefixes=GCN,SDAG -enable-var-scope /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/amdpal-callable.ll
290 msx64 debian > LLVM.CodeGen/AMDGPU::atomic_optimizations_local_pointer.ll
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -enable-var-scope -check-prefixes=GFX7LESS /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
300 msx64 debian > LLVM.CodeGen/AMDGPU::divergence-at-use.ll
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc -march=amdgcn -mcpu=gfx900 -amdgpu-atomic-optimizations=true < /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/divergence-at-use.ll | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/divergence-at-use.ll
View Full Test Results (120 Failed)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
hsmhsm updated this revision to Diff 318985.Jan 25 2021, 6:19 AM

Make use of llvm append_range() api.

hsmhsm updated this revision to Diff 319026.Jan 25 2021, 8:39 AM

Improvements to code at few places.

hsmhsm edited the summary of this revision. (Show Details)Jan 25 2021, 9:01 AM
hsmhsm edited the summary of this revision. (Show Details)

All tests are now missing

llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLocalLDS.cpp
32 ↗(On Diff #319026)

This function is pointless, just directly use isModuleEntryFunctionCC

41–44 ↗(On Diff #319026)

This function is useless. Assert strings also don't need to end in \n

56 ↗(On Diff #319026)

cast<>, don't dyn_cast and assert

66 ↗(On Diff #319026)

Pointless comment

68 ↗(On Diff #319026)

Pointless comment

74–75 ↗(On Diff #319026)

Pointless comment

82 ↗(On Diff #319026)

Extra private

92 ↗(On Diff #319026)

Extra private

93 ↗(On Diff #319026)

Typo unhanlded

111 ↗(On Diff #319026)

.contains

142 ↗(On Diff #319026)

Extra private

233 ↗(On Diff #319026)

Copy of set unnecessary

320 ↗(On Diff #319026)

Don't need all these newlines in assert strings

376–379 ↗(On Diff #319026)

I think you're overcomplicating the CallGraph usage by ignoring most of what it gives you. You should be able to just iterate directly through the CallGraph to get functions reachable from the parent

410–411 ↗(On Diff #319026)

This concept doesn't quite work for the IR. The same global can appear in multiple functions

449 ↗(On Diff #319026)

isa<>, no \n

479 ↗(On Diff #319026)

Should not be checking the function name. Should just skip all declarations

492–495 ↗(On Diff #319026)

Return !Kernels.empty()

502 ↗(On Diff #319026)

Probably should skip declarations. Also not sure about the linkage check

506–509 ↗(On Diff #319026)

Return !empty()

555–558 ↗(On Diff #319026)

Return !empty()

561–571 ↗(On Diff #319026)

Don't understand the point of this stub function

hsmhsm updated this revision to Diff 319316.Jan 26 2021, 8:55 AM

Fixed review comments (by Matt).

hsmhsm marked 22 inline comments as done.Jan 26 2021, 9:12 AM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLocalLDS.cpp
111 ↗(On Diff #319026)

The data structures, ValueMap<>, SmallPtrSet<> do not have member function - .contains(). W.r.t std::set<>, this member function is supported in C++20.

376–379 ↗(On Diff #319026)

As far as I understand it, llvm CallGraph infrastructure does not provide any facility as such. Implementer needs to explicitly iterate the callees of the caller.

410–411 ↗(On Diff #319026)

My understanding is that - scope of the shared variable is function/statement block scope. It is not available to access outside this scope. It is just that we implement it as global, just like how the local static variables are implemented in C/C++?

Can you give an example of the use-case that you are claiming?

502 ↗(On Diff #319026)

The linkage test is required to ignore the dynamic shared variables like the one defined as extern __shared__ int dy_sm[]; where size of dy_sm is not known at compile time, but is passed as one of the kernel execution configuration parameters at run time.

561–571 ↗(On Diff #319026)

This is a driver function, it looks like a stub now, since implementation is not complete yet. Once this patch is accepted, next step is to (1) define kernel specific LDS layouts (2) create 2D offset table and (3) add new implicit argument.

arsenm added inline comments.Jan 26 2021, 7:34 PM
llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLocalLDS.cpp
125–128 ↗(On Diff #319316)

Should just inline this function

168 ↗(On Diff #319316)

Should just inline this function

182 ↗(On Diff #319316)

Should just inline this function

205 ↗(On Diff #319316)

Copy here, just directly use this in the for loop

111 ↗(On Diff #319026)

There is an llvm::is_contained. Also why use std::set? You randomly switch set types around here

410–411 ↗(On Diff #319026)

The IR has absolutely no concept of these scopes. The global variables have global scope and no restriction on where their uses can appear. Whether or not this directly corresponds to a direct language feature is unimportant. Some IPO transforms can push global variable references into other functions.

The example is just two functions that refer to the same variable:

@lds = ...

define void @func0() {
  store i32 0, i32* @lds
  ret void
}

define void @func1() {
  store i32 0, i32* @lds
  ret void
}
502 ↗(On Diff #319026)

That's a function of it having 0 size, not the linkage

561–571 ↗(On Diff #319026)

If it's going to be split, I'd rather see the full stack for the review

arsenm added inline comments.Jan 26 2021, 7:45 PM
llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLocalLDS.cpp
376–379 ↗(On Diff #319026)

The CallGraph as a whole gives you the functions reachable from each other. I don't think you need to do a stack walk to find the callees. You don't need to care about which functions specifically call which, just that they are all connected

hsmhsm updated this revision to Diff 320037.Jan 28 2021, 8:36 PM
hsmhsm marked 5 inline comments as done.

Pushing renamed code (not for review).

hsmhsm updated this revision to Diff 320090.Jan 29 2021, 3:10 AM

Fixed Matt's comments.

hsmhsm retitled this revision from [AMDGPU][WIP] Lower Function Local LDS Variables. to [AMDGPU][WIP] Lower LDS Global Variables..Jan 29 2021, 3:11 AM
hsmhsm edited the summary of this revision. (Show Details)Tue, Feb 2, 7:59 PM
hsmhsm updated this revision to Diff 322270.Mon, Feb 8, 7:32 PM

Save current work.

hsmhsm updated this revision to Diff 323247.Fri, Feb 12, 1:31 AM

Save current work.

Could we get some tests *and* a commit message that explains what this is supposed to do.

arsenm added inline comments.Fri, Feb 12, 8:36 AM
llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp
312

You should only need to do the use replacement, you aren't changing the types of the instructions so cloning/hacking on them shouldn't be needed

447–449

Don't need this, the IR would have failed the verifier to get here

468–469

Should not const_cast

518–519

Too much auto for me

554

Should use lowercase, period separator naming convention with an llvm.amdgcn prefix

566

Should use lowercase, period separator naming convention with an llvm.amdgcn prefix

609

This still needs to add in alignment padding

669

Braces, Also can use range loop

743

I don't see why you need to build your own stack. The call graph already found the reachable functions for you

777

I think trying to handle callees is left for a later patch. Additionally, I think this should be the CallGraph analysis's responsibility to deal with

855–858

The callgraph should already give this to you. Iterating the call graph should give you all of the functions you care about. You don't actually need to worry about which functions call which, since you need to touch every function in the SCC

987–988

I would swap the order of these checks

Could we get some tests *and* a commit message that explains what this is supposed to do.

This is WIP, will add test and commit messages at the end before final review.

hsmhsm updated this revision to Diff 323514.Fri, Feb 12, 8:34 PM

Address Matt's comments.

hsmhsm marked 7 inline comments as done.Fri, Feb 12, 8:38 PM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp
312

Added FIXME comment, will see how to fix it.

609

Added FIXME comment, will see how to fix it.

743

Added FIXME comment, will see how to fix it.

777

Added FIXME comment, will see how to fix it.

855–858

Added FIXME comment, will see how to fix it.

hsmhsm updated this revision to Diff 323518.Fri, Feb 12, 9:30 PM

Assert that both caller and callee appear in same translation unit.

hsmhsm updated this revision to Diff 323530.Sat, Feb 13, 12:17 AM

Address clang-tidy warnings.

hsmhsm updated this revision to Diff 323532.Sat, Feb 13, 12:31 AM

Remove over created auto variables.

jdoerfert requested changes to this revision.Sat, Feb 13, 5:10 PM

Could we get some tests *and* a commit message that explains what this is supposed to do.

This is WIP, will add test and commit messages at the end before final review.

I generally would recommend against that but I guess you can use phabricator this way.
However, once people can figure out what this is actually supposed to do, they might effectively restart the entire review process if the design is questioned.
I say this because I have a hunch, or maybe a hope, about the intent of this patch. If it would be that, I'd very much like this to be a generic, non-AMDGPU pass. I might be wrong about what this does, and that is what I'd like to figure out rather sooner than later.

This revision now requires changes to proceed.Sat, Feb 13, 5:10 PM

Could we get some tests *and* a commit message that explains what this is supposed to do.

This is WIP, will add test and commit messages at the end before final review.

I generally would recommend against that but I guess you can use phabricator this way.
However, once people can figure out what this is actually supposed to do, they might effectively restart the entire review process if the design is questioned.
I say this because I have a hunch, or maybe a hope, about the intent of this patch. If it would be that, I'd very much like this to be a generic, non-AMDGPU pass. I might be wrong about what this does, and that is what I'd like to figure out rather sooner than later.

OK let's wait for the complete patch then. I will only going to push complete patch next time. But, it may take some time since there are some major hurdles to overcome.