Page MenuHomePhabricator
Feed Advanced Search

Today

rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.

Removed unrelated subtarget change.

Fri, Oct 23, 12:04 PM · Restricted Project
rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.

Rebased.

Fri, Oct 23, 11:36 AM · Restricted Project
rampitec committed rG2e64ad949487: [AMDGPU] Fixed isLegalRegOperand() with physregs (authored by rampitec).
[AMDGPU] Fixed isLegalRegOperand() with physregs
Fri, Oct 23, 11:34 AM
rampitec closed D90064: [AMDGPU] Fixed isLegalRegOperand() with physregs.
Fri, Oct 23, 11:33 AM · Restricted Project
rampitec requested review of D90064: [AMDGPU] Fixed isLegalRegOperand() with physregs.
Fri, Oct 23, 11:13 AM · Restricted Project
rampitec added inline comments to D89170: [AMDGPU] Use flat scratch instructions where available.
Fri, Oct 23, 10:56 AM · Restricted Project
rampitec added a comment to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

LGTM in principle. We wanted to split CodeGenPrepare for a long time already. We also should drop widening from an early pass then.

Fri, Oct 23, 10:42 AM · Restricted Project
rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.

Moved predicates from complex patterns into td files.

Fri, Oct 23, 9:17 AM · Restricted Project
rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.

Corrected IsOffsetLegal to remove negation.

Fri, Oct 23, 8:51 AM · Restricted Project
rampitec added a reviewer for D89978: Fix SROA with a PHI mergig values from a same block: efriedma.
Fri, Oct 23, 1:59 AM · Restricted Project

Yesterday

rampitec updated the diff for D89978: Fix SROA with a PHI mergig values from a same block.

Changed check to only allow unique blocks.

Thu, Oct 22, 11:35 PM · Restricted Project
rampitec accepted D89997: AMDGPU: Increase branch size estimate with offset bug.
Thu, Oct 22, 11:26 PM · Restricted Project
rampitec added inline comments to D89978: Fix SROA with a PHI mergig values from a same block.
Thu, Oct 22, 4:56 PM · Restricted Project
rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.

Fixed issue with flat scratch not always being initialized. It was not initialized if we had no stack objects or calls, but later did spilling.
It is too late to insert system SGPRs at frame lowering, so initialize it always if flat scratch is used.

Thu, Oct 22, 3:25 PM · Restricted Project
rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.

Fixed a need of SGPR spill during VGPR spilling on targets w/o flat scratch ST mode, reused existing code adjusting offsets.

Thu, Oct 22, 12:58 PM · Restricted Project
rampitec added a comment to D79218: Process gep (phi ptr1, ptr2) in SROA.

Hi!

https://bugs.llvm.org/show_bug.cgi?id=47945 started happening with this patch.

Thu, Oct 22, 11:26 AM · Restricted Project
rampitec requested review of D89978: Fix SROA with a PHI mergig values from a same block.
Thu, Oct 22, 11:25 AM · Restricted Project
rampitec accepted D89973: [AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions..

LGTM

Thu, Oct 22, 10:41 AM · Restricted Project
rampitec added a comment to D89170: [AMDGPU] Use flat scratch instructions where available.

I also came to conclusion that the only robust way to have no failed scavenging during frame lowering is to always have an sp or fp. Otherwise it can fail regardless of the spilling method. The only other way is to have an instruction with full 32 bit immediate offset. I.e. it can fail in a kernel with MUBUF as well.

I was considering requiring an FP if the stack size was starting to hit the offset limit, but was unable to come up with a testcase where it would really break

Thu, Oct 22, 10:37 AM · Restricted Project

Wed, Oct 21

rampitec added a comment to D89170: [AMDGPU] Use flat scratch instructions where available.

I also came to conclusion that the only robust way to have no failed scavenging during frame lowering is to always have an sp or fp. Otherwise it can fail regardless of the spilling method. The only other way is to have an instruction with full 32 bit immediate offset. I.e. it can fail in a kernel with MUBUF as well.

Wed, Oct 21, 4:29 PM · Restricted Project
rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.
  • Integrated spilling from child revision, child is dropped;
  • Fixed situation when an SGPR has to be spilled while scavenging in frame elimination;
Wed, Oct 21, 3:31 PM · Restricted Project
rampitec abandoned D89424: [AMDGPU] Spilling using flat scratch.

I am integrating it into parent revision.

Wed, Oct 21, 3:27 PM · Restricted Project
rampitec committed rG611959f004d7: [AMDGPU] Fixed v_swap_b32 match (authored by rampitec).
[AMDGPU] Fixed v_swap_b32 match
Wed, Oct 21, 10:14 AM
rampitec closed D89599: [AMDGPU] Fixed v_swap_b32 match.
Wed, Oct 21, 10:14 AM · Restricted Project
rampitec updated the diff for D89599: [AMDGPU] Fixed v_swap_b32 match.

Do not specaial case implicit exec.

Wed, Oct 21, 9:25 AM · Restricted Project
rampitec accepted D89880: [AMDGPU] Reorder SIMemoryLegalizer functions to be consistent.
Wed, Oct 21, 9:11 AM · Restricted Project
rampitec accepted D89386: [AMDGPU] Fix access beyond the end of the basic block in execMayBeModifiedBeforeAnyUse..

LGTM

Wed, Oct 21, 9:10 AM · Restricted Project

Tue, Oct 20

rampitec accepted D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.

LGTM. Thank you!

Tue, Oct 20, 3:35 PM · Restricted Project
rampitec updated the diff for D89599: [AMDGPU] Fixed v_swap_b32 match.

Preserve implicit operands.

Tue, Oct 20, 3:24 PM · Restricted Project
rampitec added inline comments to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.
Tue, Oct 20, 2:31 PM · Restricted Project
rampitec added inline comments to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.
Tue, Oct 20, 2:24 PM · Restricted Project
rampitec updated the diff for D89599: [AMDGPU] Fixed v_swap_b32 match.

Preserve implicit operands on the mov at insertion point.

Tue, Oct 20, 2:12 PM · Restricted Project
rampitec accepted D89738: [AMDGPU] Refactor SOPC & SOPP .td for extension.

LGTM. PSDB run is still desirable.

Tue, Oct 20, 12:56 PM · Restricted Project
rampitec updated the diff for D89424: [AMDGPU] Spilling using flat scratch.

Rebased.

Tue, Oct 20, 12:39 PM · Restricted Project
rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.
  • Ensure flat scratch initialization;
  • Added asserts around scavenger calls until there is a better handling of failed scavenging;
Tue, Oct 20, 12:25 PM · Restricted Project
rampitec accepted D89753: [HazardRec] Allow inserting multiple wait-states simultaneously.

LGTM

Tue, Oct 20, 11:58 AM · Restricted Project
rampitec added inline comments to D89738: [AMDGPU] Refactor SOPC & SOPP .td for extension.
Tue, Oct 20, 11:56 AM · Restricted Project
rampitec added a comment to D89753: [HazardRec] Allow inserting multiple wait-states simultaneously.

Do we properly count number of pre-existing wait states if we have s_nop > 0?

Tue, Oct 20, 11:24 AM · Restricted Project
rampitec added inline comments to D89170: [AMDGPU] Use flat scratch instructions where available.
Tue, Oct 20, 11:22 AM · Restricted Project
rampitec added inline comments to D89386: [AMDGPU] Fix access beyond the end of the basic block in execMayBeModifiedBeforeAnyUse..
Tue, Oct 20, 11:16 AM · Restricted Project
rampitec accepted D89805: AMDGPU: Lower the threshold reported for maximum stack size exceeded.

LGTM

Tue, Oct 20, 11:09 AM · Restricted Project
rampitec added inline comments to D89170: [AMDGPU] Use flat scratch instructions where available.
Tue, Oct 20, 10:51 AM · Restricted Project
rampitec accepted D89796: [AMDGPU] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans.

LGTM, thanks!

Tue, Oct 20, 10:44 AM · Restricted Project
rampitec added inline comments to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.
Tue, Oct 20, 10:40 AM · Restricted Project
rampitec added a comment to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.

JFYI how much it will help actual programs after it is fixed is unclear. It will likely change a lot of lit tests, but actual effect on real programs would depend on FE and language rules. And inlining of course, as usual.

It did change 46 lit tests. I agree it is unclear how much it will help. But the GLOBAL and SCRATCH flat operations seem like they may avoid the pessimistic waitcnt 0.

Right. Out of these 46 lit tests I was looking for a very specific one, wanting to ask to write one if it does not exist. This one does exist and it is failing.

Which test is failing? All the lit tests are passing on my machine. Or are you questioning the way the CHECK tests have been updated? The original test is marking the FLAT pointer as referencing the GLOBAL address space. I assume this is what the frontend did to match the CUDA language semantics that say kernel arguments can only reference global memory. So I believe the generated code is correct unless I am missing something.

Tue, Oct 20, 1:29 AM · Restricted Project
rampitec added inline comments to D89501: [AMDGPU] flat scratch ST addressing mode on gfx10.
Tue, Oct 20, 1:24 AM · Restricted Project
rampitec added inline comments to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.
Tue, Oct 20, 1:17 AM · Restricted Project
rampitec added a comment to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.

JFYI how much it will help actual programs after it is fixed is unclear. It will likely change a lot of lit tests, but actual effect on real programs would depend on FE and language rules. And inlining of course, as usual.

It did change 46 lit tests. I agree it is unclear how much it will help. But the GLOBAL and SCRATCH flat operations seem like they may avoid the pessimistic waitcnt 0.

Tue, Oct 20, 1:16 AM · Restricted Project
rampitec added a comment to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.

JFYI how much it will help actual programs after it is fixed is unclear. It will likely change a lot of lit tests, but actual effect on real programs would depend on FE and language rules.

Tue, Oct 20, 12:02 AM · Restricted Project

Mon, Oct 19

rampitec accepted D89619: [AMDGPU][NFC] Tidy SIOptimizeExecMaskingPreRA for extensibility.

LGTM

Mon, Oct 19, 11:38 PM · Restricted Project
rampitec requested changes to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.

The patch clearly ignores existence of flat pointers with the test failing.

Mon, Oct 19, 11:33 PM · Restricted Project
rampitec added inline comments to D89170: [AMDGPU] Use flat scratch instructions where available.
Mon, Oct 19, 3:55 PM · Restricted Project
rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.
Mon, Oct 19, 3:55 PM · Restricted Project
rampitec committed rG6ddadf99018b: [AMDGPU] flat scratch ST addressing mode on gfx10 (authored by rampitec).
[AMDGPU] flat scratch ST addressing mode on gfx10
Mon, Oct 19, 3:44 PM
rampitec closed D89501: [AMDGPU] flat scratch ST addressing mode on gfx10.
Mon, Oct 19, 3:44 PM · Restricted Project
rampitec updated the diff for D89424: [AMDGPU] Spilling using flat scratch.

Rebased to parent.

Mon, Oct 19, 3:28 PM · Restricted Project
rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.

Correct rebase patch.

Mon, Oct 19, 3:01 PM · Restricted Project
rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.

Rebased to parent.

Mon, Oct 19, 2:56 PM · Restricted Project
rampitec added a comment to D89738: [AMDGPU] Refactor SOPC & SOPP .td for extension.

Are there hazards associated with SOP, where we have it lowered to real instructions?

I'm not sure exactly what you mean, but there is a workaround for branch instructions, that operates on the real instructions. Original fix at https://github.com/llvm/llvm-project/commit/9ab812d4752b2a1442426db2ccc17dc95d12eb04

Mon, Oct 19, 2:31 PM · Restricted Project
rampitec updated the diff for D89501: [AMDGPU] flat scratch ST addressing mode on gfx10.

Added comment.

Mon, Oct 19, 2:29 PM · Restricted Project
rampitec added a comment to D89738: [AMDGPU] Refactor SOPC & SOPP .td for extension.

Are there hazards associated with SOP, where we have it lowered to real instructions?

Mon, Oct 19, 2:14 PM · Restricted Project
rampitec accepted D89737: AMDGPU: Propagate amdgpu-flat-work-group-size attributes.

Cloning might be a good thing anyway. One call stack will use conservative slow version and some other a faster version.

Mon, Oct 19, 2:08 PM · Restricted Project
rampitec updated the diff for D89501: [AMDGPU] flat scratch ST addressing mode on gfx10.
  • Use !not in td file.
  • Only enable ST mode from gfx1030.
Mon, Oct 19, 2:03 PM · Restricted Project
rampitec added inline comments to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.
Mon, Oct 19, 1:36 PM · Restricted Project
rampitec added inline comments to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.
Mon, Oct 19, 11:55 AM · Restricted Project
rampitec added inline comments to D89386: [AMDGPU] Fix access beyond the end of the basic block in execMayBeModifiedBeforeAnyUse..
Mon, Oct 19, 11:25 AM · Restricted Project
rampitec added a reviewer for D89582: clang/AMDGPU: Apply workgroup related attributes to all functions: b-sumner.
Mon, Oct 19, 8:18 AM

Sat, Oct 17

rampitec added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!

No I am not quite saying that as some languages are not clears. Having said that, some compiler implementations are assuming that for some languages. Basically the rule is language specific, so AA would need to ask the language if it is permissible to assume that or not. Also bear in mind the OpenCL case for LDS where the kernel argument is not really being passed in from externally, but created independently for each thread/group-of-threads.

Generic pointers are another issue. They are pointers that may point to multiple address spaces. But the rules of dereferncing when they reference the non-global address space are the same. There can be rules that allow a generic pointers to be known to only point to one address space, in which case they can be treated the same as if they were a pointer to that address space. At the hardware level, FLAT instructions can be used to implement language generic pointers. But FLAT instructions can also be used when the address space is fixed, in which case the semantics are the same as the single address space case.

Unlike OpenCL, the CUDA language does not have the address space of pointers as part of the type system. But it still allows allocation of objects to specific address spaces. For CUDA all addressing is conceptually generic, but the allocation address space can be propagated to know the fixed address space of the FLAT operations.

Sat, Oct 17, 3:00 PM · Restricted Project
rampitec accepted D89525: [amdgpu] Enhance AMDGPU AA..

LGTM, I think we have resolved my doubts.

Sat, Oct 17, 1:38 PM · Restricted Project
rampitec added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

But not in OpenCL.

For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.

Sat, Oct 17, 1:09 PM · Restricted Project
rampitec added a comment to D89599: [AMDGPU] Fixed v_swap_b32 match.

Couldn't this try to preserve the implicit operands?

Sat, Oct 17, 11:52 AM · Restricted Project

Fri, Oct 16

rampitec added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

But not in OpenCL.

For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.

Fri, Oct 16, 9:13 PM · Restricted Project
rampitec requested review of D89599: [AMDGPU] Fixed v_swap_b32 match.
Fri, Oct 16, 3:28 PM · Restricted Project
rampitec added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

Fri, Oct 16, 2:09 PM · Restricted Project
rampitec added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Fri, Oct 16, 2:01 PM · Restricted Project
rampitec added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

Fri, Oct 16, 1:55 PM · Restricted Project
rampitec committed rG173389e16d32: [AMDGPU] Fix gfx1032 description in AMDGPUUsage.rst. NFC. (authored by rampitec).
[AMDGPU] Fix gfx1032 description in AMDGPUUsage.rst. NFC.
Fri, Oct 16, 1:29 PM
rampitec closed D89565: [AMDGPU] Fix gfx1032 description in AMDGPUUsage.rst. NFC..
Fri, Oct 16, 1:29 PM · Restricted Project
rampitec added inline comments to D89525: [amdgpu] Enhance AMDGPU AA..
Fri, Oct 16, 12:48 PM · Restricted Project
rampitec added a comment to D89582: clang/AMDGPU: Apply workgroup related attributes to all functions.

What if a device function is called by kernels with different work group sizes, will caller's work group size override callee's work group size?

It's user error to call a function with a larger range than the caller

The problem is that user can override default on a kernel with the attribute, but cannot do so on function. So a module can be compiled with a default smaller than requested on one of the kernels.

Then if default is maximum 1024 and can only be overridden with the --gpu-max-threads-per-block option it would not be problem, if not the description of the option:

LANGOPT(GPUMaxThreadsPerBlock, 32, 256, "default max threads per block for kernel launch bounds for HIP")

I.e. it says about the "default", so it should be perfectly legal to set a higher limits on a specific kernel. Should the option say it restricts the maximum it would be legal to apply it to functions as well.

The current backend default ends up greatly restricting the registers used in the functions, and increasing the spilling.

Fri, Oct 16, 12:38 PM
rampitec committed rG874524ab88a9: [AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs (authored by rampitec).
[AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs
Fri, Oct 16, 12:37 PM
rampitec closed D89568: [AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs.
Fri, Oct 16, 12:37 PM · Restricted Project
rampitec added a comment to D89582: clang/AMDGPU: Apply workgroup related attributes to all functions.

What if a device function is called by kernels with different work group sizes, will caller's work group size override callee's work group size?

It's user error to call a function with a larger range than the caller

Fri, Oct 16, 12:32 PM
rampitec updated the diff for D89568: [AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs.
Fri, Oct 16, 12:12 PM · Restricted Project
rampitec updated the diff for D89424: [AMDGPU] Spilling using flat scratch.

Switched to ST mode on GFX10 is no SOffset is used.

Fri, Oct 16, 12:05 PM · Restricted Project
rampitec added inline comments to D89525: [amdgpu] Enhance AMDGPU AA..
Fri, Oct 16, 11:30 AM · Restricted Project
rampitec added inline comments to D89568: [AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs.
Fri, Oct 16, 11:11 AM · Restricted Project
rampitec added inline comments to D89487: [AMDGPU] gfx1032 target.
Fri, Oct 16, 10:45 AM · Restricted Project, Restricted Project
rampitec requested review of D89568: [AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs.
Fri, Oct 16, 10:45 AM · Restricted Project
rampitec added inline comments to D89487: [AMDGPU] gfx1032 target.
Fri, Oct 16, 10:21 AM · Restricted Project, Restricted Project
rampitec requested review of D89565: [AMDGPU] Fix gfx1032 description in AMDGPUUsage.rst. NFC..
Fri, Oct 16, 10:20 AM · Restricted Project
rampitec added inline comments to D89170: [AMDGPU] Use flat scratch instructions where available.
Fri, Oct 16, 10:10 AM · Restricted Project
rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.

Renamed pattern.

Fri, Oct 16, 9:40 AM · Restricted Project
rampitec accepted D89510: AMDGPU: Fix not always reserving VGPRs used for SGPR spilling.
Fri, Oct 16, 9:11 AM · Restricted Project

Thu, Oct 15

rampitec updated the diff for D89170: [AMDGPU] Use flat scratch instructions where available.

Use ST mode on GFX10 instead of NULL register.

Thu, Oct 15, 4:26 PM · Restricted Project
rampitec added a comment to D89510: AMDGPU: Fix not always reserving VGPRs used for SGPR spilling.

Any tests?

Thu, Oct 15, 4:17 PM · Restricted Project
rampitec added inline comments to D89502: AMDGPU: Don't kill super-register with overlapping copy.
Thu, Oct 15, 3:20 PM · Restricted Project
rampitec accepted D89502: AMDGPU: Don't kill super-register with overlapping copy.

LGTM, although it might require checking liveness of all aliasing registers.

Thu, Oct 15, 3:15 PM · Restricted Project
rampitec added a comment to D89424: [AMDGPU] Spilling using flat scratch.

A fantastic result that it has passed PSDB on gfx9 with flat scratch enabled. I did not expect it to start working from the first attempt.

Should try with -O0 too

Thu, Oct 15, 3:06 PM · Restricted Project
rampitec added a comment to D89424: [AMDGPU] Spilling using flat scratch.

A fantastic result that it has passed PSDB on gfx9 with flat scratch enabled. I did not expect it to start working from the first attempt.

Thu, Oct 15, 3:04 PM · Restricted Project