This is an archive of the discontinued LLVM Phabricator instance.

One hour between sending the patch out and landing it is not sufficient for anyone to meaningfully
review the patch and there are no mentions of the review done anywhere else.

While the code only changes AMDGPU back-end, it does not mean that the patch should be just rubber-stamped.

In D96906#2570086, @tra wrote:

This is a pretty huge patch, with no details in the commit log.

One hour between sending the patch out and landing it is not sufficient for anyone to meaningfully
review the patch and there are no mentions of the review done anywhere else.

While the code only changes AMDGPU back-end, it does not mean that the patch should be just rubber-stamped.

It's a year of work necessarily downstream. Every line there was reviewed and tested in the downstream. I understand no one can reasonably review something that big, although I cannot break it into small patches after a year of changes and fixes. Not that I have too much choice.

In D96906#2570095, @rampitec wrote:

It's a year of work necessarily downstream. Every line there was reviewed and tested in the downstream. I understand no one can reasonably review something that big, although I cannot break it into small patches after a year of changes and fixes. Not that I have too much choice.

The point is that nobody upstream even got a chance to chime in.

According to https://llvm.org/docs/CodeReview.html

We expect significant patches to be reviewed before being committed.

This patch certainly qualifies as significant.

foad added a subscriber: foad.Feb 18 2021, 7:57 AM

foad added inline comments.

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll
23	CO-V3 isn't tested by any RUN line. I think FileCheck might complain about this in future.
41	UNPACKED-TID isn't tested by any RUN line.

kzhuravl added inline comments.Feb 18 2021, 8:49 AM

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll
23	Thanks for spotting this. I think issue was introduced in one of the merges from trunk. Fixed in https://reviews.llvm.org/D96967
41	Thanks for spotting this. I think issue was introduced in one of the merges from trunk. Fixed in https://reviews.llvm.org/D96967

arsenm added inline comments.Feb 18 2021, 1:46 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199	This is a problem because I've removed forAllLanes. This is a hack, we should be using a different register class for cases that don't support a given subregister index not scanning for an example non-reserved register

rampitec added inline comments.Feb 18 2021, 1:49 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199	This would be massive duplication of all instructions with such operands, isn't it?

The point is that nobody upstream even got a chance to chime in.

We are and will be taking care of any feedback provided in this review post-commit.

In D96906#2572749, @kzhuravl wrote:

The point is that nobody upstream even got a chance to chime in.

We are and will be taking care of any feedback provided in this review post-commit.

To be fair to @rampitec , it was not his desire to push this up in 1 big patch. We needed this upstreamed and no time was given to him to break it up into reasonably sized pieces. If it appears to be his doing/his intent, well, it should not. There have been a couple comments; I believe most addressed; comments will continue to be addressed.

In D96906#2572842, @msearles wrote:

In D96906#2572749, @kzhuravl wrote:

The point is that nobody upstream even got a chance to chime in.

We are and will be taking care of any feedback provided in this review post-commit.

To be fair to @rampitec , it was not his desire to push this up in 1 big patch. We needed this upstreamed and no time was given to him to break it up into reasonably sized pieces. If it appears to be his doing/his intent, well, it should not. There have been a couple comments; I believe most addressed; comments will continue to be addressed.

Who landed the change is not particularly important.

What does matter is to make sure that shortcutting the review does not become a regular occurrence.

Stuff happens, Sometimes the standard rules may need to be bent. However, it should not be a unilateral decision by the committing side.

A better way to handle that would be to send the patch for review few days early (you presumably did have most/all of these changes made by then), provide details describing the changes (single subject line falls a bit short), outline your situation explaining why the patch can't be split and reviewed pre-commit. If the changes are indeed well-reviewed downstream, that would probably not pose much of a challenge to get the patch approved. If the patch does need further cleanups, at least we would have a reasonable idea how invasive they would be and could make an informed call. "Commit now, ask for forgiveness later" is not among the LLVM contribution guidelines, not for large patches. At the very minimum it should've been publicly discussed before the fact.

clang/include/clang/Driver/Options.td
3097–3101	We have `BoolFOption` to generate `-fsomething` and `-fno-something`
llvm/include/llvm/BinaryFormat/ELF.h
737	Nit: This looks odd. GFX90A does not need to be in the middle of the list. It makes it somewhat confusing to tell which ID is really the last. The `_LAST` enum says it's GFX90A, but it's not the last item of the list. There are already out-of-name-order GPUs at the end of the list, so putting 90A at the end would probably be a better choice. At least we'd still have the numeric values in order. Right now the list is ordered neither by the name nor by the value. There's also a question of whether something needs to be done about the missing values 0x3c..0x3e. Presumably the `_FIRST`..`_LAST` enums specify the range we'll use to iterate over the GPU IDs. Do we handle the missing values correctly? Looks like it's benign at the moment as we're only using it to return amdgcn triple in ELFObjectFile.h. I'd add the placeholder enums for the reserved/unused values within the range.

In D96906#2572842, @msearles wrote:

In D96906#2572749, @kzhuravl wrote:

The point is that nobody upstream even got a chance to chime in.

We are and will be taking care of any feedback provided in this review post-commit.

To be fair to @rampitec , it was not his desire to push this up in 1 big patch. We needed this upstreamed and no time was given to him to break it up into reasonably sized pieces. If it appears to be his doing/his intent, well, it should not. There have been a couple comments; I believe most addressed; comments will continue to be addressed.

"we needed this upstream" is a business issue on AMD's side, not an issue for the llvm project. In general the expectation is that code is reviewed according to the guidelines and a single reviewer with one (small) patch that wasn't a revert doesn't feel like sufficient review for something of this size. For something this size I'd have expected Matt to at least be on the reviewer line and that also wasn't done. This feels like an abuse of the review system and probably should be reverted.

Thanks.

-eric

kzhuravl added inline comments.Feb 18 2021, 5:04 PM

llvm/include/llvm/BinaryFormat/ELF.h
737	https://reviews.llvm.org/D97010

kzhuravl added inline comments.Feb 18 2021, 5:20 PM

clang/include/clang/Driver/Options.td
3097–3101	The reason we use m_amdgpu_Features_Group is it is getting transformed to target features (-mtgsplit to +tgsplit, mno-tgsplit to -tgsplit. For example, tgsplit target feature in AMDGPU backend: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPU.td#L157 Does BoolFOption get translated to target features as well?

rampitec added inline comments.Feb 18 2021, 5:23 PM

clang/include/clang/Driver/Options.td
3097–3101	We could probably create similar BoolMOption. This is not only tgsplit, there are plenty of such binary options around.

kzhuravl added inline comments.Feb 18 2021, 5:23 PM

clang/include/clang/Driver/Options.td
3097–3101	Quickly glancing over Options.td, BoolFOption is in f_group, and does not get automatically converted to target-features

kzhuravl added inline comments.Feb 18 2021, 5:26 PM

clang/include/clang/Driver/Options.td
3097–3101	agreed, seems like a good choice given there is BoolFOption, BoolGOption

kzhuravl added inline comments.Feb 18 2021, 5:37 PM

clang/include/clang/Driver/Options.td
3097–3101	but it will still need to be in m_amdgpu_Features_Group because https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/AMDGPU.cpp#L403 unless we want to switch away from that. so maybe make a group an optional, last template parameter to BoolMOption?

kzhuravl added inline comments.Feb 18 2021, 5:39 PM

clang/include/clang/Driver/Options.td
3097–3101	Other targets also have its own corresponding groups: https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Driver/Options.td#L151

rampitec added inline comments.Feb 18 2021, 6:09 PM

clang/include/clang/Driver/Options.td
3097–3101	Looks like. It is not just the same thing as BoolFOption or BoolGOption.

kzhuravl added inline comments.Feb 18 2021, 6:20 PM

clang/include/clang/Driver/Options.td
3097–3101	@tra , do you have any suggestion based on the comments above? thanks.

In D96906#2573265, @echristo wrote:

In D96906#2572842, @msearles wrote:

In D96906#2572749, @kzhuravl wrote:

The point is that nobody upstream even got a chance to chime in.

We are and will be taking care of any feedback provided in this review post-commit.

To be fair to @rampitec , it was not his desire to push this up in 1 big patch. We needed this upstreamed and no time was given to him to break it up into reasonably sized pieces. If it appears to be his doing/his intent, well, it should not. There have been a couple comments; I believe most addressed; comments will continue to be addressed.

"we needed this upstream" is a business issue on AMD's side, not an issue for the llvm project. In general the expectation is that code is reviewed according to the guidelines and a single reviewer with one (small) patch that wasn't a revert doesn't feel like sufficient review for something of this size. For something this size I'd have expected Matt to at least be on the reviewer line and that also wasn't done. This feels like an abuse of the review system and probably should be reverted.

Thanks.

-eric

I'd appreciate it if you could find a solution that does not involve reverting and reapplying later, as this will triple the amount of churn we get downstream. (I realise LLVM policy is not to care about downstream but I thought I'd plead my case anyway!)

JonChesterfield added a subscriber: JonChesterfield.Feb 19 2021, 9:13 AM

rampitec added inline comments.Feb 19 2021, 11:05 AM

clang/include/clang/Driver/Options.td
3097–3101	I have created new BoolMOption here: D97069. Not sure if it saves much code but let's see if we like it.

arsenm added inline comments.Feb 19 2021, 2:33 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199	Ideally yes. We can still use register classes for this, with special care to make sure we never end up with the unaligned virtual registers in the wrong contexts. The less that's tracked by the instruction definitions, the more special case code we have to right. I've been thinking of swapping out the entire MCInstrDesc table per-subtarget to make this easier, although that may be a painful change.

rampitec added inline comments.Feb 19 2021, 2:41 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199	I do not see how it can be less code. You will need to duplicate all VALU pseudos, not just real instructions. Which means every time you write in the source something like AMDGPU::FLAT_LOAD_DWORDX2 you would have to write an if. For every VALU instruction.

arsenm added inline comments.Feb 19 2021, 3:25 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199	It's less code because the code that's already there is supposed to rely on the static operand definitions. Every time we want to deviate from those, we end up writing manual code in the verifier and fixup things here and there that differ. The point of swapping out the table would be to eliminate all the VALU pseudos. We would just have the same enum values referring to different physical instruction definitions

rampitec added inline comments.Feb 19 2021, 6:08 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199	This makes sense, although as you said also quite painful and to me also sounds like a hack. There is still a lot of legalization needed even with this approach. Every time you hit an instruction not supported by a target you will need to do something about it. In a worst case expanding. Sounds like another year of work. Especially when you look at highly specialized ASICs which can do this but cannot do that, and you have a lot them.

rampitec added inline comments.Feb 19 2021, 10:47 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199	JBTW it will not help anyway. Not for this problem. You may create an operand of a different RC or you may just reserve every other register like I did, the net result will be the same, you will end up using prohibited register. Imagine you are using an RC where only even tuples are added. And then you are using sub1_sub2 subreg of it. RA will happily allocate forbidden register just like it does now. To me this is RA bug in the first place to allocate a reserved register. The only thing which could help is an another register info without odd wide subregs, but that you cannot achieve just by duplication of instruction definitions, for that you would need to duplicate register info as well. This is almost a new BE.

arsenm added inline comments.Feb 23 2021, 11:53 AM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199	It's not a hack, this is how operand classes are intended to work. You wouldn't be producing these instructions on targets that don't support them (ideally we would also have a verifier for this, which is another area where subtarget handling is weak). The point is to not reserve them. References to unaligned registers can exist, they just can't be used in the context of a real machine operand. D97316 switches to using dedicated classes for alignment (the further cleanup would be to have this come directly from the instruction definitions instead of fixing them up after isel).

rampitec added inline comments.Feb 23 2021, 12:40 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199	It's not a hack, this is how operand classes are intended to work. You wouldn't be producing these instructions on targets that don't support them (ideally we would also have a verifier for this, which is another area where subtarget handling is weak). The point is to not reserve them. References to unaligned registers can exist, they just can't be used in the context of a real machine operand. D97316 switches to using dedicated classes for alignment (the further cleanup would be to have this come directly from the instruction definitions instead of fixing them up after isel). I have replied in D97316, but I do not believe it will help as is. It will run into the exactly same issue as with reserved registers approach and their subregs.

rampitec added inline comments.Feb 23 2021, 1:21 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
191–199	I have checked the commit. This code is not even related to vgpr agignment. It was about SGPRs in fact: If stack is present 4 low SGPRs are reserved. The pass was using just a first register as a representative for RC to check for reserved subregs. The first register was reserved and pass failed to find a valid lane split. The testcase is mem_clause_sreg256_used_stack from memory_clause.mir.

rampitec marked 2 inline comments as done.Mar 1 2021, 9:06 AM

foad added inline comments.Mar 29 2021, 11:30 AM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
100	Spelling "sequence".

Herald added a subscriber: mstorsjo. · View Herald TranscriptMar 29 2021, 11:30 AM

rampitec marked an inline comment as done.Mar 29 2021, 12:23 PM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
100	Thanks! https://reviews.llvm.org/rG619b88849e14

Large Diff

This large diff affects 359 files. Files without inline comments have been collapsed. Expand All Files

Revision Contents

Path

Size

clang/

docs/

ClangCommandLineReference.rst

4 lines

include/

clang/

Basic/

BuiltinsAMDGPU.def

8 lines

Cuda.h

1 line

Driver/

Options.td

5 lines

lib/

Basic/

Cuda.cpp

1 line

Targets/

AMDGPU.cpp

3 lines

NVPTX.cpp

1 line

CodeGen/

CGOpenMPRuntimeGPU.cpp

2 lines

test/

CodeGenOpenCL/

amdgpu-features.cl

2 lines

builtins-amdgcn-mfma.cl

139 lines

Driver/

5 lines

1 line

2 lines

2 lines

hip-toolchain-features.hip

12 lines

Misc/

target-invalid-cpu-note.c

2 lines

SemaOpenCL/

builtins-amdgcn-error-gfx90a-param.cl

67 lines

llvm/

docs/

AMDGPUUsage.rst

2785 lines

include/

llvm/

BinaryFormat/

ELF.h

3 lines

IR/

IntrinsicsAMDGPU.td

67 lines

Support/

AMDHSAKernelDescriptor.h

27 lines

TargetParser.h

3 lines

lib/

Object/

ELFObjectFile.cpp

2 lines

ObjectYAML/

ELFYAML.cpp

1 line

Support/

TargetParser.cpp

2 lines

Target/

AMDGPU/

AMDGPU.td

141 lines

AMDGPUArgumentUsageInfo.h

3 lines

AMDGPUAsmPrinter.cpp

43 lines

AMDGPUCallingConv.td

15 lines

AMDGPUGISel.td

5 lines

AMDGPUISelDAGToDAG.cpp

95 lines

AMDGPUISelLowering.h

2 lines

AMDGPUISelLowering.cpp

2 lines

AMDGPUInstructionSelector.h

3 lines

AMDGPUInstructionSelector.cpp

32 lines

AMDGPULegalizerInfo.cpp

16 lines

AMDGPURegisterBankInfo.cpp

33 lines

AMDGPUSearchableTables.td

17 lines

AMDGPUSubtarget.h

5 lines

AMDGPUSubtarget.cpp

12 lines

AMDGPUTargetTransformInfo.cpp

17 lines

AsmParser/

AMDGPUAsmParser.cpp

348 lines

BUFInstructions.td

308 lines

DSInstructions.td

116 lines

Disassembler/

AMDGPUDisassembler.h

10 lines

AMDGPUDisassembler.cpp

246 lines

FLATInstructions.td

160 lines

GCNDPPCombine.cpp

50 lines

GCNHazardRecognizer.h

3 lines

GCNHazardRecognizer.cpp

427 lines

4 lines

13 lines

17 lines

5 lines

38 lines

MCTargetDesc/

AMDGPUInstPrinter.h

2 lines

AMDGPUInstPrinter.cpp

37 lines

AMDGPUTargetStreamer.h

6 lines

AMDGPUTargetStreamer.cpp

43 lines

5 lines

162 lines

2 lines

19 lines

225 lines

SIFormMemoryClauses.cpp

15 lines

19 lines

203 lines

24 lines

22 lines

10 lines

323 lines

119 lines

62 lines

SILoadStoreOptimizer.cpp

90 lines

SIMemoryLegalizer.cpp

409 lines

3 lines

2 lines

41 lines

91 lines

66 lines

3 lines

Utils/

11 lines

28 lines

18 lines

63 lines

31 lines

137 lines

13 lines

test/

Analysis/

CostModel/

AMDGPU/

fadd.ll

17 lines

fma.ll

9 lines

fmul.ll

15 lines

CodeGen/

AMDGPU/

GlobalISel/

fp64-atomics-gfx90a.ll

584 lines

inst-select-amdgpu-atomic-cmpxchg-flat.mir

42 lines

inst-select-amdgpu-atomic-cmpxchg-global.mir

72 lines

inst-select-atomicrmw-add-flat.mir

84 lines

inst-select-atomicrmw-add-global.mir

84 lines

inst-select-copy.mir

16 lines

inst-select-fmaxnum-ieee.mir

12 lines

inst-select-fmaxnum.mir

12 lines

inst-select-fminnum-ieee.mir

12 lines

inst-select-fminnum.mir

12 lines

inst-select-fmul.mir

26 lines

inst-select-fptoui.mir

8 lines

inst-select-fract.f64.mir

4 lines

inst-select-implicit-def.mir

6 lines

inst-select-load-atomic-flat.mir

16 lines

inst-select-load-atomic-global.mir

40 lines

inst-select-load-flat.mir

192 lines

inst-select-load-global-saddr.mir

72 lines

inst-select-load-global.mir

264 lines

inst-select-load-global.s96.mir

12 lines

inst-select-load-private.mir

108 lines

inst-select-sitofp.mir

8 lines

inst-select-store-atomic-flat.mir

8 lines

inst-select-store-flat.mir

120 lines

inst-select-store-global.mir

152 lines

inst-select-store-global.s96.mir

10 lines

inst-select-store-private.mir

72 lines

llvm.amdgcn.global.atomic.fadd-with-ret.ll

13 lines

llvm.amdgcn.global.atomic.fadd.ll

46 lines

llvm.amdgcn.raw.buffer.atomic.fadd-with-ret.ll

25 lines

llvm.amdgcn.raw.buffer.atomic.fadd.ll

546 lines

llvm.amdgcn.raw.buffer.load.format.f16.ll

20 lines

llvm.amdgcn.raw.buffer.load.format.ll

12 lines

llvm.amdgcn.raw.buffer.load.ll

66 lines

llvm.amdgcn.raw.buffer.store.format.f16.ll

44 lines

llvm.amdgcn.raw.buffer.store.format.f32.ll

24 lines

llvm.amdgcn.raw.buffer.store.ll

66 lines

llvm.amdgcn.raw.tbuffer.load.f16.ll

32 lines

llvm.amdgcn.raw.tbuffer.load.ll

18 lines

llvm.amdgcn.raw.tbuffer.store.f16.ll

40 lines

llvm.amdgcn.raw.tbuffer.store.i8.ll

16 lines

llvm.amdgcn.raw.tbuffer.store.ll

50 lines

llvm.amdgcn.s.buffer.load.ll

300 lines

llvm.amdgcn.struct.buffer.atomic.fadd-with-ret.ll

20 lines

llvm.amdgcn.struct.buffer.atomic.fadd.ll

594 lines

llvm.amdgcn.struct.buffer.load.format.f16.ll

24 lines

llvm.amdgcn.struct.buffer.load.format.ll

14 lines

llvm.amdgcn.struct.buffer.load.ll

32 lines

llvm.amdgcn.struct.buffer.store.format.f16.ll

20 lines

llvm.amdgcn.struct.buffer.store.format.f32.ll

12 lines

llvm.amdgcn.struct.buffer.store.ll

20 lines

llvm.amdgcn.struct.tbuffer.load.f16.ll

26 lines

llvm.amdgcn.struct.tbuffer.load.ll

14 lines

regbankselect-amdgcn.mfma.gfx90a.mir

206 lines

SRSRC-GIT-clobber-check.mir

2 lines

acc-ldst.ll

316 lines

accvgpr-copy.mir

1093 lines

adjust-writemask-vectorized.ll

14 lines

agpr-csr.ll

206 lines

agpr-register-count.ll

118 lines

attr-amdgpu-flat-work-group-size-vgpr-limit.ll

10 lines

branch-relaxation-debug-info.mir

4 lines

break-smem-soft-clauses.mir

4 lines

break-vmem-soft-clauses.mir

320 lines

buffer-intrinsics-mmo-offsets.ll

268 lines

bundle-latency.mir

16 lines

callee-special-input-vgprs-packed.ll

743 lines

clamp-omod-special-case.mir

24 lines

cluster-flat-loads-postra.mir

8 lines

cluster-flat-loads.mir

4 lines

coalesce-vgpr-alignment.ll

38 lines

coalescer-extend-pruned-subrange.mir

4 lines

coalescer-subranges-another-copymi-not-live.mir

4 lines

coalescer-subranges-another-prune-error.mir

4 lines

coalescer-subreg-join.mir

4 lines

coalescer-subregjoin-fullcopy.mir

6 lines

coalescer-with-subregs-bad-identical.mir

4 lines

collapse-endcf2.mir

8 lines

constant-fold-imm-immreg.mir

103 lines

copy_phys_vgpr64.mir

160 lines

couldnt-join-subrange-3.mir

4 lines

dbg-value-ends-sched-region.mir

14 lines

dead-lane.mir

2 lines

dead_copy.mir

2 lines

debug-value-scheduler-crash.mir

4 lines

dpp64_combine.ll

74 lines

dpp64_combine.mir

51 lines

elf-header-flags-mach.ll

2 lines

endpgm-dce.mir

14 lines

expand-si-indirect.mir

16 lines

extract_subvector_vec4_vec3.ll

4 lines

fastregalloc-self-loop-heuristic.mir

40 lines

flat-load-clustering.mir

8 lines

flat-scratch-fold-fi.mir

20 lines

20 lines

56 lines

2 lines

78 lines

fold-immediate-output-mods.mir

24 lines

fold-multiple.mir

2 lines

fp-atomic-to-s_denormmode.mir

64 lines

fp64-atomics-gfx90a.ll

737 lines

gfx90a-enc.ll

22 lines

global-atomics-fp.ll

341 lines

hard-clauses.mir

320 lines

hazard-buffer-store-v-interp.mir

2 lines

hazard-hidden-bundle.mir

4 lines

hazard-in-bundle.mir

2 lines

hazard-inlineasm.mir

2 lines

hazard-pass-ordering.mir

2 lines

hazard-recognizer-meta-insts.mir

8 lines

indirect-addressing-term.ll

2 lines

inline-asm.i128.ll

12 lines

insert-skips-flat-vmem.mir

8 lines

insert-waitcnts-exp.mir

8 lines

inserted-wait-states.mir

22 lines

invert-br-undef-vcc.mir

6 lines

lds-branch-vmem-hazard.mir

38 lines

limit-coalesce.mir

6 lines

llvm.amdgcn.atomic.fadd.gfx90a.ll

93 lines

llvm.amdgcn.atomic.fadd.ll

1 line

llvm.amdgcn.fmul.legacy.ll

1 line

llvm.amdgcn.image.atomic.dim.ll

1 line

llvm.amdgcn.image.dim.gfx90a.ll

306 lines

llvm.amdgcn.image.sample.dim.gfx90a.ll

74 lines

llvm.amdgcn.mfma.bf16.ll

142 lines

llvm.amdgcn.mfma.gfx90a.ll

194 lines

llvm.amdgcn.mfma.i8.ll

42 lines

llvm.amdgcn.mfma.ll

1449 lines

llvm.amdgcn.workitem.id.ll

26 lines

llvm.pow-gfx9.ll

19 lines

lower-control-flow-other-terminators.mir

4 lines

mai-hazards-gfx90a.mir

1310 lines

mai-hazards.mir

14 lines

memory-legalizer-atomic-insert-end.mir

2 lines

memory-legalizer-fence.ll

516 lines

memory-legalizer-flat-agent.ll

1748 lines

memory-legalizer-flat-nontemporal.ll

122 lines

memory-legalizer-flat-singlethread.ll

1634 lines

memory-legalizer-flat-system.ll

2007 lines

memory-legalizer-flat-wavefront.ll

1634 lines

memory-legalizer-flat-workgroup.ll

1644 lines

memory-legalizer-global-agent.ll

1670 lines

memory-legalizer-global-nontemporal.ll

98 lines

memory-legalizer-global-singlethread.ll

1586 lines

memory-legalizer-global-system.ll

1942 lines

memory-legalizer-global-wavefront.ll

1586 lines

memory-legalizer-global-workgroup.ll

1580 lines

memory-legalizer-invalid-addrspace.mir

10 lines

memory-legalizer-local-agent.ll

1495 lines

memory-legalizer-local-nontemporal.ll

106 lines

memory-legalizer-local-singlethread.ll

1554 lines

memory-legalizer-local-system.ll

1495 lines

memory-legalizer-local-wavefront.ll

1554 lines

memory-legalizer-local-workgroup.ll

1495 lines

memory-legalizer-local.mir

40 lines

memory-legalizer-multiple-mem-operands-atomics.mir

8 lines

memory-legalizer-multiple-mem-operands-nontemporal-1.mir

8 lines

memory-legalizer-multiple-mem-operands-nontemporal-2.mir

10 lines

memory-legalizer-private-nontemporal.ll

138 lines

memory-legalizer-region.mir

40 lines

memory_clause.mir

298 lines

merge-image-load-gfx10.mir

62 lines

merge-image-load.mir

244 lines

merge-image-sample-gfx10.mir

128 lines

merge-image-sample.mir

562 lines

merge-load-store-agpr.mir

94 lines

merge-load-store.mir

40 lines

merge-tbuffer.mir

776 lines

mfma-loop.ll

453 lines

mubuf-legalize-operands.mir

30 lines

nsa-vmem-hazard.mir

10 lines

opt-sgpr-to-vgpr-copy.mir

6 lines

optimize-if-exec-masking.mir

48 lines

packed-fp32.ll

580 lines

pei-build-spill-partial-agpr.mir

128 lines

pei-build-spill.mir

2280 lines

pei-reg-scavenger-position.mir

4 lines

pei-scavenge-vgpr-spill.mir

20 lines

phi-elimination-end-cf.mir

2 lines

postra-bundle-memops.mir

152 lines

power-sched-no-instr-sunit.mir

2 lines

promote-constOffset-to-imm-gfx10.mir

22 lines

promote-constOffset-to-imm.mir

26 lines

regbank-reassign.mir

46 lines

regcoal-subrange-join-seg.mir

2 lines

regcoalesce-dbg.mir

2 lines

rename-independent-subregs-mac-operands.mir

12 lines

reserved-reg-in-clause.mir

28 lines

reserved-vgpr-tuples.mir

248 lines

s_code_end.ll

16 lines

sched-assert-dead-def-subreg-use-other-subreg.mir

4 lines

sched-assert-onlydbg-value-empty-region.mir

40 lines

sched-crash-dbg-value.mir

22 lines

sched-handleMoveUp-subreg-def-across-subreg-def.mir

16 lines

schedule-barrier-fpmode.mir

32 lines

schedule-barrier.mir

39 lines

sdwa-gfx9.mir

8 lines

sdwa-ops.mir

31 lines

sdwa-peephole-instr-gfx10.mir

8 lines

sdwa-peephole-instr.mir

12 lines

sdwa-preserve.mir

18 lines

sdwa-scalar-ops.mir

16 lines

sdwa-vop2-64bit.mir

4 lines

sgpr-spill-wrong-stack-id.mir

2 lines

shrink-carry.mir

8 lines

shrink-vop3-carry-out.mir

36 lines

skip-branch-taildup-ret.mir

4 lines

spill-agpr-partially-undef.mir

12 lines

spill-agpr.ll

49 lines

spill-agpr.mir

892 lines

spill-reg-tuple-super-reg-use.mir

28 lines

spill-special-sgpr.mir

12 lines

splitkit-copy-live-lanes.mir

160 lines

splitkit-getsubrangeformask.ll

60 lines

stack-slot-color-sgpr-vgpr-spills.mir

14 lines

subreg-split-live-in-error.mir

4 lines

subvector-test.mir

2 lines

syncscopes.ll

6 lines

tgsplit.ll

11 lines

transform-block-with-return-to-epilog.ll

4 lines

twoaddr-fma-f64.mir

186 lines

undefined-physreg-sgpr-spill.mir

4 lines

v_mov_b64_expansion.mir

80 lines

vccz-corrupt-bug-workaround.mir

20 lines

vgpr-spill.mir

28 lines

virtregrewrite-undef-identity-copy.mir

8 lines

vmem-to-salu-hazard.mir

38 lines

vmem-vcc-hazard.mir

20 lines

waitcnt-agpr.mir

316 lines

waitcnt-back-edge-loop.mir

8 lines

waitcnt-loop-irreducible.mir

6 lines

waitcnt-loop-single-basic-block.mir

10 lines

waitcnt-meta-instructions.mir

16 lines

waitcnt-no-redundant.mir

2 lines

waitcnt-overflow.mir

395 lines

waitcnt-preexisting.mir

2 lines

32 lines

2 lines

68 lines

16 lines

MIR/

AMDGPU/

custom-pseudo-source-values.ll

2 lines

load-store-opt-dlc.mir

28 lines

	load-store-opt-scc.mir
	load-store-opt-dlc.mir

32 lines

mir-canon-multi.mir

4 lines

parse-order-reserved-regs.mir

4 lines

syncscopes.mir

12 lines

target-index-operands.mir

4 lines

MC/

AMDGPU/

atomic-fadd-insts.s

4 lines

dpp64.s

58 lines

gfx90a_asm_features.s

1030 lines

196 lines

11194 lines

85 lines

179 lines

2518 lines

76 lines

misaligned-vgpr-tuples-err.s

25 lines

vop_dpp.s

3 lines

Disassembler/

AMDGPU/

dpp64.txt

43 lines

gfx90a_dasm_features.txt

795 lines

gfx90a_ldst_acc.txt

8395 lines

mai-gfx90a.txt

2512 lines

mimg_gfx90a.txt

76 lines

Object/

AMDGPU/

elf-header-flags-mach.yaml

7 lines

Transforms/

SLPVectorizer/

AMDGPU/

slp-v2f32.ll

66 lines

tools/

llvm-readobj/

ELF/

amdgpu-elf-headers.test

3 lines

tools/

llvm-readobj/

ELFDumper.cpp

1 line

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx90a supportClosedPublic

Details

Diff Detail

Event Timeline

Large Diff

Revision Contents

Diff 324456

clang/docs/ClangCommandLineReference.rst

clang/include/clang/Basic/BuiltinsAMDGPU.def

clang/include/clang/Basic/Cuda.h

clang/include/clang/Driver/Options.td

clang/lib/Basic/Cuda.cpp

clang/lib/Basic/Targets/AMDGPU.cpp

clang/lib/Basic/Targets/NVPTX.cpp

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

clang/test/CodeGenOpenCL/amdgpu-features.cl

clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl

clang/test/Driver/amdgpu-features.c

clang/test/Driver/amdgpu-macros.cl

clang/test/Driver/amdgpu-mcpu.cl

clang/test/Driver/cuda-bad-arch.cu

clang/test/Driver/hip-toolchain-features.hip

clang/test/Misc/target-invalid-cpu-note.c

clang/test/SemaOpenCL/builtins-amdgcn-error-gfx90a-param.cl

llvm/docs/AMDGPUUsage.rst

llvm/include/llvm/BinaryFormat/ELF.h

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/include/llvm/Support/AMDHSAKernelDescriptor.h

llvm/include/llvm/Support/TargetParser.h

llvm/lib/Object/ELFObjectFile.cpp

llvm/lib/ObjectYAML/ELFYAML.cpp

llvm/lib/Support/TargetParser.cpp

llvm/lib/Target/AMDGPU/AMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td

llvm/lib/Target/AMDGPU/AMDGPUGISel.td

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

llvm/lib/Target/AMDGPU/BUFInstructions.td

llvm/lib/Target/AMDGPU/DSInstructions.td

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

llvm/lib/Target/AMDGPU/FLATInstructions.td

llvm/lib/Target/AMDGPU/GCNDPPCombine.cpp

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp

llvm/lib/Target/AMDGPU/GCNProcessors.td

llvm/lib/Target/AMDGPU/GCNRegPressure.h

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp

llvm/lib/Target/AMDGPU/GCNSubtarget.h

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp

llvm/lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp

llvm/lib/Target/AMDGPU/MIMGInstructions.td

llvm/lib/Target/AMDGPU/SIAddIMGInit.cpp

llvm/lib/Target/AMDGPU/SIDefines.h

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp

llvm/lib/Target/AMDGPU/SIInstrFormats.td

llvm/lib/Target/AMDGPU/SIInstrInfo.h

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.td

[AMDGPU] gfx90a support
ClosedPublic