Download Raw Diff

Details

Reviewers

nhaehnle
tpr
dstuttard
Flakebi
arsenm
foad
Petar.Avramovic

Commits

rG508329922dad: Removed Diff file.
rGfaeed774d157: Fixed includes.

Summary

This patch should find a pattern in GlobalISel / AMDGPUPostLegalizerCombiner which can appear when trying to clamp 64-bit values to short values without truncation, for example by using the 64-bit / 16-bit integer extensions in shaders. This pattern can be reduced to two VALU instructions:

v_cvt_pk_i16_i32_e64 v0, v0, v1 ; where v[0:1] is a 64-bit number
v_med3_i32 v0, LowBoundary, v0, HighBoundary ; where LowBoundary >= SHORT_MII, HighBoundary <= SHORT_MAX

I am open for suggestions and reviews as this is my first change on the LLVM compiler.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tsymalla created this revision.Dec 22 2020, 7:39 AM

Herald added subscribers: kerbowa, hiraditya, t-tye and 5 others. · View Herald TranscriptDec 22 2020, 7:39 AM

tsymalla requested review of this revision.Dec 22 2020, 7:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2020, 7:39 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B83282: Diff 313345.Dec 22 2020, 8:44 AM

Looks good to me, I left some nits inline.
Someone who is more familiar with GlobalISel should review this.

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
1–2	Unintended change?
llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll
3 ↗	(On Diff #313345)	GFX9 is also a GCN architecture, so calling this GFX10 might be a clearer choice.

foad added a reviewer: Petar.Avramovic.Dec 23 2020, 2:42 AM

JonChesterfield added a subscriber: JonChesterfield.Dec 23 2020, 2:48 AM

Petar.Avramovic added inline comments.Dec 23 2020, 5:04 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll
11 ↗	(On Diff #313345)	This looks incorrect since some high bits are discarded (are you missing some known bits analysis?). Think of positive signed input value that has 1 somewhere in high 16 bits, e.g. 0x4000000000000000 'icmp select' sequence gives 32767 which is then truncated to i16 (no op in this case). But cvt_pk takes low 16 bits from 0x40000000 and 0x00000000 and packs them into 32 bit value 0. v_med3 with [[A]]=0 gives 0. Does sdag have a similar combine?
16–17 ↗	(On Diff #313345)	For awareness: generic llvm-ir max and min intrinsics @llvm.umax. and @llvm.umin. were added recently. It is expected to get llvm-ir combine that will transform cmp+select into intrinsic at some point and then there would be no need to match cmp+select in combines. You should test with min and max intrinsics for now. However since this is i64 I guess it is fine to use cmp + select since intrinsic min and max have to be lowered for s64.

arsenm added inline comments.Dec 23 2020, 7:15 AM

llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h
475 ↗	(On Diff #313345)	This is doing too much, this should just match the G_UMIN/G_UMAX/G_SMIN/G_SMAX opcodes. This shouldn't be responsible for matching the compare+select pattern
llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll
1 ↗	(On Diff #313345)	I would prefer if the test name had a more consistent name. How about combine-int-clamp.ll?
18–21 ↗	(On Diff #313345)	I dislike anonymous values in tests, can you add names here

nhaehnle added inline comments.Dec 23 2020, 11:27 PM

llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll
11 ↗	(On Diff #313345)	cvt_pk performs a clamp before packing into 16 bits. It's not very well documented.

tsymalla marked 2 inline comments as done.Jan 4 2021, 2:21 AM

tsymalla added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
1–2	I think this happened because of using clang-format.
llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll
11 ↗	(On Diff #313345)	The whole concept behind this, as Nicolai said, is, to have the two 32 bit-parts of the 64-bit number being clamped automatically against short boundaries by using v_cvt_pk_i16_i32. So the lower and higher 16 bits of the resulting 32 bit number should be clamped independently. If this fails for some case (very large numbers), the v_med3 instruction should make sure that the value actually gets clamped.
18–21 ↗	(On Diff #313345)	Hi, what do you mean by anonymous values in this context?

Mostly renamings.

Refactored the pattern matching algorithm. I removed the construct from MIPatternMatch and implemented it directly in the PostLegalizerCombiner.

pvellien added a subscriber: pvellien.Jan 4 2021, 6:37 AM

tsymalla added inline comments.Jan 4 2021, 6:56 AM

llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h
475 ↗	(On Diff #313345)	I removed this construct. There area helpers already which could be extended for matching the G_*MAX/MIN opcodes. I tried to implement this straight forward directly in the matcher.

arsenm added inline comments.Jan 4 2021, 10:39 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll
18–21 ↗	(On Diff #313345)	The numbered %0, %1. They should have fixed string names instead. opt -S -instnamer will do this for you or you can manually %name them

arsenm added inline comments.Jan 4 2021, 10:40 AM

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
2	This has too many -s then
285	This should still be trying to match against the min/max opcodes, not looking for compare and selects

pvellien removed a subscriber: pvellien.Jan 4 2021, 8:31 PM

Renamed the identifiers in the lit test.

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
2	I reverted this change.
285	So, if I understand this correctly: The select/compare pattern I'm trying to match here will be matched to G_MIN/G_MAX This seems to be not implemented yet or not working for my case (as far as I can see that) I'm not sure what should be achieved here. Should there either be a "future-proof" implementation for matching min and max which is not working yet, an additional case for matching min and max which falls back to my current implementation or a combination of a) extension of the pattern matching that combines select and icmp to max / min and b) an implementation which only matches max / min and combines it accordingly? Would you mind explaining this to me a little bit more?
llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll
18–21 ↗	(On Diff #313345)	Thanks. I manually named them.

tsymalla marked an inline comment as not done.Jan 5 2021, 2:26 AM

foad added inline comments.Jan 5 2021, 4:34 AM

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
15–16	You can use the git-clang-format script to avoid reformatting the whole file. If you put it on your path then you can invoke it as e.g. `git clang-format @^` to only reformat the parts of the file that were changed by your most recent commit.
311–319	Use `//` even for block comments please.

arsenm added inline comments.Jan 5 2021, 7:15 AM

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
285	The compare+select pattern you are matching here should be matched separately into the min/max opcodes as a better, more canonical form. This is not implemented in GlobalISel yet. Matching min/max should be implemented in GlobalISel, however I don't think this is a priority. Theoretically the IR optimizers should be matching these in the vast majority of cases with the new min/max intrinsics. A GlobalISel combine would theoretically be useful if the compare/select pattern were to be emitted by independent legalizations, but I'm currently not aware of any implemented case where this could happen

tsymalla added inline comments.Jan 5 2021, 9:07 AM

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
285	So, what would be the right approach to handle this in your opinion? As far as I understand it doesn't make sense to have an implementation for this specific case where a generic implementation would be more suitable. Would you say it makes more sense to wait for the max / min matching in GlobalISel and implement my combine around it?

arsenm added inline comments.Jan 5 2021, 10:10 AM

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
285	You don't need to wait. You can just do it with the min/max opcodes now. You can either write MIR tests or change the IR tests to use the intrinsics already

tsymalla marked an inline comment as done.Jan 11 2021, 4:33 AM

tsymalla added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
311–319	Thanks, Jay. I will change this.

This moves the implementation to the PreLegalizer and is based on LLVM max/min intrinsics.

Harbormaster completed remote builds in B84678: Diff 315783.Jan 11 2021, 6:14 AM

I would expect this in the post legalizer combiner, and going directly from i64 is a bit weird. What does the post-legalized MIR look like for this?

llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h
455 ↗	(On Diff #315783)	Remove leftover whitespace change
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
191–192	Unrelated formatting changes
248	Extra whitespace change
llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
62 ↗	(On Diff #315783)	Capitalize
74 ↗	(On Diff #315783)	Shouldn't construct single use MachineIRBuilder, there should be one in the combiner state
104 ↗	(On Diff #315783)	Capitalize
119 ↗	(On Diff #315783)	This is just noise, remove it
121 ↗	(On Diff #315783)	Don't construct fresh builder
129 ↗	(On Diff #315783)	You can just do Unmerge.getReg(0)
134 ↗	(On Diff #315783)	It would be better to use a generic operation pre-regbank select rather than forcing register classes here
148–149 ↗	(On Diff #315783)	Bad shadowing name
151–153 ↗	(On Diff #315783)	You can combine this as auto MinBoundaryDst = B.buildConstant(S32, min)

Previously this was implemented in the PostLegalizer. The SMAX / SMIN MIR gets lowered to a pattern of G_SELECT / G_ICMP, so the state after PostLegalizing is equivalent to what I've implemented in the tests before matching this to the min/max instructions. Disabling the optimization in the implementation here yields a result similar to the following:

%62:_(s1) = G_ICMP intpred(slt), %53:_(s32), %25:_
%63:_(s64) = G_CONSTANT i64 0
%28:_(s64) = G_SELECT %62:_(s1), %63:_, %61:_
%29:_(s64) = G_CONSTANT i64 32767
%37:_(s1) = G_ICMP intpred(slt), %28:_(s64), %29:_
%30:_(s64) = G_SELECT %37:_(s1), %28:_, %29:_
%31:_(s64) = G_CONSTANT i64 -32768
%36:_(s1) = G_ICMP intpred(sgt), %30:_(s64), %31:_
%32:_(s64) = G_SELECT %36:_(s1), %30:_, %31:_
%34:_(s32) = G_TRUNC %32:_(s64)

So this is why I moved the implementation to the PreLegalize step where the G_SMIN / G_SMAX IR still exists:

%30:_(s64) = G_SMIN %28:_, %29:_
%32:_(s64) = G_SMAX %30:_, %31:_
%33:_(s16) = G_TRUNC %32:_(s64)

I don't exactly understand why the min / max instructions get lowered to this pattern so for the moment, the only way to get this producing the code I wish seems to match the pattern in the PreLegalize step. By the way, the goal here is to check explicitly if an long should get clamped to short boundaries (or inbetween). Instead of producing some comparisons, this should yield these two instructions as I've figured out with Nicolai.

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
134 ↗	(On Diff #315783)	How would I go for this?

Various changes according to the code review. I don't know why the formatting changes in the PostLegalizer keep showing up as they are already reverted to HEAD...

Do we do the equivalent combine in the DAG path? I don't recognize this one

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
71 ↗	(On Diff #316057)	Probably shouldn't print here especially when failure is likely. Ideally we would get a message from the generated combiner based on the defined combine name
132 ↗	(On Diff #316057)	I'd rather just repeat the VGPR_32RegClass (also shuoldn't use macro-like naming convention)
134 ↗	(On Diff #315783)	You can follow the example of the other G_AMDGPU_* pseudos (e.g. G_AMDGPU_CVT_UBYTE0). I'm a bit ambivalent on whether we should do this in this case, as there are still some unanswered questions about how regbankselect should work. However, so far we've been using pseudos like this
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-short-clamp.ll
113 ↗	(On Diff #316057)	Fix this whitespace error

Petar.Avramovic added inline comments.Jan 13 2021, 6:11 AM

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
137–141 ↗	(On Diff #316057)	You can replace this block with `B.buildInstr(CvtOpcode, {CvtDst}, {Hi32, Lo32}, MI.getFlags());`
201 ↗	(On Diff #316057)	Why do you need LegalizerInfo? CombinerHelper::isLegalOrBeforeLegalizer relies on LI being nullptr before legalizer.

No, I doubt there is support for this in the other path.

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
71 ↗	(On Diff #316057)	I removed this message. There aren't any other debug outputs in the other Legalize combines, too.
137–141 ↗	(On Diff #316057)	Thanks. I removed this for the other `buildInstr`, too.
201 ↗	(On Diff #316057)	Thanks. This is not needed here, so I removed it.
134 ↗	(On Diff #315783)	Thanks. With some help I was able to get this working and learned something new. This should now use the new pseudo and select it into the corresponding machine instruction.

Implemented a GMIR instruction for the V_CVT opcode and did some changes to the implementation according to the code reviews.

arsenm added inline comments.Jan 13 2021, 9:34 AM

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
149 ↗	(On Diff #316380)	A pseudo here would also apply for the same reasons

tsymalla added inline comments.Jan 18 2021, 4:03 AM

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
149 ↗	(On Diff #316380)	I will implement this as soon as I'll get access back to my machine. Is there a reason why in some cases pseudos exist, in some cases not?

This adds an additional G_MED3_S32 GIR opcode and uses it in the PreLegalizerCombiner.

Added the new MED3 pseudo.

arsenm added inline comments.Jan 20 2021, 1:39 PM

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
146 ↗	(On Diff #317348)	Don't need the register classes. You can also hide the createRegister calls with the builder as well
149 ↗	(On Diff #316380)	If we need to change how the operation is selected based on the register banks, it definitely should use a regbankselectable pseudo with generic register operands. If there isn't a meaningful handling change depending on the register banks, there's less reason to use a pseudo.

tsymalla marked 2 inline comments as done.Jan 25 2021, 6:20 AM

tsymalla added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
146 ↗	(On Diff #317348)	Thanks. I reworked this aspect of the implementation. Should be a bit more clean now.
149 ↗	(On Diff #316380)	Thanks. :)

Removes the generic createVirtualRegister invocations and the manual setting of the VGPR32 register class.

arsenm added inline comments.Jan 25 2021, 10:04 AM

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
142–143 ↗	(On Diff #318987)	You can truncate directly into the destination register, you don't need this copy
llvm/lib/Target/AMDGPU/SIInstructions.td
2596 ↗	(On Diff #318987)	No _S32, this is also available for s16 on some targets

Renamed the G_MED3 opcode and removed a superfluous copy.

arsenm accepted this revision.Jan 26 2021, 3:21 PM

This revision is now accepted and ready to land.Jan 26 2021, 3:21 PM

Hi, it seems, I am not permitted to commit this patch. Could somebody please do the commit for me? Thanks!

In D93708#2524681, @tsymalla wrote:

Hi, it seems, I am not permitted to commit this patch. Could somebody please do the commit for me? Thanks!

You have to request commit access:
https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

Requested access. Thanks.

I've tried applying this but getting some build errors. You also have some remaining trailing whitespace and extra whitespace changes

In D93708#2525719, @arsenm wrote:

I've tried applying this but getting some build errors. You also have some remaining trailing whitespace and extra whitespace changes

I will revert those whitespace changes, thanks.
Where did you apply this? Locally? For me, this worked with a recent LLVM version.

I investigated the problem and it seems, there were some includes missing in PostLegalizerCombiner as there were changes in terms of how the instructions are made available to the TU.
I will fix this and upload a new patch.

Reverted whitespace changes, added missing includes.

arsenm accepted this revision.Feb 1 2021, 4:11 PM

Closed by commit rGfaeed774d157: Fixed includes. (authored by tsymalla). · Explain WhyFeb 2 2021, 12:17 AM

This revision was automatically updated to reflect the committed changes.

tsymalla added a commit: rGfaeed774d157: Fixed includes..

tsymalla added a commit: rG508329922dad: Removed Diff file..

Hi,

This change, or one of the related change has broken the msan bot.
http://lab.llvm.org:8011/#/builders/5/builds/4134

==91167==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x35d38b6 in AMDGPUPreLegalizerCombinerHelper::matchClampI64ToI16(llvm::MachineInstr&, llvm::MachineRegisterInfo&, llvm::MachineFunction&, AMDGPUPreLegalizerCombinerHelper::ClampI64ToI16MatchInfo&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:92:17
    #1 0x35e7a91 in operator() /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/lib/Target/AMDGPU/AMDGPUGenPreLegalizeGICombiner.inc:1227:38
    #2 0x35e7a91 in (anonymous namespace)::AMDGPUGenPreLegalizerCombinerHelper::tryCombineAll(llvm::GISelChangeObserver&, llvm::MachineInstr&, llvm::MachineIRBuilder&, llvm::CombinerHelper&) const /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/lib/Target/AMDGPU/AMDGPUGenPreLegalizeGICombiner.inc:1226:14
    #3 0x35d6aed in (anonymous namespace)::AMDGPUPreLegalizerCombinerInfo::combine(llvm::GISelChangeObserver&, llvm::MachineInstr&, llvm::MachineIRBuilder&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:197:17
    #4 0xae8a3c4 in llvm::Combiner::combineMachineInstrs(llvm::MachineFunction&, llvm::GISelCSEInfo*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/GlobalISel/Combiner.cpp:150:24
    #5 0x35d629f in (anonymous namespace)::AMDGPUPreLegalizerCombiner::runOnMachineFunction(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:268:12
    #6 0x71117ca in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:72:13
    #7 0x7f07777 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1435:27
    #8 0x63360b8 in RunPassOnSCC /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:178:25
    #9 0x63360b8 in RunAllPassesOnSCC /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:476:9
    #10 0x63360b8 in (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:541:18
    #11 0x7f08ee4 in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1550:27
    #12 0x7f08ee4 in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541:44
    #13 0x27ee02a in compileModule(char**, llvm::LLVMContext&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:684:8
    #14 0x27e7c41 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:385:22
    #15 0x7f1bd530d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #16 0x2767739 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc+0x2767739)

Also, there is something very wrong with the diff here. It just shows a bunch of merge conflict markers. Please make sure that your commits include the relevant headers like "Differential Revision", etc - it's very hard to track things otherwise.

In D93708#2537793, @eugenis wrote:

Hi,

This change, or one of the related change has broken the msan bot.
http://lab.llvm.org:8011/#/builders/5/builds/4134

==91167==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x35d38b6 in AMDGPUPreLegalizerCombinerHelper::matchClampI64ToI16(llvm::MachineInstr&, llvm::MachineRegisterInfo&, llvm::MachineFunction&, AMDGPUPreLegalizerCombinerHelper::ClampI64ToI16MatchInfo&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:92:17
    #1 0x35e7a91 in operator() /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/lib/Target/AMDGPU/AMDGPUGenPreLegalizeGICombiner.inc:1227:38
    #2 0x35e7a91 in (anonymous namespace)::AMDGPUGenPreLegalizerCombinerHelper::tryCombineAll(llvm::GISelChangeObserver&, llvm::MachineInstr&, llvm::MachineIRBuilder&, llvm::CombinerHelper&) const /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/lib/Target/AMDGPU/AMDGPUGenPreLegalizeGICombiner.inc:1226:14
    #3 0x35d6aed in (anonymous namespace)::AMDGPUPreLegalizerCombinerInfo::combine(llvm::GISelChangeObserver&, llvm::MachineInstr&, llvm::MachineIRBuilder&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:197:17
    #4 0xae8a3c4 in llvm::Combiner::combineMachineInstrs(llvm::MachineFunction&, llvm::GISelCSEInfo*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/GlobalISel/Combiner.cpp:150:24
    #5 0x35d629f in (anonymous namespace)::AMDGPUPreLegalizerCombiner::runOnMachineFunction(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:268:12
    #6 0x71117ca in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:72:13
    #7 0x7f07777 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1435:27
    #8 0x63360b8 in RunPassOnSCC /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:178:25
    #9 0x63360b8 in RunAllPassesOnSCC /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:476:9
    #10 0x63360b8 in (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:541:18
    #11 0x7f08ee4 in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1550:27
    #12 0x7f08ee4 in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541:44
    #13 0x27ee02a in compileModule(char**, llvm::LLVMContext&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:684:8
    #14 0x27e7c41 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:385:22
    #15 0x7f1bd530d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #16 0x2767739 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc+0x2767739)

I think that's what D95878 is for

sebastian-ne mentioned this in rGd49efdc9696a: Revert "[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path.".Feb 3 2021, 2:07 AM

It may take some time until @tsymalla can look at the fix in D95878 again, so I reverted this for the time being.

tsymalla mentioned this in D95878: [AMDGPU]: Fixes an invalid clamp selection pattern..Feb 3 2021, 8:16 AM

Diff 320699

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp

//=== lib/CodeGen/GlobalISel/AMDGPUPostLegalizerCombiner.cpp ---------------===//		//=== lib/CodeGen/GlobalISel/AMDGPUPostLegalizerCombiner.cpp ---------------===//
//		//
		FlakebiUnsubmitted Done Reply Inline Actions Unintended change? Flakebi: Unintended change?
		tsymallaAuthorUnsubmitted Done Reply Inline Actions I think this happened because of using clang-format. tsymalla: I think this happened because of using clang-format.
		arsenmUnsubmitted Done Reply Inline Actions This has too many -s then arsenm: This has too many -s then
		tsymallaAuthorUnsubmitted Done Reply Inline Actions I reverted this change. tsymalla: I reverted this change.
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This pass does combining of machine instructions at the generic MI level,		// This pass does combining of machine instructions at the generic MI level,
// after the legalizer.		// after the legalizer.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

<<<<<<< HEAD
<<<<<<< HEAD
#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDGPULegalizerInfo.h"		#include "AMDGPULegalizerInfo.h"
#include "GCNSubtarget.h"		#include "GCNSubtarget.h"
		foadUnsubmitted Done Reply Inline Actions You can use the git-clang-format script to avoid reformatting the whole file. If you put it on your path then you can invoke it as e.g. `git clang-format @^` to only reformat the parts of the file that were changed by your most recent commit. foad: You can use the [[ https://reviews.llvm.org/source/llvm-github/browse/main/clang/tools/clang…
=======
#include "AMDGPULegalizerInfo.h"
#include "AMDGPUTargetMachine.h"
>>>>>>> clang-format
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
=======
#include "AMDGPUTargetMachine.h"
#include "AMDGPULegalizerInfo.h"


>>>>>>> Updating formatting changes.
#include "llvm/CodeGen/GlobalISel/Combiner.h"		#include "llvm/CodeGen/GlobalISel/Combiner.h"
#include "llvm/CodeGen/GlobalISel/CombinerHelper.h"		#include "llvm/CodeGen/GlobalISel/CombinerHelper.h"
#include "llvm/CodeGen/GlobalISel/CombinerInfo.h"		#include "llvm/CodeGen/GlobalISel/CombinerInfo.h"
#include "llvm/CodeGen/GlobalISel/GISelKnownBits.h"		#include "llvm/CodeGen/GlobalISel/GISelKnownBits.h"
#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"		#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
<<<<<<< HEAD
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
=======
#include "llvm/Support/Debug.h"
<<<<<<< HEAD
>>>>>>> clang-format
=======
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
>>>>>>> Updating formatting changes.

#define DEBUG_TYPE "amdgpu-postlegalizer-combiner"		#define DEBUG_TYPE "amdgpu-postlegalizer-combiner"

using namespace llvm;		using namespace llvm;
using namespace MIPatternMatch;		using namespace MIPatternMatch;

class AMDGPUPostLegalizerCombinerHelper {		class AMDGPUPostLegalizerCombinerHelper {
protected:		protected:
▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	void AMDGPUPostLegalizerCombinerHelper::applyUCharToFloat(MachineInstr &MI) {
Register DstReg = MI.getOperand(0).getReg();		Register DstReg = MI.getOperand(0).getReg();
Register SrcReg = MI.getOperand(1).getReg();		Register SrcReg = MI.getOperand(1).getReg();
LLT Ty = MRI.getType(DstReg);		LLT Ty = MRI.getType(DstReg);
LLT SrcTy = MRI.getType(SrcReg);		LLT SrcTy = MRI.getType(SrcReg);
if (SrcTy != S32)		if (SrcTy != S32)
SrcReg = B.buildAnyExtOrTrunc(S32, SrcReg).getReg(0);		SrcReg = B.buildAnyExtOrTrunc(S32, SrcReg).getReg(0);

if (Ty == S32) {		if (Ty == S32) {
B.buildInstr(AMDGPU::G_AMDGPU_CVT_F32_UBYTE0, {DstReg},		B.buildInstr(AMDGPU::G_AMDGPU_CVT_F32_UBYTE0, {DstReg},
{SrcReg}, MI.getFlags());		{SrcReg}, MI.getFlags());
		arsenmUnsubmitted Not Done Reply Inline Actions Unrelated formatting changes arsenm: Unrelated formatting changes
} else {		} else {
auto Cvt0 = B.buildInstr(AMDGPU::G_AMDGPU_CVT_F32_UBYTE0, {S32},		auto Cvt0 = B.buildInstr(AMDGPU::G_AMDGPU_CVT_F32_UBYTE0, {S32},
{SrcReg}, MI.getFlags());		{SrcReg}, MI.getFlags());
B.buildFPTrunc(DstReg, Cvt0, MI.getFlags());		B.buildFPTrunc(DstReg, Cvt0, MI.getFlags());
}		}

MI.eraseFromParent();		MI.eraseFromParent();
}		}
Show All 39 Lines	if (SrcTy != S32) {
CvtSrc = B.buildAnyExt(S32, CvtSrc).getReg(0);		CvtSrc = B.buildAnyExt(S32, CvtSrc).getReg(0);
}		}

assert(MI.getOpcode() != NewOpc);		assert(MI.getOpcode() != NewOpc);
B.buildInstr(NewOpc, {MI.getOperand(0)}, {CvtSrc}, MI.getFlags());		B.buildInstr(NewOpc, {MI.getOperand(0)}, {CvtSrc}, MI.getFlags());
MI.eraseFromParent();		MI.eraseFromParent();
}		}

class AMDGPUPostLegalizerCombinerHelperState {		class AMDGPUPostLegalizerCombinerHelperState {
		arsenmUnsubmitted Not Done Reply Inline Actions Extra whitespace change arsenm: Extra whitespace change
protected:		protected:
CombinerHelper &Helper;		CombinerHelper &Helper;
AMDGPUPostLegalizerCombinerHelper &PostLegalizerHelper;		AMDGPUPostLegalizerCombinerHelper &PostLegalizerHelper;

public:		public:
AMDGPUPostLegalizerCombinerHelperState(		AMDGPUPostLegalizerCombinerHelperState(
CombinerHelper &Helper,		CombinerHelper &Helper,
AMDGPUPostLegalizerCombinerHelper &PostLegalizerHelper)		AMDGPUPostLegalizerCombinerHelper &PostLegalizerHelper)
Show All 20 Lines	AMDGPUPostLegalizerCombinerInfo(bool EnableOpt, bool OptSize, bool MinSize,
const AMDGPULegalizerInfo *LI,		const AMDGPULegalizerInfo *LI,
GISelKnownBits KB, MachineDominatorTree MDT)		GISelKnownBits KB, MachineDominatorTree MDT)
: CombinerInfo(/AllowIllegalOps/ false, /ShouldLegalizeIllegal/ true,		: CombinerInfo(/AllowIllegalOps/ false, /ShouldLegalizeIllegal/ true,
/LegalizerInfo/ LI, EnableOpt, OptSize, MinSize),		/LegalizerInfo/ LI, EnableOpt, OptSize, MinSize),
KB(KB), MDT(MDT) {		KB(KB), MDT(MDT) {
if (!GeneratedRuleCfg.parseCommandLineOption())		if (!GeneratedRuleCfg.parseCommandLineOption())
report_fatal_error("Invalid rule identifier");		report_fatal_error("Invalid rule identifier");
}		}

		arsenmUnsubmitted Not Done Reply Inline Actions This should still be trying to match against the min/max opcodes, not looking for compare and selects arsenm: This should still be trying to match against the min/max opcodes, not looking for compare and…
		tsymallaAuthorUnsubmitted Not Done Reply Inline Actions So, if I understand this correctly: The select/compare pattern I'm trying to match here will be matched to G_MIN/G_MAX This seems to be not implemented yet or not working for my case (as far as I can see that) I'm not sure what should be achieved here. Should there either be a "future-proof" implementation for matching min and max which is not working yet, an additional case for matching min and max which falls back to my current implementation or a combination of a) extension of the pattern matching that combines select and icmp to max / min and b) an implementation which only matches max / min and combines it accordingly? Would you mind explaining this to me a little bit more? tsymalla: So, if I understand this correctly: - The select/compare pattern I'm trying to match here will…
		arsenmUnsubmitted Not Done Reply Inline Actions The compare+select pattern you are matching here should be matched separately into the min/max opcodes as a better, more canonical form. This is not implemented in GlobalISel yet. Matching min/max should be implemented in GlobalISel, however I don't think this is a priority. Theoretically the IR optimizers should be matching these in the vast majority of cases with the new min/max intrinsics. A GlobalISel combine would theoretically be useful if the compare/select pattern were to be emitted by independent legalizations, but I'm currently not aware of any implemented case where this could happen arsenm: The compare+select pattern you are matching here should be matched separately into the min/max…
		tsymallaAuthorUnsubmitted Done Reply Inline Actions So, what would be the right approach to handle this in your opinion? As far as I understand it doesn't make sense to have an implementation for this specific case where a generic implementation would be more suitable. Would you say it makes more sense to wait for the max / min matching in GlobalISel and implement my combine around it? tsymalla: So, what would be the right approach to handle this in your opinion? As far as I understand it…
		arsenmUnsubmitted Not Done Reply Inline Actions You don't need to wait. You can just do it with the min/max opcodes now. You can either write MIR tests or change the IR tests to use the intrinsics already arsenm: You don't need to wait. You can just do it with the min/max opcodes now. You can either write…
bool combine(GISelChangeObserver &Observer, MachineInstr &MI,		bool combine(GISelChangeObserver &Observer, MachineInstr &MI,
MachineIRBuilder &B) const override;		MachineIRBuilder &B) const override;
};		};

bool AMDGPUPostLegalizerCombinerInfo::combine(GISelChangeObserver &Observer,		bool AMDGPUPostLegalizerCombinerInfo::combine(GISelChangeObserver &Observer,
MachineInstr &MI,		MachineInstr &MI,
MachineIRBuilder &B) const {		MachineIRBuilder &B) const {
CombinerHelper Helper(Observer, B, KB, MDT, LInfo);		CombinerHelper Helper(Observer, B, KB, MDT, LInfo);
Show All 9 Lines	bool AMDGPUPostLegalizerCombinerInfo::combine(GISelChangeObserver &Observer,
case TargetOpcode::G_LSHR:		case TargetOpcode::G_LSHR:
case TargetOpcode::G_ASHR:		case TargetOpcode::G_ASHR:
// On some subtargets, 64-bit shift is a quarter rate instruction. In the		// On some subtargets, 64-bit shift is a quarter rate instruction. In the
// common case, splitting this into a move and a 32-bit shift is faster and		// common case, splitting this into a move and a 32-bit shift is faster and
// the same code size.		// the same code size.
return Helper.tryCombineShiftToUnmerge(MI, 32);		return Helper.tryCombineShiftToUnmerge(MI, 32);
}		}

return false;		return false;
}		}

#define AMDGPUPOSTLEGALIZERCOMBINERHELPER_GENCOMBINERHELPER_CPP		#define AMDGPUPOSTLEGALIZERCOMBINERHELPER_GENCOMBINERHELPER_CPP
#include "AMDGPUGenPostLegalizeGICombiner.inc"		#include "AMDGPUGenPostLegalizeGICombiner.inc"
#undef AMDGPUPOSTLEGALIZERCOMBINERHELPER_GENCOMBINERHELPER_CPP		#undef AMDGPUPOSTLEGALIZERCOMBINERHELPER_GENCOMBINERHELPER_CPP

// Pass boilerplate		// Pass boilerplate
// ================		// ================
		foadUnsubmitted Done Reply Inline Actions Use `//` even for block comments please. foad: Use `//` even for block comments please.
		tsymallaAuthorUnsubmitted Done Reply Inline Actions Thanks, Jay. I will change this. tsymalla: Thanks, Jay. I will change this.

class AMDGPUPostLegalizerCombiner : public MachineFunctionPass {		class AMDGPUPostLegalizerCombiner : public MachineFunctionPass {
public:		public:
static char ID;		static char ID;

AMDGPUPostLegalizerCombiner(bool IsOptNone = false);		AMDGPUPostLegalizerCombiner(bool IsOptNone = false);

StringRef getPassName() const override {		StringRef getPassName() const override {
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 320699

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 320699

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp

[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path.
ClosedPublic