Page MenuHomePhabricator
Feed Advanced Search

Yesterday

Pierre-vh planned changes to D138651: [CUDA][HIP] Don't diagnose use for __bf16.

I'll take a look at handling bf16 storage-only for AMDGPU. Looks like our Backend already handles it and converts it to i16 so maybe it'll be really easy.

Fri, Dec 2, 7:51 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D139000: [AMDGPU] Clear bodies of function with incompatible features.

Rebase

Fri, Dec 2, 4:40 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D139000: [AMDGPU] Clear bodies of function with incompatible features.

Comments

Fri, Dec 2, 4:32 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D139000: [AMDGPU] Clear bodies of function with incompatible features.

Reworked the feature compatibility checking logic to use TableGen data.
I think this is a lot more robust. I only check the features we're interested in though - I tried checking all of them and there's too many edge cases to handle them all so it's better to do this on an "opt-in" basis IMO.

Fri, Dec 2, 4:29 AM · Restricted Project, Restricted Project
Pierre-vh added inline comments to D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.
Fri, Dec 2, 12:57 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.
  • Add new tests (rebased)
  • Fix the combine to check for the EXACT amount of leading zeroes to ensure the transform is correct. Otherwise, if there's too little/too many leading zeroes it could mean that the shift was checking something other than the OV bit, I think.
Fri, Dec 2, 12:33 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D139011: [InstCombine] Precommit D138814 tests.

Add more testcases

Fri, Dec 2, 12:31 AM · Restricted Project, Restricted Project

Thu, Dec 1

Pierre-vh added a comment to D139000: [AMDGPU] Clear bodies of function with incompatible features.

Overall this looks pretty good. As you say, the feature checking logic is quite limited, but that's not a problem.
I think after this patch lands https://reviews.llvm.org/D123693 can be reverted. Can you try to revert that with this patch and check if device libs can be built correctly at -O0?

Thu, Dec 1, 12:55 AM · Restricted Project, Restricted Project
Pierre-vh added inline comments to D138651: [CUDA][HIP] Don't diagnose use for __bf16.
Thu, Dec 1, 12:16 AM · Restricted Project, Restricted Project
Pierre-vh added inline comments to D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.
Thu, Dec 1, 12:02 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.

Comments

Thu, Dec 1, 12:02 AM · Restricted Project, Restricted Project

Wed, Nov 30

Pierre-vh added a comment to D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.

I don't think there is any requirement for the wider type to be exactly double the narrower type.

That's correct:
https://alive2.llvm.org/ce/z/iLVIgn

So this patch/tests are too narrow as-is. It should be checking something like "if we only demand the top N bits of an add, and the add operands are known zero in those top N bits, then fold the add into an overflow check."

Also, canonicalizing to the add intrinsic if we're not using the add part of the result seems like the wrong direction. I can't tell from the larger test what we're expecting to happen. Please pre-commit the baseline tests, so we can see the diffs.

Will update the combine & add a base test diff.

Do you mean we shouldn't do the combine if the Add has only one use?

I think it's the inverse - if the add has only one use, then fold to "not+icmp+zext":
https://github.com/llvm/llvm-project/issues/59232

If the add has >1 use, then I'm not sure what we want to happen. In the general form, we have something like this:
https://alive2.llvm.org/ce/z/sW5BME
...so what other pieces of the pattern need to be there to justify creating the add intrinsic? We're in target-independent InstCombine here, so we don't usually want to end up with more instructions than we started with.

Wed, Nov 30, 6:31 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.

Since I relaxed the rules on the combine, another test changed.
Not sure if the new conditions are correct, what do you think?

Wed, Nov 30, 6:24 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.

Rebase on D139011

Wed, Nov 30, 6:21 AM · Restricted Project, Restricted Project
Pierre-vh requested review of D139011: [InstCombine] Precommit D138814 tests.
Wed, Nov 30, 6:19 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.

I don't think there is any requirement for the wider type to be exactly double the narrower type.

That's correct:
https://alive2.llvm.org/ce/z/iLVIgn

So this patch/tests are too narrow as-is. It should be checking something like "if we only demand the top N bits of an add, and the add operands are known zero in those top N bits, then fold the add into an overflow check."

Also, canonicalizing to the add intrinsic if we're not using the add part of the result seems like the wrong direction. I can't tell from the larger test what we're expecting to happen. Please pre-commit the baseline tests, so we can see the diffs.

Wed, Nov 30, 6:10 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.

Comment

Wed, Nov 30, 5:26 AM · Restricted Project, Restricted Project
Pierre-vh added inline comments to D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.
Wed, Nov 30, 5:14 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.

Comments

Wed, Nov 30, 5:14 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D139000: [AMDGPU] Clear bodies of function with incompatible features.

Comments

Wed, Nov 30, 5:08 AM · Restricted Project, Restricted Project
Pierre-vh added inline comments to D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.
Wed, Nov 30, 4:44 AM · Restricted Project, Restricted Project
Pierre-vh committed rGa88deb4b65f8: [AMDGPU] Use aperture registers instead of S_GETREG (authored by Pierre-vh).
[AMDGPU] Use aperture registers instead of S_GETREG
Wed, Nov 30, 4:25 AM · Restricted Project, Restricted Project
Pierre-vh closed D137542: [AMDGPU] Use aperture registers instead of S_GETREG.
Wed, Nov 30, 4:25 AM · Restricted Project, Restricted Project
Pierre-vh added inline comments to D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.
Wed, Nov 30, 4:24 AM · Restricted Project, Restricted Project
Pierre-vh added inline comments to D139000: [AMDGPU] Clear bodies of function with incompatible features.
Wed, Nov 30, 4:20 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D139000: [AMDGPU] Clear bodies of function with incompatible features.

Comments

Wed, Nov 30, 4:20 AM · Restricted Project, Restricted Project
Pierre-vh requested review of D139000: [AMDGPU] Clear bodies of function with incompatible features.
Wed, Nov 30, 3:55 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.

Support i2/i1 case with SExt

Wed, Nov 30, 3:02 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D138560: [lld][LTO] Add assembly output to LTO save-temps.

I am unsure I am convinced with the motivation. Say you have two bitcode files a.o and b.o, can you just use clang++ -fuse-ld=lld a.o b.o -Wl,--lto-emit-asm? This can be used with -Wl,--save-temps as well. The output is named in term of the -o output file name, instead of *.s, but the slight difference doesn't warrant adding significant more complexity to LTOBackend.cpp as this patch does.

We use lld to do device LTO in the HIP toolchain. clang driver extracts device bitcode and passes them to lld. When -save-temps is used, clang driver adds -save-temps to the options passed to lld. It is not convenient for the users to rerun lld command to get the assembly. It is better for lld to generate the assembly when -save-temps is used because: 1. it avoids redundant work, especially device LTO is usually the most time-consuming part 2. to keep the temporary file names consistent 3. it is a useful feature for lld itself

clang --save-temps does not pass --save-temps to the linker. The user can use -Wl,--save-temps which is more orthogonal. I can use -Wl,--lto-emit-asm today to get assembly output.

I am unsure I am convinced with the motivation. Say you have two bitcode files a.o and b.o, can you just use clang++ -fuse-ld=lld a.o b.o -Wl,--lto-emit-asm? This can be used with -Wl,--save-temps as well. The output is named in term of the -o output file name, instead of *.s, but the slight difference doesn't warrant adding significant more complexity to LTOBackend.cpp as this patch does.

With your suggestion, does it also generate the final, linked output file? Or just the .S?
The idea is to be able to get both alongside each other. We'd also like to have comments in the assembly file about things like register usage - the ones you get when you codegen to assembly, not when you disassemble.

There is an empty output file to satisfy some build tools' requirement that an output is already emitted. You can find assembly files in ${output}1. As mentioned, they are just not named *.S.

Though, I am surprised that you see this as adding significant complexity, I tried to not make it too intrusive. Is it the whole change that's problematic, or just some part of it?
If yes, maybe I can re-do the parts you find problematic to simplify them. The way I see it, this kind of patch should just be a small feature addition that shouldn't increase complexity too much. If you think complexity has increased significantly then I definitely failed somewhere and I can try to correct it before discarding the idea entirely.

This is significant complexity because you do codegen twice // Doing codegen twice may seem inefficient, but this is designed as a debug , which you probably would agree as well that this is inelegant.

Well, the real question is when --lto-emit-asm exists, why bother with another mode and duplicating the existing functionality. So far I fail to see a convincing argument that this new mode is useful.

It does pass it for HIP code when using -fgpu-rdc, see HIPAMD.cpp:151. This would be the primary use case because when building the same HIP code without -fgpu-rdc, we get a .S temporary, but if we pass fgpu-rdc, that temporary is no longer there.

It seems that the logic in HIPAMD.cpp:151 can simply be removed.

There is an empty output file to satisfy some build tools' requirement that an output is already emitted. You can find assembly files in ${output}1. As mentioned, they are just not named *.S.

This is not about build systems at all, it's for debuggability. An empty file is not useful to us, we need the assembly with the comments.

I think you misunderstand my comment. The assembly output of --lto-emit-asm is of course not empty. The object file output is empty which is fine for your case.
If you want the object file output as well, re-run LTO without --lto-emit-asm. It's similar to what your patch intends to do, but places less burden to libLTO.

Wed, Nov 30, 12:46 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.

Use known bits instead of zext

Wed, Nov 30, 12:42 AM · Restricted Project, Restricted Project
Pierre-vh updated the summary of D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.
Wed, Nov 30, 12:42 AM · Restricted Project, Restricted Project

Tue, Nov 29

Pierre-vh updated the diff for D138651: [CUDA][HIP] Don't diagnose use for __bf16.
  • Recentering the patch around HIP only.
    • I was using too much from D57369 and was involving OpenMP when there's no reason to. Just checking if HIP is being used should be enough.
Tue, Nov 29, 11:54 PM · Restricted Project, Restricted Project

Mon, Nov 28

Pierre-vh updated the summary of D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.
Mon, Nov 28, 7:20 AM · Restricted Project, Restricted Project
Pierre-vh updated the summary of D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.
Mon, Nov 28, 7:20 AM · Restricted Project, Restricted Project
Pierre-vh abandoned D137705: [AMDGPU] Add DAG Combine for right-shift carry add to uaddo.

D138814

Mon, Nov 28, 6:58 AM · Restricted Project, Restricted Project
Pierre-vh abandoned D138106: [AMDGPU][GISel] Add lshr/add -> uaddo combine.

D138814

Mon, Nov 28, 6:57 AM · Restricted Project, Restricted Project
Pierre-vh abandoned D138104: [AMDGPU] Precommit add_shr_carry test.

D138814

Mon, Nov 28, 6:57 AM · Restricted Project, Restricted Project
Pierre-vh requested review of D138814: [InstCombine] Combine a/lshr of add -> uadd.with.overflow.
Mon, Nov 28, 6:57 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D137542: [AMDGPU] Use aperture registers instead of S_GETREG.

Addressing comments following discussion

Mon, Nov 28, 5:59 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D138560: [lld][LTO] Add assembly output to LTO save-temps.

I am unsure I am convinced with the motivation. Say you have two bitcode files a.o and b.o, can you just use clang++ -fuse-ld=lld a.o b.o -Wl,--lto-emit-asm? This can be used with -Wl,--save-temps as well. The output is named in term of the -o output file name, instead of *.s, but the slight difference doesn't warrant adding significant more complexity to LTOBackend.cpp as this patch does.

We use lld to do device LTO in the HIP toolchain. clang driver extracts device bitcode and passes them to lld. When -save-temps is used, clang driver adds -save-temps to the options passed to lld. It is not convenient for the users to rerun lld command to get the assembly. It is better for lld to generate the assembly when -save-temps is used because: 1. it avoids redundant work, especially device LTO is usually the most time-consuming part 2. to keep the temporary file names consistent 3. it is a useful feature for lld itself

clang --save-temps does not pass --save-temps to the linker. The user can use -Wl,--save-temps which is more orthogonal. I can use -Wl,--lto-emit-asm today to get assembly output.

I am unsure I am convinced with the motivation. Say you have two bitcode files a.o and b.o, can you just use clang++ -fuse-ld=lld a.o b.o -Wl,--lto-emit-asm? This can be used with -Wl,--save-temps as well. The output is named in term of the -o output file name, instead of *.s, but the slight difference doesn't warrant adding significant more complexity to LTOBackend.cpp as this patch does.

With your suggestion, does it also generate the final, linked output file? Or just the .S?
The idea is to be able to get both alongside each other. We'd also like to have comments in the assembly file about things like register usage - the ones you get when you codegen to assembly, not when you disassemble.

There is an empty output file to satisfy some build tools' requirement that an output is already emitted. You can find assembly files in ${output}1. As mentioned, they are just not named *.S.

Though, I am surprised that you see this as adding significant complexity, I tried to not make it too intrusive. Is it the whole change that's problematic, or just some part of it?
If yes, maybe I can re-do the parts you find problematic to simplify them. The way I see it, this kind of patch should just be a small feature addition that shouldn't increase complexity too much. If you think complexity has increased significantly then I definitely failed somewhere and I can try to correct it before discarding the idea entirely.

This is significant complexity because you do codegen twice // Doing codegen twice may seem inefficient, but this is designed as a debug , which you probably would agree as well that this is inelegant.

Well, the real question is when --lto-emit-asm exists, why bother with another mode and duplicating the existing functionality. So far I fail to see a convincing argument that this new mode is useful.

Mon, Nov 28, 1:37 AM · Restricted Project, Restricted Project

Fri, Nov 25

Pierre-vh updated the diff for D138651: [CUDA][HIP] Don't diagnose use for __bf16.

Fixing condition, adding new test case

Fri, Nov 25, 12:43 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D138560: [lld][LTO] Add assembly output to LTO save-temps.

I am unsure I am convinced with the motivation. Say you have two bitcode files a.o and b.o, can you just use clang++ -fuse-ld=lld a.o b.o -Wl,--lto-emit-asm? This can be used with -Wl,--save-temps as well. The output is named in term of the -o output file name, instead of *.s, but the slight difference doesn't warrant adding significant more complexity to LTOBackend.cpp as this patch does.

Fri, Nov 25, 12:19 AM · Restricted Project, Restricted Project

Thu, Nov 24

Pierre-vh updated the diff for D138651: [CUDA][HIP] Don't diagnose use for __bf16.

Not all targets have bf16 and AuxTarget may not be available all the time so I changed the condition slightly

Thu, Nov 24, 5:54 AM · Restricted Project, Restricted Project
Pierre-vh planned changes to D138651: [CUDA][HIP] Don't diagnose use for __bf16.

Need to fix a test crash

Thu, Nov 24, 5:20 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D138651: [CUDA][HIP] Don't diagnose use for __bf16.

Add newline at end of file

Thu, Nov 24, 3:59 AM · Restricted Project, Restricted Project
Pierre-vh requested review of D138651: [CUDA][HIP] Don't diagnose use for __bf16.
Thu, Nov 24, 3:55 AM · Restricted Project, Restricted Project
Pierre-vh retitled D138560: [lld][LTO] Add assembly output to LTO save-temps from [WIP] Add assembly output to LTO save-temps to [lld][LTO] Add assembly output to LTO save-temps.
Thu, Nov 24, 2:10 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D138560: [lld][LTO] Add assembly output to LTO save-temps.

Cleanup diff so it can be reviewed

Thu, Nov 24, 2:06 AM · Restricted Project, Restricted Project

Wed, Nov 23

Pierre-vh added inline comments to D137542: [AMDGPU] Use aperture registers instead of S_GETREG.
Wed, Nov 23, 4:06 AM · Restricted Project, Restricted Project
Pierre-vh requested review of D138560: [lld][LTO] Add assembly output to LTO save-temps.
Wed, Nov 23, 3:52 AM · Restricted Project, Restricted Project

Tue, Nov 22

Pierre-vh added inline comments to D137542: [AMDGPU] Use aperture registers instead of S_GETREG.
Tue, Nov 22, 6:34 AM · Restricted Project, Restricted Project
Pierre-vh committed rG4d39552abea3: [AMDGPU][NFC] Remove isLegalVOP3PShuffleMask (authored by Pierre-vh).
[AMDGPU][NFC] Remove isLegalVOP3PShuffleMask
Tue, Nov 22, 6:32 AM · Restricted Project, Restricted Project
Pierre-vh closed D138493: [AMDGPU][NFC] Remove isLegalVOP3PShuffleMask.
Tue, Nov 22, 6:32 AM · Restricted Project, Restricted Project
Pierre-vh committed rG9e7febb4f73c: [AMDGPU][GISel] Select llvm.amdgcn.fcmp intrinsics (authored by Pierre-vh).
[AMDGPU][GISel] Select llvm.amdgcn.fcmp intrinsics
Tue, Nov 22, 6:19 AM · Restricted Project, Restricted Project
Pierre-vh closed D136592: [AMDGPU][GISel] Select llvm.amdgcn.fcmp intrinsics.
Tue, Nov 22, 6:19 AM · Restricted Project, Restricted Project
Pierre-vh requested review of D138493: [AMDGPU][NFC] Remove isLegalVOP3PShuffleMask.
Tue, Nov 22, 6:15 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D137542: [AMDGPU] Use aperture registers instead of S_GETREG.

Note: the machine where my full dev setup is is down at the moment so I can't run OCLTst right now, will do it as soon as possible (and before landing definitely)

Tue, Nov 22, 1:40 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D137542: [AMDGPU] Use aperture registers instead of S_GETREG.

Rebase, constrain in select instead

Tue, Nov 22, 1:39 AM · Restricted Project, Restricted Project
Pierre-vh committed rG220147d536f3: [AMDGPU] Make aperture registers 64 bit (authored by Pierre-vh).
[AMDGPU] Make aperture registers 64 bit
Tue, Nov 22, 1:18 AM · Restricted Project, Restricted Project
Pierre-vh closed D137767: [AMDGPU] Make aperture registers 64 bit.
Tue, Nov 22, 1:18 AM · Restricted Project, Restricted Project
Pierre-vh requested review of D136592: [AMDGPU][GISel] Select llvm.amdgcn.fcmp intrinsics.

Please take another look as I had to make some minor changes due to the rebase

Tue, Nov 22, 1:01 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D136592: [AMDGPU][GISel] Select llvm.amdgcn.fcmp intrinsics.

Rebase + had to make some changes D138044

Tue, Nov 22, 1:00 AM · Restricted Project, Restricted Project
Pierre-vh committed rGa751676f98e8: [AMDGPU][GISel] Add llvm.amdgcn.icmp selection (authored by Pierre-vh).
[AMDGPU][GISel] Add llvm.amdgcn.icmp selection
Tue, Nov 22, 12:27 AM · Restricted Project, Restricted Project
Pierre-vh closed D136448: [AMDGPU][GISel] Add llvm.amdgcn.icmp selection.
Tue, Nov 22, 12:26 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D137705: [AMDGPU] Add DAG Combine for right-shift carry add to uaddo.

Apparently there is already an in flight version of this at D106139

Right, I think the reviewers there asked to move it to InstCombine, but I think we also want it in the backend, right?

Combines in the backend are primarily for patterns that arise as the result of legalization. Looking at this again, I'm inclined to have it in instcombine primarily.

Perhaps it should be kept as a target-specific combine for now? It can always be moved later if needed

It definitely should be done generically

Tue, Nov 22, 12:15 AM · Restricted Project, Restricted Project

Thu, Nov 17

Pierre-vh planned changes to D136234: [WIP] Allow ignoring copies in GISel.
Thu, Nov 17, 2:51 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D136319: [GISel] Rework trunc/shl combine in a generic trunc/shift combine.

Not sure why you deleted the test

Thu, Nov 17, 2:50 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D136319: [GISel] Rework trunc/shl combine in a generic trunc/shift combine.

Comments

Thu, Nov 17, 2:50 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D137767: [AMDGPU] Make aperture registers 64 bit.

Fix reserveRegisterTuples call

Thu, Nov 17, 12:13 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D137767: [AMDGPU] Make aperture registers 64 bit.

Note: I can't use COPY in D137542 because otherwise it'll "simplify" it and we end up with instructions that use the _HI register variants - which shouldn't be there and are just here because TableGen needs them/I get hundreds of crashes without it. It seems to assume there'll be sub1 for every reg in the class.

That either means we're missing a class restriction or need a reserve

Ah the class restriction was it! I fixed it. I got confused and was using SREG_64 (includes aperture register) for the COPY dst register, but if I use SGPR (doesn't include it) then copy coalescing won't mess it up

But naturally it should be SReg_64. It should be valid to copy to VCC, and used directly as a SSrc_b64/VSrc_b64 operand. You may want an SReg_64 variant that excludes these

Thu, Nov 17, 12:07 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D136448: [AMDGPU][GISel] Add llvm.amdgcn.icmp selection.

Don't support i1 inputs

Thu, Nov 17, 12:04 AM · Restricted Project, Restricted Project

Wed, Nov 16

Pierre-vh added a comment to D137705: [AMDGPU] Add DAG Combine for right-shift carry add to uaddo.

Apparently there is already an in flight version of this at D106139

Wed, Nov 16, 11:46 PM · Restricted Project, Restricted Project
Pierre-vh accepted D138050: AMDGPU/GlobalISel: Insert freeze when splitting vector G_SEXT_INREG.

What makes this combine specifically require a freeze? Could we have more combine that need it to or is it something with G_SEXT_INREG's semantics that makes it need the G_FREEZE?

It's introducing an expectation for potentially poisonous bits in two distinct uses since the low half is used in two different places. Both uses need to use the same value. There are plenty of places that are probably missing freezes

There have been a number of talks on freeze, e.g. https://www.youtube.com/watch?v=ZMaZH3YYJqY

Wed, Nov 16, 11:38 PM · Restricted Project, Restricted Project
Pierre-vh added inline comments to D138044: AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection.
Wed, Nov 16, 4:33 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D138050: AMDGPU/GlobalISel: Insert freeze when splitting vector G_SEXT_INREG.

What makes this combine specifically require a freeze? Could we have more combine that need it to or is it something with G_SEXT_INREG's semantics that makes it need the G_FREEZE?

Wed, Nov 16, 4:29 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D137705: [AMDGPU] Add DAG Combine for right-shift carry add to uaddo.

Fix comment

Wed, Nov 16, 2:12 AM · Restricted Project, Restricted Project
Pierre-vh requested review of D138106: [AMDGPU][GISel] Add lshr/add -> uaddo combine.
Wed, Nov 16, 2:11 AM · Restricted Project, Restricted Project
Pierre-vh added inline comments to D137705: [AMDGPU] Add DAG Combine for right-shift carry add to uaddo.
Wed, Nov 16, 1:00 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D137705: [AMDGPU] Add DAG Combine for right-shift carry add to uaddo.

Rebase on D138104

Wed, Nov 16, 1:00 AM · Restricted Project, Restricted Project
Pierre-vh requested review of D138104: [AMDGPU] Precommit add_shr_carry test.
Wed, Nov 16, 1:00 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D137767: [AMDGPU] Make aperture registers 64 bit.

Note: I can't use COPY in D137542 because otherwise it'll "simplify" it and we end up with instructions that use the _HI register variants - which shouldn't be there and are just here because TableGen needs them/I get hundreds of crashes without it. It seems to assume there'll be sub1 for every reg in the class.

That either means we're missing a class restriction or need a reserve

Wed, Nov 16, 12:34 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D137542: [AMDGPU] Use aperture registers instead of S_GETREG.

Use COPY (restricting to SGPR works, I got confused and was restricting to SREG)

Wed, Nov 16, 12:34 AM · Restricted Project, Restricted Project

Tue, Nov 15

Pierre-vh updated the diff for D137705: [AMDGPU] Add DAG Combine for right-shift carry add to uaddo.

Comments

Tue, Nov 15, 3:16 AM · Restricted Project, Restricted Project
Pierre-vh requested review of D137767: [AMDGPU] Make aperture registers 64 bit.

Note: I can't use COPY in D137542 because otherwise it'll "simplify" it and we end up with instructions that use the _HI register variants - which shouldn't be there and are just here because TableGen needs them/I get hundreds of crashes without it. It seems to assume there'll be sub1 for every reg in the class.

Tue, Nov 15, 3:00 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D137542: [AMDGPU] Use aperture registers instead of S_GETREG.

How have you tested this? OpenCL conformance flat tests with -O0 and -O2 should be good enough

Tue, Nov 15, 2:59 AM · Restricted Project, Restricted Project
Pierre-vh planned changes to D137767: [AMDGPU] Make aperture registers 64 bit.
Tue, Nov 15, 2:44 AM · Restricted Project, Restricted Project
Pierre-vh retitled D137767: [AMDGPU] Make aperture registers 64 bit from [AMDGPU] Add aperture register 64 bit variants to [AMDGPU] Make aperture registers 64 bit.
Tue, Nov 15, 2:14 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D137767: [AMDGPU] Make aperture registers 64 bit.

clang-format

Tue, Nov 15, 2:14 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D137767: [AMDGPU] Make aperture registers 64 bit.

Why do we need the 32-bit variants?

Can't this register still be used as a 32 bit operand ? (even if it's bugged) Then we need it for proper assembly/disassembly I believe, no?

OK, seems reasonable. I see we already have some asm/dis tests for 64-bit uses of these sources, e.g. llvm/test/MC/AMDGPU/literals.s tests s_and_b64 s[0:1], s[0:1], src_shared_base. How did that work? Should it work differently now?

Tue, Nov 15, 2:13 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D137767: [AMDGPU] Make aperture registers 64 bit.

I wasn't able to do this without creating the 16 bit subregisters.
I added a FIXME for it. If I don't add them and try to just define the
32 bit register directly using SIReg, it creates a lot of issues with register pressure sets and classes.
We end up with a bunch of autogenerated ones that are useless and just cause crashes everywhere.

Tue, Nov 15, 2:12 AM · Restricted Project, Restricted Project
Pierre-vh abandoned D136944: [AMDGPU] Enable `s_sendmsg_rtn` selection with `+gfx11-insts`.
Tue, Nov 15, 12:26 AM · Restricted Project, Restricted Project
Pierre-vh abandoned D136945: [AMDGPU] Enable `permlanex16` selection with `+16-bit-insts,+gfx10-insts`.
Tue, Nov 15, 12:26 AM · Restricted Project, Restricted Project
Pierre-vh abandoned D136946: [AMDGPU] Enable `update/mov.dpp` selection with `+dpp`.
Tue, Nov 15, 12:25 AM · Restricted Project, Restricted Project

Thu, Nov 10

Pierre-vh added a comment to D137767: [AMDGPU] Make aperture registers 64 bit.

Why do we need the 32-bit variants?

Thu, Nov 10, 1:34 AM · Restricted Project, Restricted Project
Pierre-vh updated the summary of D137542: [AMDGPU] Use aperture registers instead of S_GETREG.
Thu, Nov 10, 12:18 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D137542: [AMDGPU] Use aperture registers instead of S_GETREG.

Rebase on D137767

Thu, Nov 10, 12:18 AM · Restricted Project, Restricted Project
Pierre-vh requested review of D137767: [AMDGPU] Make aperture registers 64 bit.
Thu, Nov 10, 12:17 AM · Restricted Project, Restricted Project

Wed, Nov 9

Pierre-vh planned changes to D137542: [AMDGPU] Use aperture registers instead of S_GETREG.

Need to be rebased on a future patch that will add the 64 bit variant of the aperture registers (almost done, just need to get tests to pass)

Wed, Nov 9, 7:26 AM · Restricted Project, Restricted Project
Pierre-vh added a comment to D137542: [AMDGPU] Use aperture registers instead of S_GETREG.

do we have to change the RC of the src_shared/private_base register to 64 bit?

Yes that sounds right.

(ideally it should be available for both, no?)

That would be useful bu apparently it's not how the hardware works. I guess it gives you the 32 low bits which is not very useful because it will always be 0 for src_*_base and -1 for src_*_limit.

Wed, Nov 9, 6:06 AM · Restricted Project, Restricted Project
Pierre-vh updated the diff for D137542: [AMDGPU] Use aperture registers instead of S_GETREG.

So I indeed checked ocltst and the previous version crashed.
Now it looks fine, but the verifier crashes in a lot of tests, especially GISel ones.
The issue is that we need to use this register as a 64 bit operand but it's a 32 bit register, so the verifier complains on S_MOV_B64.

Wed, Nov 9, 5:54 AM · Restricted Project, Restricted Project