- User Since
- Jun 13 2017, 6:06 PM (159 w, 5 d)
Mar 18 2020
Feb 27 2020
Jan 30 2020
Jan 29 2020
It seems I forgot to test this against the entire testsuite, which made a bunch
of buildbots unhappy. This version fixes the issues:
Can someone else please commit this? @nhaehnle? It's been almost a month, excluding Christmas, and my commit access situation still hasn't been resolved. This fix is necessary for radv to pass VK 1.2 conformance, and we'd like it to cherry-pick it to LLVM 10 before it's released.
Dec 12 2019
I just asked earlier this week to get my commit rights back, but I haven't heard back right away, so you can commit it if I don't get it first.
Dec 10 2019
Suppress unused variable warnings in release builds.
Add diff for new test.
Add a test for multiple phis as suggested by @nikic.
- Update precommitted test in this commit.
- Make the test match the original bug better. It turns out that in my attempt to make it harder for other transforms to happen beforehand and break the test, I accidentally made another transform kick in which broke it anyways. Use this form with a loop which should hopefully defeat other optimizations better.
Dec 9 2019
- Fix wrong indentation and missing "immarg" on intrinsic declaration
- Make sure that we remove the done bit from existing exports, and test it
Dec 6 2019
Update comment to explain why this works even when only some threads are killed.
Dec 3 2019
Dec 2 2019
Nov 27 2019
Fix leftover extraneous change from an earlier version.
Sep 2 2019
FWIW, this LGTM. Maybe it would be a good idea to add more lines to global-constant.ll to test radeonsi (amdgcn-- at the moment) and radv (amdgcn-mesa-mesa3d) under the NOPAL label.
I just noticed that this already came up in D65813 and it does the right thing, it's just waiting review.
Aug 30 2019
FYI, I think I don't have commit access anymore because of the whole licensing thing, so I'll need someone else to commit for me.
Mar 13 2019
Actually, now that I think about it, I believe we realized that SIFixWWMLiveness has a giant hole in that if any of the extra live ranges we insert are split, it'll fall over. I don't think anyone has come up with a way to express the constraints only with extra defs and uses in a way that always works, and I'm not sure it's possible. The issue is that we're lying to LLVM RA by pretending that vector instructions always fully clobber their destinations, and while before we were careful to never write to any inactive channels in order to keep up the charade, but WWM instructions force us to deal with it somehow. Fully informing LLVM of what's going on would involve marking every vector instruction as partially clobbering its destination, even the move instructions and load/store instructions LLVM emits during RA, which of course would tank performance unless LLVM is taught about predicated liveness -- but of course that's a whole lot of work that opens another can of worms (register pressure is suddenly not that meaningful anymore...).
Feb 21 2019
Fixed a regression in the llvm.amdgcn.kill tests.
Feb 8 2019
I'm not going to read everything in detail, but the combining rules look correct to me and everything passes with this pass enabled. Feel free to re-enable it.
Adding a pattern for this wouldn't work for what I wanted to do, which was a ballot/inverseballot pair to operate directly on the bitmask representation of a boolean, since there's a bug where SelectionDAG forgets that ballot removes divergence, and it needs to be non-divergent for the pattern to fire. That being said, inserting two readlanes isn't that much better, so maybe I should just fix that instead...
Feb 7 2019
- Remove spurious whitespace change.
- Lower S_INV_BALLOT with EmitInstrWithCustomInserter.
Feb 5 2019
- Added a pseudoinstruction lowered in SILowerI1Copies, and legalized it more
similarly to how other instructions are legalized.
- Added tests where the source is a uniform and non-uniform phi node. I had to
to use the amdgpu_ps calling convention here to get the arguments into SGPR's.
Jan 28 2019
Jan 24 2019
Jan 16 2019
Jan 15 2019
Jan 14 2019
Jan 11 2019
I figured it would be a little easier if I looked at these cases by myself. It turns out there are more problems with isIdentityValue, including some correctness issues. After fixing these, everything works correctly now.
Jan 10 2019
Sorry, I just got back from break this week. I've run CTS with the pass enabled, and it now passes, although it seems most of the patterns we use don't get folded. Firstly AND, XOR, unsigned max, and unsigned min are most troubling, since the code that gets generated looks like it should be optimized:
Dec 12 2018
Dec 6 2018
Nov 16 2018
Nov 14 2018
I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?
May 6 2018
May 4 2018
For the liveness issue, maybe a better way to solve it would be to add a new ENTER_WWM pseudoinstruction similar to EXIT_WWM, and add a matching implicit def to the matching ENTER_WWM whenever we insert an implicit use on EXIT_WWM, and mark both of them as kills. After all, any affected registers only need to interfere with the instructions run in WWM, so that should help with code quality too. I'm not sure why I didn't do that in the first place.
Oct 16 2017
Aug 8 2017
Aug 7 2017
Ping on this one. This is the last outstanding patch for implementing AMD_shader_ballot in Mesa.
Aug 4 2017
It looks like Sam has worked a lot on the assembler, including adding support for DPP instructions, so I'm adding him for the assembler bits. I'd like to get this in before I leave next week, though.
Remove spurious change to AMDGPUAsmParser.cpp
Aug 3 2017
Fix assembling DPP instructions. Also, adopt a more conservative version of
D34715. In particular, we ignore Constraints/DisableEncoding from the original
instruction for the DPP version. The only instruction with any special
constraints is MAC, because of its fake third source, and there it doesn't make
sense to keep the fake third source since it has to be the same as the normal
"old" source anyways. We can revisit this if something else comes up, but I
think this is a good plan for now.