Page MenuHomePhabricator

cwabbott (Connor Abbott)
User

Projects

User does not belong to any projects.

User Details

User Since
Jun 13 2017, 6:06 PM (92 w, 4 d)

Recent Activity

Wed, Mar 13

cwabbott added inline comments to D59295: [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure..
Wed, Mar 13, 12:19 PM · Restricted Project, Restricted Project
cwabbott added a comment to D59295: [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure..

Actually, now that I think about it, I believe we realized that SIFixWWMLiveness has a giant hole in that if any of the extra live ranges we insert are split, it'll fall over. I don't think anyone has come up with a way to express the constraints only with extra defs and uses in a way that always works, and I'm not sure it's possible. The issue is that we're lying to LLVM RA by pretending that vector instructions always fully clobber their destinations, and while before we were careful to never write to any inactive channels in order to keep up the charade, but WWM instructions force us to deal with it somehow. Fully informing LLVM of what's going on would involve marking every vector instruction as partially clobbering its destination, even the move instructions and load/store instructions LLVM emits during RA, which of course would tank performance unless LLVM is taught about predicated liveness -- but of course that's a whole lot of work that opens another can of worms (register pressure is suddenly not that meaningful anymore...).

Wed, Mar 13, 12:02 PM · Restricted Project, Restricted Project
cwabbott added a comment to D59295: [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure..

If you can express the constraints with some series of uses and defs, that would be preferable.

Wed, Mar 13, 11:46 AM · Restricted Project, Restricted Project
cwabbott added a comment to D59295: [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure..

I really don't like introducing new, dynamically reserved registers for this. It's going to introduce hell for dealing with any kind of ABI, and reserved registers are generally a bad idea. There's also nothing guaranteeing there are any free registers available to reserve, since you are just grabbing totally unused ones. This is going to just hit some variant of the problem I've been working on solving for handling SGPR->VGPR spills. Can WWM code be moved into a bundle or something?

Wed, Mar 13, 11:16 AM · Restricted Project, Restricted Project

Feb 21 2019

cwabbott updated the diff for D57894: AMDGPU: Fix @llvm.amdgcn.wqm.vote implementation.

Fixed a regression in the llvm.amdgcn.kill tests.

Feb 21 2019, 5:33 AM · Restricted Project

Feb 8 2019

cwabbott accepted D55444: AMDGPU: Fix DPP combiner.

I'm not going to read everything in detail, but the combining rules look correct to me and everything passes with this pass enabled. Feel free to re-enable it.

Feb 8 2019, 2:54 AM · Restricted Project, Restricted Project
cwabbott added a comment to D57748: AMDGPU: Add inverse ballot intrinsic.

Adding a pattern for this wouldn't work for what I wanted to do, which was a ballot/inverseballot pair to operate directly on the bitmask representation of a boolean, since there's a bug where SelectionDAG forgets that ballot removes divergence, and it needs to be non-divergent for the pattern to fire. That being said, inserting two readlanes isn't that much better, so maybe I should just fix that instead...

Feb 8 2019, 2:22 AM · Restricted Project

Feb 7 2019

cwabbott created D57894: AMDGPU: Fix @llvm.amdgcn.wqm.vote implementation.
Feb 7 2019, 7:31 AM · Restricted Project
cwabbott updated the diff for D57748: AMDGPU: Add inverse ballot intrinsic.
  • Remove spurious whitespace change.
  • Lower S_INV_BALLOT with EmitInstrWithCustomInserter.
Feb 7 2019, 6:48 AM · Restricted Project

Feb 5 2019

cwabbott updated the diff for D57748: AMDGPU: Add inverse ballot intrinsic.
  • Added a pseudoinstruction lowered in SILowerI1Copies, and legalized it more

similarly to how other instructions are legalized.

  • Added tests where the source is a uniform and non-uniform phi node. I had to

to use the amdgpu_ps calling convention here to get the arguments into SGPR's.

Feb 5 2019, 8:30 AM · Restricted Project
cwabbott added inline comments to D57748: AMDGPU: Add inverse ballot intrinsic.
Feb 5 2019, 6:25 AM · Restricted Project
cwabbott added inline comments to D57748: AMDGPU: Add inverse ballot intrinsic.
Feb 5 2019, 6:19 AM · Restricted Project
cwabbott added a comment to D57748: AMDGPU: Add inverse ballot intrinsic.

The user can't actually guarantee the argument is uniform, since transforms are allowed to touch the argument. We're accumulating quite a few places that assume this though

Feb 5 2019, 6:16 AM · Restricted Project
cwabbott created D57748: AMDGPU: Add inverse ballot intrinsic.
Feb 5 2019, 5:57 AM · Restricted Project

Jan 28 2019

cwabbott added inline comments to D55897: Add constrained fptrunc and fpext intrinsics.
Jan 28 2019, 9:16 AM
cwabbott added inline comments to D55897: Add constrained fptrunc and fpext intrinsics.
Jan 28 2019, 7:31 AM

Jan 24 2019

cwabbott added inline comments to D55897: Add constrained fptrunc and fpext intrinsics.
Jan 24 2019, 3:57 AM

Jan 16 2019

cwabbott added inline comments to D55444: AMDGPU: Fix DPP combiner.
Jan 16 2019, 1:14 AM · Restricted Project, Restricted Project

Jan 15 2019

cwabbott added inline comments to D55444: AMDGPU: Fix DPP combiner.
Jan 15 2019, 7:44 AM · Restricted Project, Restricted Project

Jan 14 2019

cwabbott added inline comments to D55444: AMDGPU: Fix DPP combiner.
Jan 14 2019, 10:27 AM · Restricted Project, Restricted Project

Jan 11 2019

cwabbott added a comment to D55444: AMDGPU: Fix DPP combiner.

I figured it would be a little easier if I looked at these cases by myself. It turns out there are more problems with isIdentityValue, including some correctness issues. After fixing these, everything works correctly now.

Jan 11 2019, 2:22 AM · Restricted Project, Restricted Project

Jan 10 2019

cwabbott added a comment to D55444: AMDGPU: Fix DPP combiner.

Sorry, I just got back from break this week. I've run CTS with the pass enabled, and it now passes, although it seems most of the patterns we use don't get folded. Firstly AND, XOR, unsigned max, and unsigned min are most troubling, since the code that gets generated looks like it should be optimized:

Jan 10 2019, 5:34 AM · Restricted Project, Restricted Project

Dec 12 2018

cwabbott added inline comments to D55444: AMDGPU: Fix DPP combiner.
Dec 12 2018, 10:24 AM · Restricted Project, Restricted Project

Dec 6 2018

cwabbott added a comment to D55314: AMDGPU: Turn on the DPP combiner by default.

Ok, I'll disable it. I'm not sure about 3rd point: are you sayng the pass doesn't actually perform the optimization or it's fundamentally wrong? Because it implemented to handle "identity" cases for add, mul and min/max.

Dec 6 2018, 6:29 AM
cwabbott added a comment to D55314: AMDGPU: Turn on the DPP combiner by default.

Hi,

This change breaks most of the subgroups tests with RADV (ie. dEQP-VK.subgroups.arithmetic.*).

Any reasons why you enabled it by default? Looks like it now triggers a new bug in the AMDGPU backend.

Thanks!

Hi,

this is awaited feature and other reason is to detect a situation such yours. I think it's better to turn it off and reproduce the failures: how can I do this?

Dec 6 2018, 6:08 AM

Nov 16 2018

cwabbott added a comment to D54516: [AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem.

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

readnone is literally readnone. Maybe you're mixing this up with inaccessiblememonly?

Nov 16 2018, 4:50 AM

Nov 14 2018

cwabbott added a comment to D54516: [AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem.

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

Nov 14 2018, 2:04 AM

May 6 2018

cwabbott added a comment to D46470: [AMDGPU] Fixed a couple of SIFixWWMLiveness problems.
In D46470#1088984, @tpr wrote:

You made a comment in this code that "this is a workaround anyways until LLVM gains the notion of predicated uses and definitions of variables". How do you see that working? Are you aware of anyone else thinking along those lines in the LLVM community? Something to think about for the future.

May 6 2018, 10:37 AM

May 4 2018

cwabbott added a comment to D46470: [AMDGPU] Fixed a couple of SIFixWWMLiveness problems.

For the liveness issue, maybe a better way to solve it would be to add a new ENTER_WWM pseudoinstruction similar to EXIT_WWM, and add a matching implicit def to the matching ENTER_WWM whenever we insert an implicit use on EXIT_WWM, and mark both of them as kills. After all, any affected registers only need to interfere with the instructions run in WWM, so that should help with code quality too. I'm not sure why I didn't do that in the first place.

May 4 2018, 2:56 PM
cwabbott added inline comments to D46470: [AMDGPU] Fixed a couple of SIFixWWMLiveness problems.
May 4 2018, 2:37 PM

Oct 16 2017

cwabbott added inline comments to D38543: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic.
Oct 16 2017, 12:39 PM

Aug 8 2017

cwabbott committed rL310399: [AMDGPU] Add llvm.amdgpu.update.dpp intrinsic.
[AMDGPU] Add llvm.amdgpu.update.dpp intrinsic
Aug 8 2017, 11:54 AM
cwabbott closed D34718: [AMDGPU] Add llvm.amdgpu.update.dpp intrinsic by committing rL310399: [AMDGPU] Add llvm.amdgpu.update.dpp intrinsic.
Aug 8 2017, 11:54 AM

Aug 7 2017

cwabbott added a comment to D34718: [AMDGPU] Add llvm.amdgpu.update.dpp intrinsic.

Ping on this one. This is the last outstanding patch for implementing AMD_shader_ballot in Mesa.

Aug 7 2017, 4:25 PM
cwabbott committed rL310283: [AMDGPU] Add pseudo "old" source to all DPP instructions.
[AMDGPU] Add pseudo "old" source to all DPP instructions
Aug 7 2017, 12:12 PM
cwabbott closed D34716: [AMDGPU] Add pseudo "old" and "wqm_mode" source to all DPP instructions by committing rL310283: [AMDGPU] Add pseudo "old" source to all DPP instructions.
Aug 7 2017, 12:12 PM

Aug 4 2017

cwabbott committed rL310088: [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic.
[AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic
Aug 4 2017, 11:38 AM
cwabbott closed D34719: [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic by committing rL310088: [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic.
Aug 4 2017, 11:37 AM
cwabbott committed rL310087: [AMDGPU] Add support for Whole Wavefront Mode.
[AMDGPU] Add support for Whole Wavefront Mode
Aug 4 2017, 11:37 AM
cwabbott committed rL310085: [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM.
[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM
Aug 4 2017, 11:37 AM
cwabbott committed rL310086: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI).
[AMDGPU] refactor WQM pass in preparation for WWM (NFCI)
Aug 4 2017, 11:37 AM
cwabbott closed D35524: [AMDGPU] Add support for Whole Wavefront Mode by committing rL310087: [AMDGPU] Add support for Whole Wavefront Mode.
Aug 4 2017, 11:37 AM
cwabbott closed D35167: [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM by committing rL310085: [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM.
Aug 4 2017, 11:37 AM
cwabbott closed D35523: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI) by committing rL310086: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI).
Aug 4 2017, 11:37 AM
cwabbott added a reviewer for D34716: [AMDGPU] Add pseudo "old" and "wqm_mode" source to all DPP instructions: SamWot.

It looks like Sam has worked a lot on the assembler, including adding support for DPP instructions, so I'm adding him for the assembler bits. I'd like to get this in before I leave next week, though.

Aug 4 2017, 10:58 AM
cwabbott updated the diff for D34716: [AMDGPU] Add pseudo "old" and "wqm_mode" source to all DPP instructions.

Remove spurious change to AMDGPUAsmParser.cpp

Aug 4 2017, 10:54 AM
cwabbott added inline comments to D34716: [AMDGPU] Add pseudo "old" and "wqm_mode" source to all DPP instructions.
Aug 4 2017, 8:48 AM

Aug 3 2017

cwabbott committed rL310013: [AMDGPU] Add missing hazard for DPP-after-EXEC-write.
[AMDGPU] Add missing hazard for DPP-after-EXEC-write
Aug 3 2017, 6:10 PM
cwabbott closed D34849: [AMDGPU] Add missing hazard for DPP-after-EXEC-write by committing rL310013: [AMDGPU] Add missing hazard for DPP-after-EXEC-write.
Aug 3 2017, 6:10 PM
cwabbott updated the diff for D34716: [AMDGPU] Add pseudo "old" and "wqm_mode" source to all DPP instructions.

Fix assembling DPP instructions. Also, adopt a more conservative version of
D34715. In particular, we ignore Constraints/DisableEncoding from the original
instruction for the DPP version. The only instruction with any special
constraints is MAC, because of its fake third source, and there it doesn't make
sense to keep the fake third source since it has to be the same as the normal
"old" source anyways. We can revisit this if something else comes up, but I
think this is a good plan for now.

Aug 3 2017, 5:48 PM
cwabbott abandoned D34715: [AMDGPU] don't set Constraints/DisableEncoding from the Profile.
Aug 3 2017, 5:39 PM
cwabbott added a comment to D34715: [AMDGPU] don't set Constraints/DisableEncoding from the Profile.

I realized today that this change is bogus, since these lines aren't copying the Constraints from the Profile. They're copying the constraints set via the pseudoinstruction, and removing these lines breaks a few things, in particular assembling MAC instructions, and probably liveness tracking for them too. Originally the changes in D34716 wouldn't work without this, but I'm going to update D34716 with a better solution that passes all the tests.

Aug 3 2017, 5:39 PM
cwabbott updated the diff for D34719: [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic.

Rebase on master.

Aug 3 2017, 1:50 PM
cwabbott updated the diff for D35524: [AMDGPU] Add support for Whole Wavefront Mode.

Remove unneccesary stuff from fix-wwm-liveness.mir, based on Matt's comment for wqm.mir.

Aug 3 2017, 1:50 PM
cwabbott committed rL309981: test commit.
test commit
Aug 3 2017, 1:23 PM
cwabbott updated the diff for D35524: [AMDGPU] Add support for Whole Wavefront Mode.

Fix one last style issue.

Aug 3 2017, 1:06 PM

Aug 2 2017

cwabbott updated the diff for D34719: [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic.

Clarify what the code that implements the WQM semantics is doing.

Aug 2 2017, 7:27 PM
cwabbott added inline comments to D34719: [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic.
Aug 2 2017, 7:15 PM
cwabbott updated the diff for D35524: [AMDGPU] Add support for Whole Wavefront Mode.
  • rename dest of EXIT_WWM to be consistent
Aug 2 2017, 7:09 PM
cwabbott updated the diff for D35524: [AMDGPU] Add support for Whole Wavefront Mode.
  • fix style issues
  • add test for WWM with integers
  • remove unneccesary stuff from MIR test
Aug 2 2017, 7:06 PM
cwabbott added inline comments to D35524: [AMDGPU] Add support for Whole Wavefront Mode.
Aug 2 2017, 6:41 PM
cwabbott added inline comments to D35167: [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM.
Aug 2 2017, 12:16 PM
cwabbott added inline comments to D35524: [AMDGPU] Add support for Whole Wavefront Mode.
Aug 2 2017, 11:28 AM

Aug 1 2017

cwabbott added a comment to D34849: [AMDGPU] Add missing hazard for DPP-after-EXEC-write.

Ping.

Aug 1 2017, 11:42 AM

Jul 31 2017

cwabbott updated the diff for D34719: [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic.

Fixup lowerCopyInstrs() after using setDesc() by removing extra arguments.

Jul 31 2017, 11:49 AM

Jul 28 2017

cwabbott updated the diff for D34716: [AMDGPU] Add pseudo "old" and "wqm_mode" source to all DPP instructions.

Remove leftover $wqm_ctrl.

Jul 28 2017, 3:54 PM
cwabbott updated the diff for D34719: [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic.

Rebase on using s_or_saveexec_b64 for WWM (fix test).

Jul 28 2017, 3:52 PM
cwabbott updated the diff for D35524: [AMDGPU] Add support for Whole Wavefront Mode.
  • Fix style issues.
  • Rebased on using setDesc() in lowerCopyInstrs()
  • Reworked logic for determining safe points to transition where SCC isn't live, to make using s_or_saveexec_b64 for entering WWM possible.
  • Added tests for WWM across multiple basic blocks and correct SCC handling. For the latter, I couldn't figure out how to make an IR test that exercised that path, so I made a MIR test for it.
Jul 28 2017, 3:50 PM
cwabbott updated the diff for D35523: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI).

Reset to previous version. Whoops.

Jul 28 2017, 3:50 PM
cwabbott added a comment to D35523: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI).

Whoops, I accidentally squashed this with D35524 while rebasing. Those changes were meant to be applied to that.

Jul 28 2017, 3:09 PM
cwabbott updated the diff for D35523: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI).

Fix WWM SCC test to actually test what it's supposed to, and fix a bug caught by it. Whoops.

Jul 28 2017, 3:01 PM
cwabbott updated the diff for D35523: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI).
  • Fixed style issues.
  • Rebased on using setDesc() in lowerCopyInstrs()
  • Reworked logic for determining safe points to transition where SCC isn't live, to make using s_or_saveexec_b64 for entering WWM possible.
  • Added tests for WWM across multiple basic blocks and correct SCC handling. For the latter, I couldn't figure out how to make an IR test that exercised that path, so I made a MIR test for it.
Jul 28 2017, 2:07 PM

Jul 27 2017

cwabbott added inline comments to D35524: [AMDGPU] Add support for Whole Wavefront Mode.
Jul 27 2017, 5:54 PM

Jul 26 2017

cwabbott added inline comments to D35524: [AMDGPU] Add support for Whole Wavefront Mode.
Jul 26 2017, 5:11 PM
cwabbott updated the diff for D35523: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI).

Mask out the flag first in markInstruction() to simplify the code a little and
avoid processing instructions unnecessarily once WWM is enabled.

Jul 26 2017, 3:47 PM
cwabbott updated the diff for D35167: [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM.

Minor comment style fixes, simplify lowerCopyInstrs by using setDesc()

Jul 26 2017, 3:31 PM
cwabbott added inline comments to D35167: [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM.
Jul 26 2017, 3:28 PM
cwabbott added inline comments to D34716: [AMDGPU] Add pseudo "old" and "wqm_mode" source to all DPP instructions.
Jul 26 2017, 1:26 PM

Jul 20 2017

cwabbott added inline comments to D35167: [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM.
Jul 20 2017, 1:14 PM

Jul 17 2017

cwabbott added reviewers for D34718: [AMDGPU] Add llvm.amdgpu.update.dpp intrinsic: nhaehnle, tpr.
Jul 17 2017, 7:17 PM
cwabbott added reviewers for D34716: [AMDGPU] Add pseudo "old" and "wqm_mode" source to all DPP instructions: nhaehnle, tpr.
Jul 17 2017, 7:16 PM
cwabbott added reviewers for D34719: [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic: nhaehnle, tpr.
Jul 17 2017, 7:15 PM
cwabbott updated the diff for D34849: [AMDGPU] Add missing hazard for DPP-after-EXEC-write.
  • Check for correct S_NOP argument and use VI-NEXT
  • Only check for VALU EXEC write as per the docs, and update test
Jul 17 2017, 7:02 PM
cwabbott added inline comments to D34849: [AMDGPU] Add missing hazard for DPP-after-EXEC-write.
Jul 17 2017, 6:18 PM
cwabbott added a child revision for D35167: [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM: D35523: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI).
Jul 17 2017, 5:59 PM
cwabbott added a parent revision for D35523: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI): D35167: [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM.
Jul 17 2017, 5:59 PM
cwabbott abandoned D34717: [AMDGPU] Teach the WQM pass about Whole Wavefront Mode and wqm_ctrl.

Abadon in favor of D35524. While I based that change off of this one, things have changed so much that it's probably better to abandon this and do the review there.

Jul 17 2017, 5:58 PM
cwabbott updated the diff for D34718: [AMDGPU] Add llvm.amdgpu.update.dpp intrinsic.

Remove wqm_ctrl in favor of llvm.amdgcn.wqm and llvm.amdgcn.wqm
intrinsics, and remove tests for them.

Jul 17 2017, 5:55 PM
cwabbott updated the diff for D34716: [AMDGPU] Add pseudo "old" and "wqm_mode" source to all DPP instructions.

Remove wqm_mode in favor of llvm.amdgcn.wqm and llvm.amdgcn.wwm intrinsics.

Jul 17 2017, 5:54 PM
cwabbott added a parent revision for D35524: [AMDGPU] Add support for Whole Wavefront Mode: D35523: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI).
Jul 17 2017, 5:52 PM
cwabbott added a child revision for D35523: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI): D35524: [AMDGPU] Add support for Whole Wavefront Mode.
Jul 17 2017, 5:52 PM
cwabbott updated the diff for D34719: [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic.

rebase on latest WWM implementation, tweak semantics and implementation
to force WQM whenever WQM is used.

Jul 17 2017, 5:51 PM
cwabbott created D35524: [AMDGPU] Add support for Whole Wavefront Mode.
Jul 17 2017, 5:50 PM
cwabbott created D35523: [AMDGPU] refactor WQM pass in preparation for WWM (NFCI).
Jul 17 2017, 5:48 PM
cwabbott updated the diff for D35167: [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM.

Avoid illegal SGPR<->VGPR copies by treating WQM more like COPY, add test that
exposes the problem.

Jul 17 2017, 1:59 PM

Jul 10 2017

cwabbott added a comment to D34849: [AMDGPU] Add missing hazard for DPP-after-EXEC-write.

Ping.

Jul 10 2017, 3:03 PM

Jul 9 2017

cwabbott added a comment to D34677: [AMDGPU] Whole Quad Mode variant of mov.dpp intrinsic.

I just posted D35167, which seems closer to how we want to deal with WQM and WWM in the long term, and should provide the same functionality as this intrinsic.

Jul 9 2017, 6:13 AM
cwabbott created D35167: [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM.
Jul 9 2017, 6:13 AM

Jun 30 2017

cwabbott updated the diff for D34717: [AMDGPU] Teach the WQM pass about Whole Wavefront Mode and wqm_ctrl.

Actually disable WWM on exit of a block.

Jun 30 2017, 5:26 PM
cwabbott abandoned D34847: [AMDGPU] Mark all export instructions as DisableWQM.

After some more testing, it seems like my hypothesis about the issue was wrong. I think the real problem was that WWM wasn't being disabled in the face of control flow like I intended it to, and then stuff was getting messed up because inactive lanes were enabled when the code assumed they weren't. At least, that's my theory for now. I'm going to update D34717 to fix that.

Jun 30 2017, 5:20 PM