dstuttard (David Stuttard)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 25 2017, 7:29 AM (94 w, 5 d)

Recent Activity

Today

dstuttard updated the diff for D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Thanks for the review - made all the suggested changes

Mon, Nov 19, 7:35 AM
dstuttard added inline comments to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.
Mon, Nov 19, 7:35 AM

Tue, Nov 13

dstuttard added a comment to D54301: [AMDGPU] Derive GCNSubtarget from MF to get overridden target features.

I don't quite understand what distinguishes the two versions of MCSubtargetInfo (getSTI() vs STM), but it appears a patch from Konstantin earlier this year changed this specific instance from STM.getFeatureBits() to getSTI(). Changing this back LGTM but I don't know what the rationale was in the first place, if it wasn't a typo, so I don't want to sign off without asking.

Tue, Nov 13, 8:53 AM

Fri, Nov 9

dstuttard added a reviewer for D54301: [AMDGPU] Derive GCNSubtarget from MF to get overridden target features: scott.linder.

Scott - you made some changes here most recently so adding you as the reviewer.

Fri, Nov 9, 3:28 AM
dstuttard created D54301: [AMDGPU] Derive GCNSubtarget from MF to get overridden target features.
Fri, Nov 9, 3:27 AM

Wed, Nov 7

dstuttard added a reviewer for D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics: sheredom.

Added Neil as a reviewer as I've made some changes to some of his a16 tests. I'm pretty certain that the modifications are correct, but wanted to get feedback on that as well.

Wed, Nov 7, 6:24 AM
dstuttard updated the diff for D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Modified based on review feedback from Nicolai

Wed, Nov 7, 6:21 AM

Tue, Nov 6

dstuttard abandoned D35073: [RegisterCoalescer] Fix for subrange join unreachable.

This patch has been superseded by D49097 - so I'm abandoning this one.

Tue, Nov 6, 4:15 AM

Tue, Oct 30

dstuttard added a comment to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Minor changes made.

Tue, Oct 30, 7:57 AM
dstuttard updated the diff for D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Made minor code changes suggested in review

Tue, Oct 30, 7:51 AM

Wed, Oct 24

dstuttard added a comment to D51925: [AMDGPU] Fix issue for zext of f16 to i32.

@arsenm Matt, any more comments? Would you be happy with a clarification comment as per the last suggestion from me?

Wed, Oct 24, 1:08 AM
dstuttard added a comment to D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.

@arsenm Matt - good to go?

Wed, Oct 24, 1:06 AM
dstuttard added a comment to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Covered all the requested changes (I think). Also implemented a test to make sure that simplifyDemanded doesn't run when TFE/LWE is enabled.

Wed, Oct 24, 1:03 AM
dstuttard updated the diff for D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Changed the implementation of the intrinsic return type to be an aggregate type

Wed, Oct 24, 1:00 AM

Sep 24 2018

dstuttard added a comment to D51925: [AMDGPU] Fix issue for zext of f16 to i32.

Looking again at the code - you're correct that it attempts to only do this transformation if the high bits are zero.
However, the code that checks this has the following telling comment:

// (i32 zext (i16 (bitcast f16:$src))) -> fp16_zext $src
// FIXME: It is not universally true that the high bits are zeroed on gfx9.
if (Src.getOpcode() == ISD::BITCAST) {
  SDValue BCSrc = Src.getOperand(0);
  if (BCSrc.getValueType() == MVT::f16 &&
      fp16SrcZerosHighBits(BCSrc.getOpcode()))
    return DCI.DAG.getNode(AMDGPUISD::FP16_ZEXT, SDLoc(N), VT, BCSrc);
}

In this particular case the BCSrc operation was an fptrunc which passes the fp16SrcZerosHighBits test - but that eventually ends up as v_mad_mixlo_f16 which doesn't ensure that the high bits are zero.

Any suggestions on how to proceed? I agree that it seems a shame to have to insert the extra AND operation blindly.

I guess you could check the subtarget in fp16SrcZerosHighBits. However that's pretty risky since it's depending on things we can't guarantee. Something could transform any other instruction into something else that won't preserve this. Overall I'm very unhappy this hardware change happened and it's a lot of work to handle all of this properly. I think what we really need is to drop this combine/node, and a separate machine instruction for every operation that preserves the high bits (with a tied source operand) vs. zeros them, and then have a machine pass that tries to clean up the extra ands while dropping this combine. We'll have to do extra work because we will have missed out on combines that this was enabling.

Sep 24 2018, 7:38 AM
dstuttard added a comment to D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.

I don't actually understand why this code is where it is? Why is SIFixSGPRCopies doing this? To clarify is this just an optimization? My initial reaction was that it was a fix, but looking at it again it seems like an optimization to me

Sep 24 2018, 7:30 AM
dstuttard updated the diff for D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.

Updated a mov to a copy as per review comment

Sep 24 2018, 7:26 AM

Sep 14 2018

dstuttard committed rL342222: [AMDGPU] Ensure trig range reduction only used for subtargets that require it.
[AMDGPU] Ensure trig range reduction only used for subtargets that require it
Sep 14 2018, 3:29 AM
dstuttard closed D51933: [AMDGPU] Ensure trig range reduction only used for subtargets that require it.
Sep 14 2018, 3:29 AM

Sep 13 2018

dstuttard updated the diff for D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.

Moved foldToImm into SIInstrInfo as suggested
Implemented check in verifyInstruction and checked that it worked when the fix was removed

Sep 13 2018, 6:51 AM
dstuttard added inline comments to D51933: [AMDGPU] Ensure trig range reduction only used for subtargets that require it.
Sep 13 2018, 4:10 AM
dstuttard added inline comments to D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.
Sep 13 2018, 1:35 AM
dstuttard updated the diff for D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.

Added missing break

Sep 13 2018, 1:33 AM
dstuttard added inline comments to D51933: [AMDGPU] Ensure trig range reduction only used for subtargets that require it.
Sep 13 2018, 1:19 AM
dstuttard updated the diff for D51933: [AMDGPU] Ensure trig range reduction only used for subtargets that require it.

De-duplicated 1/2PI constant

Sep 13 2018, 1:18 AM

Sep 12 2018

dstuttard updated the diff for D51933: [AMDGPU] Ensure trig range reduction only used for subtargets that require it.

Made suggested change

Sep 12 2018, 9:51 AM
dstuttard added inline comments to D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.
Sep 12 2018, 9:38 AM
dstuttard updated the diff for D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.

Made suggested changes

Sep 12 2018, 9:38 AM

Sep 11 2018

dstuttard added inline comments to D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.
Sep 11 2018, 2:01 PM
dstuttard added inline comments to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.
Sep 11 2018, 10:38 AM
dstuttard updated the diff for D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Folded in most of the changes highlighted in the review

Sep 11 2018, 10:31 AM
dstuttard added reviewers for D51933: [AMDGPU] Ensure trig range reduction only used for subtargets that require it: arsenm, tpr.
Sep 11 2018, 7:53 AM
dstuttard created D51933: [AMDGPU] Ensure trig range reduction only used for subtargets that require it.
Sep 11 2018, 7:52 AM
dstuttard added reviewers for D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands: rampitec, tpr.
Sep 11 2018, 7:41 AM
dstuttard created D51932: [AMDGPU] Fix-up cases where writelane has 2 SGPR operands.
Sep 11 2018, 7:40 AM
dstuttard added a comment to D51925: [AMDGPU] Fix issue for zext of f16 to i32.

Looking again at the code - you're correct that it attempts to only do this transformation if the high bits are zero.
However, the code that checks this has the following telling comment:

Sep 11 2018, 6:43 AM
dstuttard added inline comments to D51925: [AMDGPU] Fix issue for zext of f16 to i32.
Sep 11 2018, 6:13 AM
dstuttard added reviewers for D51925: [AMDGPU] Fix issue for zext of f16 to i32: arsenm, tpr.
Sep 11 2018, 4:52 AM
dstuttard created D51925: [AMDGPU] Fix issue for zext of f16 to i32.
Sep 11 2018, 4:47 AM

Jul 23 2018

dstuttard accepted D49026: [AMDGPU] New tbuffer intrinsics.

LGTM - but you might want further reviews from others not so involved in implementation.

Jul 23 2018, 7:39 AM

Jul 2 2018

dstuttard added a comment to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

@nhaehnle - just added you as reviewer at the moment.

Jul 2 2018, 4:35 AM
dstuttard added a reviewer for D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics: nhaehnle.
Jul 2 2018, 4:33 AM
dstuttard created D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.
Jul 2 2018, 4:32 AM

Jun 26 2018

dstuttard abandoned D48546: [CodeGen] Ensure split interval has valid ranges for all sub registers.

D48555 supersedes this change.

Jun 26 2018, 3:01 AM
dstuttard accepted D48555: Account for undef values from predecessors in extendSegmentsToUses.

Yes, this change looks better based on what you say about not requiring live-in's for undef sub-ranges.

Jun 26 2018, 3:00 AM

Jun 25 2018

dstuttard added reviewers for D48546: [CodeGen] Ensure split interval has valid ranges for all sub registers: tpr, MatzeB, kparzysz.
Jun 25 2018, 6:26 AM
dstuttard created D48546: [CodeGen] Ensure split interval has valid ranges for all sub registers.
Jun 25 2018, 6:21 AM

Apr 18 2018

dstuttard committed rL330257: [AMDGPU] Fix issues for backend divergence tracking.
[AMDGPU] Fix issues for backend divergence tracking
Apr 18 2018, 6:56 AM
dstuttard closed D45372: [AMDGPU] Fix issues for backend divergence tracking.
Apr 18 2018, 6:56 AM

Apr 16 2018

dstuttard updated the diff for D45372: [AMDGPU] Fix issues for backend divergence tracking.

Removed test as being overkill for the missed state clear being added (2 other
tests still remain)

Apr 16 2018, 8:46 AM
dstuttard added a comment to D45372: [AMDGPU] Fix issues for backend divergence tracking.
  1. Remove the test altogether since the VirtReg2Value.clear() is definitely required and was originally missed as an oversight - and the test might be overkill.

Sounds reasonable. It really was an oversight :) It's funny that the change was reviewed by lots of people for 2 months but nobody noticed not clearing state.

Apr 16 2018, 8:23 AM
dstuttard added a comment to D45372: [AMDGPU] Fix issues for backend divergence tracking.

In general LGTM.
I also concerned about the magic test that has no checks and has no visible side effects on shared data but is necessary to reproduce buggy behavior.
Could you please clarify what do you need it for?

Apr 16 2018, 7:12 AM

Apr 7 2018

dstuttard added inline comments to D45372: [AMDGPU] Fix issues for backend divergence tracking.
Apr 7 2018, 1:40 AM

Apr 6 2018

dstuttard updated the diff for D45372: [AMDGPU] Fix issues for backend divergence tracking.

Folded in triple to run line

Apr 6 2018, 9:44 AM
dstuttard added a reviewer for D45372: [AMDGPU] Fix issues for backend divergence tracking: alex-t.
Apr 6 2018, 8:34 AM
dstuttard added a reviewer for D45372: [AMDGPU] Fix issues for backend divergence tracking: nhaehnle.
Apr 6 2018, 8:32 AM
dstuttard created D45372: [AMDGPU] Fix issues for backend divergence tracking.
Apr 6 2018, 8:27 AM

Mar 14 2018

dstuttard accepted D44468: [AMDGPU] For OS type AMDPAL, fixed scratch on compute shader.

LGTM

Mar 14 2018, 7:54 AM

Feb 2 2018

dstuttard added inline comments to D42838: [AMDGPU] added writelane intrinsic.
Feb 2 2018, 4:04 AM

Jan 17 2018

dstuttard added inline comments to D42079: AMDGPU: Add a function attribute that shrinks buggy s_buffer opcodes on GFX9.
Jan 17 2018, 1:05 AM

Jan 16 2018

dstuttard added a comment to D40308: [RegisterCoalescer] More fixes for subreg join failure in RegCoalescer.

Any comment on this change? If not I'll land in the next few days.

Jan 16 2018, 2:18 AM
dstuttard added a comment to D40300: [RegisterCoalescer] Fix for SubRegJoin failures.

Any comment on this change? If not I'll land in the next few days.

Jan 16 2018, 2:16 AM
dstuttard added a comment to D35073: [RegisterCoalescer] Fix for subrange join unreachable.

Any more comments on this change? If not I'll land this in the next few days.

Jan 16 2018, 2:15 AM

Jan 9 2018

dstuttard updated the diff for D35073: [RegisterCoalescer] Fix for subrange join unreachable.

Removing a debug llvm_unreachable

Jan 9 2018, 6:02 AM
dstuttard updated the diff for D35073: [RegisterCoalescer] Fix for subrange join unreachable.

A better fix for this problem.

Jan 9 2018, 3:58 AM

Nov 28 2017

dstuttard added a comment to D40297: [RegisterCoalescer] Add verification method to check LiveInterval Segments.

The intent of this code is really to allow anyone doing debug to insert a check anywhere in the code. In particular, for the problems I was looking at I needed to check the live intervals after each change made in order to pinpoint where an error was being introduced. As implemented here (with the check after coalescing) it looks redundant, so having that extra check is probably useless.
Running a verification phase after coalescing is sometimes too late, and indeed may well have asserted before this point.

Nov 28 2017, 2:18 AM

Nov 21 2017

dstuttard updated the diff for D40297: [RegisterCoalescer] Add verification method to check LiveInterval Segments.

Removing debug code

Nov 21 2017, 11:03 AM
dstuttard added a comment to D35073: [RegisterCoalescer] Fix for subrange join unreachable.

I'm going to take another look at this bug.
I've also uploaded a couple of other fixes for SubRange join failures. See D40300 and D40308. I've also uploaded a verifier method that is helpful in tracking these problems down. See D40297.

Nov 21 2017, 7:58 AM
dstuttard added reviewers for D40308: [RegisterCoalescer] More fixes for subreg join failure in RegCoalescer: qcolombet, MatzeB.
Nov 21 2017, 7:54 AM
dstuttard updated subscribers of D40308: [RegisterCoalescer] More fixes for subreg join failure in RegCoalescer.
Nov 21 2017, 7:54 AM
dstuttard created D40308: [RegisterCoalescer] More fixes for subreg join failure in RegCoalescer.
Nov 21 2017, 7:54 AM
dstuttard added a reviewer for D40300: [RegisterCoalescer] Fix for SubRegJoin failures: MatzeB.
Nov 21 2017, 5:53 AM
dstuttard updated subscribers of D40300: [RegisterCoalescer] Fix for SubRegJoin failures.
Nov 21 2017, 5:53 AM
dstuttard created D40300: [RegisterCoalescer] Fix for SubRegJoin failures.
Nov 21 2017, 5:53 AM
dstuttard updated the diff for D40297: [RegisterCoalescer] Add verification method to check LiveInterval Segments.

Removed a blank line causing unnecessary differences

Nov 21 2017, 5:50 AM
dstuttard added a comment to D40297: [RegisterCoalescer] Add verification method to check LiveInterval Segments.

I've got a couple of new SubRange join fixes that I'm about to upload for review. This routine was useful in tracking those down so I thought it would be useful to add as a generic aid for future problems.

Nov 21 2017, 5:08 AM
dstuttard added reviewers for D40297: [RegisterCoalescer] Add verification method to check LiveInterval Segments: qcolombet, MatzeB.
Nov 21 2017, 5:08 AM
dstuttard updated subscribers of D40297: [RegisterCoalescer] Add verification method to check LiveInterval Segments.
Nov 21 2017, 5:04 AM
dstuttard created D40297: [RegisterCoalescer] Add verification method to check LiveInterval Segments.
Nov 21 2017, 5:04 AM

Oct 10 2017

dstuttard committed rL315307: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors.
[DAGCombine] Fix for shuffle to vector extend for non power 2 vectors
Oct 10 2017, 5:46 AM
dstuttard closed D35241: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors by committing rL315307: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors.
Oct 10 2017, 5:46 AM

Oct 9 2017

dstuttard added a comment to D35241: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors.

@dstuttard Are you ready to commit this?

Oct 9 2017, 1:43 AM

Sep 28 2017

dstuttard added a comment to D35241: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors.

Test now reduced further - thanks @RKSimon - confirmed that it still exhibits the problem and that the fix still fixes it.
I managed to remove a couple more lines from the cut-down version you provided.

Sep 28 2017, 3:04 AM
dstuttard updated the diff for D35241: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors.

Cutting down the test size and rebasing

Sep 28 2017, 3:01 AM

Sep 25 2017

dstuttard added a comment to D35241: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors.

@dstuttard Are you still looking at this?

Sep 25 2017, 5:03 AM

Aug 4 2017

dstuttard added a comment to D35241: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors.

Can you reduce the test any further?

Aug 4 2017, 6:44 AM
dstuttard abandoned D34677: [AMDGPU] Whole Quad Mode variant of mov.dpp intrinsic.

Agreed - @cwabbott your changes provide something much more complete than this change, plus remove the need for it. Thanks.

Aug 4 2017, 6:41 AM
dstuttard added a comment to D34889: [ScheduleDAG] Fix bug in check for use of dead defs.

Testcase? It seems that your change wants to allow the following:

vreg4:sub0<def, dead> = ...
...
  = use vreg4:sub1

This is not legal! The dead modifier is about the full vreg not just about the sub0 part defined. (-verify-machineinstrs should also lead to such inputs getting rejected).

Aug 4 2017, 6:39 AM

Jul 26 2017

dstuttard updated the diff for D35073: [RegisterCoalescer] Fix for subrange join unreachable.

Implemented a slightly different approach to resolving the issue.

Jul 26 2017, 7:24 AM

Jul 25 2017

dstuttard added a comment to D35073: [RegisterCoalescer] Fix for subrange join unreachable.

I'm confused by this change (and the testcase doesn't really help understanding what is going on). You are probably onto a real bug here where we cannot assume the same lanemasks work for the input/output register class of a copy. However that should be independent of the fact that a full or partial copy is present.

Jul 25 2017, 7:29 AM

Jul 17 2017

dstuttard added a comment to D34889: [ScheduleDAG] Fix bug in check for use of dead defs.

ping

Jul 17 2017, 9:04 AM
dstuttard added a comment to D35073: [RegisterCoalescer] Fix for subrange join unreachable.

ping

Jul 17 2017, 9:03 AM
dstuttard added a comment to D35241: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors.

@arsenm - any comments?

Jul 17 2017, 9:03 AM

Jul 11 2017

dstuttard added a comment to D35241: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors.

It's already bugpoint reduced - I tried trimming it further by hand but it hid the issue (unsurprisingly).

Jul 11 2017, 5:03 AM
dstuttard added a reviewer for D35241: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors: RKSimon.

Not sure who is an appropriate reviewer for this, adding @RKSimon as he seems to have made several changes in the same area.

Jul 11 2017, 3:28 AM
dstuttard updated subscribers of D35241: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors.
Jul 11 2017, 3:27 AM
dstuttard created D35241: [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors.
Jul 11 2017, 3:26 AM

Jul 7 2017

dstuttard updated the diff for D35073: [RegisterCoalescer] Fix for subrange join unreachable.

Adding verify-machineinstr flag to llc invocation in the test

Jul 7 2017, 5:35 AM

Jul 6 2017

dstuttard updated subscribers of D35073: [RegisterCoalescer] Fix for subrange join unreachable.
Jul 6 2017, 10:52 AM
dstuttard added reviewers for D35073: [RegisterCoalescer] Fix for subrange join unreachable: MatzeB, arsenm.
Jul 6 2017, 10:51 AM