Page MenuHomePhabricator

nhaehnle (Nicolai Hähnle)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 9 2015, 4:06 AM (170 w, 4 d)

Recent Activity

Yesterday

nhaehnle added a comment to D55444: AMDGPU: Fix DPP combiner.

Hi Valery, I really like the way the different cases are listed in the explanatory comment at the top of the file, and I believe those cases are correct. Would it be possible to restructure the code in a way that follows those cases? I think that would make it much easier to follow.

Tue, Jan 15, 3:34 AM · Restricted Project

Wed, Jan 2

nhaehnle accepted D55179: AMDGPU: Remove v16i8 from register classes.

LGTM

Wed, Jan 2, 1:04 AM

Wed, Dec 26

nhaehnle added a comment to D56002: [AMDGPU] Fix a weird WWM intrinsic issue..

The only user of canReadVGPR is addUsersToMoveToVALUWorklist. Since the intended semantics of canReadVGPR aren't at all clear from the name, might I suggesting folding it into its only user?

Wed, Dec 26, 8:36 AM · Restricted Project

Tue, Dec 18

nhaehnle accepted D55602: AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged.

Thanks, LGTM

Tue, Dec 18, 7:21 AM

Dec 12 2018

nhaehnle accepted D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..

LGTM

Dec 12 2018, 2:12 AM · Restricted Project

Dec 10 2018

nhaehnle accepted D55367: [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D..

LGTM

Dec 10 2018, 7:44 AM · Restricted Project

Dec 7 2018

nhaehnle added inline comments to D55435: [AMDGPU] Fix discarded result of addAttribute.
Dec 7 2018, 8:11 AM
nhaehnle added inline comments to D55402: [AMDGPU] Simplify negated condition.
Dec 7 2018, 8:08 AM
nhaehnle added a comment to D55369: AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1.

I thought mesa was moving to stop using the relocations at all for this?

Dec 7 2018, 7:56 AM
nhaehnle added a comment to D55367: [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D..

Please make the change apply to Mesa3D as well.

Dec 7 2018, 7:54 AM · Restricted Project
nhaehnle added a comment to D55181: AMDGPU: Convert tests away from llvm.SI.load.const.

Note, we still have one use of this in Mesa.

Dec 7 2018, 7:51 AM
nhaehnle accepted D55180: AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load.

LGTM

Dec 7 2018, 7:50 AM
nhaehnle added a comment to D55179: AMDGPU: Remove v16i8 from register classes.

Can we also remove the setTruncStoreAction mention of v16i8 from SIISelLowering?

Dec 7 2018, 7:49 AM
nhaehnle accepted D55176: AMDGPU: Remove llvm.SI.buffer.load.dword.

LGTM

Dec 7 2018, 7:48 AM
nhaehnle accepted D55177: AMDGPU: Remove llvm.SI.tbuffer.store.

LGTM

Dec 7 2018, 7:48 AM
nhaehnle accepted D55175: AMDGPU: Remove llvm.AMDGPU.kill.

LGTM

Dec 7 2018, 7:47 AM
nhaehnle accepted D50633: [AMDGPU] Add new Mode Register pass.

Oops, I thought I'd posted this earlier... LGTM.

Dec 7 2018, 7:44 AM

Dec 6 2018

nhaehnle created D55369: AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1.
Dec 6 2018, 7:01 AM

Dec 1 2018

nhaehnle added a comment to D53493: [DA] GPUDivergenceAnalysis for unstructured GPU kernels.

LGTM. Do you need me to commit this?

Yes, i cannot commit myself.

What are the current future plans for this whole series? I think we should enable this by default, but it would be nice to get rid of the old code, which requires addressing the irreducible test case. I vaguely recall that you mentioned a way to address that, but I can't find that now.

I will open a new patch series to address the irreducibility issue. That series should also involve deprecating the LegacyDivergenceAnalysis for good.

Dec 1 2018, 3:06 AM

Nov 30 2018

nhaehnle accepted D55093: [AMDGPU] Disable SReg Global LD/ST, perf regression.

Sad, but LGTM

Nov 30 2018, 6:09 AM
nhaehnle added a comment to D50633: [AMDGPU] Add new Mode Register pass.

One last thing. If I'm right about this, it'd be good to reduce the check complexity and indentation levels. Then I'm happy :)

Nov 30 2018, 6:08 AM
nhaehnle updated the diff for D54340: AMDGPU: Fix various issues around the VirtReg2Value mapping.
  • prefer to use !TRI.isSGPRReg
Nov 30 2018, 5:57 AM

Nov 29 2018

nhaehnle accepted D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

LGTM

Nov 29 2018, 4:04 AM
nhaehnle added a comment to D52416: Allow FP types for atomicrmw xchg.

Should there be a change to LangRef?

Nov 29 2018, 4:00 AM
nhaehnle added a comment to D52785: [PseudoSourceValue] New category to represent floating-point status.

FWIW, I also think it's important not to drop MachineMemOperands. As Hal points out, MMOs carry non-optional information such as volatile-ness. And the Size is non-optional information at least for the AMDGPU backend.

Nov 29 2018, 3:51 AM
nhaehnle added inline comments to D50633: [AMDGPU] Add new Mode Register pass.
Nov 29 2018, 3:34 AM
nhaehnle added a comment to D50633: [AMDGPU] Add new Mode Register pass.

Thank you. One question left though.

Nov 29 2018, 3:29 AM
nhaehnle added a comment to D54340: AMDGPU: Fix various issues around the VirtReg2Value mapping.

ping

Nov 29 2018, 3:04 AM
nhaehnle added a comment to D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands.

ping^4

Nov 29 2018, 2:59 AM

Nov 25 2018

nhaehnle added a comment to D54855: [AMDGPU] An exp must be branched over if exec=0.

This should already be fixed in trunk. Trunk has a function TII->hasUnwantedEffectsWhenEXECEmpty to cover this, and this patch simply shouldn't apply.

Nov 25 2018, 2:34 PM

Nov 22 2018

nhaehnle updated the diff for D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking.
  • add TODO comment about the opportunity of keeping per-event timelines
Nov 22 2018, 5:16 AM
nhaehnle updated the diff for D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.
  • Early-out in generateWaitcntInstBefore when no wait is needed. This helps keep the nesting complexity in check for later changes.
Nov 22 2018, 5:16 AM

Nov 19 2018

nhaehnle added a comment to D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands.

ping^3

Nov 19 2018, 4:05 AM
nhaehnle added a comment to D54231: AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo.

ping

Nov 19 2018, 4:03 AM
nhaehnle added a comment to D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking.

I think the remarks by @t-tye point to a potentially useful optimization, but that should not be part of this patch.

Nov 19 2018, 4:03 AM
nhaehnle added a comment to D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.

ping

Nov 19 2018, 4:02 AM
nhaehnle added inline comments to D54649: [FPEnv] Rough out constrained FCmp intrinsics.
Nov 19 2018, 3:59 AM
nhaehnle accepted D54606: [AMDGPU] Convert insert_vector_elt into set of selects.

Yeah, the separate DAG combine for scalar select w/ undef is a better solution. LGTM.

Nov 19 2018, 3:54 AM
nhaehnle added inline comments to D54649: [FPEnv] Rough out constrained FCmp intrinsics.
Nov 19 2018, 3:51 AM

Nov 16 2018

nhaehnle added a comment to D54606: [AMDGPU] Convert insert_vector_elt into set of selects.

However, why does code with undef vectors look so bad? For example, in float4_inselt, the fact that the initial vector is undef should allow us to just store a splat of 1.0.

Yes, I noticed that too. That needs to be a separate optimization. As far as I understand "insert_vector_element undef, %var, %idx" should not even come to this point. It needs to be replaced by build_vector (n x %var) regardless of the thresholds and heuristics I am using, e.g. earlier (higher in the same function I think).

Nov 16 2018, 10:07 AM
nhaehnle added a comment to D54516: [AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem.
In D54516#1301072, @tpr wrote:

EarlyCSE does seem to common up in this situation. And, if I disable that, I get GVN commoning it up.

By "disable", do you mean modifying EarlyCSE to not touch convergent calls? What if you do that for GVN as well? GVN::ValueTable::lookupOrAddCall() seems to be the right place for that. Such spot fixes as a useful step in the right direction, to be preferred over repeating the asm hack. One already exists in GVNHoist. This may sound a bit whackamoley, but the "effort" that @nhaehnle was asking about is essentially a more organized way to audit all the places where convergent calls should be handled specially. That project has not gained sufficient motivation yet to commit to.

Nov 16 2018, 9:58 AM
nhaehnle added a comment to D54516: [AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem.

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

readnone is literally readnone. Maybe you're mixing this up with inaccessiblememonly?

No, I meant readnone, since AFAIK that's what IntrNoMem here maps to. The bit about "caller-visible state" was taken directly from the langref entry for readnone. My point was that in something like:

WWM code
if (cc) {
  other stuff v1
} else {
  identical WWM code
  other stuff v2
}

it wouldn't be allowed for LLVM to rewrite the use of the second WWM to point to the first, even now, since the semantics of readnone aren't strong enough to guarantee that the two calls return the same thing, at least as far as I understand (I think someone else, maybe Tom, explained this to me over IRC a while ago). Your issue is different, and indeed removing IntrNoMem isn't going to help with that at all. I agree that we need a better solution for that. But for the current patch, I can't think of a situation where removing NoMem/readonly would disallow a transform that shouldn't be allowed.

Nov 16 2018, 6:25 AM
nhaehnle added a comment to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Thank you, this looks much cleaner. I only have a small number of nitpicks left.

Nov 16 2018, 4:43 AM
nhaehnle added a comment to D54516: [AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem.

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

Nov 16 2018, 4:23 AM
nhaehnle added a comment to D54516: [AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem.

I agree this needs a test case.

Nov 16 2018, 4:18 AM
nhaehnle added a comment to D54340: AMDGPU: Fix various issues around the VirtReg2Value mapping.

Is there code for the lazy map that should be cleaned up now?

Nov 16 2018, 4:13 AM
nhaehnle updated the diff for D54340: AMDGPU: Fix various issues around the VirtReg2Value mapping.
  • add LLVM_ATTRIBUTE_UNUSED to prevent warning in optimized builds
  • cleanup isSDNodeSourceOfDivergence a bit more
Nov 16 2018, 4:12 AM
nhaehnle added a comment to D54606: [AMDGPU] Convert insert_vector_elt into set of selects.

Mostly looks good to me.

Nov 16 2018, 4:10 AM

Nov 15 2018

nhaehnle accepted D53840: Preprocessing support in tablegen.

LGTM

Nov 15 2018, 12:35 PM
nhaehnle added a comment to D50633: [AMDGPU] Add new Mode Register pass.

The update seems to have messed up the indentation of comments in a few places.

Nov 15 2018, 10:54 AM

Nov 9 2018

nhaehnle created D54340: AMDGPU: Fix various issues around the VirtReg2Value mapping.
Nov 9 2018, 11:51 AM
nhaehnle added a comment to D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.

Note, the new waitcnt-preexisting.mir test shows this change.

Nov 9 2018, 7:15 AM
nhaehnle updated the diff for D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.

Turns out I was a bit too quick in my analysis of the second point.
I thought the overly conservative waitcnt was due to the control flow
in the shader I was looking at, but it was actually due to a pre-existing
waitcnt.

Nov 9 2018, 7:14 AM
nhaehnle added a comment to D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking.

This is sufficient, because whenever only one event of a count type is

pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).

Just wondered if can do better than using 0. Instead can the lowest count be used as this should be sufficient to ensure all out-of-order events in this have happened? I had discussed this with Bob at one time.

Nov 9 2018, 6:30 AM
nhaehnle accepted D53493: [DA] GPUDivergenceAnalysis for unstructured GPU kernels.
Nov 9 2018, 5:51 AM
nhaehnle added a comment to D53493: [DA] GPUDivergenceAnalysis for unstructured GPU kernels.

LGTM. Do you need me to commit this?

Nov 9 2018, 5:51 AM
nhaehnle accepted D54235: [AMDGPU] Always pass TRI into findRegister[Use/Def]OperandIdx.

LGTM

Nov 9 2018, 5:40 AM
nhaehnle added a comment to D54128: Fix MachineInstr::findRegisterUseOperandIdx subreg checks.

The code change looks fine to me, but it should be possible to cleanup the test case a bit.

Nov 9 2018, 5:38 AM
nhaehnle accepted D54164: [AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]Z.

Two stylistic nitpicks. LGTM with those addressed.

Nov 9 2018, 5:33 AM

Nov 8 2018

nhaehnle added inline comments to D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..
Nov 8 2018, 10:18 AM · Restricted Project
nhaehnle added a comment to D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..

The huge switch statements are a poster child for the generic SearchableTables, somewhat analogous to what already exists for MIMGInstructions. Sketching it out:

class LoadStoreBaseOpcode {
  LoadStoreBaseOpcode BaseOpcode = !cast<LoadStoreBaseOpcode>(NAME);
  bit Srsrc;
  bit Sbase;
  ...
}
Nov 8 2018, 9:12 AM · Restricted Project

Nov 7 2018

nhaehnle added a comment to D53283: AMDGPU: Divergence-driven selection of scalar buffer load intrinsics.

Hi Nicolai,

Fyi, This introduced a regression with Mass Effect Andromeda with DXVK and RADV on Polaris10. See https://bugs.freedesktop.org/show_bug.cgi?id=108611

Nov 7 2018, 2:28 PM
nhaehnle added a parent revision for D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state: D54225: AMDGPU/InsertWaitcnts: Some more const-correctness.
Nov 7 2018, 2:19 PM
nhaehnle added a child revision for D54225: AMDGPU/InsertWaitcnts: Some more const-correctness: D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.
Nov 7 2018, 2:19 PM
nhaehnle added a parent revision for D54227: AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types: D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.
Nov 7 2018, 2:19 PM
nhaehnle added a child revision for D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state: D54227: AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types.
Nov 7 2018, 2:19 PM
nhaehnle added a parent revision for D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking: D54227: AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types.
Nov 7 2018, 2:19 PM
nhaehnle added a child revision for D54227: AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types: D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking.
Nov 7 2018, 2:19 PM
nhaehnle added a child revision for D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking: D54229: AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning.
Nov 7 2018, 2:19 PM
nhaehnle added a parent revision for D54229: AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning: D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking.
Nov 7 2018, 2:19 PM
nhaehnle added a parent revision for D54230: AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points: D54229: AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning.
Nov 7 2018, 2:17 PM
nhaehnle added a child revision for D54229: AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning: D54230: AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points.
Nov 7 2018, 2:17 PM
nhaehnle added a parent revision for D54231: AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo: D54230: AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points.
Nov 7 2018, 2:17 PM
nhaehnle added a child revision for D54230: AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points: D54231: AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo.
Nov 7 2018, 2:17 PM
nhaehnle created D54231: AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo.
Nov 7 2018, 2:17 PM
nhaehnle created D54230: AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points.
Nov 7 2018, 2:16 PM
nhaehnle created D54229: AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning.
Nov 7 2018, 2:16 PM
nhaehnle created D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking.
Nov 7 2018, 2:16 PM
nhaehnle created D54227: AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types.
Nov 7 2018, 2:15 PM
nhaehnle created D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.
Nov 7 2018, 2:15 PM
nhaehnle created D54225: AMDGPU/InsertWaitcnts: Some more const-correctness.
Nov 7 2018, 2:15 PM

Nov 6 2018

nhaehnle added a comment to D54153: Fix compilation issue in VS2017 with Clang-tablegen and LLVM-tablegen.
In D54153#1288936, @rnk wrote:

The issue is that XXX-tablegen-host .vcxprojects are explicitly calling cmake --build, thus not going through MSBuild's dependency graph. It looks like the Ninja generator doesn't do that.

Really? I thought it did... The main reason I don't use the "optimized tablegen" build configuration is because it doesn't keep everything in a single ninja build file. I just always compile with optimization, asserts, and debug info enabled.

Nov 6 2018, 11:12 AM
nhaehnle added a comment to D54153: Fix compilation issue in VS2017 with Clang-tablegen and LLVM-tablegen.

Interesting. Thanks for digging into buildsystem stuff!

Nov 6 2018, 7:55 AM
nhaehnle added a comment to D53840: Preprocessing support in tablegen.

Thanks for making those changes. The EOF handling does look better.

Nov 6 2018, 7:47 AM

Nov 5 2018

nhaehnle added a comment to D50633: [AMDGPU] Add new Mode Register pass.

As an overall algorithmic remark: I like the organization of the pass into phases, because it provides a path forward to an additional optimization.

Nov 5 2018, 3:51 AM
nhaehnle accepted D53930: [AMDGPU] Fix the new atomic optimizer in pixel shaders..
Nov 5 2018, 3:02 AM · Restricted Project
nhaehnle added a comment to D53840: Preprocessing support in tablegen.

I have to say I'm feeling a bit ambivalent about this. I'd say it would be nicer to have a mechanism that integrates with the rest of the TableGen language, but that's admittedly non-trivial. So I guess this approach is okay.

Thank you for the prompt reply, Nikolai! Yes, I decided to keep the preprocessing aside to minimize changes in the actual lexing.

The handling of end-of-files is a bit wonky. Have you considered just returning EOF from getNextChar at end of file, even if there's a parent file, and changing the EOF case in LexToken to just loop (or tail-recurse, I suppose) if it was the EOF of an included file?

This is possible, though, EOF handling will be required in SkipCComment() and LexBracket(). If you agree with me changing these routines, I can do this. Returning EOF from getNextChar() allows handling cross-file C-style comments and bracket contructs - I can disallow these use-cases or keep supporting them. What do you think I should do?

Nov 5 2018, 2:58 AM

Nov 4 2018

nhaehnle created D54086: AMDGPU/InsertWaitcnts: Cleanup some old cruft (NFCI).
Nov 4 2018, 1:48 PM
nhaehnle created D54085: AMDGPU/InsertWaitcnts: Remove kill-related logic.
Nov 4 2018, 1:46 PM
nhaehnle added a comment to D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.

This regresses the following tests on RADV:

dEQP-VK.glsl.loops.special.for_uniform_iterations.select_iteration_count_fragment,Fail
dEQP-VK.glsl.loops.special.for_uniform_iterations.select_iteration_count_vertex,Fail
dEQP-VK.glsl.loops.special.while_uniform_iterations.select_iteration_count_fragment,Fail
dEQP-VK.glsl.loops.special.while_uniform_iterations.select_iteration_count_vertex,Fail

Nov 4 2018, 1:38 PM

Oct 31 2018

nhaehnle accepted D53888: [SelectionDAG] Handle constant range [0,1) in lowerRangeToAssertZExt.

LGTM

Oct 31 2018, 8:06 AM
nhaehnle added a comment to D53283: AMDGPU: Divergence-driven selection of scalar buffer load intrinsics.

Thanks for the heads up. I'll take a look.

Oct 31 2018, 7:38 AM
nhaehnle added inline comments to D53931: TableGen: Fix ASAN error.
Oct 31 2018, 7:24 AM
nhaehnle created D53931: TableGen: Fix ASAN error.
Oct 31 2018, 7:07 AM
nhaehnle added a comment to D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.

It seems like we have to further develop this approach to deal with the scalar comparison instructions.
For instance, S_CMP_* does not produce any result but implicitly defines SCC.
Thus, InstrEmitter will insert the copies all the time.
Since DAG operator SETCC produces i1 value there will be the SCC to VReg_1 copies.
I not trying to invent a method to lower that copies.
First issue: in case all the uses are not divergent I don't need the V_CND_MASK -1,0 -> V_CMP_NE 0 pair
I need S_CSELECT -1, 0 immediately after the definition (to save SCC) and S_CMP_NE 0 just before use to rematerialize SCC
Second issue: I only need to save/restore if there are SCC defs in between.
So, we need to take into account not divergent flow as well.

Oct 31 2018, 5:37 AM
nhaehnle updated the diff for D51995: AMDGPU: Generate VALU ThreeOp Integer instructions.

Add missing V_XAD_U32 pattern.

Oct 31 2018, 5:17 AM
nhaehnle added inline comments to D53888: [SelectionDAG] Handle constant range [0,1) in lowerRangeToAssertZExt.
Oct 31 2018, 4:27 AM
nhaehnle added a comment to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Thanks for making the changes.

Oct 31 2018, 4:18 AM
nhaehnle added a comment to D53840: Preprocessing support in tablegen.

I have to say I'm feeling a bit ambivalent about this. I'd say it would be nicer to have a mechanism that integrates with the rest of the TableGen language, but that's admittedly non-trivial. So I guess this approach is okay.

Oct 31 2018, 4:13 AM
nhaehnle added a comment to D53815: [TableGen] Better error checking for TIED_TO constraints..

Yay for better error messages! One comment about unfortunate variable naming...

Oct 31 2018, 3:28 AM