nhaehnle (Nicolai Hähnle)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 9 2015, 4:06 AM (162 w, 3 d)

Recent Activity

Today

nhaehnle added a comment to D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands.

ping^3

Mon, Nov 19, 4:05 AM
nhaehnle added a comment to D54231: AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo.

ping

Mon, Nov 19, 4:03 AM
nhaehnle added a comment to D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking.

I think the remarks by @t-tye point to a potentially useful optimization, but that should not be part of this patch.

Mon, Nov 19, 4:03 AM
nhaehnle added a comment to D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.

ping

Mon, Nov 19, 4:02 AM
nhaehnle added inline comments to D54649: [FPEnv] Rough out constrained FCmp intrinsics.
Mon, Nov 19, 3:59 AM
nhaehnle accepted D54606: [AMDGPU] Convert insert_vector_elt into set of selects.

Yeah, the separate DAG combine for scalar select w/ undef is a better solution. LGTM.

Mon, Nov 19, 3:54 AM
nhaehnle added inline comments to D54649: [FPEnv] Rough out constrained FCmp intrinsics.
Mon, Nov 19, 3:51 AM

Fri, Nov 16

nhaehnle added a comment to D54606: [AMDGPU] Convert insert_vector_elt into set of selects.

However, why does code with undef vectors look so bad? For example, in float4_inselt, the fact that the initial vector is undef should allow us to just store a splat of 1.0.

Yes, I noticed that too. That needs to be a separate optimization. As far as I understand "insert_vector_element undef, %var, %idx" should not even come to this point. It needs to be replaced by build_vector (n x %var) regardless of the thresholds and heuristics I am using, e.g. earlier (higher in the same function I think).

Fri, Nov 16, 10:07 AM
nhaehnle added a comment to D54516: [AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem.
In D54516#1301072, @tpr wrote:

EarlyCSE does seem to common up in this situation. And, if I disable that, I get GVN commoning it up.

By "disable", do you mean modifying EarlyCSE to not touch convergent calls? What if you do that for GVN as well? GVN::ValueTable::lookupOrAddCall() seems to be the right place for that. Such spot fixes as a useful step in the right direction, to be preferred over repeating the asm hack. One already exists in GVNHoist. This may sound a bit whackamoley, but the "effort" that @nhaehnle was asking about is essentially a more organized way to audit all the places where convergent calls should be handled specially. That project has not gained sufficient motivation yet to commit to.

Fri, Nov 16, 9:58 AM
nhaehnle added a comment to D54516: [AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem.

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

readnone is literally readnone. Maybe you're mixing this up with inaccessiblememonly?

No, I meant readnone, since AFAIK that's what IntrNoMem here maps to. The bit about "caller-visible state" was taken directly from the langref entry for readnone. My point was that in something like:

WWM code
if (cc) {
  other stuff v1
} else {
  identical WWM code
  other stuff v2
}

it wouldn't be allowed for LLVM to rewrite the use of the second WWM to point to the first, even now, since the semantics of readnone aren't strong enough to guarantee that the two calls return the same thing, at least as far as I understand (I think someone else, maybe Tom, explained this to me over IRC a while ago). Your issue is different, and indeed removing IntrNoMem isn't going to help with that at all. I agree that we need a better solution for that. But for the current patch, I can't think of a situation where removing NoMem/readonly would disallow a transform that shouldn't be allowed.

Fri, Nov 16, 6:25 AM
nhaehnle added a comment to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Thank you, this looks much cleaner. I only have a small number of nitpicks left.

Fri, Nov 16, 4:43 AM
nhaehnle added a comment to D54516: [AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem.

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

Fri, Nov 16, 4:23 AM
nhaehnle added a comment to D54516: [AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem.

I agree this needs a test case.

Fri, Nov 16, 4:18 AM
nhaehnle added a comment to D54340: AMDGPU: Fix various issues around the VirtReg2Value mapping.

Is there code for the lazy map that should be cleaned up now?

Fri, Nov 16, 4:13 AM
nhaehnle updated the diff for D54340: AMDGPU: Fix various issues around the VirtReg2Value mapping.
  • add LLVM_ATTRIBUTE_UNUSED to prevent warning in optimized builds
  • cleanup isSDNodeSourceOfDivergence a bit more
Fri, Nov 16, 4:12 AM
nhaehnle added a comment to D54606: [AMDGPU] Convert insert_vector_elt into set of selects.

Mostly looks good to me.

Fri, Nov 16, 4:10 AM

Thu, Nov 15

nhaehnle accepted D53840: Preprocessing support in tablegen.

LGTM

Thu, Nov 15, 12:35 PM
nhaehnle added a comment to D50633: [AMDGPU] Add new Mode Register pass.

The update seems to have messed up the indentation of comments in a few places.

Thu, Nov 15, 10:54 AM

Fri, Nov 9

nhaehnle created D54340: AMDGPU: Fix various issues around the VirtReg2Value mapping.
Fri, Nov 9, 11:51 AM
nhaehnle added a comment to D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.

Note, the new waitcnt-preexisting.mir test shows this change.

Fri, Nov 9, 7:15 AM
nhaehnle updated the diff for D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.

Turns out I was a bit too quick in my analysis of the second point.
I thought the overly conservative waitcnt was due to the control flow
in the shader I was looking at, but it was actually due to a pre-existing
waitcnt.

Fri, Nov 9, 7:14 AM
nhaehnle added a comment to D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking.

This is sufficient, because whenever only one event of a count type is

pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).

Just wondered if can do better than using 0. Instead can the lowest count be used as this should be sufficient to ensure all out-of-order events in this have happened? I had discussed this with Bob at one time.

Fri, Nov 9, 6:30 AM
nhaehnle accepted D53493: [DA] GPUDivergenceAnalysis for unstructured GPU kernels.
Fri, Nov 9, 5:51 AM
nhaehnle added a comment to D53493: [DA] GPUDivergenceAnalysis for unstructured GPU kernels.

LGTM. Do you need me to commit this?

Fri, Nov 9, 5:51 AM
nhaehnle accepted D54235: [AMDGPU] Always pass TRI into findRegister[Use/Def]OperandIdx.

LGTM

Fri, Nov 9, 5:40 AM
nhaehnle added a comment to D54128: Fix MachineInstr::findRegisterUseOperandIdx subreg checks.

The code change looks fine to me, but it should be possible to cleanup the test case a bit.

Fri, Nov 9, 5:38 AM
nhaehnle accepted D54164: [AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]Z.

Two stylistic nitpicks. LGTM with those addressed.

Fri, Nov 9, 5:33 AM

Thu, Nov 8

nhaehnle added inline comments to D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..
Thu, Nov 8, 10:18 AM · Restricted Project
nhaehnle added a comment to D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..

The huge switch statements are a poster child for the generic SearchableTables, somewhat analogous to what already exists for MIMGInstructions. Sketching it out:

class LoadStoreBaseOpcode {
  LoadStoreBaseOpcode BaseOpcode = !cast<LoadStoreBaseOpcode>(NAME);
  bit Srsrc;
  bit Sbase;
  ...
}
Thu, Nov 8, 9:12 AM · Restricted Project

Wed, Nov 7

nhaehnle added a comment to D53283: AMDGPU: Divergence-driven selection of scalar buffer load intrinsics.

Hi Nicolai,

Fyi, This introduced a regression with Mass Effect Andromeda with DXVK and RADV on Polaris10. See https://bugs.freedesktop.org/show_bug.cgi?id=108611

Wed, Nov 7, 2:28 PM
nhaehnle added a dependency for D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state: D54225: AMDGPU/InsertWaitcnts: Some more const-correctness.
Wed, Nov 7, 2:19 PM
nhaehnle added a dependent revision for D54225: AMDGPU/InsertWaitcnts: Some more const-correctness: D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.
Wed, Nov 7, 2:19 PM
nhaehnle added a dependency for D54227: AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types: D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.
Wed, Nov 7, 2:19 PM
nhaehnle added a dependent revision for D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state: D54227: AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types.
Wed, Nov 7, 2:19 PM
nhaehnle added a dependency for D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking: D54227: AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types.
Wed, Nov 7, 2:19 PM
nhaehnle added a dependent revision for D54227: AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types: D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking.
Wed, Nov 7, 2:19 PM
nhaehnle added a dependent revision for D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking: D54229: AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning.
Wed, Nov 7, 2:19 PM
nhaehnle added a dependency for D54229: AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning: D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking.
Wed, Nov 7, 2:19 PM
nhaehnle added a dependency for D54230: AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points: D54229: AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning.
Wed, Nov 7, 2:17 PM
nhaehnle added a dependent revision for D54229: AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning: D54230: AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points.
Wed, Nov 7, 2:17 PM
nhaehnle added a dependency for D54231: AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo: D54230: AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points.
Wed, Nov 7, 2:17 PM
nhaehnle added a dependent revision for D54230: AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points: D54231: AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo.
Wed, Nov 7, 2:17 PM
nhaehnle created D54231: AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo.
Wed, Nov 7, 2:17 PM
nhaehnle created D54230: AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points.
Wed, Nov 7, 2:16 PM
nhaehnle created D54229: AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning.
Wed, Nov 7, 2:16 PM
nhaehnle created D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking.
Wed, Nov 7, 2:16 PM
nhaehnle created D54227: AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types.
Wed, Nov 7, 2:15 PM
nhaehnle created D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state.
Wed, Nov 7, 2:15 PM
nhaehnle created D54225: AMDGPU/InsertWaitcnts: Some more const-correctness.
Wed, Nov 7, 2:15 PM

Tue, Nov 6

nhaehnle added a comment to D54153: Fix compilation issue in VS2017 with Clang-tablegen and LLVM-tablegen.
In D54153#1288936, @rnk wrote:

The issue is that XXX-tablegen-host .vcxprojects are explicitly calling cmake --build, thus not going through MSBuild's dependency graph. It looks like the Ninja generator doesn't do that.

Really? I thought it did... The main reason I don't use the "optimized tablegen" build configuration is because it doesn't keep everything in a single ninja build file. I just always compile with optimization, asserts, and debug info enabled.

Tue, Nov 6, 11:12 AM
nhaehnle added a comment to D54153: Fix compilation issue in VS2017 with Clang-tablegen and LLVM-tablegen.

Interesting. Thanks for digging into buildsystem stuff!

Tue, Nov 6, 7:55 AM
nhaehnle added a comment to D53840: Preprocessing support in tablegen.

Thanks for making those changes. The EOF handling does look better.

Tue, Nov 6, 7:47 AM

Mon, Nov 5

nhaehnle added a comment to D50633: [AMDGPU] Add new Mode Register pass.

As an overall algorithmic remark: I like the organization of the pass into phases, because it provides a path forward to an additional optimization.

Mon, Nov 5, 3:51 AM
nhaehnle accepted D53930: [AMDGPU] Fix the new atomic optimizer in pixel shaders..
Mon, Nov 5, 3:02 AM · Restricted Project
nhaehnle added a comment to D53840: Preprocessing support in tablegen.

I have to say I'm feeling a bit ambivalent about this. I'd say it would be nicer to have a mechanism that integrates with the rest of the TableGen language, but that's admittedly non-trivial. So I guess this approach is okay.

Thank you for the prompt reply, Nikolai! Yes, I decided to keep the preprocessing aside to minimize changes in the actual lexing.

The handling of end-of-files is a bit wonky. Have you considered just returning EOF from getNextChar at end of file, even if there's a parent file, and changing the EOF case in LexToken to just loop (or tail-recurse, I suppose) if it was the EOF of an included file?

This is possible, though, EOF handling will be required in SkipCComment() and LexBracket(). If you agree with me changing these routines, I can do this. Returning EOF from getNextChar() allows handling cross-file C-style comments and bracket contructs - I can disallow these use-cases or keep supporting them. What do you think I should do?

Mon, Nov 5, 2:58 AM

Sun, Nov 4

nhaehnle created D54086: AMDGPU/InsertWaitcnts: Cleanup some old cruft (NFCI).
Sun, Nov 4, 1:48 PM
nhaehnle created D54085: AMDGPU/InsertWaitcnts: Remove kill-related logic.
Sun, Nov 4, 1:46 PM
nhaehnle added a comment to D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.

This regresses the following tests on RADV:

dEQP-VK.glsl.loops.special.for_uniform_iterations.select_iteration_count_fragment,Fail
dEQP-VK.glsl.loops.special.for_uniform_iterations.select_iteration_count_vertex,Fail
dEQP-VK.glsl.loops.special.while_uniform_iterations.select_iteration_count_fragment,Fail
dEQP-VK.glsl.loops.special.while_uniform_iterations.select_iteration_count_vertex,Fail

Sun, Nov 4, 1:38 PM

Wed, Oct 31

nhaehnle accepted D53888: [SelectionDAG] Handle constant range [0,1) in lowerRangeToAssertZExt.

LGTM

Wed, Oct 31, 8:06 AM
nhaehnle added a comment to D53283: AMDGPU: Divergence-driven selection of scalar buffer load intrinsics.

Thanks for the heads up. I'll take a look.

Wed, Oct 31, 7:38 AM
nhaehnle added inline comments to D53931: TableGen: Fix ASAN error.
Wed, Oct 31, 7:24 AM
nhaehnle created D53931: TableGen: Fix ASAN error.
Wed, Oct 31, 7:07 AM
nhaehnle added a comment to D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.

It seems like we have to further develop this approach to deal with the scalar comparison instructions.
For instance, S_CMP_* does not produce any result but implicitly defines SCC.
Thus, InstrEmitter will insert the copies all the time.
Since DAG operator SETCC produces i1 value there will be the SCC to VReg_1 copies.
I not trying to invent a method to lower that copies.
First issue: in case all the uses are not divergent I don't need the V_CND_MASK -1,0 -> V_CMP_NE 0 pair
I need S_CSELECT -1, 0 immediately after the definition (to save SCC) and S_CMP_NE 0 just before use to rematerialize SCC
Second issue: I only need to save/restore if there are SCC defs in between.
So, we need to take into account not divergent flow as well.

Wed, Oct 31, 5:37 AM
nhaehnle updated the diff for D51995: AMDGPU: Generate VALU ThreeOp Integer instructions.

Add missing V_XAD_U32 pattern.

Wed, Oct 31, 5:17 AM
nhaehnle added inline comments to D53888: [SelectionDAG] Handle constant range [0,1) in lowerRangeToAssertZExt.
Wed, Oct 31, 4:27 AM
nhaehnle added a comment to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Thanks for making the changes.

Wed, Oct 31, 4:18 AM
nhaehnle added a comment to D53840: Preprocessing support in tablegen.

I have to say I'm feeling a bit ambivalent about this. I'd say it would be nicer to have a mechanism that integrates with the rest of the TableGen language, but that's admittedly non-trivial. So I guess this approach is okay.

Wed, Oct 31, 4:13 AM
nhaehnle added a comment to D53815: [TableGen] Better error checking for TIED_TO constraints..

Yay for better error messages! One comment about unfortunate variable naming...

Wed, Oct 31, 3:28 AM
nhaehnle accepted D48144: [Support] Teach YAMLIO about polymorphic types.

One small nitpick, apart from that LGTM.

Wed, Oct 31, 3:20 AM

Tue, Oct 30

nhaehnle added a comment to D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands.

ping

Tue, Oct 30, 5:41 AM

Mon, Oct 29

nhaehnle accepted D53750: [AMDGPU] support image load/store a16.

Thanks, LGTM

Mon, Oct 29, 5:09 AM · Restricted Project

Sat, Oct 27

nhaehnle added a comment to D53760: [SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodes.

The common code changes as well as the AMDGPU bits look good to me.

Sat, Oct 27, 8:37 AM
nhaehnle accepted D53750: [AMDGPU] support image load/store a16.

The function names of the a16.d16 tests should probably use vNf16 instead of vNf32. Please change that before committing, and please add a test for 2darraymsaa, to cover the v4i16 case of coordinates.

Sat, Oct 27, 8:30 AM · Restricted Project
nhaehnle added a comment to D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics.

Thank you for making these changes. I have some detail comments inline and some high-level remarks:

Sat, Oct 27, 7:58 AM

Wed, Oct 24

nhaehnle updated the diff for D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.
  • fix cases where SCC is clobbered (also add a corresponding test)
  • use getWavefrontSize()
Wed, Oct 24, 10:36 AM
nhaehnle added a comment to D53424: Enable thread specific cl::opt values for multi-threaded support.
  • Given that cl::opt’s are usually global it feels extremely weird that reading and writing to these could actually be a thread local operation.

That's purely aesthetic opinion and not an argument at all.

Wed, Oct 24, 4:19 AM
nhaehnle added a comment to D53424: Enable thread specific cl::opt values for multi-threaded support.

May I ask why you think it's important to move away from cl::opt in the first place? What purpose does it actually solve?

Wed, Oct 24, 4:13 AM
nhaehnle added a comment to D52677: [AMDGPU] Match v_swap_b32.

The code LGTM; don't know about the question on the tests.

Wed, Oct 24, 3:59 AM
nhaehnle added a comment to D53594: [GlobalISel] Introduce G_BUILD_VECTOR and G_CONCAT_VECTOR opcodes.

Are two different opcodes for this really needed? Why not a single opcode with uniformly typed operands, either scalar or vector, which concatenates the operands as an appropriately sized vector? I know there's precedent for the separation elsewhere in LLVM, but it has always seemed redundant to me personally. The code for handling the BUILD vs. CONCAT cases tends to be very similar.

Wed, Oct 24, 3:34 AM

Tue, Oct 23

nhaehnle added a comment to D52100: [tblgen] Allow FixedLenDecoderEmitter to use APInt-like objects as InsnType.

Apart from the issue with the assertion, this change looks good to me.

Tue, Oct 23, 7:55 AM
nhaehnle added a comment to D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.

Thanks for taking a look.

Tue, Oct 23, 3:02 AM
nhaehnle updated the diff for D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.

Address review comments

Tue, Oct 23, 3:02 AM

Mon, Oct 22

nhaehnle updated the diff for D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.
  • fix recently added tests
  • formatting fixes
Mon, Oct 22, 7:40 AM
nhaehnle created D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.
Mon, Oct 22, 6:47 AM
nhaehnle added inline comments to D52369: [tblgen][disasm] Allow multiple encodings to disassemble to the same instruction.
Mon, Oct 22, 2:09 AM
nhaehnle added inline comments to D52100: [tblgen] Allow FixedLenDecoderEmitter to use APInt-like objects as InsnType.
Mon, Oct 22, 1:40 AM
nhaehnle added a comment to D53424: Enable thread specific cl::opt values for multi-threaded support.

Can you explain what you would use per-thread command line options for?
Intuitively I would not expect actual commandline users wanting to set options per thread. If you need it to tweak compiler behavior then it might be better to find ways to encode the information in TargetOptions.h, function attributes or similar, so we have a streamlined way of setting them independently of programmatically modifying commandline options.

Mon, Oct 22, 1:05 AM

Oct 20 2018

nhaehnle added a comment to D53424: Enable thread specific cl::opt values for multi-threaded support.

Haven't looked too much at the implementation details yet, but I definitely like the idea.

Oct 20 2018, 3:08 AM

Oct 18 2018

nhaehnle added a comment to D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

Sure, I'll take care of it.

Oct 18 2018, 1:36 AM

Oct 17 2018

nhaehnle added a comment to D53160: AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI.

LGTM. Is this only actually a problem with the UB because we don't bother trying to set m0 to the allocated size?

Oct 17 2018, 8:35 AM
nhaehnle abandoned D53162: [DataLayout] Add bit width of pointers to global values.

I'm dropping this change. We don't need it anymore, given changes to the parent revision.

Oct 17 2018, 5:18 AM
nhaehnle added a comment to D53161: Fix some cases where the index size was used instead of the pointer size.

ping

Oct 17 2018, 5:17 AM
nhaehnle added inline comments to D53268: [X86] Stop promoting and/or/xor/andn to vXi64..
Oct 17 2018, 3:32 AM
nhaehnle added inline comments to D53222: AMDGPU: Add sram-ecc feature.
Oct 17 2018, 3:15 AM
nhaehnle created D53359: AMDGPU: Remove PHI loop condition optimization.
Oct 17 2018, 2:11 AM
nhaehnle updated the diff for D53160: AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI.
  • actually disable the pattern to select ds_{read,write}2_b32 to catch cases where we'd regress this fix
Oct 17 2018, 1:00 AM

Oct 16 2018

nhaehnle added inline comments to D53160: AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI.
Oct 16 2018, 11:41 AM
nhaehnle added inline comments to D53283: AMDGPU: Divergence-driven selection of scalar buffer load intrinsics.
Oct 16 2018, 11:40 AM
nhaehnle added inline comments to D53160: AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI.
Oct 16 2018, 11:34 AM
nhaehnle updated the diff for D53160: AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI.

Right, I looked at SelectDS64Bit4ByteAligned, but doing the split there
requires more code and also seems wrong since it'd mean inserting
additional nodes fairly late without giving them a chance to be combined
(not that they'd be combined very often, but still).

Oct 16 2018, 11:33 AM