craig.topper (Craig Topper)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 30 2013, 7:58 PM (259 w, 1 d)

Recent Activity

Yesterday

craig.topper updated the diff for D49280: [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types..

Rebase again

Wed, Jul 18, 2:47 PM
craig.topper updated the diff for D49280: [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types..

Rebase

Wed, Jul 18, 12:39 PM
craig.topper created D49499: [X86] Prefer unpckhpd over movhlps in isel for fake unary cases.
Wed, Jul 18, 10:57 AM
craig.topper added a comment to D49243: [X86] Improved sched models for X86 BT*rr instructions.

There's a shift uop in the BTR/BTS/BTC memory flow so thats port06 restricted. And the bit test uop should also be port 06 restricted just like the register form.

Wed, Jul 18, 10:21 AM
craig.topper accepted D49474: [X86][SSE] Canonicalize scalar fp arithmetic shuffle patterns.

LGTM

Wed, Jul 18, 9:40 AM

Tue, Jul 17

craig.topper accepted D46179: [X86] Lowering adds/addus/subs/subus intrinsics to native IR (LLVM part).

LGTM

Tue, Jul 17, 10:03 PM
craig.topper added inline comments to D49436: [X86][BtVer2] correctly model the latency/throughput of LEA instructions..
Tue, Jul 17, 6:00 PM
craig.topper accepted D49413: [x86/SLH] Flesh out the data-invariant instruction table a bit based on feedback from Craig..

LGTM

Tue, Jul 17, 10:11 AM

Mon, Jul 16

craig.topper added inline comments to D49413: [x86/SLH] Flesh out the data-invariant instruction table a bit based on feedback from Craig..
Mon, Jul 16, 11:58 PM
craig.topper added inline comments to D46179: [X86] Lowering adds/addus/subs/subus intrinsics to native IR (LLVM part).
Mon, Jul 16, 3:47 PM
craig.topper added inline comments to D46179: [X86] Lowering adds/addus/subs/subus intrinsics to native IR (LLVM part).
Mon, Jul 16, 3:35 PM
craig.topper added inline comments to D49378: [x86/SLH] Completely rework how we sink post-load hardening past data invariant instructions to be both more correct and much more powerful..
Mon, Jul 16, 2:42 PM
craig.topper added inline comments to D49378: [x86/SLH] Completely rework how we sink post-load hardening past data invariant instructions to be both more correct and much more powerful..
Mon, Jul 16, 2:40 PM
craig.topper added inline comments to D49243: [X86] Improved sched models for X86 BT*rr instructions.
Mon, Jul 16, 9:56 AM

Sun, Jul 15

craig.topper accepted D49336: [x86/SLH] Teach speculative load hardening to correctly harden the indices used by AVX2 and AVX-512 gather instructions..

LGTM

Sun, Jul 15, 8:52 PM
craig.topper added inline comments to D49336: [x86/SLH] Teach speculative load hardening to correctly harden the indices used by AVX2 and AVX-512 gather instructions..
Sun, Jul 15, 5:56 PM
craig.topper added inline comments to D49336: [x86/SLH] Teach speculative load hardening to correctly harden the indices used by AVX2 and AVX-512 gather instructions..
Sun, Jul 15, 5:15 PM
craig.topper added a comment to D49280: [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types..

Which microarchitecture cares about switching PD/PS? To my knowledge, no Intel architecture cares. Do any of the AMD architectures care?

Sun, Jul 15, 12:24 PM

Fri, Jul 13

craig.topper added a comment to D45537: [CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions.

I think this may have caused PR38125

Fri, Jul 13, 6:00 PM
craig.topper added a comment to D49280: [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types..

It's 2 patterns repeated 3 times. And as of yesterday these patterns are all qualified with "optsize || !sse41) and we have new patterns to select blend under (sse41 & !optsize). This fixes a long standing issue where we turned movss/sd into blend later in the pipeline through an accidental double call to commuteInstruction. So with that fixed its now 12 patterns that can be removed. I need to rebase this patch.

Fri, Jul 13, 3:07 PM
craig.topper created D49315: [X86][SLH] Remove PDEP and PEXT from isDataInvariantLoad.
Fri, Jul 13, 12:11 PM
craig.topper created D49313: [X86][SLH] Add VEX and EVEX conversion instructions to isDataInvariantLoad.
Fri, Jul 13, 12:05 PM
craig.topper created D49312: [X86][SLH] Regroup the instructions in isDataInvariantLoad a little. NFC.
Fri, Jul 13, 12:01 PM
craig.topper added a comment to D49299: [MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall construction.

Though JMP_1 has a one byte offset and will get relaxed to JMP_4 by the assembler if the offset isn't in the range [128,-127]. TAILJMPd64 always encodes to the same encoding as JMP_4 and doesn't go through relaxation in the assembler.

Fri, Jul 13, 9:03 AM
craig.topper added a comment to D49280: [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types..

The domain switching isn't really intentiional. The code I removed from X86ISelLowering was removing some bitcasts that made this optimize better.

Fri, Jul 13, 8:56 AM
craig.topper accepted D49299: [MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall construction.

This seems right to me. I assume from the use of CALL64pcrel32 in the other branch that outlining is 64-bit only?

Fri, Jul 13, 8:38 AM

Thu, Jul 12

craig.topper created D49280: [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types..
Thu, Jul 12, 10:09 PM
craig.topper added inline comments to D44824: [Spectre] Introduce a new pass to do speculative load hardening to mitigate Spectre variant #1 for x86..
Thu, Jul 12, 9:48 PM
craig.topper accepted D49220: [x86] Teach the EFLAGS copy lowering to handle much more complex control flow patterns including forks, merges, and even cyles..

LGTM

Thu, Jul 12, 7:41 PM
craig.topper added inline comments to D44824: [Spectre] Introduce a new pass to do speculative load hardening to mitigate Spectre variant #1 for x86..
Thu, Jul 12, 2:08 PM
craig.topper added inline comments to D49220: [x86] Teach the EFLAGS copy lowering to handle much more complex control flow patterns including forks, merges, and even cyles..
Thu, Jul 12, 1:51 PM
craig.topper added a comment to D49243: [X86] Improved sched models for X86 BT*rr instructions.

Why does this need the 'r' at the end of WriteBTr? By existing convention the memory form would be called WriteBTLd. So WriteBT should be interpreted as non-memory anyway.

Thu, Jul 12, 9:24 AM

Wed, Jul 11

craig.topper accepted D49211: [x86] Fix EFLAGS copy lowering to correctly handle walking past uses in multiple successors where some of the uses end up killing the EFLAGS register..

LGTM

Wed, Jul 11, 5:55 PM
craig.topper added inline comments to D44824: [Spectre] Introduce a new pass to do speculative load hardening to mitigate Spectre variant #1 for x86..
Wed, Jul 11, 10:32 AM
craig.topper accepted D49182: [X86] Fix MayLoad/HasSideEffect flag for (V)MOVLPSrm instructions..

LGTM

Wed, Jul 11, 8:16 AM

Tue, Jul 10

craig.topper created D49162: [Inliner] Teach inliner to merge 'min-legal-vector-width' function attribute.
Tue, Jul 10, 5:08 PM

Mon, Jul 9

craig.topper accepted D48936: [X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3).

LGTM

Mon, Jul 9, 12:00 PM
craig.topper accepted D48765: [X86] The TEST instruction is eliminated when BSF/TZCNT is used.

LGTM

Mon, Jul 9, 11:27 AM
craig.topper added a comment to D48765: [X86] The TEST instruction is eliminated when BSF/TZCNT is used.

What happens with cttz when -mattr=bmi which enables the tzcnt instruction

Mon, Jul 9, 12:45 AM

Sun, Jul 8

craig.topper updated the diff for D48789: [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware.

Remove a change that snuck into my last update.

Sun, Jul 8, 3:04 PM
craig.topper updated the diff for D48789: [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware.

Remove bad comment

Sun, Jul 8, 3:03 PM
craig.topper added a comment to D48789: [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware.

Judging by the code in SystemZISelDAGToDAG.cpp that calls UpdateNodeOperands during select, I don't think it works the way I thought. The SystemZ code calls ReplaceNode if UpdateNodeOperands does CSE. I tried to change that to just assign over Node. Which would make it similar to what I'm trying to do here. But it ended up leaving a node unselected.

Sun, Jul 8, 2:42 PM
craig.topper added a comment to D48789: [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware.

@niravd, let me make sure I understand what will happen in the CSE case. We'll walk the isel matching table using the CSE node, but at the end of the match we'll call MorphNodeTo using the original node that is still held in NodeToMatch in the target independent isel code? Then later we'll revisit the CSE node when its turn in the topological order comes up?

Sun, Jul 8, 1:49 PM
craig.topper updated the diff for D48789: [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware.

Rebase and capture the value returned form UpdateNodeOperands.

Sun, Jul 8, 1:43 PM
craig.topper accepted D48712: [X86] Lowering integer truncation intrinsics to native IR.

LGTM

Sun, Jul 8, 10:38 AM
craig.topper added a comment to D48789: [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware.

Without an And the target independent node wouldn't allow this. I thought about trying to insert an And into the DAG during DAG combine which would make simplify demanded bits trigger. But it breaks recognizition of (x << (32 - n)) >> (32 - n)) for bzhi. It would become (x << (-n & 0x1f)) >> (-n & 0x1f) and now we don't know if 'n' is safe to feed directly to bzhi.

Sun, Jul 8, 10:37 AM

Sat, Jul 7

craig.topper updated the summary of D48354: [LoopIdiomRecognize] Support for loops that use LSHR instruction added..
Sat, Jul 7, 6:26 PM
craig.topper accepted D48963: [X86][SSE] Combine v16i8 SHL by constants to multiplies.

LGTM

Sat, Jul 7, 1:15 PM
craig.topper added a comment to D48765: [X86] The TEST instruction is eliminated when BSF/TZCNT is used.

We need tests cases that use tzcnt too.

Sat, Jul 7, 11:52 AM
craig.topper added reviewers for D48765: [X86] The TEST instruction is eliminated when BSF/TZCNT is used: spatel, RKSimon.
Sat, Jul 7, 8:59 AM

Fri, Jul 6

craig.topper accepted D49015: [X86][Nearly NFC] Split SHLD/SHRD into their own WriteShiftDouble class.

I definitely believe these instructions would be different than single shifts. LGTM

Fri, Jul 6, 1:38 PM

Thu, Jul 5

craig.topper accepted D49001: [X86][Disassembler] Fix LOCK prefix disassembler support.

LGTM

Thu, Jul 5, 3:17 PM
craig.topper accepted D48998: [X86][Basically NFC] Split WriteBitScan into WriteBitScanReverse and WriteBitScanForward..
Thu, Jul 5, 2:55 PM
craig.topper accepted D48825: [DAGCombiner] Add EXTRACT_SUBVECTOR to SimplifyDemandedVectorElts.

LGTM

Thu, Jul 5, 11:23 AM
craig.topper added inline comments to D48970: [DAGCombiner] extend(ifpositive(X)) -> shift-right (not X).
Thu, Jul 5, 10:59 AM
craig.topper added a comment to D48963: [X86][SSE] Combine v16i8 SHL by constants to multiplies.

Should v32i8 do the same or is there something different about it?

Thu, Jul 5, 10:45 AM
craig.topper added inline comments to D48825: [DAGCombiner] Add EXTRACT_SUBVECTOR to SimplifyDemandedVectorElts.
Thu, Jul 5, 10:38 AM
craig.topper updated the diff for D41794: [X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation.

Rebase and full context

Thu, Jul 5, 10:27 AM

Wed, Jul 4

craig.topper created D48954: [X86] Add SHUF128 to target shuffle decoding..
Wed, Jul 4, 2:39 PM
craig.topper added a comment to D41794: [X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation.

Looks like X86ISD::SHUF128 isn't recognized as a target shuffle. That explains some of the regressions here. I'll submit a separate patch for that and we can rebase this one.

Wed, Jul 4, 12:55 PM
craig.topper added a comment to D41794: [X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation.

I forget to mention, I tried to write the code in a way that it should be somewhat straightforward to extend to 512 bits with 128 bit lanes.

Wed, Jul 4, 12:47 PM
craig.topper updated the diff for D41794: [X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation.

Replace the contents of lowerVectorShuffleByMerging128BitLanes. Now we try to form 2 lane correcting shuffles and a repeated shuffle mask.

Wed, Jul 4, 12:45 PM
craig.topper added a comment to D48768: [X86] When have BMI2, prefer shifts to clear low/high bits, rather than variable mask..

This specifically looks for (and (shl -1, X), Y) or (and (shr -1, X), Y). If Y in either of those is a constant, I think we would have rewritten the LHS side of the shift already and removed the 'and' entirely.

Wed, Jul 4, 11:17 AM

Tue, Jul 3

craig.topper added a comment to D48877: [X86][BtVer2][MCA] Recognize CMPEQ one-idioms.

On Intel hardware, in register renaming it basically rewrites the input register to an internal all zeros register. The instruction still needs to execute and now compares zero with zero and gets all ones. But by rewriting the input register to all zeros instead of the real input, the instruction doesn't need to wait for the previous writer of the input register to execute.

Tue, Jul 3, 11:05 AM

Mon, Jul 2

craig.topper added a comment to D48614: [SelectionDAG] Fix promotion of extracted FP vector element.

There are actually two selection DAGs for that test. I'm not sure why yet. The final schedule is printed separately for each DAG.

Mon, Jul 2, 10:10 PM
craig.topper added inline comments to D48789: [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware.
Mon, Jul 2, 4:30 PM
craig.topper updated the diff for D48617: [Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it..

-Added a negative test
-Hopefully fixed all the grammatical/spelling errors.
-Attempted to clarify some more about prefer-vector-width and builtins.

Mon, Jul 2, 2:06 PM
craig.topper added a comment to D48789: [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware.

@niravd would it be better to just do the full manual isel instead of trying to do it this way? Selecting the negate is easy. It just the variations of the shift that kind of ugly due to legacy vs BMI2 and BMI2 allowing load folding.

Mon, Jul 2, 12:09 PM
craig.topper added inline comments to D48617: [Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it..
Mon, Jul 2, 12:04 PM
craig.topper updated the diff for D48617: [Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it..

-Rebase the intrinsic headers and builtins file
-Add documentation for the attribute. Open to feedback on improvements here
-Add tests for wrong number of arguments to the attribute.

Mon, Jul 2, 12:00 PM
craig.topper added inline comments to D48617: [Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it..
Mon, Jul 2, 11:59 AM

Sun, Jul 1

craig.topper added inline comments to D48809: [FPEnv] Split double width StrictFP vector operations as needed.
Sun, Jul 1, 10:55 PM
craig.topper added inline comments to D48809: [FPEnv] Split double width StrictFP vector operations as needed.
Sun, Jul 1, 10:48 PM

Sat, Jun 30

craig.topper added a comment to D43353: [X86] Add phony registers for high halves of E[A-D]X, E[SD]I, E[BS]P and EIP.

Sorry for the archaelogy here.

Sat, Jun 30, 1:12 PM
craig.topper added inline comments to D48768: [X86] When have BMI2, prefer shifts to clear low/high bits, rather than variable mask..
Sat, Jun 30, 11:03 AM
craig.topper added inline comments to D48768: [X86] When have BMI2, prefer shifts to clear low/high bits, rather than variable mask..
Sat, Jun 30, 10:33 AM

Fri, Jun 29

craig.topper updated the diff for D48789: [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware.

Place the new nodes before the original shift amount in the topological order of the DAG. This ensures that all shifts that had the same shift amount will be selected and run this transform before any of the new nodes are selected. The selection DAG should CSE all the new nodes together for multiple shifts.

Fri, Jun 29, 3:32 PM
craig.topper added inline comments to D48789: [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware.
Fri, Jun 29, 3:11 PM
craig.topper created D48789: [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware.
Fri, Jun 29, 2:33 PM
craig.topper planned changes to D48784: [X86] Prefer to form negate instructions instead of folding a load.
Fri, Jun 29, 2:17 PM
craig.topper added a comment to D48768: [X86] When have BMI2, prefer shifts to clear low/high bits, rather than variable mask..

That's what i meant. I think I didn't understand this statement "It will still need to produce two instructions (which is safe, since there is one-use check on the mask)."

Fri, Jun 29, 2:09 PM
craig.topper added a comment to D48768: [X86] When have BMI2, prefer shifts to clear low/high bits, rather than variable mask..

If you do the one use check in the DAG combine to qualify the reverse, then we can use existing isel patterns to select each shift individually right?

Fri, Jun 29, 1:48 PM
craig.topper added a comment to D48768: [X86] When have BMI2, prefer shifts to clear low/high bits, rather than variable mask..

Is it possible to reverse this as a DAG combine? I'd rather not produce 2 instructions from one pattern if we can avoid it.

Fri, Jun 29, 1:40 PM
craig.topper added a comment to D48784: [X86] Prefer to form negate instructions instead of folding a load.

I'll run our benchmark list. This was more of an observation that we were different than icc, gcc, and msvc.

Fri, Jun 29, 1:34 PM
craig.topper created D48784: [X86] Prefer to form negate instructions instead of folding a load.
Fri, Jun 29, 11:35 AM
craig.topper added a comment to D48765: [X86] The TEST instruction is eliminated when BSF/TZCNT is used.

I need to look at this in more detail, but does this handle the case where the flags are still live past the cmov? Moving the add down would break that.

Fri, Jun 29, 10:07 AM

Thu, Jun 28

craig.topper added a comment to D48585: [X86] Use a std::vector for the memory unfolding vector..

@greened, did you forget to Accept when you wrote LGTM or should I wait for another reviewer?

Thu, Jun 28, 11:05 PM
craig.topper added inline comments to D48754: [InstCombine] canonicalize abs pattern.
Thu, Jun 28, 10:56 PM
craig.topper updated the summary of D48754: [InstCombine] canonicalize abs pattern.
Thu, Jun 28, 10:50 PM
craig.topper added a reviewer for D48754: [InstCombine] canonicalize abs pattern: spatel.
Thu, Jun 28, 10:49 PM
craig.topper accepted D48655: [X86][SSE] Support v16i8/v32i8 vector rotations.

LGTM

Thu, Jun 28, 8:58 PM
craig.topper abandoned D48530: [DAGCombiner] In foldSelectOfConstants, use 'sub C1+1, (zext Cond)' instead of 'add (sext Cond), C1+1'.
Thu, Jun 28, 8:52 PM
craig.topper added a comment to D48530: [DAGCombiner] In foldSelectOfConstants, use 'sub C1+1, (zext Cond)' instead of 'add (sext Cond), C1+1'.

Yeah I think the other patches took care of my perf regression. I'm fine to abandon this if you don't think the PowerPC changes are worthwhile.

Thu, Jun 28, 12:51 PM
craig.topper added inline comments to D48619: [X86] Limit the number of target specific nodes emitted in LowerShiftParts.
Thu, Jun 28, 12:23 PM
craig.topper added inline comments to D48712: [X86] Lowering integer truncation intrinsics to native IR.
Thu, Jun 28, 12:12 PM
craig.topper added reviewers for D48712: [X86] Lowering integer truncation intrinsics to native IR: RKSimon, spatel.
Thu, Jun 28, 12:07 PM
craig.topper added a comment to D48530: [DAGCombiner] In foldSelectOfConstants, use 'sub C1+1, (zext Cond)' instead of 'add (sext Cond), C1+1'.

@spatel, what are you thoughts on this?

Thu, Jun 28, 12:07 PM
craig.topper closed D48606: [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position.
Thu, Jun 28, 10:43 AM
craig.topper added 1 commit(s) for D48606: [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position: rL335754: [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit….
Thu, Jun 28, 10:43 AM
craig.topper added an edge to rL335754: [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit…: D48606: [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position.
Thu, Jun 28, 10:43 AM