craig.topper (Craig Topper)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 30 2013, 7:58 PM (216 w, 5 d)

Recent Activity

Fri, Sep 22

craig.topper accepted D38181: [x86] swap order of srl (and X, C1), C2 when it saves size.

LGTM

Fri, Sep 22, 10:46 AM
craig.topper added inline comments to D38181: [x86] swap order of srl (and X, C1), C2 when it saves size.
Fri, Sep 22, 10:16 AM
craig.topper added inline comments to D38181: [x86] swap order of srl (and X, C1), C2 when it saves size.
Fri, Sep 22, 9:45 AM

Thu, Sep 21

craig.topper added a comment to D38088: Fix out-of-order stepping behavior in programs with hoisted constants..

I'm not sure I know enough about debug info to approve this myself. I just knew enough to know that the comment didn't match up after the change.

Thu, Sep 21, 4:46 PM
craig.topper added a comment to D38120: [X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMem.

Looks like objdump also prints the memory on the right in at&t syntax. So this makes our disassembler printing consistent

Thu, Sep 21, 10:22 AM
craig.topper created D38122: [DAGCombiner] Remove duplicate code from visitZERO_EXTEND.
Thu, Sep 21, 12:07 AM

Wed, Sep 20

craig.topper created D38120: [X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMem.
Wed, Sep 20, 10:09 PM
craig.topper added a comment to D38025: [X86] Change the Format attribute for TEST*rr instruction from the default MRMDestReg to MRMSrcReg.

Did this come out of the folding table work? The real bug is that TEST8rm is wrong. But we don't notice because we have aliases to accept the memory operand in either location.

Wed, Sep 20, 7:38 PM
craig.topper added a comment to D38025: [X86] Change the Format attribute for TEST*rr instruction from the default MRMDestReg to MRMSrcReg.

This is apparently a revert of the fix for https://bugs.llvm.org/show_bug.cgi?id=22995

Wed, Sep 20, 7:28 PM
craig.topper accepted D38117: [X86] [MC] fixed non optimal encoding of instruction memory operand (PR24038) .

LGTM

Wed, Sep 20, 7:18 PM
craig.topper added a comment to D37562: [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IR.

Was the code not using emmintrin.h and instead copied code from it that used the builtins?

Wed, Sep 20, 5:40 PM
craig.topper added inline comments to D38088: Fix out-of-order stepping behavior in programs with hoisted constants..
Wed, Sep 20, 4:34 PM
craig.topper created D38100: [InstCombine] Teach getDemandedBitsLHSMask to handle constant splat vectors.
Wed, Sep 20, 2:31 PM

Tue, Sep 19

craig.topper added a comment to D36454: [X86] Changes to extract Horizontal addition operation for AVX-512..

I don't think we should shrinking operations based on undef inputs. I think we should be shrinking them based on what elements are consumed by their users. There's no reason the shuffles in these reductions have to have undef elements. For integers we may rewrite the shuffle mask to undef in InstCombine if the elements aren't used down stream. But we don't do that for FP.

Tue, Sep 19, 10:01 PM
craig.topper created D38065: [instCombine] Handle (X & C2) < C1 --> (X & C2) == 0.
Tue, Sep 19, 5:47 PM
craig.topper accepted D37292: 'into' instruction should not be decoded as valid in 64-bit mode.

LGTM

Tue, Sep 19, 5:44 PM
craig.topper added inline comments to D38027: [X86] Add new attribute to X86 instructions to enable marking them as "not memory foldable".
Tue, Sep 19, 11:05 AM
craig.topper added a comment to D38028: [X86][TableGen] Recommitting the X86 memory folding tables TableGen backend while disabling it by default..

Why can't we just generate the table by default and not include it yet? Then anyone can look at the table at any time without having to do something extra?

Tue, Sep 19, 10:56 AM
craig.topper added a comment to D38028: [X86][TableGen] Recommitting the X86 memory folding tables TableGen backend while disabling it by default..

Why can't we just generate the table by default and not include it yet? Then anyone can look at the table at any time without having to do something extra?

Tue, Sep 19, 10:55 AM
craig.topper added a comment to D38001: [X86] Don't select anyext GR32->GR64 to SUBREG_TO_REG. Use INSERT_SUBREG instead..

I havent' found any piece of code that analyzes SUB_REG_TO_REG that closely. Everything I found pretty much treats it much like a copy. Does anyone know of anything that cares today?

Tue, Sep 19, 9:58 AM
craig.topper updated the diff for D38022: [X86] Convert X86ISD::SELECT to ISD::VSELECT just before instruction selection to avoid duplicate patterns.

Remove from X86InstrFragments.td

Tue, Sep 19, 12:47 AM
craig.topper added a comment to D38022: [X86] Convert X86ISD::SELECT to ISD::VSELECT just before instruction selection to avoid duplicate patterns.

I did remove the usages from the td patterns. It wasn't used in very many places.

Tue, Sep 19, 12:37 AM
craig.topper added a comment to D36454: [X86] Changes to extract Horizontal addition operation for AVX-512..

Why this be solved by just combining extract_subvectors through things like binops and onto their inputs.

Tue, Sep 19, 12:33 AM

Mon, Sep 18

craig.topper added a comment to D38022: [X86] Convert X86ISD::SELECT to ISD::VSELECT just before instruction selection to avoid duplicate patterns.

Here's the comment from getVectorMaskingNode

Mon, Sep 18, 11:56 PM
craig.topper added a comment to D38022: [X86] Convert X86ISD::SELECT to ISD::VSELECT just before instruction selection to avoid duplicate patterns.

What sort of test are you looking for? The truncate intrinsic tests should already cover this.

Mon, Sep 18, 11:34 PM
craig.topper created D38023: [X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions of scalar arithmetic patterns.
Mon, Sep 18, 11:32 PM
craig.topper created D38022: [X86] Convert X86ISD::SELECT to ISD::VSELECT just before instruction selection to avoid duplicate patterns.
Mon, Sep 18, 10:30 PM
craig.topper retitled D38001: [X86] Don't select anyext GR32->GR64 to SUBREG_TO_REG. Use INSERT_SUBREG instead. from [X86] Don't select any extend of anyext GR32->GR64 to SUBREG_TO_REG. Use INSERT_SUBREG instead. to [X86] Don't select anyext GR32->GR64 to SUBREG_TO_REG. Use INSERT_SUBREG instead..
Mon, Sep 18, 10:27 PM
craig.topper created D38001: [X86] Don't select anyext GR32->GR64 to SUBREG_TO_REG. Use INSERT_SUBREG instead..
Mon, Sep 18, 3:02 PM
craig.topper accepted D37017: [DAGCombiner] fold assertzexts separated by trunc.

LGTM

Mon, Sep 18, 2:25 PM
craig.topper added a comment to D37017: [DAGCombiner] fold assertzexts separated by trunc.

PR28540 is fixed now.

Mon, Sep 18, 1:53 PM
craig.topper added a comment to D37729: [X86] Make sure we still emit zext for GR32 to GR64 when the source of the zext is AssertZext.

Ping

Mon, Sep 18, 12:21 PM
craig.topper added a comment to D37762: [InstCombine] Remove single use restriction from InstCombine's explicit sinking code..

So I think we all agree that doing sinking in InstCombine isn't a good idea in the long term. But I suspect doing something to fix that is going to take some more work.

Mon, Sep 18, 12:21 PM
craig.topper accepted D37256: [Nios2] adding subtarget, basic infrastructure for frame, instructions and registers.

LGTM

Mon, Sep 18, 12:09 PM
craig.topper added a comment to D37890: [X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of sub_8bit.

Ran with ENABLE_EXPENSIVE_CHECKS and llvm.X86TargetMachine.isMachineVerifierClean and got the same failures as PR27481

Mon, Sep 18, 10:08 AM
craig.topper accepted D37668: [X86][intrinsics] lower _mm[256|512]_mask[z]_set1_epi[8|16|32|64] intrinsic to IR .

LGTM

Mon, Sep 18, 9:30 AM
craig.topper accepted D37669: [X86][intrinsics] lower _mm[256|512]_mask[z]_set1_epi[8|16|32|64] intrinsic to IR.

LGTM

Mon, Sep 18, 9:27 AM

Sun, Sep 17

craig.topper created D37966: [TableGen] Generate formatted DAGISelEmitter without relying on formatted_raw_ostream..
Sun, Sep 17, 11:47 PM
craig.topper added inline comments to D37949: [X86][XOP] Merge rotation opcodes with AVX512 equivalents. NFCI..
Sun, Sep 17, 9:50 PM
craig.topper updated the diff for D37947: [X86] Fix two more places to prefer VPERMQ/PD over VPERM2X128 when AVX2 is enabled.

Not sure what happened there. That was clearly the wrong patch

Sun, Sep 17, 4:00 PM
craig.topper updated the diff for D37947: [X86] Fix two more places to prefer VPERMQ/PD over VPERM2X128 when AVX2 is enabled.

Use getVectorShuffle

Sun, Sep 17, 3:58 PM
craig.topper added a comment to D37957: [TableGen] Some simple optimizations to TableGen execution time.

Is it the PadToColumn calls that are slowing down the formatting?

Sun, Sep 17, 12:04 PM
craig.topper added inline comments to D37957: [TableGen] Some simple optimizations to TableGen execution time.
Sun, Sep 17, 11:59 AM

Sat, Sep 16

craig.topper added inline comments to D37957: [TableGen] Some simple optimizations to TableGen execution time.
Sat, Sep 16, 11:54 PM
craig.topper added inline comments to D37957: [TableGen] Some simple optimizations to TableGen execution time.
Sat, Sep 16, 11:41 PM
craig.topper added a reviewer for D37957: [TableGen] Some simple optimizations to TableGen execution time: kparzysz.
Sat, Sep 16, 11:39 PM
craig.topper added inline comments to D37957: [TableGen] Some simple optimizations to TableGen execution time.
Sat, Sep 16, 11:39 PM
craig.topper created D37950: [X86] Rewrite the zero vector checks in lowerV2X128VectorShuffle to use the Zeroable APInt.
Sat, Sep 16, 1:40 PM
craig.topper created D37947: [X86] Fix two more places to prefer VPERMQ/PD over VPERM2X128 when AVX2 is enabled.
Sat, Sep 16, 10:46 AM

Fri, Sep 15

craig.topper added a dependency for D37941: [X86] Move even more of our CPU to feature mapping switch to use fallthroughs: D37938: [X86] Remove unnecessary extra encodings from the CPU name enum in clang.
Fri, Sep 15, 3:57 PM
craig.topper added a dependent revision for D37938: [X86] Remove unnecessary extra encodings from the CPU name enum in clang: D37941: [X86] Move even more of our CPU to feature mapping switch to use fallthroughs.
Fri, Sep 15, 3:57 PM
craig.topper created D37941: [X86] Move even more of our CPU to feature mapping switch to use fallthroughs.
Fri, Sep 15, 3:56 PM
craig.topper created D37938: [X86] Remove unnecessary extra encodings from the CPU name enum in clang.
Fri, Sep 15, 3:37 PM
craig.topper added a comment to D37668: [X86][intrinsics] lower _mm[256|512]_mask[z]_set1_epi[8|16|32|64] intrinsic to IR .

I'm going to go ahead and remove __builtin_ia32_pbroadcastq512_mem_mask from clang and change _mm512_maskz_set1_epi64 to be disabled in 32-bit mode. I want to nominate this for 5.0.1 because using it in 32-bit mode causes the compile to throw a cannot select error. So disabling it in the header at least gives a better user experience.

Fri, Sep 15, 12:36 PM
craig.topper updated the diff for D37892: [X86] Use native shuffle vector for the perm2f128 intrinsics.

Convert the AVX2 integer intrinsic as well.

Fri, Sep 15, 11:10 AM
craig.topper updated the diff for D37893: [X86] Prefer VPERMQ over VPERM2F128 for any unary shuffle, not just the ones that can be done with a insertf128.

Move AVX2 unary shuffle check to the top of the function.

Fri, Sep 15, 10:43 AM
craig.topper added inline comments to D37858: [X86] Don't create i64 constants on 32-bit targets when lowering v64i1 constant build vectors.
Fri, Sep 15, 9:58 AM
craig.topper created D37893: [X86] Prefer VPERMQ over VPERM2F128 for any unary shuffle, not just the ones that can be done with a insertf128.
Fri, Sep 15, 12:48 AM

Thu, Sep 14

craig.topper updated subscribers of D37892: [X86] Use native shuffle vector for the perm2f128 intrinsics.
Thu, Sep 14, 11:39 PM
craig.topper created D37892: [X86] Use native shuffle vector for the perm2f128 intrinsics.
Thu, Sep 14, 11:36 PM
craig.topper created D37890: [X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of sub_8bit.
Thu, Sep 14, 9:16 PM
craig.topper created D37885: [x86] Bring back the MOVZX64rr* pseudo instructions so that they can be coalesced using X86InstrInfo::isCoalescableExtInstr.
Thu, Sep 14, 7:19 PM
craig.topper added inline comments to D37874: [TwoAddressInstructionPass] When prepending COPYs when processing tied operands, take the register class constraint from the instruction operands not the register we're copying..
Thu, Sep 14, 5:37 PM
craig.topper added inline comments to D37874: [TwoAddressInstructionPass] When prepending COPYs when processing tied operands, take the register class constraint from the instruction operands not the register we're copying..
Thu, Sep 14, 5:06 PM
craig.topper created D37874: [TwoAddressInstructionPass] When prepending COPYs when processing tied operands, take the register class constraint from the instruction operands not the register we're copying..
Thu, Sep 14, 4:20 PM
craig.topper added inline comments to D37669: [X86][intrinsics] lower _mm[256|512]_mask[z]_set1_epi[8|16|32|64] intrinsic to IR.
Thu, Sep 14, 3:15 PM
craig.topper updated the summary of D37453: [X86] In combineLoopSADPattern, pad result with zeros and use full size add instead of using a smaller add and inserting..
Thu, Sep 14, 3:04 PM
craig.topper updated the summary of D37453: [X86] In combineLoopSADPattern, pad result with zeros and use full size add instead of using a smaller add and inserting..
Thu, Sep 14, 3:04 PM
craig.topper accepted D37559: [X86FixupBWInsts] More presise register liveness if no <imp-use> on MOVs..

LGTM

Thu, Sep 14, 2:42 PM
craig.topper added a comment to D37453: [X86] In combineLoopSADPattern, pad result with zeros and use full size add instead of using a smaller add and inserting..

Ping

Thu, Sep 14, 2:41 PM
craig.topper added a comment to D37653: [X86] Add isel pattern infrastructure to begin recognizing when we're inserting 0s into the upper portions of a vector register and the producing instruction as already produced the zeros..

Ping

Thu, Sep 14, 2:41 PM
craig.topper created D37858: [X86] Don't create i64 constants on 32-bit targets when lowering v64i1 constant build vectors.
Thu, Sep 14, 12:21 PM

Wed, Sep 13

craig.topper created D37843: [X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of sub_8bit_hi.
Wed, Sep 13, 10:35 PM
craig.topper added inline comments to D37559: [X86FixupBWInsts] More presise register liveness if no <imp-use> on MOVs..
Wed, Sep 13, 9:47 PM
craig.topper added inline comments to D37559: [X86FixupBWInsts] More presise register liveness if no <imp-use> on MOVs..
Wed, Sep 13, 8:33 PM
craig.topper added a comment to D37762: [InstCombine] Remove single use restriction from InstCombine's explicit sinking code..

Is the code gcc generates at O3 really "better"? According to godbolt the resulting assembly looks quite large with lots of spills and reloads.

Wed, Sep 13, 5:11 PM
craig.topper added a comment to D37762: [InstCombine] Remove single use restriction from InstCombine's explicit sinking code..

Here's something even closer to the real benchmark https://godbolt.org/g/ePkjTu

Wed, Sep 13, 3:57 PM
craig.topper added inline comments to D37017: [DAGCombiner] fold assertzexts separated by trunc.
Wed, Sep 13, 3:49 PM
craig.topper added a comment to D37762: [InstCombine] Remove single use restriction from InstCombine's explicit sinking code..

Looks like machine sinking fails because it can't handle the cmovs reading eflags. And it only considers one instruction at a time, but to sink the cmov you have to sink the eflags producer and the cmov together.

Wed, Sep 13, 3:43 PM
craig.topper added a comment to D37762: [InstCombine] Remove single use restriction from InstCombine's explicit sinking code..

Ok so maybe I'm chasing a deficiency in machine sinking? Here's a related example https://godbolt.org/g/YnEvgg Most of the code should have been able to sink below the 'je' at line 11.

Wed, Sep 13, 2:46 PM
craig.topper added inline comments to D37801: [x86] fix pr29061.
Wed, Sep 13, 11:35 AM
craig.topper added a comment to D37762: [InstCombine] Remove single use restriction from InstCombine's explicit sinking code..

Here's a longer version of what I saw in the benchmark i was looking at

Wed, Sep 13, 9:05 AM

Tue, Sep 12

craig.topper updated the diff for D37762: [InstCombine] Remove single use restriction from InstCombine's explicit sinking code..

Remove a stray change that snuck in

Tue, Sep 12, 1:26 PM
craig.topper created D37762: [InstCombine] Remove single use restriction from InstCombine's explicit sinking code..
Tue, Sep 12, 1:25 PM
craig.topper accepted D37693: [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm).

LGTM

Tue, Sep 12, 10:27 AM
craig.topper added a comment to D37592: [X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel.

Normal loads are usually folded by isel patterns
Stack reloads created by the register allocator are folded using the load folding tables. Since these loads don't exist during isel we can't fold them there.
The peephole pass can also fold loads using the loading fold tables. At one point in time this was definitely weaker than the isel mechanism even ignoring the missing instructions in the isel table. I'm not sure what the status is now.

Tue, Sep 12, 8:21 AM
craig.topper accepted D37694: [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (clang).

Oops you are correct. Sorry.

Tue, Sep 12, 12:30 AM
craig.topper added inline comments to D37668: [X86][intrinsics] lower _mm[256|512]_mask[z]_set1_epi[8|16|32|64] intrinsic to IR .
Tue, Sep 12, 12:30 AM
craig.topper added inline comments to D37669: [X86][intrinsics] lower _mm[256|512]_mask[z]_set1_epi[8|16|32|64] intrinsic to IR.
Tue, Sep 12, 12:01 AM

Mon, Sep 11

craig.topper updated the diff for D37729: [X86] Make sure we still emit zext for GR32 to GR64 when the source of the zext is AssertZext.

Forgot to add full context to the diff the first time.

Mon, Sep 11, 8:33 PM
craig.topper created D37729: [X86] Make sure we still emit zext for GR32 to GR64 when the source of the zext is AssertZext.
Mon, Sep 11, 8:28 PM
craig.topper requested changes to D37694: [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (clang).
Mon, Sep 11, 9:38 AM
craig.topper added a comment to D37694: [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (clang).

Remove the builtins from BuiltinsX86.def

Mon, Sep 11, 9:38 AM
craig.topper requested changes to D37693: [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm).

/Can you update the avx2-intrinsics-fast-isel.ll test too? It should use the native IR implementation now instead of the intrinsic. It's supposed to match the IR clang will generate.

Mon, Sep 11, 9:36 AM
craig.topper accepted D37693: [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm).

LGTM

Mon, Sep 11, 9:34 AM
craig.topper accepted D37688: [X86] Add explicit mc-encoding checks to X86/viabs.ll. NFC..

LGTM

Mon, Sep 11, 9:28 AM
craig.topper added a comment to D37669: [X86][intrinsics] lower _mm[256|512]_mask[z]_set1_epi[8|16|32|64] intrinsic to IR.

This also seems to be only a partial diff now.

Mon, Sep 11, 9:28 AM
craig.topper added a comment to D37669: [X86][intrinsics] lower _mm[256|512]_mask[z]_set1_epi[8|16|32|64] intrinsic to IR.

This also seems to be only a partial diff now.

Mon, Sep 11, 9:28 AM
craig.topper accepted D37562: [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IR.

LGTM

Mon, Sep 11, 9:28 AM
craig.topper accepted D37560: [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IR.

LGTM

Mon, Sep 11, 9:25 AM