craig.topper (Craig Topper)
User

Projects

User does not belong to any projects.
User Since
Jul 30 2013, 7:58 PM (190 w, 4 d)

Recent Activity

Today

craig.topper added inline comments to D30968: [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers.
Sun, Mar 26, 12:08 AM

Yesterday

craig.topper added a comment to D30968: [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers.

Ping

Sat, Mar 25, 9:11 PM
craig.topper accepted D31200: [X86][AVX512F] Fix reg class for VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk.

LGTM

Sat, Mar 25, 9:11 PM
craig.topper abandoned D31348: [ValueTracking] Compute known bits for add/sub with less temporary APInts.
Sat, Mar 25, 9:09 PM
craig.topper abandoned D31115: [InstCombine] Provide a way to calculate KnownZero/One for Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits.
Sat, Mar 25, 9:09 PM

Fri, Mar 24

craig.topper added a comment to D31239: [WIP] Add Caching of Known Bits in InstCombine.

I'm seeing more problems than just nsw/nuw flags here.

Fri, Mar 24, 3:33 PM
craig.topper created D31348: [ValueTracking] Compute known bits for add/sub with less temporary APInts.
Fri, Mar 24, 12:10 PM

Thu, Mar 23

craig.topper added a comment to D31200: [X86][AVX512F] Fix reg class for VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk.

Aren't there now broken patterns using COPY_TO_REGCLASS from FR32X/FR64X to VR128X below this now? For example the avx512_move_scalar_lowering multiclass.

Thu, Mar 23, 9:40 AM
craig.topper updated the diff for D31115: [InstCombine] Provide a way to calculate KnownZero/One for Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits.

Fix comment

Thu, Mar 23, 9:33 AM

Tue, Mar 21

craig.topper created D31232: [IR] Use a binary search in DataLayout::getAlignmentInfo.
Tue, Mar 21, 8:21 PM
craig.topper added a comment to D31120: [InstCombine] Teach SimplifyDemandedUseBits that adding or subtractings 0s from every bit below the highest demanded bit can be simplified.

Yeah I saw that. Should we add a debug counter around the top level worklist loop? The semantics of the "skip" part of the counter wouldn't make sense though.

Tue, Mar 21, 4:52 PM
craig.topper added inline comments to D31115: [InstCombine] Provide a way to calculate KnownZero/One for Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits.
Tue, Mar 21, 4:04 PM

Mon, Mar 20

craig.topper accepted D31155: [X86][AVX512] Add _mm512_cvtsd_f64 and _mm512_cvtss_f32 intrinsics (PR32305).

LGTM

Mon, Mar 20, 4:01 PM

Sun, Mar 19

craig.topper added a comment to D30968: [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers.

That's great news! But this problem isn't unique to VK1. This test fails in 32-bit mode with avx512vl

Sun, Mar 19, 9:35 AM
craig.topper created D31120: [InstCombine] Teach SimplifyDemandedUseBits that adding or subtractings 0s from every bit below the highest demanded bit can be simplified.
Sun, Mar 19, 1:12 AM
craig.topper accepted D31034: [X86][AVX512][Clang][Intrinsics] Adding missing intrinsics to Clang ..

LGTM

Sun, Mar 19, 12:59 AM
craig.topper created D31119: [InstCombine] Teach SimplifyDemandedUseBits to shrink Constants on the left side of subtracts.
Sun, Mar 19, 12:07 AM

Sat, Mar 18

craig.topper added a comment to D31094: [BuildLibCalls] emitPutChar should infer its function attributes.

I'm not sure how to do that. We eventually inferred the attributes. It just took a second trip through the InstCombine worklist.

Sat, Mar 18, 6:13 PM
craig.topper updated the diff for D30968: [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers.

Update all of the tests.

Sat, Mar 18, 12:49 PM
craig.topper updated the diff for D31084: [GVN] Fix accidental double storage of the function BasicBlock list in iterateOnFunction.

Add comment to PostOrderIterator.h

Sat, Mar 18, 12:35 AM
craig.topper created D31115: [InstCombine] Provide a way to calculate KnownZero/One for Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits.
Sat, Mar 18, 12:17 AM

Fri, Mar 17

craig.topper added a comment to D31094: [BuildLibCalls] emitPutChar should infer its function attributes.

fputc and fputs check the type of one of the arguments before calling inferLibFuncAttributes. So its not always unconditional.

Fri, Mar 17, 2:05 PM
craig.topper created D31094: [BuildLibCalls] emitPutChar should infer its function attributes.
Fri, Mar 17, 12:22 PM
craig.topper created D31091: [InstCombine] Print a debug message when we constant fold an operand during worklist creation.
Fri, Mar 17, 11:36 AM
craig.topper updated the diff for D31084: [GVN] Fix accidental double storage of the function BasicBlock list in iterateOnFunction.

Just use RPOT and kill off the local vector

Fri, Mar 17, 11:22 AM
craig.topper added a comment to D31084: [GVN] Fix accidental double storage of the function BasicBlock list in iterateOnFunction.

I was afraid of making the assumption about how RPOT works in case someone made it behave differently in the future. But I can change it to use RPOT if that's what you'd prefer.

Fri, Mar 17, 10:12 AM
craig.topper created D31084: [GVN] Fix accidental double storage of the function BasicBlock list in iterateOnFunction.
Fri, Mar 17, 9:54 AM

Thu, Mar 16

craig.topper added a reviewer for D31034: [X86][AVX512][Clang][Intrinsics] Adding missing intrinsics to Clang .: craig.topper.
Thu, Mar 16, 3:10 PM
craig.topper created D31056: [AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time instead of isel.
Thu, Mar 16, 2:53 PM
craig.topper added inline comments to D31034: [X86][AVX512][Clang][Intrinsics] Adding missing intrinsics to Clang ..
Thu, Mar 16, 9:52 AM

Wed, Mar 15

craig.topper accepted D30654: [X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction.

LGTM

Wed, Mar 15, 10:31 AM
craig.topper updated the diff for D30968: [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers.

Fix a couple additional fast isel bugs.

Wed, Mar 15, 12:28 AM

Tue, Mar 14

craig.topper created D30968: [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers.
Tue, Mar 14, 11:59 PM
craig.topper created D30965: [CodeGen] Use APInt::setLowBits/setHighBits/setBitsFrom in more places.
Tue, Mar 14, 6:37 PM

Mon, Mar 13

craig.topper created D30922: [Builtins] Synchronize the definition of fma/fmaf/fmal in Builtins.def with the implementation in CGBuiltins.cpp.
Mon, Mar 13, 4:51 PM
craig.topper closed D30866: [X86] Recognize AVX2 gather instructions during lowering so we can modify the source input when the mask is all ones.

I've modified this to force the input to zero when the mask is all ones to break the execution dependency. I'll file a bug to look at using ExeDepsFix.

Mon, Mar 13, 11:48 AM
craig.topper abandoned D30865: [AVX-512] If gather mask is all ones, use UNDEF for the source.

I've submitted r297651 to force the input to zero if the mask is all ones.

Mon, Mar 13, 11:30 AM

Sun, Mar 12

craig.topper created D30878: [SelectionDAG] Enhance SDTCisSameNumEltsAs to work with scalar types and use it on extend/trunc/round operations.
Sun, Mar 12, 4:59 PM
craig.topper updated subscribers of D30875: [X86] Add checking of the scale argument to scatter/gather builtins.
Sun, Mar 12, 3:31 PM
craig.topper created D30875: [X86] Add checking of the scale argument to scatter/gather builtins.
Sun, Mar 12, 3:29 PM
craig.topper added a comment to D30865: [AVX-512] If gather mask is all ones, use UNDEF for the source.

I wonder if we shouldn't keep zeroes to break execution dependency. I don't think the existing undef handling in ExeDependencyFix can handle the early clobber.

Sun, Mar 12, 11:06 AM
craig.topper added a comment to D30866: [X86] Recognize AVX2 gather instructions during lowering so we can modify the source input when the mask is all ones.

Looks like we only add xor for partial dependency breaking if the instruction is listed in hasUndefRegUpdate in X86InstrInfo.cpp and our first response it to use the same register as one of the other input operands, but that would be illegal for gather.

Sun, Mar 12, 10:44 AM

Sat, Mar 11

craig.topper added a dependency for D30866: [X86] Recognize AVX2 gather instructions during lowering so we can modify the source input when the mask is all ones: D30865: [AVX-512] If gather mask is all ones, use UNDEF for the source.
Sat, Mar 11, 10:07 AM
craig.topper added a dependent revision for D30865: [AVX-512] If gather mask is all ones, use UNDEF for the source: D30866: [X86] Recognize AVX2 gather instructions during lowering so we can modify the source input when the mask is all ones.
Sat, Mar 11, 10:07 AM
craig.topper created D30866: [X86] Recognize AVX2 gather instructions during lowering so we can modify the source input when the mask is all ones.
Sat, Mar 11, 10:06 AM
craig.topper created D30865: [AVX-512] If gather mask is all ones, use UNDEF for the source.
Sat, Mar 11, 9:32 AM

Fri, Mar 10

craig.topper accepted D30836: Use setBits in SelectionDAG.

LGTM

Fri, Mar 10, 5:11 PM
craig.topper added a comment to D30834: [x86] these aren't the undefs you're looking for (PR32176).

Have you ran the tests all the way through to assembly and made sure we don't regress? If we do regress, I wouldn't hold up fixing this, but we should at least have bugs for what breaks.

Fri, Mar 10, 3:42 PM
craig.topper added inline comments to D30836: Use setBits in SelectionDAG.
Fri, Mar 10, 12:39 PM
craig.topper added inline comments to D30836: Use setBits in SelectionDAG.
Fri, Mar 10, 10:29 AM

Wed, Mar 8

craig.topper added inline comments to D30654: [X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction.
Wed, Mar 8, 8:28 AM

Mon, Mar 6

craig.topper added inline comments to D30654: [X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction.
Mon, Mar 6, 9:49 AM
craig.topper accepted D30391: [X86] Add option to specify preferable loop alignment.

LGTM

Mon, Mar 6, 9:45 AM
craig.topper accepted D30451: [X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables..

LGTM

Mon, Mar 6, 9:44 AM
craig.topper accepted D30501: [X86][AVX512] Add missing entries to EVEX2VEX tables.

LGTM

Mon, Mar 6, 9:36 AM

Sun, Mar 5

craig.topper created D30629: [APInt] Give the value union a name so we can remove assumptions on VAL being the larger member.
Sun, Mar 5, 7:52 PM
craig.topper updated the diff for D30612: [APInt] Add rvalue reference support to and, or, xor operations to allow their memory allocation to be reused when possible.

Add unit tests. Rebase for the removal of And/Or/Xor methods in r296993.

Sun, Mar 5, 5:16 PM
craig.topper added a comment to D30614: [APInt] Move operator~ out of line to make it better able to reused memory allocation from temporary objects.

It is used in other places for example

Sun, Mar 5, 8:33 AM
craig.topper added a comment to D30612: [APInt] Add rvalue reference support to and, or, xor operations to allow their memory allocation to be reused when possible.

Removing the And, Or, Xor functions could be removed separately.

Sun, Mar 5, 8:29 AM
craig.topper created D30614: [APInt] Move operator~ out of line to make it better able to reused memory allocation from temporary objects.
Sun, Mar 5, 12:03 AM

Sat, Mar 4

craig.topper created D30613: [APInt] Remove the And/Or/Xor/Not functions from the APIntOps namespace..
Sat, Mar 4, 11:11 PM
craig.topper created D30612: [APInt] Add rvalue reference support to and, or, xor operations to allow their memory allocation to be reused when possible.
Sat, Mar 4, 5:29 PM
craig.topper updated the diff for D30602: [APInt] Add getBitsSetFrom and setBitsFrom to set upper bits starting at a bit.

Fix comment pointed out by Simon.

Sat, Mar 4, 4:05 PM

Fri, Mar 3

craig.topper added a dependent revision for D30563: [APInt] Implement getLowBitsSet/getHighBitsSet/getBitsSet using setLowBits/setHighBits/setBits: D30602: [APInt] Add getBitsSetFrom and setBitsFrom to set upper bits starting at a bit.
Fri, Mar 3, 11:13 PM
craig.topper added a dependency for D30602: [APInt] Add getBitsSetFrom and setBitsFrom to set upper bits starting at a bit: D30563: [APInt] Implement getLowBitsSet/getHighBitsSet/getBitsSet using setLowBits/setHighBits/setBits.
Fri, Mar 3, 11:13 PM
craig.topper created D30602: [APInt] Add getBitsSetFrom and setBitsFrom to set upper bits starting at a bit.
Fri, Mar 3, 11:12 PM
craig.topper updated the diff for D30525: [APInt] Add setLowBits/setHighBits methods to APInt..

Cleanup the comments in setBitsSlowCase. Modify the implementation to just AND the low and high masks together when loWord and hiWord are the same.

Fri, Mar 3, 7:05 PM
craig.topper resigned from D12399: Microsoft compatibility – add support for “relaxation” of memory operands in inline assembly. .
Fri, Mar 3, 2:20 PM
craig.topper resigned from D3068: [AVX512] Implemented masking for integer arithmetic & logic instructions..
Fri, Mar 3, 2:19 PM
craig.topper resigned from D3005: [AVX512] Implemented conversions up/down instructions with masking.
Fri, Mar 3, 2:18 PM
craig.topper resigned from D10668: [X86] Add Jcc branch hint (2E, 3E) MC support..
Fri, Mar 3, 2:17 PM
craig.topper resigned from D2687: Fix for PR 18573: AVX gather intrinsics access only memory based on argument.
Fri, Mar 3, 2:17 PM
craig.topper added a reviewer for D30397: [X86][AVX2] bugzilla bug 21281 Performance regression in vector interleave in AVX2: RKSimon.
Fri, Mar 3, 2:16 PM
craig.topper accepted D30213: [X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets.
Fri, Mar 3, 2:11 PM
craig.topper added inline comments to D30525: [APInt] Add setLowBits/setHighBits methods to APInt..
Fri, Mar 3, 11:14 AM
craig.topper accepted D30549: [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops..

LGTM with that one comment.

Fri, Mar 3, 11:08 AM

Thu, Mar 2

craig.topper added a dependent revision for D30525: [APInt] Add setLowBits/setHighBits methods to APInt.: D30563: [APInt] Implement getLowBitsSet/getHighBitsSet/getBitsSet using setLowBits/setHighBits/setBits.
Thu, Mar 2, 11:45 PM
craig.topper added a dependency for D30563: [APInt] Implement getLowBitsSet/getHighBitsSet/getBitsSet using setLowBits/setHighBits/setBits: D30525: [APInt] Add setLowBits/setHighBits methods to APInt..
Thu, Mar 2, 11:45 PM
craig.topper created D30563: [APInt] Implement getLowBitsSet/getHighBitsSet/getBitsSet using setLowBits/setHighBits/setBits.
Thu, Mar 2, 11:45 PM
craig.topper abandoned D25222: [AVX-512] Use AVX512 feature instead of VLX to determine whether to use extended 128/256-bit register classes for addRegisterClass..
Thu, Mar 2, 11:44 PM
craig.topper added inline comments to D30549: [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops..
Thu, Mar 2, 10:45 PM
craig.topper updated the diff for D30525: [APInt] Add setLowBits/setHighBits methods to APInt..

Make some portions of setBits inline. Use it to implement setLowBits and setHighBits.

Thu, Mar 2, 9:09 PM
craig.topper added a comment to D30525: [APInt] Add setLowBits/setHighBits methods to APInt..

I'm definitely going to look at moving some of setBits inline. I'm also planning to fix getBitsSet to use setBits. I might even fix setBits to support loBit > hiBit like getBitsSet supports.

Thu, Mar 2, 4:02 PM
craig.topper added a comment to D30525: [APInt] Add setLowBits/setHighBits methods to APInt..

@RKSimon , yes I think getLowBitsSet/getHighBitsSet would also benefit. I think they are mallocing twice due to the getAllOnes and then shifting. So I think we should be able to just create an all 0s APInt then call setLowBits/setHighBits instead and save one of the mallocs.

Thu, Mar 2, 1:21 PM
craig.topper added inline comments to D30451: [X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables..
Thu, Mar 2, 9:38 AM
craig.topper added inline comments to D30501: [X86][AVX512] Add missing entries to EVEX2VEX tables.
Thu, Mar 2, 9:26 AM
craig.topper added a comment to D30525: [APInt] Add setLowBits/setHighBits methods to APInt..

I haven't tried to measure any compile time differences. I was just inspired by Simon's patch for setBits and observing that ORing getLowBits/getHighBits was a common idiom in some places like ValueTracking. I assume we aren't often dealing larger than 64-bit numbers there on most types of compiled code.

Thu, Mar 2, 12:06 AM
craig.topper updated the diff for D30525: [APInt] Add setLowBits/setHighBits methods to APInt..

Add periods at the end of comments.

Thu, Mar 2, 12:03 AM

Wed, Mar 1

craig.topper updated the diff for D30525: [APInt] Add setLowBits/setHighBits methods to APInt..

Add a clearUpperBits call to the fast part of setHighBits along with a test case.

Wed, Mar 1, 11:36 PM
craig.topper updated the diff for D30525: [APInt] Add setLowBits/setHighBits methods to APInt..

Fixed a copy/paste mistake

Wed, Mar 1, 11:11 PM
craig.topper created D30525: [APInt] Add setLowBits/setHighBits methods to APInt..
Wed, Mar 1, 11:10 PM
craig.topper accepted D29874: [X86] Generate VZEROUPPER for Skylake-avx512.

LGTM

Wed, Mar 1, 10:11 AM
craig.topper added inline comments to D30451: [X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables..
Wed, Mar 1, 9:04 AM

Tue, Feb 28

craig.topper created D30486: [APInt] Optimize APInt creation from uint64_t.
Tue, Feb 28, 11:51 PM
craig.topper added inline comments to D30451: [X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables..
Tue, Feb 28, 9:44 PM
craig.topper added a comment to D30451: [X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables..

Should we fix the deficiencies in the current tables first so that there are no test changes?

Tue, Feb 28, 8:59 PM

Sun, Feb 26

craig.topper created D30392: [X86] Use APInt instead of SmallBitVector tracking undef elements from getTargetConstantBitsFromNode and getConstVector..
Sun, Feb 26, 9:57 PM
craig.topper added inline comments to D30391: [X86] Add option to specify preferable loop alignment.
Sun, Feb 26, 9:46 PM
craig.topper created D30390: [X86] Use APInt instead of SmallBitVector for tracking Zeroable elements in shuffle lowering.
Sun, Feb 26, 9:20 PM
craig.topper created D30387: [X86] Fix SmallVector sizes in constant pool shuffle decoding to avoid heap allocation.
Sun, Feb 26, 6:28 PM