craig.topper (Craig Topper)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 30 2013, 7:58 PM (251 w, 4 d)

Recent Activity

Yesterday

craig.topper added inline comments to D47401: [X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible..
Sat, May 26, 6:03 PM
craig.topper reopened D47401: [X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible..
Sat, May 26, 12:04 PM

Fri, May 25

craig.topper created D47401: [X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible..
Fri, May 25, 5:25 PM
craig.topper updated the diff for D47348: [LoopIdiomRecognize] Only convert loops to ctlz if we can prove that the input is non-negative..

Mention in infinite loop in comments. Update C code in comments in ARM test. Change ashr to lshr in tests when its directly after the abs pattern. InstCombine is able to convert those. The ashr inside the loop dont' get changed because value tracking doesn't understand the recurrence I guess.

Fri, May 25, 1:29 PM
craig.topper added inline comments to D47012: [X86] Scalar mask and scalar move optimizations.
Fri, May 25, 10:55 AM
craig.topper added inline comments to D47145: [X86][ELF][CET] Adding the .note.gnu.property ELF section in X86.
Fri, May 25, 10:37 AM

Thu, May 24

craig.topper added a comment to D47348: [LoopIdiomRecognize] Only convert loops to ctlz if we can prove that the input is non-negative..

The original motivating code is something like the loop in the htest_one_block function here https://github.com/mozilla/mozjpeg/blob/master/jchuff.c

Thu, May 24, 9:11 PM
craig.topper updated the diff for D47348: [LoopIdiomRecognize] Only convert loops to ctlz if we can prove that the input is non-negative..

Add the code changes.

Thu, May 24, 4:09 PM
craig.topper created D47348: [LoopIdiomRecognize] Only convert loops to ctlz if we can prove that the input is non-negative..
Thu, May 24, 4:00 PM
craig.topper added reviewers for D47346: [X86] Add a new pass to fold extension into load instructions in previous BB: RKSimon, spatel, chandlerc.
Thu, May 24, 2:22 PM
craig.topper added a comment to D47311: [X86][CET] Shadow stack fix for setjmp/longjmp.

Can you say more about the bug and what the fix was so I know where to focus?

Thu, May 24, 1:05 PM
craig.topper accepted D47142: [x86] invpcid intrinsic.
Thu, May 24, 1:01 PM
craig.topper added a comment to D47142: [x86] invpcid intrinsic.

LGTM, if you fix the ordering in cpuid.h.

Thu, May 24, 1:00 PM
craig.topper added inline comments to D47330: [DAGCombiner] match vector compare and select sizes with extload operand (PR37427).
Thu, May 24, 12:34 PM
craig.topper added inline comments to D47330: [DAGCombiner] match vector compare and select sizes with extload operand (PR37427).
Thu, May 24, 12:31 PM
craig.topper added inline comments to D47012: [X86] Scalar mask and scalar move optimizations.
Thu, May 24, 12:10 PM
craig.topper added a comment to D47012: [X86] Scalar mask and scalar move optimizations.

It's not really a question of whether it will be lowered correctly. Target independent DAG combines call getBooleanContents and expect that the condition for a target independent ISD:;SELECT node obey that.

Thu, May 24, 9:54 AM

Wed, May 23

craig.topper accepted D47277: [modules] Mark __wmmintrin_pclmul.h/__wmmintrin_aes.h as textual.

LGTM

Wed, May 23, 1:31 PM
craig.topper added a comment to D46863: [X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics..

Hi @aemerson, I'm not opposed to adding it back. But the clang policy for vector builtins has always been that we won't support all the builtins that gcc does and to encourage the use of the _mm_* wrappers which are guaranteed to work in both compilers. It possible to change your source code to use the portable intrinsic name?

Wed, May 23, 1:22 PM
craig.topper added inline comments to D46957: [x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w].
Wed, May 23, 11:45 AM
craig.topper accepted D47129: Testcase for PR31593.

LGTM

Wed, May 23, 11:42 AM
craig.topper updated the diff for D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h.

Add back popcntintrin.h

Wed, May 23, 10:59 AM
craig.topper updated the diff for D47041: [ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive..

Rebase to remvoe the InstCombine changes that were split off to another patch and have now been committed

Wed, May 23, 10:43 AM
craig.topper added inline comments to D47012: [X86] Scalar mask and scalar move optimizations.
Wed, May 23, 10:23 AM

Tue, May 22

craig.topper created D47236: [InstCombine] Negate ABS/NABS patterns by swapping the select operands to remove the negation.
Tue, May 22, 4:47 PM
craig.topper requested review of D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h.
Tue, May 22, 3:35 PM
craig.topper updated the diff for D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h.

Leave the message still saying x86intrin.h. Change the error checks to look for either x86intrin.h or immintrin.h to have been included. Really only the immintrin.h check is necessary since that's the header that does the include, but I put both so the error message saying x86intrin.h would be less confusing.

Tue, May 22, 3:35 PM
craig.topper added a comment to D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h.

Implemented @DavidKreitzer's suggestion in r333033

Tue, May 22, 3:23 PM
craig.topper added a comment to D47041: [ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive..

That should have said abs(x) >= 0 is always true.

Tue, May 22, 3:12 PM
craig.topper updated the diff for D47041: [ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive..

Add a simple test case to show (abs(x)) > 0 is always true.

Tue, May 22, 3:07 PM
craig.topper updated the diff for D47041: [ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive..

Add a combine to negate ABS/NABS to recover the broken tests.

Tue, May 22, 2:59 PM
craig.topper accepted D47141: [x86] invpcid LLVM intrinsic.

It looks like neither icc or MSVC doe a perfect job of ensuring zero extension here. For example: https://godbolt.org/g/UqnXh8

Tue, May 22, 11:49 AM
craig.topper added a comment to D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h.

It is odd, but they really are split in the icc include files. So they got split a while back in clang to match the Intel Intrinsic Guide documentation.

Tue, May 22, 11:38 AM
craig.topper accepted D47202: [CodeGen] use nsw negation for abs.

This seems right to me. GCC believes believes that __bultin_abs always returns a positive number.

Tue, May 22, 11:13 AM
craig.topper added a reviewer for D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h: rnk.
Tue, May 22, 11:07 AM
craig.topper added inline comments to D47188: Intel SVML calling conventions.
Tue, May 22, 8:43 AM
craig.topper added a comment to D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h.

First there was mmintrin.h which covered MMX instructions. Then xmmintrin.h came along to support SSE1 and implicitly included mmintrin.h. The emmintrin.h to support SSE2 and implicitly included xmmintrin.h. This repeated for each new version of SSE. With each header file including the previous header file. I think most of the later SSE headers start with the first letter of the code name of the CPU that added that SSE level including cancelled projects. So nmmintrin.h refers to Nehalem. tmmintrin.h came from the cancelled Tejas CPU. wmmintrin.h referes to Westmere. pmmintrin.h refers to Penryn. Eventually this was determined to not be very scalable to remember which header file contained what intrinsics and you have to change it with each generation to get the latest.. So immintrin.h was created to just include everything. I assume the 'i' here just stands for Intel. The 'mm' is just historic legacy due to the earlier file names. x86intrin.h was created by gcc to hold the intrinsics not specified by Intel. I think icc has an x86intrin.h that just includes immintrin.h.

Tue, May 22, 8:25 AM
craig.topper added inline comments to D47141: [x86] invpcid LLVM intrinsic.
Tue, May 22, 8:16 AM

Mon, May 21

craig.topper created D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h.
Mon, May 21, 11:08 PM
craig.topper added inline comments to D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h.
Mon, May 21, 6:43 PM
craig.topper created D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h.
Mon, May 21, 6:40 PM
craig.topper updated the diff for D47116: [DAGCombiner] Use computeKnownBits to match rotate patterns that have had their amount masking modified by simplifyDemandedBits..

Add vector tests too.

Mon, May 21, 1:17 PM
craig.topper added inline comments to D47129: Testcase for PR31593.
Mon, May 21, 12:41 PM
craig.topper added a comment to D46711: [private] Add min_vector_width function attribute. Use it to annotate all of the x86 intrinsic header files. Emit a attribute in IR.

Sorry @rsmith, this patch was created only to go to Chandler for a preliminary review. I didn't expect it to get a wider audience.

Mon, May 21, 12:04 PM
craig.topper added a comment to D46711: [private] Add min_vector_width function attribute. Use it to annotate all of the x86 intrinsic header files. Emit a attribute in IR.

My main focus was on preventing 512 bit instructions on Skylake X. Do you have measurements for that?

Mon, May 21, 11:20 AM
craig.topper added a comment to D46711: [private] Add min_vector_width function attribute. Use it to annotate all of the x86 intrinsic header files. Emit a attribute in IR.

@atdt How can I access that document?

Mon, May 21, 11:17 AM
craig.topper added a comment to D47012: [X86] Scalar mask and scalar move optimizations.

Just commit the test files to trunk with the current codegen. No need for a phabricator review. Then rebase this patch to show the diff with the new codegen

Mon, May 21, 11:11 AM
craig.topper added a comment to D46863: [X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics..

So I think we've covered the whether this is ok to do questions. If someone can double check signed/unsigned and vector element sizes are all correct and approve this that would be great.

Mon, May 21, 11:09 AM
craig.topper updated the diff for D47124: [X86] Remove masking from vpternlog intrinsics. Use a select in IR instead..

Add fast-isel tests

Mon, May 21, 11:00 AM
craig.topper added a comment to D46541: [CodeGen] Improve diagnostics related to target attributes.

I think you can pass StringRef(F).substr(1). That won't create a temporary string. It will just create a StringRef pointing into the middle of an existing std::string stored in the parsed attributes.

Mon, May 21, 10:37 AM
craig.topper added a comment to D47124: [X86] Remove masking from vpternlog intrinsics. Use a select in IR instead..

I don't know that the fast-isel tests would show anything more than the intrinsics tests do since I ran those test cases through the autoupgrade code to get IR with selects. Should I move the intrinsic+select to fast-isel and leave just intrinsic by itself in the *intrinsic.ll test?

Mon, May 21, 10:16 AM
craig.topper added a comment to D47117: [CVP][NFCI] Refactor truncated binary operator creation out of processUDivOrURem().

IRBuilder constant folding can be disable by declaring it as IRBuilder<NoFolder>

Mon, May 21, 10:11 AM
craig.topper added inline comments to D47141: [x86] invpcid LLVM intrinsic.
Mon, May 21, 10:00 AM
craig.topper added inline comments to D47142: [x86] invpcid intrinsic.
Mon, May 21, 9:40 AM
craig.topper added a reviewer for D47142: [x86] invpcid intrinsic: rnk.
Mon, May 21, 9:40 AM
craig.topper added a comment to D47125: [X86] Remove masking from pternlog llvm intrinsics and use a select instruction instead..

Because the builtins take one of the arguments as an immediate, they must be implemented as macros. This was the frontend can verify that it's an imediate or a constant expression.

Mon, May 21, 9:29 AM

Sun, May 20

craig.topper created D47125: [X86] Remove masking from pternlog llvm intrinsics and use a select instruction instead..
Sun, May 20, 10:36 PM
craig.topper created D47124: [X86] Remove masking from vpternlog intrinsics. Use a select in IR instead..
Sun, May 20, 10:35 PM
craig.topper accepted D46954: [X86][SSE] Support v4i32 rotations (PR37426).

LGTM

Sun, May 20, 6:24 PM
craig.topper added inline comments to D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.
Sun, May 20, 6:19 PM
craig.topper updated the diff for D47037: [EarlyCSE] Improve EarlyCSE of some absolute value cases..

Update comment. Fix dead code.

Sun, May 20, 4:55 PM

Sat, May 19

craig.topper created D47116: [DAGCombiner] Use computeKnownBits to match rotate patterns that have had their amount masking modified by simplifyDemandedBits..
Sat, May 19, 8:09 PM

Fri, May 18

craig.topper retitled D47037: [EarlyCSE] Improve EarlyCSE of some absolute value cases. from [ValueTracking][EarlyCSE][InstCombine] Improve EarlyCSE of some absolute value cases. to [EarlyCSE] Improve EarlyCSE of some absolute value cases..
Fri, May 18, 3:43 PM
craig.topper updated the diff for D47037: [EarlyCSE] Improve EarlyCSE of some absolute value cases..

Remove the InstCombine change.

Fri, May 18, 3:43 PM
craig.topper added a comment to D46946: [X86] Lowering shift intrinsics to native IR.

I fear that LICM is going to separate this pattern if its used in a loop and then isel won't be able to see the whole pattern.

Fri, May 18, 1:33 PM
craig.topper added a comment to D47019: [X86] Lowering rotation intrinsics to native IR.

Why are we going through shift intrinsics to do this? Why can't we just emit shl and lshr instructions directly?

Fri, May 18, 12:11 PM
craig.topper added inline comments to D47037: [EarlyCSE] Improve EarlyCSE of some absolute value cases..
Fri, May 18, 11:36 AM
craig.topper added inline comments to D47041: [ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive..
Fri, May 18, 11:28 AM
craig.topper added a dependency for D47041: [ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive.: D47037: [EarlyCSE] Improve EarlyCSE of some absolute value cases..
Fri, May 18, 11:26 AM
craig.topper added a dependent revision for D47037: [EarlyCSE] Improve EarlyCSE of some absolute value cases.: D47041: [ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive..
Fri, May 18, 11:26 AM

Thu, May 17

craig.topper added inline comments to D47041: [ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive..
Thu, May 17, 6:34 PM
craig.topper created D47041: [ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive..
Thu, May 17, 5:04 PM
craig.topper created D47037: [EarlyCSE] Improve EarlyCSE of some absolute value cases..
Thu, May 17, 3:55 PM
craig.topper added a reviewer for D47029: [X86] Remove some preprocessor feature checks from intrinsic headers: DavidKreitzer.
Thu, May 17, 3:33 PM
craig.topper added inline comments to D47025: [X86] Directly legalize v16i16/v8i16 vselect to vXi8 vselect to use VPBLENDVB.
Thu, May 17, 3:30 PM
craig.topper added a comment to D47012: [X86] Scalar mask and scalar move optimizations.

Is it possible to add tests for any patterns that are new? It's not good to add untested code.

Thu, May 17, 2:57 PM
craig.topper requested changes to D47012: [X86] Scalar mask and scalar move optimizations.
Thu, May 17, 2:55 PM
craig.topper added inline comments to D47012: [X86] Scalar mask and scalar move optimizations.
Thu, May 17, 1:51 PM
craig.topper created D47029: [X86] Remove some preprocessor feature checks from intrinsic headers.
Thu, May 17, 12:59 PM
craig.topper created D47025: [X86] Directly legalize v16i16/v8i16 vselect to vXi8 vselect to use VPBLENDVB.
Thu, May 17, 12:26 PM
craig.topper added a comment to D46881: [X86][CET] Changing -fcf-protection behavior to comply with gcc (clang part).

LGTM

Thu, May 17, 9:30 AM

Wed, May 16

craig.topper accepted D46797: [X86DomainReassignment] Don't delete IMPLICIT_DEF nodes.

LGTM

Wed, May 16, 9:47 PM
craig.topper accepted D46800: [X86DomainReassignment] Don't compare stack-allocated values by address.

LGTM

Wed, May 16, 9:47 PM
craig.topper added a comment to D46797: [X86DomainReassignment] Don't delete IMPLICIT_DEF nodes.

You can just remove the comment.

Wed, May 16, 5:50 PM
craig.topper added a comment to D46560: [SelectionDAG] Don't crash on inline assembly errors when the inline assembly return type is a struct..

Ping

Wed, May 16, 5:22 PM
craig.topper accepted D46962: make GlobalValueSummary::getOriginalName() a const function.

LGTM

Wed, May 16, 5:21 PM
craig.topper added inline comments to D46954: [X86][SSE] Support v4i32 rotations (PR37426).
Wed, May 16, 5:19 PM
craig.topper created D46988: [InstCombine] Propagate the nsw/nuw flags from the add in the 'shifty' abs pattern to the sub in the select version..
Wed, May 16, 4:14 PM
craig.topper accepted D46976: [STLExtras] Add size() for ranges, and remove distance().

LGTM

Wed, May 16, 4:06 PM
craig.topper added a comment to D46957: [x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w].

What about _mm512_cvtepi64_epi8?

Wed, May 16, 1:44 PM
craig.topper added a comment to D46957: [x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w].

I haven't tried but you might be able to do

Wed, May 16, 1:39 PM
craig.topper added a comment to D46957: [x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w].

What about this. This is the most obvious representation for this.

Wed, May 16, 12:44 PM
craig.topper accepted D46959: [X86][SSE] Reduce instruction/register usages for v4i32 vector shifts (PR37441).

LGTM

Wed, May 16, 12:16 PM
craig.topper added inline comments to D46959: [X86][SSE] Reduce instruction/register usages for v4i32 vector shifts (PR37441).
Wed, May 16, 11:15 AM
craig.topper added a comment to D46957: [x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w].

Can you precommit the test cases?

Wed, May 16, 10:52 AM
craig.topper added a comment to D46882: [X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part).

LGTM if you remove that one FIXME that's still in there.

Wed, May 16, 10:48 AM
craig.topper accepted D46882: [X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part).
Wed, May 16, 10:48 AM

Tue, May 15

craig.topper added a comment to D46882: [X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part).

Do we need to do something for the FIXMEs?

Tue, May 15, 7:31 PM
craig.topper accepted D46881: [X86][CET] Changing -fcf-protection behavior to comply with gcc (clang part).

LGTM

Tue, May 15, 7:29 PM
craig.topper requested review of D43441: [X86][AVX512DQ] Use packed instructions for scalar FP<->i64 conversions on 32-bit targets (PR31630).
Tue, May 15, 5:43 PM