Page MenuHomePhabricator

tkrupa (Tomasz Krupa)
User

Projects

User does not belong to any projects.

User Details

User Since
Mar 22 2018, 8:44 AM (136 w, 1 d)

Recent Activity

Jan 2 2019

tkrupa abandoned D47019: [X86] Lowering rotation intrinsics to native IR.
Jan 2 2019, 12:19 AM

Dec 17 2018

tkrupa updated subscribers of D47019: [X86] Lowering rotation intrinsics to native IR.

I'm no longer working on this. AFAIK this task and D46946 have been reassigned to @Jianping or @LuoYuanke.

Dec 17 2018, 12:26 AM

Aug 14 2018

tkrupa committed rL339659: [X86] Constant folding of adds/subs intrinsics.
[X86] Constant folding of adds/subs intrinsics
Aug 14 2018, 2:04 AM
tkrupa closed D50499: [X86] Constant folding of adds/subs intrinsics.
Aug 14 2018, 2:04 AM
tkrupa committed rL339651: [X86] Lowering addus/subus intrinsics to native IR.
[X86] Lowering addus/subus intrinsics to native IR
Aug 14 2018, 1:02 AM
tkrupa committed rC339651: [X86] Lowering addus/subus intrinsics to native IR.
[X86] Lowering addus/subus intrinsics to native IR
Aug 14 2018, 1:02 AM
tkrupa closed D46892: [X86] Lowering addus/subus intrinsics to native IR (Clang part).
Aug 14 2018, 1:02 AM
tkrupa committed rL339650: [X86] Lowering addus/subus intrinsics to native IR.
[X86] Lowering addus/subus intrinsics to native IR
Aug 14 2018, 1:01 AM
tkrupa closed D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).
Aug 14 2018, 1:01 AM

Aug 13 2018

tkrupa added a comment to D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

Is clang part good to go too?

Aug 13 2018, 11:39 PM
tkrupa updated the diff for D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).
Aug 13 2018, 1:14 AM
tkrupa updated the diff for D50499: [X86] Constant folding of adds/subs intrinsics.

Implemented suggested changes.
Regarding masking with select - do you mean creating new avx512_padds/avx512_psubs intrinsics without masks and replacing the old calls with new intrinsic+select?

Aug 13 2018, 12:04 AM

Aug 9 2018

tkrupa added a comment to D50499: [X86] Constant folding of adds/subs intrinsics.

I added it to InstCombineCalls.cpp after @craig.topper's suggestion to do so in order to enable adding more optimizations besides constant folding in the same place.

Aug 9 2018, 5:33 AM
tkrupa updated the summary of D50499: [X86] Constant folding of adds/subs intrinsics.
Aug 9 2018, 2:39 AM
tkrupa created D50499: [X86] Constant folding of adds/subs intrinsics.
Aug 9 2018, 2:38 AM
tkrupa retitled D46892: [X86] Lowering addus/subus intrinsics to native IR (Clang part) from [X86] Lowering adds/addus/subs/subus intrinsics to native IR (Clang part) to [X86] Lowering addus/subus intrinsics to native IR (Clang part).
Aug 9 2018, 2:16 AM
tkrupa updated the diff for D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

Split to two different patches for signed and unsigned intrinsics.
Corrected version number.

Aug 9 2018, 2:15 AM

Aug 7 2018

tkrupa updated the diff for D46892: [X86] Lowering addus/subus intrinsics to native IR (Clang part).

Removed signed intrinsics lowering due to the pattern being too complicated - instead some minor optimizations were introduced on LLVM side.

Aug 7 2018, 12:46 AM
tkrupa updated the diff for D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

Brought back signed intrinsics and added constant folding in InstCombine.

Aug 7 2018, 12:37 AM

Jul 24 2018

tkrupa updated the diff for D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

Now the pattern is not detected if there are any undef elements in the constant operand.
I removed the tests added after fixing the bug which caused reversion of the patch - in the meantime the pattern conditions changed and the sequence used in those tests is now perfectly valid.
Should FIXME comments be removed?

Jul 24 2018, 4:00 AM

Jul 17 2018

tkrupa updated the diff for D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

Removed unnecessary comment.

Jul 17 2018, 4:05 AM
tkrupa updated the diff for D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

Rebased and implemented suggested changes.

Jul 17 2018, 3:56 AM

Jul 16 2018

tkrupa added a comment to D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

Ping.

Jul 16 2018, 7:50 AM

Jul 9 2018

tkrupa added a comment to D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

Ping.

Jul 9 2018, 5:49 AM

Jun 28 2018

tkrupa added a comment to D48705: [CodeGenPrepare] Reverse LICM pass for shift and rotate patterns..

I'm not sure D46946 and D47019 are good ideas in the first place, particularly D47019. Expanding an intrinsic to an 8-instruction sequence is getting past the point where we're actually getting any benefit from transforming intrinsic to native IR. Emitting a complicated lowering like that, and trying to recover it in isel seems very tricky to get right, and as far as I can tell we don't get much benefit.

If we are going to expand out x86 shift and rotate intrinsics, we should probably consider pattern-matching on IR, rather than waiting for SelectionDAG. Trying to work around isel limitations in this fashion is fragile, and will probably have a wider effect than you want.

For rotates, there was a proposal to add a generic IR intrinsic for variable rotates on llvmdev, due to the complications involved in late pattern-matching.

Jun 28 2018, 2:30 PM

Jun 27 2018

tkrupa created D48705: [CodeGenPrepare] Reverse LICM pass for shift and rotate patterns..
Jun 27 2018, 11:40 PM

Jun 18 2018

tkrupa committed rL334964: Fix a bug introduced by rL334850.
Fix a bug introduced by rL334850
Jun 18 2018, 11:01 AM
tkrupa committed rC334964: Fix a bug introduced by rL334850.
Fix a bug introduced by rL334850
Jun 18 2018, 11:01 AM
tkrupa closed D48288: Fix a bug introduced by rL334850.
Jun 18 2018, 11:01 AM
tkrupa created D48288: Fix a bug introduced by rL334850.
Jun 18 2018, 10:23 AM
tkrupa added a comment to D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

Ping.

Jun 18 2018, 2:44 AM

Jun 15 2018

tkrupa committed rC334850: [X86] Lowering sqrt intrinsics to native IR.
[X86] Lowering sqrt intrinsics to native IR
Jun 15 2018, 11:12 AM
tkrupa committed rL334849: [X86] Lowering sqrt intrinsics to native IR.
[X86] Lowering sqrt intrinsics to native IR
Jun 15 2018, 11:12 AM
tkrupa committed rL334850: [X86] Lowering sqrt intrinsics to native IR.
[X86] Lowering sqrt intrinsics to native IR
Jun 15 2018, 11:12 AM
tkrupa closed D41168: [X86] Lowering X86 avx512 sqrt intrinsics to IR.
Jun 15 2018, 11:12 AM
tkrupa closed D41599: [X86] Lowering X86 avx512 sqrt intrinsics to IR - LLVM.
Jun 15 2018, 11:12 AM

Jun 14 2018

tkrupa updated the diff for D41168: [X86] Lowering X86 avx512 sqrt intrinsics to IR.

Fixed rounding mode calls.

Jun 14 2018, 11:16 AM
tkrupa committed rL334741: [X86] Lowering Mask Scalar intrinsics to native IR (Clang part).
[X86] Lowering Mask Scalar intrinsics to native IR (Clang part)
Jun 14 2018, 10:41 AM
tkrupa committed rC334741: [X86] Lowering Mask Scalar intrinsics to native IR (Clang part).
[X86] Lowering Mask Scalar intrinsics to native IR (Clang part)
Jun 14 2018, 10:41 AM
tkrupa closed D47979: [X86] Lowering Mask Scalar add/sub/mul/div intrinsics to native IR (Clang part).
Jun 14 2018, 10:41 AM
tkrupa committed rL334740: [X86] Lowering Mask Scalar intrinsics to native IR (LLVM part).
[X86] Lowering Mask Scalar intrinsics to native IR (LLVM part)
Jun 14 2018, 10:37 AM
tkrupa closed D47978: [X86] Lowering Mask Scalar add/sub/mul/div intrinsics to native IR (LLVM part).
Jun 14 2018, 10:37 AM

Jun 10 2018

tkrupa added a comment to D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

Ping

Jun 10 2018, 12:57 PM
tkrupa added inline comments to D47979: [X86] Lowering Mask Scalar add/sub/mul/div intrinsics to native IR (Clang part).
Jun 10 2018, 7:38 AM

Jun 9 2018

tkrupa updated the summary of D47978: [X86] Lowering Mask Scalar add/sub/mul/div intrinsics to native IR (LLVM part).
Jun 9 2018, 3:21 AM
tkrupa created D47979: [X86] Lowering Mask Scalar add/sub/mul/div intrinsics to native IR (Clang part).
Jun 9 2018, 3:21 AM
tkrupa created D47978: [X86] Lowering Mask Scalar add/sub/mul/div intrinsics to native IR (LLVM part).
Jun 9 2018, 3:18 AM

Jun 7 2018

tkrupa committed rL334175: [X86] Block UndefRegUpdate.
[X86] Block UndefRegUpdate
Jun 7 2018, 1:53 AM
tkrupa closed D47621: [X86] Block UndefRegUpdate.
Jun 7 2018, 1:53 AM
tkrupa committed rL334171: Test commit access..
Test commit access.
Jun 7 2018, 1:24 AM

Jun 6 2018

tkrupa accepted D47724: [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins..

LGTM

Jun 6 2018, 10:38 AM

Jun 5 2018

tkrupa added inline comments to D47724: [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins..
Jun 5 2018, 3:52 AM
tkrupa updated the diff for D41599: [X86] Lowering X86 avx512 sqrt intrinsics to IR - LLVM.
Jun 5 2018, 3:38 AM

Jun 4 2018

tkrupa updated the diff for D41168: [X86] Lowering X86 avx512 sqrt intrinsics to IR.

Removed CHECK-NOTs for consistency.

Jun 4 2018, 2:27 AM

Jun 1 2018

tkrupa updated the diff for D46892: [X86] Lowering addus/subus intrinsics to native IR (Clang part).
Jun 1 2018, 2:57 AM
tkrupa added a comment to D46892: [X86] Lowering addus/subus intrinsics to native IR (Clang part).

Whoops, that's a wrong revision. I'll revert it shortly.

Jun 1 2018, 2:54 AM
tkrupa updated the diff for D41168: [X86] Lowering X86 avx512 sqrt intrinsics to IR.

Added missing scalar intrinsics without rounding.

Jun 1 2018, 2:52 AM
tkrupa updated the diff for D46892: [X86] Lowering addus/subus intrinsics to native IR (Clang part).

Added missing scalar intrinsics without rounding.

Jun 1 2018, 2:49 AM
tkrupa updated the diff for D41599: [X86] Lowering X86 avx512 sqrt intrinsics to IR - LLVM.

Added lowering of scalar sqrt intrinsics without rounding (relies on D47621).

Jun 1 2018, 2:48 AM
tkrupa requested review of D41599: [X86] Lowering X86 avx512 sqrt intrinsics to IR - LLVM.
Jun 1 2018, 2:46 AM
tkrupa added a comment to D41168: [X86] Lowering X86 avx512 sqrt intrinsics to IR.

Mask scalar case is closed and doesn't have any effects on this revision. Besides, I resolved issues connected to lowering scalar sqrt intrinsics without rounding (that is, if D47621 is accepted). Should I add them here to have everything sqrt in one place or upstream this and add them to a new revision?

Jun 1 2018, 1:57 AM
tkrupa created D47621: [X86] Block UndefRegUpdate.
Jun 1 2018, 1:25 AM

May 29 2018

tkrupa updated the diff for D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

Added a check for two constant operands. I'm still waiting for answer for my last question.

May 29 2018, 5:47 AM
tkrupa updated the diff for D47443: [X86] Lowering FMA intrinsics to native IR (LLVM part).

Added fast-isel tests - there are some XORs/MOVs which aren't combined but it doesn't look terrible overall.

May 29 2018, 4:35 AM

May 28 2018

tkrupa added a comment to D47443: [X86] Lowering FMA intrinsics to native IR (LLVM part).

You're going to hate me for this, but we could do with -fast-isel test files covering the builtin test files codegen on the clang side

May 28 2018, 1:27 PM
tkrupa updated the diff for D47443: [X86] Lowering FMA intrinsics to native IR (LLVM part).
May 28 2018, 1:15 PM
tkrupa updated the summary of D47443: [X86] Lowering FMA intrinsics to native IR (LLVM part).
May 28 2018, 1:32 AM
tkrupa created D47444: [X86] Lowering FMA intrinsics to native IR (Clang part).
May 28 2018, 1:31 AM
tkrupa created D47443: [X86] Lowering FMA intrinsics to native IR (LLVM part).
May 28 2018, 1:23 AM
tkrupa updated the diff for D47012: [X86] Scalar mask and scalar move optimizations.
May 28 2018, 1:08 AM

May 25 2018

tkrupa updated the diff for D47012: [X86] Scalar mask and scalar move optimizations.

Fixed fma mask3 pattern.

May 25 2018, 3:39 AM
tkrupa updated the summary of D47012: [X86] Scalar mask and scalar move optimizations.
May 25 2018, 12:15 AM
tkrupa updated the diff for D47012: [X86] Scalar mask and scalar move optimizations.
May 25 2018, 12:14 AM

May 24 2018

tkrupa added inline comments to D47012: [X86] Scalar mask and scalar move optimizations.
May 24 2018, 12:21 AM

May 21 2018

tkrupa updated the diff for D47012: [X86] Scalar mask and scalar move optimizations.

Test diffs are now visible.
Note: there is no test for div in combine-select.ll - div pattern gets split into 3 basic block with a branch condition, so this optimization doesn't apply to it.

May 21 2018, 1:05 PM
tkrupa added a comment to D47012: [X86] Scalar mask and scalar move optimizations.

Add these tests with the current codegen so the patch shows the diff improvement?

May 21 2018, 11:09 AM
tkrupa added a comment to D46946: [X86] Lowering shift intrinsics to native IR.

Nevermind, I got it wrong - after compiling something similar to https://bugs.llvm.org/show_bug.cgi?id=37417, it does split the pattern. Moreover, it splits it in different manner for different intrinsics and hoists more than one more instruction, so it might not be as simple as moving back than one subtraction in the bug reproducer. I don't think there is any other way though - replicating simplifyX86immShift and simplifyX86varShift directly in IR would be a nightmare.

May 21 2018, 3:43 AM
tkrupa added a comment to D47019: [X86] Lowering rotation intrinsics to native IR.

Because emitting shifts in IR is more complicated than just adding an shl/lshr node due to those poison values (see D46946) and would create some redundant code. I guess I can use simplifyX86immShift directly instead of emitting a call here.
As for the bug - much more than one instruction gets thrown out of the loop after applying shift lowering patch - I'm leaning to leaving only non-variable intrinsics in this patch and implement variable ones after the generic intrinsic is introduced.

May 21 2018, 3:32 AM
tkrupa updated the diff for D46946: [X86] Lowering shift intrinsics to native IR.

Added missing test checks.

May 21 2018, 2:37 AM
tkrupa added a comment to D46946: [X86] Lowering shift intrinsics to native IR.

Can you provide an example of such behavior? I tried various (rather simple) tests like this one:

May 21 2018, 2:11 AM
tkrupa updated the diff for D47012: [X86] Scalar mask and scalar move optimizations.

Added tests for ISelLowering combining and FMA patterns.

May 21 2018, 1:29 AM

May 17 2018

tkrupa added a comment to D47012: [X86] Scalar mask and scalar move optimizations.

Yes we could, but only for X86ISelLowering folding and FMA patterns - the other ones are multiclasses without any definitions, so they never alter anything at this point. They are intended to be a basis for several upcoming patches which will add defms for these patterns with the tests alongside them. I'm OOO tomorrow so I'll be able to add the tests for first two cases on Monday.

May 17 2018, 3:20 PM
tkrupa added reviewers for D47019: [X86] Lowering rotation intrinsics to native IR: craig.topper, spatel, sroland.
May 17 2018, 11:18 AM
tkrupa created D47019: [X86] Lowering rotation intrinsics to native IR.
May 17 2018, 10:07 AM
tkrupa updated the summary of D47012: [X86] Scalar mask and scalar move optimizations.
May 17 2018, 5:12 AM
tkrupa created D47012: [X86] Scalar mask and scalar move optimizations.
May 17 2018, 5:10 AM
tkrupa added a comment to D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

About lib/Target/X86/X86ISelLowering.cpp:36183 - does it make sense to emit normal ADD/SUB without saturation when element type after truncation is larger than before extension?

May 17 2018, 3:05 AM

May 16 2018

tkrupa added inline comments to D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).
May 16 2018, 9:16 AM
tkrupa created D46946: [X86] Lowering shift intrinsics to native IR.
May 16 2018, 7:47 AM
tkrupa updated the summary of D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).
May 16 2018, 1:20 AM
tkrupa added inline comments to D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).
May 16 2018, 12:57 AM
tkrupa updated the diff for D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

I brought back lowering in AutoUpgrade for unsigned intrinsics, although I'm not sure if there's a universal agreement about it. In email correspondence @DavidKreitzer
suggested that it would be better to retain the intrinsics and get rid of them in another time/patch, which Craig disagreed to and argued we can lower non-complex ones (like addus/subus) without much trouble. Is there a consensus about it?

May 16 2018, 12:57 AM

May 15 2018

tkrupa created D46892: [X86] Lowering addus/subus intrinsics to native IR (Clang part).
May 15 2018, 10:09 AM
tkrupa added reviewers for D46892: [X86] Lowering addus/subus intrinsics to native IR (Clang part): craig.topper, RKSimon, spatel.
May 15 2018, 10:09 AM
tkrupa updated the diff for D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

As was decided, I removed lowering in Autoupgrade.cpp. I also moved the tests into corresponding *target*-intrinsics-canonical.ll files as discussed with @mike.dvoretsky. I added tests for PR37260 in test/CodeGen/X86/avx2-intrinsics-canonical.ll.
What's your opinion on fast-isel subus emmision and my proposition in detectAddSubSatPattern function?

May 15 2018, 10:02 AM

May 11 2018

tkrupa added a comment to D46742: [X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins..

Nevermind - these four are not strictly truncating. Sorry for the confusion.

May 11 2018, 3:06 AM
tkrupa added a comment to D46742: [X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins..

There are four other similar intrinsics which convert to 128/256-bit vectors:

May 11 2018, 12:31 AM

May 10 2018

tkrupa added a comment to D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

During internal review I proposed something like that (~0 - a) < b ? ~0 : a+b.
It gets rid of zext/trunc in addus pattern but introduces additional subtraction which is presumably more costly. Your solution seems much better, I'll change it to that form.

May 10 2018, 11:11 PM
tkrupa added a comment to D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

About test/CodeGen/X86/avx2-intrinsics-fast-isel.ll change - there is a canonical form for subus pattern I'm using here - it's different from adds/addus/subs patterns. While those three use ext/trunc pattern and fold correctly, subus has only max+sub. If fast-isel is enabled, sub is not put into SelectionDAG - that's what prevents it from combining. Instead, it's appended after isel as a lowered node. Is there a machine instruction pass for combining?

May 10 2018, 1:27 AM

May 9 2018

tkrupa updated subscribers of D46179: [X86] Lowering addus/subus intrinsics to native IR (LLVM part).

We're currently discussing the possible solutions for the JIT pipeline issue with @DavidKreitzer.

May 9 2018, 9:51 AM