dtemirbulatov (Dinar Temirbulatov)
User

Projects

User does not belong to any projects.

User Details

User Since
Sep 17 2015, 10:06 AM (100 w, 5 d)

Recent Activity

Sun, Aug 20

dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Rebase

Sun, Aug 20, 10:45 PM

Fri, Aug 18

dtemirbulatov updated the diff for D36766: Reorder operands with provided Opcode.

Remove getDefaultConstantForOpcode function for now.

Fri, Aug 18, 5:49 AM

Thu, Aug 17

dtemirbulatov updated the diff for D36766: Reorder operands with provided Opcode.

test update to simpler

Thu, Aug 17, 7:01 PM

Wed, Aug 16

dtemirbulatov updated the diff for D36766: Reorder operands with provided Opcode.

remove extra flag -slp-vectorizer in test.

Wed, Aug 16, 7:57 PM
dtemirbulatov updated the diff for D36766: Reorder operands with provided Opcode.

update testcase.

Wed, Aug 16, 7:53 PM

Tue, Aug 15

dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

update after RKSimon's remarks.

Tue, Aug 15, 9:08 PM
dtemirbulatov created D36766: Reorder operands with provided Opcode.
Tue, Aug 15, 1:51 PM

Mon, Aug 14

dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Rebase

Mon, Aug 14, 7:45 PM
dtemirbulatov updated the diff for D36518: [SLPVectorizer] Schedule bundle with different opcodes..

correct test filename spelling.

Mon, Aug 14, 2:40 AM

Sun, Aug 13

dtemirbulatov updated the diff for D36518: [SLPVectorizer] Schedule bundle with different opcodes..

update test.

Sun, Aug 13, 9:53 PM
dtemirbulatov added inline comments to D36518: [SLPVectorizer] Schedule bundle with different opcodes..
Sun, Aug 13, 9:40 PM

Sat, Aug 12

dtemirbulatov updated the diff for D36518: [SLPVectorizer] Schedule bundle with different opcodes..

Update after RKSimon's remarks and replaced testcase for more appropriate.

Sat, Aug 12, 12:09 PM

Thu, Aug 10

dtemirbulatov updated the diff for D36518: [SLPVectorizer] Schedule bundle with different opcodes..

update after RKSimon remarks and add test.

Thu, Aug 10, 8:05 PM
dtemirbulatov updated the diff for D36518: [SLPVectorizer] Schedule bundle with different opcodes..

Test cases?

Well, I could add tests here, but it is a like a main flow of this algorithm and it should be tested already here.

Thu, Aug 10, 4:55 AM
dtemirbulatov added inline comments to D36518: [SLPVectorizer] Schedule bundle with different opcodes..
Thu, Aug 10, 4:52 AM

Wed, Aug 9

dtemirbulatov created D36518: [SLPVectorizer] Schedule bundle with different opcodes..
Wed, Aug 9, 6:45 AM

Tue, Aug 8

dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Remove technical comments

Tue, Aug 8, 12:03 PM
dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Fixed issue with horizontal-list.ll.

Tue, Aug 8, 11:58 AM

Mon, Aug 7

dtemirbulatov added a comment to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Oops, I just noticed that we have lost flag in horizontal-list.ll , I will look at the issue.

Mon, Aug 7, 7:23 PM
dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Rebase

Mon, Aug 7, 7:20 PM

Wed, Aug 2

dtemirbulatov updated the diff for D35965: [X86] SET0 to use XMM registers where possible PR26018 PR32862 2/2.

Rebase, remove all "End function" lines.

Wed, Aug 2, 10:54 AM

Tue, Aug 1

dtemirbulatov updated the diff for D35965: [X86] SET0 to use XMM registers where possible PR26018 PR32862 2/2.

Reversed back the last change.

Tue, Aug 1, 12:28 AM

Mon, Jul 31

dtemirbulatov updated the diff for D35965: [X86] SET0 to use XMM registers where possible PR26018 PR32862 2/2.

update after Craig's remark.

Mon, Jul 31, 5:01 AM

Sun, Jul 30

dtemirbulatov updated the diff for D35965: [X86] SET0 to use XMM registers where possible PR26018 PR32862 2/2.

Fix "change to a subregister" issue pointed by Craig.

Sun, Jul 30, 2:18 AM

Thu, Jul 27

dtemirbulatov created D35965: [X86] SET0 to use XMM registers where possible PR26018 PR32862 2/2.
Thu, Jul 27, 4:27 PM
dtemirbulatov updated the diff for D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .

Update to vector-shuffle-combining-avx.ll

Thu, Jul 27, 7:27 AM
dtemirbulatov updated the diff for D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .

update with X86::AVX512_256_SET0 included, please ignore my last comment

Thu, Jul 27, 6:08 AM
dtemirbulatov added a comment to D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .

we don't need to change X86::AVX512_256_SET0, it is already using X86::sub_ymm there, I have not seen any ZMM to YMM change in the diff.

Thu, Jul 27, 5:27 AM
dtemirbulatov added a comment to D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .

ok, thanks, I will redo the change.

Thu, Jul 27, 3:01 AM

Wed, Jul 26

dtemirbulatov updated the diff for D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .

Update after RKSimon's remarks.

Wed, Jul 26, 3:57 PM
dtemirbulatov updated the diff for D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .

We decided to split this change into two AVX and AVX512 , this is part one AVX and AVX2.

Wed, Jul 26, 12:21 PM
dtemirbulatov updated the diff for D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .

updated manually vec_uint_to_fp-fastmath.ll, memset.ll

Wed, Jul 26, 9:29 AM
dtemirbulatov added a comment to D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .

and memset.ll, sorry.

Wed, Jul 26, 8:56 AM
dtemirbulatov added a comment to D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .

oh, I missed that vec_uint_to_fp-fastmath.ll should be updated manually. I will redo my change.

Wed, Jul 26, 8:55 AM
dtemirbulatov updated the diff for D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .

add X86::AVX512_256_SET0 handling, format, rebase

Wed, Jul 26, 8:38 AM

Tue, Jul 25

dtemirbulatov added a comment to D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .

oh, I missed "X86::AVX512_256_SET0:" case , I have to check some testcases, I think I saw there something incorrect. I will redo my change.

Tue, Jul 25, 7:44 AM
dtemirbulatov created D35839: [X86] SET0 to use XMM registers where possible PR26018 PR32862 .
Tue, Jul 25, 7:36 AM
dtemirbulatov added a comment to D35769: Allow setInsertPointAfterBundle to handle vectors with different opcodes.

Ping.

Tue, Jul 25, 6:23 AM

Jul 22 2017

dtemirbulatov created D35769: Allow setInsertPointAfterBundle to handle vectors with different opcodes.
Jul 22 2017, 3:05 PM

Jul 21 2017

dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

rebase after recent changes.

Jul 21 2017, 9:15 AM

Jul 19 2017

dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

rebase

Jul 19 2017, 5:49 AM

Jul 14 2017

dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

rebase to the current tree.

Jul 14 2017, 3:48 AM
dtemirbulatov updated the diff for D35292: [SLPVectorizer] Add propagateIRFlagsWithOp() function to propagate IRFlags for specific Operation.

update after rksimon's remarks.

Jul 14 2017, 3:43 AM

Jul 13 2017

dtemirbulatov updated the diff for D35292: [SLPVectorizer] Add propagateIRFlagsWithOp() function to propagate IRFlags for specific Operation.

Fixed issue with test/Transforms/SLPVectorizer/X86/horizontal-list.ll, Merged two versions of propagateIRFlags into one.

Jul 13 2017, 12:57 PM
dtemirbulatov added inline comments to D35292: [SLPVectorizer] Add propagateIRFlagsWithOp() function to propagate IRFlags for specific Operation.
Jul 13 2017, 12:54 AM
dtemirbulatov added a comment to D35292: [SLPVectorizer] Add propagateIRFlagsWithOp() function to propagate IRFlags for specific Operation.

Resulting in to:
VL[0]: %r1 = add i32 %arg, undef
VL[1]: %r2 = add nsw i32 %r1, undef
VL[2]: %r3 = add nsw i32 %r2, undef
VL[3]: %r4 = add nsw i32 %r3, undef
VL[4]: %r5 = add nsw i32 %r4, undef

Jul 13 2017, 12:30 AM
dtemirbulatov added a comment to D35292: [SLPVectorizer] Add propagateIRFlagsWithOp() function to propagate IRFlags for specific Operation.

I'm not sure if Z having those flags is on purpose, though. And I don't see what we get from that.

Please tell me if I misunderstood something. I'm not the most familiar with the SLP vectorizer.

Usually it should have flags in common and it collects those flags see "Intersection->andIRFlags(V);" in the loop across all VL,
but I have one example from test/Transforms/SLPVectorizer/X86/horizontal-list.ll where NUW flag was canceled by NSW.
VL[0]: %r1 = add nuw i32 %arg, undef
VL[1]: %r2 = add nsw i32 %r1, undef
VL[2]: %r3 = add nsw i32 %r2, undef
VL[3]: %r4 = add nsw i32 %r3, undef
VL[4]: %r5 = add nsw i32 %r4, undef

Jul 13 2017, 12:18 AM

Jul 12 2017

dtemirbulatov added a comment to D35292: [SLPVectorizer] Add propagateIRFlagsWithOp() function to propagate IRFlags for specific Operation.

I don't get what you want to do here, sorry.
As I see it, in all the uses of propagateIRFlagsWithOp(X, Y, Z), we have Z == Y[0]. Which will end up being *almost* the same as the propagateIRFlags but with the side-effect that Z will also have those flags.

propagateIRFlagsWithOp only collects flags from Y if Y.Code == Z.Code while propagateIRFlags copies any flag from Y.

Jul 12 2017, 9:52 AM
dtemirbulatov added inline comments to D35292: [SLPVectorizer] Add propagateIRFlagsWithOp() function to propagate IRFlags for specific Operation.
Jul 12 2017, 8:33 AM

Jul 11 2017

dtemirbulatov updated subscribers of D35292: [SLPVectorizer] Add propagateIRFlagsWithOp() function to propagate IRFlags for specific Operation.
Jul 11 2017, 10:51 PM
dtemirbulatov created D35292: [SLPVectorizer] Add propagateIRFlagsWithOp() function to propagate IRFlags for specific Operation.
Jul 11 2017, 10:50 PM

Jul 9 2017

dtemirbulatov abandoned D35139: [SLPVectorizer][InstCombine] Fix PR21780 Expansion of 256 bit vector loads fails to fold into shuffles.

Abandoning review due to : there should separate reviews for one for InstCombine Pass and another one for SLP, and dereferenceable_or_null metadata can only be applied to loads of a pointer type.

Jul 9 2017, 10:03 PM
dtemirbulatov created D35139: [SLPVectorizer][InstCombine] Fix PR21780 Expansion of 256 bit vector loads fails to fold into shuffles.
Jul 9 2017, 6:13 AM

Jul 4 2017

dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Jul 4 2017, 5:58 PM

Jun 30 2017

dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

update after D34756 commit.

Jun 30 2017, 4:04 PM

Jun 28 2017

dtemirbulatov created D34756: [SLPVectorizer] Introducing getTreeEntry() [NFC].
Jun 28 2017, 8:20 AM

Jun 19 2017

dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Jun 19 2017, 5:56 PM
dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

update after rksimon's remarks.

Jun 19 2017, 5:45 PM

Jun 15 2017

dtemirbulatov updated the diff for D33406: PR28129 expand vector oparation to an IR constant..

Update formatting, comments

Jun 15 2017, 1:34 PM
dtemirbulatov updated the diff for D33406: PR28129 expand vector oparation to an IR constant..

Update after http://lists.llvm.org/pipermail/llvm-dev/2017-June/114120.html. Added 0x1b(_CMP_FALSE_OS), 0x1f(_CMP_TRUE_US) handling.

Jun 15 2017, 9:00 AM

Jun 5 2017

dtemirbulatov added a comment to D33406: PR28129 expand vector oparation to an IR constant..

Ping. [andrew.w.kaylor, scanon] Is it OK to assume that FP exceptions are off by default and allow such transformation to constants in the IR since we know that we would have exception with "1.00 -nan" for _mm256_cmp_ps(a, b, 15) in case FP exceptions are enabled?

Jun 5 2017, 6:02 AM

May 31 2017

dtemirbulatov added a comment to D33406: PR28129 expand vector oparation to an IR constant..

We should've asked this first: is that fold allowed in the default FPENV state that we assume that clang is operating in?

I suppose it is FE_ALL_EXCEPT.

May 31 2017, 3:03 AM

May 30 2017

dtemirbulatov abandoned D33682: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Abandoning this revision with there is already open https://reviews.llvm.org/D28907 for the issue.

May 30 2017, 10:22 AM
dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

rebased the change against trunk with no new regressions

May 30 2017, 10:21 AM
dtemirbulatov commandeered D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

We agreed with Alexey that I would continue on this issue:
<dinar_> Hi Alexey, I rebased your change for PR30787, do you mind if I update the change in https://reviews.llvm.org/D28907?
<ABataev> Ok, go ahead

May 30 2017, 10:19 AM
dtemirbulatov added reviewers for D33682: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops.: spatel, mzolotukhin, ABataev, mkuper, filcab, RKSimon, hfinkel.
May 30 2017, 9:22 AM
dtemirbulatov created D33682: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
May 30 2017, 9:17 AM

May 24 2017

dtemirbulatov added a comment to D33406: PR28129 expand vector oparation to an IR constant..

Should we handle the 'pd256' version the same way?
How about the 0xb ('false') constant? It should produce a zero here?
Can or should we deal with the signalling versions (0x1b, 0x1f) too?

hm looks like 0xb(_CMP_FALSE_OQ) is ordered, so it is not possible and 0x1b or 0x1f might emit a signal.

May 24 2017, 8:02 AM
dtemirbulatov updated the diff for D33406: PR28129 expand vector oparation to an IR constant..

add _mm256_cmp_pd double version
add comments in lib/CodeGen/CGBuiltin.cpp
replaced 0xf to _CMP_TRUE_UQ in avx-builtins.c

May 24 2017, 7:57 AM

May 22 2017

dtemirbulatov abandoned D33407: PR28129 Avoid producing vxorps to clear the fake inputs..

canceling because x86 is not currently set up to look at breaking a dependency for anything but the 2nd operand.

May 22 2017, 6:30 PM
dtemirbulatov added a reviewer for D33406: PR28129 expand vector oparation to an IR constant.: craig.topper.
May 22 2017, 5:18 AM
dtemirbulatov created D33407: PR28129 Avoid producing vxorps to clear the fake inputs..
May 22 2017, 5:15 AM
dtemirbulatov updated subscribers of D33406: PR28129 expand vector oparation to an IR constant..
May 22 2017, 5:06 AM
dtemirbulatov created D33406: PR28129 expand vector oparation to an IR constant..
May 22 2017, 5:01 AM

May 12 2017

dtemirbulatov updated the diff for D32416: [x86, SSE] AVX1 PR28129 .

updated tests after review with proper update_llc_test_checks.py.

May 12 2017, 2:33 PM
dtemirbulatov updated the diff for D32416: [x86, SSE] AVX1 PR28129 .

Rebased and changed OptForSize to OptForMinSize.

May 12 2017, 10:36 AM

May 11 2017

dtemirbulatov updated the diff for D33055: [LoopOptimizer][Fix]PR32859, PR24738.

Fixed some typos.

May 11 2017, 2:31 PM
dtemirbulatov updated the diff for D33055: [LoopOptimizer][Fix]PR32859, PR24738.

another update after review. Erased attributes for the test added test's explanation. Added an assert that LCSSAPhi->getIncomingValue(0) is loop-invariant in the original loop

May 11 2017, 2:24 PM
dtemirbulatov updated the diff for D33055: [LoopOptimizer][Fix]PR32859, PR24738.

update after review.

May 11 2017, 10:54 AM
dtemirbulatov added a comment to D33055: [LoopOptimizer][Fix]PR32859, PR24738.

yes, both examples from PR32935, PR32859 are fixed by this change

May 11 2017, 10:52 AM

May 10 2017

dtemirbulatov added a comment to D33055: [LoopOptimizer][Fix]PR32859, PR24738.

Also, can you please make sure we don't miscompile any of the dups of the original bug with this patch?

you mean PRPR14725's testcase? yes this one looks correct to me after this change.

May 10 2017, 1:43 PM
dtemirbulatov added a comment to D33055: [LoopOptimizer][Fix]PR32859, PR24738.

I think in the particular case we are limited here to just one predecessor, there is "LCSSAPhi->getNumIncomingValues() == 1" condition above.

May 10 2017, 11:16 AM
dtemirbulatov added a comment to D33055: [LoopOptimizer][Fix]PR32859, PR24738.

Well, Out of the LCSSA for we could have "phi i32 [ 0, %for.inc.2.i ]" or "phi i32 [ undef, %for.inc.2.i ]" like in PR14725, but the IR Verifier requires for PHI one entry for each predecessor of its parent basic block. The original PR14725 just added 'undef' for an predecessor BB and it is not correct. We copy the real value for another predecessor instead of bringing 'undef'.

May 10 2017, 10:54 AM
dtemirbulatov created D33055: [LoopOptimizer][Fix]PR32859, PR24738.
May 10 2017, 10:36 AM

Apr 28 2017

dtemirbulatov updated the diff for D32416: [x86, SSE] AVX1 PR28129 .

update after issue -Os was cleared.

Apr 28 2017, 7:07 AM
dtemirbulatov updated the diff for D32416: [x86, SSE] AVX1 PR28129 .

another update after issue with -Os.

Apr 28 2017, 6:06 AM

Apr 27 2017

dtemirbulatov updated the diff for D32416: [x86, SSE] AVX1 PR28129 .

update after review process.

Apr 27 2017, 11:22 PM

Apr 25 2017

dtemirbulatov updated the diff for D32039: PR31357 fix.

Further changes after rebase.

Apr 25 2017, 11:36 PM
dtemirbulatov updated the diff for D32039: PR31357 fix.

Some variable name changes.

Apr 25 2017, 9:58 AM

Apr 24 2017

dtemirbulatov updated the diff for D32039: PR31357 fix.

update after Simon's comments.

Apr 24 2017, 12:24 AM

Apr 23 2017

dtemirbulatov added a comment to D32416: [x86, SSE] AVX1 PR28129 .

here is test-case:
define <4 x double> @cmp256_domain(<4 x double> %a) {

%cmp = fcmp oeq <4 x double> zeroinitializer, zeroinitializer
%sext = sext <4 x i1> %cmp to <4 x i64>
%mask = bitcast <4 x i64> %sext to <4 x double>
%add = fadd <4 x double> %a, %mask
ret <4 x double> %add

}
So, we are changing "immAllOnesV" for 256-bit vectors only for AVX1 machine.

Apr 23 2017, 11:58 PM
dtemirbulatov created D32416: [x86, SSE] AVX1 PR28129 .
Apr 23 2017, 11:33 PM

Apr 19 2017

dtemirbulatov updated the diff for D32039: PR31357 fix.

update test again with utils/update_llc_test_checks.py

Apr 19 2017, 10:19 AM
dtemirbulatov updated the diff for D32039: PR31357 fix.

added update to tests.

Apr 19 2017, 8:51 AM
dtemirbulatov updated the diff for D32039: PR31357 fix.

update again, something went wrong during the last update.

Apr 19 2017, 8:18 AM
dtemirbulatov updated the diff for D32039: PR31357 fix.

updated the change after rebase.

Apr 19 2017, 7:58 AM

Apr 16 2017

dtemirbulatov updated the diff for D32039: PR31357 fix.

update after Simon's comments.

Apr 16 2017, 9:50 PM

Apr 13 2017

dtemirbulatov created D32039: PR31357 fix.
Apr 13 2017, 12:35 PM

Apr 6 2017

dtemirbulatov updated the diff for D31668: Fix PR30562.

here we are checking that the store depends on the extractelement operation.

Apr 6 2017, 10:51 PM

Apr 5 2017

dtemirbulatov updated the diff for D31668: Fix PR30562.
Apr 5 2017, 7:34 AM