spatel (Sanjay Patel)
User

Projects

User does not belong to any projects.

User Details

User Since
May 22 2014, 1:24 PM (204 w, 3 d)

Recent Activity

Today

spatel added inline comments to D45733: [DAGCombiner] Unfold scalar masked merge if profitable.
Mon, Apr 23, 9:10 AM
spatel added inline comments to D45733: [DAGCombiner] Unfold scalar masked merge if profitable.
Mon, Apr 23, 9:00 AM
spatel added a comment to D45862: [InstCombine] Relax restriction in foldSelectInstWithICmp for sake of smaller code size.

Also see D24480 and the list of proposals and commits mentioned there.

Mon, Apr 23, 7:09 AM
spatel added a comment to D45862: [InstCombine] Relax restriction in foldSelectInstWithICmp for sake of smaller code size.

For now, my point is the following: yes, we are *already* doing all this bit magic. Yes, it looks like miscanonicalization, and we can discuss getting rid of all such transforms at all. But since we are already doing it and haven't yet decided to clear it off, let's at least do it well. Once we decide that all this stuff is counter-canonical, we can happily remove all this. What do you think of that?

Mon, Apr 23, 7:02 AM

Yesterday

spatel committed rL330550: [PatternMatch] allow undef elements when matching a vector zero.
[PatternMatch] allow undef elements when matching a vector zero
Sun, Apr 22, 10:11 AM
spatel committed rL330547: [InstCombine] add vector test with undef elts; NFC.
[InstCombine] add vector test with undef elts; NFC
Sun, Apr 22, 9:02 AM
spatel committed rL330543: [InstSimplify, InstCombine] add vector tests with undef elts; NFC.
[InstSimplify, InstCombine] add vector tests with undef elts; NFC
Sun, Apr 22, 7:22 AM

Sat, Apr 21

spatel committed rL330516: [InstSimplify] move tests for shifts; NFC.
[InstSimplify] move tests for shifts; NFC
Sat, Apr 21, 10:01 AM
spatel committed rL330515: [InstSimplify] move/add/regenerate checks for tests; NFC.
[InstSimplify] move/add/regenerate checks for tests; NFC
Sat, Apr 21, 9:29 AM
spatel added inline comments to D45651: [X86] Add DAG combine to turn (trunc (srl (mul ext, ext), 16) into PMULHW/PMULHUW..
Sat, Apr 21, 7:23 AM
spatel accepted D45651: [X86] Add DAG combine to turn (trunc (srl (mul ext, ext), 16) into PMULHW/PMULHUW..

LGTM, but please do check in the tests with baseline output first (just in case the code change is reverted, we don't want to lose the tests).

Sat, Apr 21, 7:19 AM
spatel added a comment to D45563: [X86][AArch64][NFC] Add tests for masked merge unfolding.

LGTM. You could increase diversity in the constant mask tests by not using a single-string-of-set-bits constant (eg 0x0f0f0f0f instead of 0x0000ffff).

But since we don't do anything (not fold, not unfold) with these patterns
with constant masks, is there any point in adding even more tests?

Sat, Apr 21, 7:07 AM
spatel added a comment to D45862: [InstCombine] Relax restriction in foldSelectInstWithICmp for sake of smaller code size.

I don't think we should be doing any of these select-of-constant transforms in instcombine.

It's worse for code analysis, more IR instructions, and may be detrimental to perf. Think about the cases where a conditional move executes at the same speed as a simple add (Ryzen?) or we have profile data for the compare, so branch prediction is perfect.

There's lots of code that does this kind of thing in the DAG, and that's where I think it belongs (using target hooks as needed). There was some discussion about this on llvm-dev here:
https://groups.google.com/forum/#!topic/llvm-dev/pid_thv2X-A

So I think we should be removing some of these transforms from instcombine rather than adding to them.

There is a second side to this though.

Even if all such "performance-degrading" transforms are removed from instcombine
(yes, i think the pass is rather huge and monolithic, this is a problem), it won't solve the problem.
The same 'bad' patterns surely could be produced via some other way.

I don't see how this is any different than lowering intrinsics.
Of course, they are no longer intrinsics, but IR, so optimizations may
break their canonical form, and backend will need to be adjusted
to recognize the IR patters (and potentially recognize the patterns
that did not originate form intrinsics, which is great).

So i'd say this really is backend's (DAGCombine?) problem.
This is just a question of whether or not there is interest to have a high quality codegen,
regardless of the input. It should recognize the 'bad' patterns, and if profitable,
transform to improve 'performance'. In this case, back to select.

And yes, absolutely, this is more complicated than just "stop dealing with it in instcombine" :)

Sat, Apr 21, 7:02 AM

Fri, Apr 20

spatel accepted D45563: [X86][AArch64][NFC] Add tests for masked merge unfolding.

LGTM. You could increase diversity in the constant mask tests by not using a single-string-of-set-bits constant (eg 0x0f0f0f0f instead of 0x0000ffff).

Fri, Apr 20, 1:37 PM
spatel added reviewers for D45862: [InstCombine] Relax restriction in foldSelectInstWithICmp for sake of smaller code size: majnemer, efriedma, hfinkel, sanjoy.

I don't think we should be doing any of these select-of-constant transforms in instcombine.

Fri, Apr 20, 10:50 AM
Herald added a reviewer for D45563: [X86][AArch64][NFC] Add tests for masked merge unfolding: javed.absar.
Fri, Apr 20, 10:24 AM
spatel committed rL330453: [utils] improve AArch64 asm parser.
[utils] improve AArch64 asm parser
Fri, Apr 20, 10:19 AM
spatel added a comment to D45653: Enable sibling-call optimization for functions returning structs.

$python2 update_llc_test_checks.py --llc-binary <directory-of-llc>/llc llvm/test/CodeGen/X86/sibcall.ll

and it modified the sibcall.ll. I'm not familiar with with the utility. I'm afraid I don't know how to use it properly and what is the meaning of the diff. I attached 2.patch. It is the changes this tool does to sibcall.ll. {F5984860}
Fri, Apr 20, 9:55 AM
spatel committed rL330445: [x86] auto-generate checks; NFC.
[x86] auto-generate checks; NFC
Fri, Apr 20, 9:50 AM
spatel added a comment to rL330437: [DAGCombine] (float)((int) f) --> ftrunc (PR36617).

I don't think we have what you're describing (looking through clang source
because I don't think there's any actual documentation...).

Fri, Apr 20, 9:08 AM
spatel committed rL330437: [DAGCombine] (float)((int) f) --> ftrunc (PR36617).
[DAGCombine] (float)((int) f) --> ftrunc (PR36617)
Fri, Apr 20, 8:11 AM
spatel closed D44909: [DAGCombine] (float)((int) f) --> ftrunc (PR36617).
Fri, Apr 20, 8:11 AM
spatel added a comment to D45858: [DivRemPairs] Fix non-determinism in use list order..

That looks ok, but I'm curious;

  1. How do you discover this problem?
  2. Is it possible to make the problem visible in a regression test (if so, please add a test)?
  3. The DenseMap usage is based on the code in BypassSlowDivision - does it have the same problem/fix?
Fri, Apr 20, 7:04 AM
Herald added a reviewer for D45710: More Fast Math Flag propagation: javed.absar.
Fri, Apr 20, 6:46 AM

Thu, Apr 19

spatel added inline comments to D45710: More Fast Math Flag propagation.
Thu, Apr 19, 3:56 PM
spatel added a comment to D45418: [SimplifyLibcalls] Atoi, strtol replacements.

Now you can merge this patch, @spatel

Thu, Apr 19, 3:53 PM
spatel created D45842: [Reassociate] swap binop operands to increase factoring potential.
Thu, Apr 19, 3:30 PM
spatel committed rL330368: [Reassociate] add baseline tests for binop swapping; NFC.
[Reassociate] add baseline tests for binop swapping; NFC
Thu, Apr 19, 2:59 PM
spatel committed rL330348: [Reassociate] fix formatting; NFC.
[Reassociate] fix formatting; NFC
Thu, Apr 19, 11:01 AM
spatel added a comment to D44909: [DAGCombine] (float)((int) f) --> ftrunc (PR36617).

I see that patches were applied to the Chromium sources:
https://bugs.chromium.org/p/chromium/issues/detail?id=831145#c59

Thu, Apr 19, 8:37 AM
spatel accepted D45809: Add SPARC support to update_llc_test_checks.py.

LGTM.
I've never touched a sparc test myself, but this looks like a reasonable copy/paste. :)
Feel free to make any sparc regex changes needed to improve those CHECK lines.

Thu, Apr 19, 7:36 AM

Wed, Apr 18

spatel added inline comments to D45664: [InstCombine] Canonicalize variable mask in masked merge .
Wed, Apr 18, 10:41 AM
spatel added a comment to D45733: [DAGCombiner] Unfold scalar masked merge if profitable.

I don't think it will be possible to check that until after the instcombine part has landed, so ok, at least for now i will stop unfolding [constant mask] in dagcombine.

While there, any hint re pattern matchers for this code?

Wed, Apr 18, 9:43 AM
spatel updated subscribers of D45733: [DAGCombiner] Unfold scalar masked merge if profitable.

Yeah, that is the question, i'm having. I did look at mca output.

Here is what MCA says about that for -mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a75


Or is this a scheduling info problem?

Wed, Apr 18, 9:10 AM
spatel added a comment to D45655: [InstCombine][RFC] Canonicalize constant mask in masked merge mattern.

See my comment in D45733. Sorry that I didn't get a chance to review/discuss this sooner.

Wed, Apr 18, 8:09 AM
spatel added a comment to D45733: [DAGCombiner] Unfold scalar masked merge if profitable.

If the mask is constant, right now i always unfold it.

Wed, Apr 18, 7:51 AM
spatel committed rL330259: [SimplifyLibcalls] Realloc(null, N) -> Malloc(N).
[SimplifyLibcalls] Realloc(null, N) -> Malloc(N)
Wed, Apr 18, 7:25 AM
spatel closed D45413: [SimplifyLibcalls] Realloc(null, N) -> Malloc(N).
Wed, Apr 18, 7:24 AM
spatel added a reviewer for D45710: More Fast Math Flag propagation: wristow.
Wed, Apr 18, 7:07 AM
spatel added inline comments to D45710: More Fast Math Flag propagation.
Wed, Apr 18, 7:07 AM

Tue, Apr 17

spatel committed rL330237: [InstCombine] peek through bitcasted vector/array pointer GEP operand.
[InstCombine] peek through bitcasted vector/array pointer GEP operand
Tue, Apr 17, 5:40 PM
spatel closed D44833: [InstCombine] peek through bitcasted vector/array pointer GEP operand.
Tue, Apr 17, 5:40 PM
spatel added a comment to D45413: [SimplifyLibcalls] Realloc(null, N) -> Malloc(N).

The patch doesn't compile as-is. Please post a corrected patch.

Tue, Apr 17, 5:26 PM
spatel added reviewers for D45731: [InstCombine] Adjusting bswap pattern matching to hold for And/Shift mixed case: efriedma, kparzysz, lebedev.ri.

Thanks for coming back to this. I think this patch is fine as one more match of a potential IR variant, but...
Since the time we started the discussion in D41353, there was a proposal to improve matching for ctpop and ctlz in D45173. I'm curious what others think about the fact that we match bswap/bitreverse in regular instcombine. Would it be better to match all of those intrinsics in the same place? Do we distinguish these based on how often we expect them to occur?

Tue, Apr 17, 4:44 PM
spatel added inline comments to D45563: [X86][AArch64][NFC] Add tests for masked merge unfolding.
Tue, Apr 17, 4:27 PM
spatel added a comment to D45563: [X86][AArch64][NFC] Add tests for masked merge unfolding.

Not sure this answer is pertinent any longer, but on PPC we can generate the VSX version of the vector select (xxsel) for rather specific situations. See test/CodeGen/PowerPC/vselect-constants.ll and test/CodeGen/PowerPC/vsx.ll.

Tue, Apr 17, 10:31 AM
spatel added a comment to D45563: [X86][AArch64][NFC] Add tests for masked merge unfolding.

Don't know what changes are planned here, but this is on the right track. We want to have coverage of the possible canonical IR variations for various targets.

I'm working on that right now, got it working, maybe will update this + post the dagcombiner part (that is where i should have put it, right?) in a few hours.

Tue, Apr 17, 9:34 AM
spatel updated subscribers of D45563: [X86][AArch64][NFC] Add tests for masked merge unfolding.

Don't know what changes are planned here, but this is on the right track. We want to have coverage of the possible canonical IR variations for various targets.

Tue, Apr 17, 9:18 AM
spatel added a reviewer for D45343: [InstCombine] Always remove null check before free: qcolombet.

I don't understand what's going on here. Why is this patch marked 'Accepted'? In general, you shouldn't approve your own patches that you've posted for review.

Tue, Apr 17, 8:57 AM
spatel added reviewers for D45653: Enable sibling-call optimization for functions returning structs: qcolombet, Gerolf, escha, alexr.

Unfortunately when this if was added, no explanation was given why it is needed here. The corresponding test case is marked rdar://7726868, but I can not see what it is about.

Tue, Apr 17, 8:09 AM
spatel added reviewers for D45710: More Fast Math Flag propagation: jbhateja, hfinkel, escha, qcolombet, echristo.

Thanks for working on this. Adding some more potential reviewers...

Tue, Apr 17, 7:56 AM
spatel added inline comments to D45572: [X86] Replace action Promote with Custom for operation ISD::SINT_TO_FP.
Tue, Apr 17, 7:35 AM

Mon, Apr 16

spatel committed rL330137: [InstCombine] simplify code in SimplifyAssociativeOrCommutative; NFCI.
[InstCombine] simplify code in SimplifyAssociativeOrCommutative; NFCI
Mon, Apr 16, 10:18 AM
spatel committed rL330129: [InstCombine] simplify getBinOpsForFactorization(); NFC.
[InstCombine] simplify getBinOpsForFactorization(); NFC
Mon, Apr 16, 8:22 AM
spatel committed rL330126: [InstCombine] simplify fneg+fadd folds; NFC.
[InstCombine] simplify fneg+fadd folds; NFC
Mon, Apr 16, 7:17 AM
spatel committed rL330124: [InstCombine] fix formatting; NFC.
[InstCombine] fix formatting; NFC
Mon, Apr 16, 6:24 AM

Sun, Apr 15

spatel committed rL330098: [DAGCombiner, PowerPC] allow X - (fpext(-Y) --> X + fpext(Y) with multiple uses.
[DAGCombiner, PowerPC] allow X - (fpext(-Y) --> X + fpext(Y) with multiple uses
Sun, Apr 15, 9:49 AM
spatel closed D45598: [DAGCombiner, PowerPC] allow X - (fpext(-Y) --> X + fpext(Y) with multiple uses.
Sun, Apr 15, 9:49 AM
spatel committed rL330097: [InstCombine] simplify more code for distributive property; NFCI.
[InstCombine] simplify more code for distributive property; NFCI
Sun, Apr 15, 9:24 AM
spatel accepted D45631: [InstCombine] Simplify 'xor'/'add' to 'or' if no common bits are set..

Note that for the value tracking improvement, there are a couple of other passes (that I've never looked at) that use haveNoCommonBitsSet(). If it's possible to show a test improvement from this change for those passes, that would be nice.

Sun, Apr 15, 9:23 AM
spatel committed rL330096: [InstCombine] simplify code for distributive property; NFCI.
[InstCombine] simplify code for distributive property; NFCI
Sun, Apr 15, 8:43 AM
spatel added inline comments to D45631: [InstCombine] Simplify 'xor'/'add' to 'or' if no common bits are set..
Sun, Apr 15, 7:54 AM

Sat, Apr 14

spatel added inline comments to D45453: [InstCombine] Enable Add/Sub simplifications with only 'reassoc' FMF.
Sat, Apr 14, 3:10 PM
spatel accepted D45453: [InstCombine] Enable Add/Sub simplifications with only 'reassoc' FMF.

LGTM - see inline for a couple of potential follow-ups.

Sat, Apr 14, 10:05 AM
spatel committed rL330086: [InstCombine] add shift+logic tests (PR37098); NFC.
[InstCombine] add shift+logic tests (PR37098); NFC
Sat, Apr 14, 6:44 AM
spatel added a comment to D45598: [DAGCombiner, PowerPC] allow X - (fpext(-Y) --> X + fpext(Y) with multiple uses.

From the PPC perspective, this is fine since the sequences are equivalent. It also seems perfectly reasonable to look through a free operation here.

Sat, Apr 14, 6:30 AM

Fri, Apr 13

spatel added a comment to D44833: [InstCombine] peek through bitcasted vector/array pointer GEP operand.

Ping * 2.

Fri, Apr 13, 1:07 PM
spatel updated subscribers of D45453: [InstCombine] Enable Add/Sub simplifications with only 'reassoc' FMF.

I've updated the code-change in "InstCombineAddSub.cpp" to require both 'reassoc' and 'nsz' to do the transformations. (I haven't changed the name from FAddCombine to FAddSubCombine or ReassociateFAddFSub. I could do that if we move forward with this.)

There could be some kind of symbiotic behavior here, but this is really saying that we don't have a well-defined division of labor between these passes, right?

Yes, a lack of a well-defined division of labor between these passes is exactly my concern. I'll look into adding or modifying tests. Depending on possible interaction, that may not be fruitful. Ultimately, that should be addressed.

In looking this over, I was pleasantly surprised that I didn't run into the troublesome interaction I was concerned about. So I've added new tests to "InstCombine/fast-math.ll" to check for the transformations happening when 'reassoc' + 'nsz' are on (with only '-instcombine' used). I've also added tests that verify that 'reassoc' alone doesn't enable the transformation in the appropriate places. Lastly, I updated the tests I had previously modified to use both 'reassoc' and 'nsz' where appropriate when testing for the minimal set of needed flags to transform things.

Fri, Apr 13, 7:55 AM

Thu, Apr 12

spatel accepted D45539: [InstCombine]: foldSelectICmpAndAnd(): and is commutative.

LGTM.

Thu, Apr 12, 3:51 PM
spatel created D45598: [DAGCombiner, PowerPC] allow X - (fpext(-Y) --> X + fpext(Y) with multiple uses.
Thu, Apr 12, 3:42 PM
spatel committed rL329963: [PowerPC] add fsub-fneg test; NFC.
[PowerPC] add fsub-fneg test; NFC
Thu, Apr 12, 3:20 PM
spatel committed rL329964: [DAGCombiner] simplify code; NFC.
[DAGCombiner] simplify code; NFC
Thu, Apr 12, 3:20 PM
spatel added a comment to D45539: [InstCombine]: foldSelectICmpAndAnd(): and is commutative.

Side note: this patch / commit should not be labeled 'NFC'.

Thu, Apr 12, 1:40 PM
spatel added a comment to D45539: [InstCombine]: foldSelectICmpAndAnd(): and is commutative.

I think the patch is correct, but it's hard to keep all of the values straight (because there are a lot!). What do you think about rearranging it a bit like this and using m_c_And():

Thu, Apr 12, 11:47 AM
spatel added a comment to D44909: [DAGCombine] (float)((int) f) --> ftrunc (PR36617).

After this change, is there some way to express this approach in C?

You can use platform-specific intrinsics, e.g. _mm_cvtsd_si32 is defined to return INT_MIN for out-of-range values. Or, in portable C, you can do a range check: if (x <= INT_MAX && x >= INT_MIN && (double)(int)x == x).

Thu, Apr 12, 9:00 AM
spatel added a comment to D44909: [DAGCombine] (float)((int) f) --> ftrunc (PR36617).

We think there's an UB check for this already, but v8 isn't UB clean.

The fast way to do the "is this double an int" check on an assembly level is to convert to int and back and then compare if the roundtripped value is equal. After this change, is there some way to express this approach in C?

Thu, Apr 12, 8:36 AM
spatel reopened D44909: [DAGCombine] (float)((int) f) --> ftrunc (PR36617).

Reopening - reverted at rL329920.

Thu, Apr 12, 8:35 AM
spatel committed rL329920: revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617).
revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
Thu, Apr 12, 8:30 AM
spatel added a comment to D44909: [DAGCombine] (float)((int) f) --> ftrunc (PR36617).

This broke some large integration test on arm64 in Chromium (https://crbug.com/831145 – doesn't yet have any information other than "something is broken" and that we bisected it to this revision). It sounds like the patch got rewritten during review; does "formally allowed by the standard, but extremely likely to break things and surprise people" still apply? It sounds like this is supposed to be behavior-preserving for values that aren't NaN and Inf, but it does change behavior for those two? (If so, in what way?)

The patch started off as an IR transform, and the quoted comment applied only to cases (I hope) where the target doesn’t have native support for an fptrunc instruction. NaN/inf inputs are undefined with these ops, so if something was relying on that undefined output, that could be the source of the problem.

I’m not an arm64 expert, but the tests show that we should codegen ‘frintz’ instructions with this patch. So that would be the place to look for diffs.

We found it: https://bugs.chromium.org/p/chromium/issues/detail?id=831145#c48

It's exactly the "formally allowed by the standard, but extremely likely to break things and surprise people" situation :-) The question is just what the appropriate fix is...

Thu, Apr 12, 7:40 AM

Wed, Apr 11

spatel added a comment to D44909: [DAGCombine] (float)((int) f) --> ftrunc (PR36617).

This broke some large integration test on arm64 in Chromium (https://crbug.com/831145 – doesn't yet have any information other than "something is broken" and that we bisected it to this revision). It sounds like the patch got rewritten during review; does "formally allowed by the standard, but extremely likely to break things and surprise people" still apply? It sounds like this is supposed to be behavior-preserving for values that aren't NaN and Inf, but it does change behavior for those two? (If so, in what way?)

Wed, Apr 11, 7:12 PM
spatel accepted D45538: [InstCombine][NFC]: Add tests: foldSelectICmpAndAnd(): and is commutative.

LGTM. I think you have a good understanding of how to write/check instcombine tests, so feel free to add those as needed without pre-commit review unless you're not sure about something.

Wed, Apr 11, 4:02 PM
spatel added a comment to D45378: [InstCombine] Propagate null values from conditions to other basic blocks.

This is pushing instcombine beyond local simplifications. Propagating values across blocks should probably be handled in CVP (-correlated-propagation).

Maybe, but here it also fits, every info required for this transformation is available here.

Wed, Apr 11, 9:19 AM
spatel committed rL329821: [InstCombine] limit X - (cast(-Y) --> X + cast(Y) with hasOneUse().
[InstCombine] limit X - (cast(-Y) --> X + cast(Y) with hasOneUse()
Wed, Apr 11, 9:00 AM
spatel added a comment to D45453: [InstCombine] Enable Add/Sub simplifications with only 'reassoc' FMF.

A couple of short comments/questions, and one longer one:

... the remaining big blob is FAddCombine() - which is poorly named because as shown in this patch, it's also called from visitFSub()...

Should we rename it to FAddSubCombine() in this work (assuming this patch moves forward, given the other questions here)?

Wed, Apr 11, 7:31 AM

Tue, Apr 10

spatel committed rL329755: [CVP] simplify phi with constant incoming values that match common variable….
[CVP] simplify phi with constant incoming values that match common variable…
Tue, Apr 10, 1:45 PM
spatel closed D45448: [CVP] simplify phi with constant incoming values that match common variable edge values .
Tue, Apr 10, 1:45 PM
spatel added reviewers for D45022: [X86] Mark all byval parameters as aliased: andreadb, wristow.
Tue, Apr 10, 11:48 AM
spatel committed rL329736: [InstSimplify] fix formatting; NFC.
[InstSimplify] fix formatting; NFC
Tue, Apr 10, 11:41 AM
spatel updated the diff for D45448: [CVP] simplify phi with constant incoming values that match common variable edge values .

Patch updated:
Fixed the call to dominates() to use the phi's block, not the phi itself.

Tue, Apr 10, 11:22 AM
spatel committed rL329732: [llvm-mca] reorder text.
[llvm-mca] reorder text
Tue, Apr 10, 11:13 AM
spatel committed rL329729: [llvm-mca] fix formatting.
[llvm-mca] fix formatting
Tue, Apr 10, 10:59 AM
spatel committed rL329726: [llvm-mca] add example workflow for source code.
[llvm-mca] add example workflow for source code
Tue, Apr 10, 10:55 AM
spatel added a comment to D45453: [InstCombine] Enable Add/Sub simplifications with only 'reassoc' FMF.

The tests here demonstrate the overlap between instcombine and reassociate, and based on feedback I've gotten, I'd say that rL170471 was a mistake for instcombine. It created a mini-reassociation pass within instcombine.

Tue, Apr 10, 9:26 AM
spatel added a comment to D45393: [InstCombine] Scalarize binary ops following shuffles..

On the other hand, this transformation can be implemented in a cleaner way with 1) (inside InstCombine) under the same consideration(target checks) with a simple extension. 1) also allows us to reuse the existing implementation. Another benefit of 1) is it will contain all the relevant scalarization code in one place. The only con is it requires additional support for certain targets; X86 at least and may be more.

Tue, Apr 10, 8:58 AM
spatel accepted D45317: Eliminate a bitwise 'not' op of 'not' min/max by inverting the min/max..

LGTM - see inline for a nit.

Tue, Apr 10, 8:36 AM
spatel updated the diff for D45448: [CVP] simplify phi with constant incoming values that match common variable edge values .

Patch updated:

  1. Move the dominance check, so it only happens once. This matches the logic in InstSimplify that handles the simpler case of a phi where all incoming values are identical except for undefs.
  2. Added a debug statistic to count how often this fires.
Tue, Apr 10, 7:44 AM

Mon, Apr 9

spatel added a comment to D45448: [CVP] simplify phi with constant incoming values that match common variable edge values .

Any idea how often this triggers in practice?

Mon, Apr 9, 5:26 PM
spatel accepted D45421: [X86] Emit native IR for pmuldq/pmuludq builtins..

LGTM (assuming the backend gets this right in all cases and we have tests for that).

Mon, Apr 9, 11:38 AM
spatel created D45448: [CVP] simplify phi with constant incoming values that match common variable edge values .
Mon, Apr 9, 11:26 AM
spatel added a comment to D45421: [X86] Emit native IR for pmuldq/pmuludq builtins..

Yes. I'll make the llvm changes before committing this. Just wanted to make sure this direction was ok first.

Mon, Apr 9, 10:24 AM