Page MenuHomePhabricator
Feed Advanced Search

Today

lebedev.ri updated the diff for D104597: [SimplifyCFG] Tail-merging all blocks with `ret` terminator.

Autogenerate more, but still not all, affected codegen tests.

Sun, Jun 20, 4:19 AM · Restricted Project
lebedev.ri committed rGe497b12a6960: [NFC][AArch64][ARM][Thumb][Hexagon] Autogenerate some tests (authored by lebedev.ri).
[NFC][AArch64][ARM][Thumb][Hexagon] Autogenerate some tests
Sun, Jun 20, 4:13 AM
lebedev.ri committed rGb1f55c33d435: [UpdateTestUtils] Print test filename when complaining about conflicting prefix (authored by lebedev.ri).
[UpdateTestUtils] Print test filename when complaining about conflicting prefix
Sun, Jun 20, 4:13 AM
lebedev.ri updated the summary of D104598: [NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging.
Sun, Jun 20, 2:47 AM · Restricted Project
lebedev.ri updated the diff for D104598: [NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging.

Split off addresstaken fixes.

Sun, Jun 20, 2:46 AM · Restricted Project
lebedev.ri committed rGc5b7335dc8eb: [SimplifyCFG] FoldTwoEntryPHINode(): don't fold if either block has it's… (authored by lebedev.ri).
[SimplifyCFG] FoldTwoEntryPHINode(): don't fold if either block has it's…
Sun, Jun 20, 2:38 AM
lebedev.ri committed rGad87761925c2: [SimplifyCFG] HoistThenElseCodeToIf(): don't hoist if either block has it's… (authored by lebedev.ri).
[SimplifyCFG] HoistThenElseCodeToIf(): don't hoist if either block has it's…
Sun, Jun 20, 2:23 AM

Yesterday

lebedev.ri updated the summary of D104597: [SimplifyCFG] Tail-merging all blocks with `ret` terminator.
Sat, Jun 19, 2:28 PM · Restricted Project
lebedev.ri planned changes to D104445: [SimplifyCFGPass] Tail-merging function-terminating blocks.

That being said, let's try this in smaller steps, D104598/D104597 being the first one, handling only the ret.

Sat, Jun 19, 2:28 PM · Restricted Project
lebedev.ri updated the diff for D104597: [SimplifyCFG] Tail-merging all blocks with `ret` terminator.

Split off mostly NFC refactor into a separate change.

Sat, Jun 19, 2:26 PM · Restricted Project
lebedev.ri requested review of D104598: [NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging.
Sat, Jun 19, 2:17 PM · Restricted Project
lebedev.ri added inline comments to D94395: [X86] AMD Zen 3 Scheduler Model.
Sat, Jun 19, 12:07 PM · Restricted Project
lebedev.ri committed rG834aafa55bd1: [NFC] AMD Zen 3: fix typo in a comment (authored by lebedev.ri).
[NFC] AMD Zen 3: fix typo in a comment
Sat, Jun 19, 12:05 PM
lebedev.ri requested review of D104597: [SimplifyCFG] Tail-merging all blocks with `ret` terminator.
Sat, Jun 19, 11:57 AM · Restricted Project
lebedev.ri accepted D104585: [NFC] Add getUnderlyingObjects test.

Thank you for adding the test.
I think you can freely commit this now.

Sat, Jun 19, 3:01 AM · Restricted Project
lebedev.ri added a reviewer for D86669: [ValueTracking] Remove MaxLookup from getUnderlyingObjects: fhahn.

Oh, so wait, we need this for correctness even?
That doesn't look good on the LAA's side.

Sat, Jun 19, 3:00 AM · Restricted Project

Fri, Jun 18

lebedev.ri added a comment to D104445: [SimplifyCFGPass] Tail-merging function-terminating blocks.

I'm asking because i would like to see this change happen, and i don't believe it to be a bad change in general,
but the comment so far seemed to be rather dismissive, and it looks like this might end up following in footsteps
of rG13ec913, where a single platform penalizes/dictates behavior for all other platforms..

Fri, Jun 18, 4:17 PM · Restricted Project
lebedev.ri accepted D104567: [InstCombine] Don't transform code if DoTransform is false.

LG
I think that function is misdesigned, but i can't suggest an easy good alternative.

Fri, Jun 18, 1:04 PM · Restricted Project
lebedev.ri added a comment to D103959: [LoopDeletion] Handle Phis with similar inputs from different block.

This seems fine to me.
I think there may be some room for undef/poison incoming value handling.

Fri, Jun 18, 12:10 PM · Restricted Project
lebedev.ri added a comment to D104500: [clang] Apply P1825 as Defect Report from C++11 up to C++20..

Patch is missing description

Fri, Jun 18, 11:02 AM · Restricted Project
lebedev.ri added a comment to D104445: [SimplifyCFGPass] Tail-merging function-terminating blocks.

Thank you for taking a look.
Thinking about it, this change should be split up into NFC refactor,
removal of "block must be empty" check,
and several patches to enable each terminator opcode.

Fri, Jun 18, 11:01 AM · Restricted Project
lebedev.ri added a comment to D104547: [langref] attempt to clarify semantics of inttoptr/ptrtoint for non-integral types.

Thank you.

Fri, Jun 18, 9:38 AM · Restricted Project
lebedev.ri added reviewers for D104445: [SimplifyCFGPass] Tail-merging function-terminating blocks: rnk, hans.

Oh, i forgot to CC the people who tried this last :D
@rnk @hans - ping. this is tracking towards the very same endgoal as your D29428, but in a more generic way.

Fri, Jun 18, 9:09 AM · Restricted Project
lebedev.ri added a comment to D104496: [GlobalDCE] Support of conditionally used global variables.

Tests missing

Fri, Jun 18, 5:08 AM · Restricted Project
lebedev.ri retitled D104496: [GlobalDCE] Support of conditionally used global variables from Support of conditionally used global variables to [GlobalDCE] Support of conditionally used global variables.
Fri, Jun 18, 5:08 AM · Restricted Project

Thu, Jun 17

lebedev.ri committed rG84eeb82888a0: [NFC][SimpleLoopUnswitch] unswitchTrivialBranch(): add debug output explaining… (authored by lebedev.ri).
[NFC][SimpleLoopUnswitch] unswitchTrivialBranch(): add debug output explaining…
Thu, Jun 17, 2:46 PM
lebedev.ri committed rG69caacc626f7: [X86] AMD Zen 3: don't confuse shift and shuffle, NFC (authored by lebedev.ri).
[X86] AMD Zen 3: don't confuse shift and shuffle, NFC
Thu, Jun 17, 11:08 AM
lebedev.ri committed rG37dfc467ac80: [NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg (authored by lebedev.ri).
[NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg
Thu, Jun 17, 11:08 AM
lebedev.ri added a comment to D104472: [ValueTracking] look through bitcast of vector in computeKnownBits.

Would it please be possible to either enhance comments, or port more comments from the SelectionDAG::computeKnownBits()?
It appears to match that original implementation, and seems correct, but it is hard to read/get through.

Thu, Jun 17, 10:26 AM · Restricted Project
lebedev.ri requested review of D104445: [SimplifyCFGPass] Tail-merging function-terminating blocks.
Thu, Jun 17, 3:07 AM · Restricted Project

Wed, Jun 16

lebedev.ri added a comment to D86669: [ValueTracking] Remove MaxLookup from getUnderlyingObjects.

I think this would benefit from adding the testcase[s] for the pass that uses this utility that showcase the improvements here.

Wed, Jun 16, 2:45 PM · Restricted Project
lebedev.ri added a comment to D104403: [SCEV] Avoid pointer subtraction of non-integral pointers [WIP].

This is only going to work if getMinusSCEV() is called directly though.

Wed, Jun 16, 11:08 AM · Restricted Project
lebedev.ri added a comment to D104322: [SCEV] PtrToInt on non-integral pointers is allowed.

As a drive-by note, it would be great if you could expand LangRef on non-integral pointers a bit. It made sense to me when it specified that you can't use ptrtoint on a non-integral pointer, but without that limitation, it's not really clear to me what the actual difference between a non-integral pointer and a normal one is. What transforms are you not allowed to perform on a non-integral pointer that you can perform on a normal one?

Yes, i would also like to see such documentation, especially if non-integral pointers
are going to be used as an "arbitrary" roadblock for SCEV changes. I would have posted
that in the review for the commit mentioned, but there was none, which also highlights
the problem around non-integral pointer status in llvm :)

On the documentation side, I'd love to, but I'm honestly not sure *how* to. There's two inter-related problems here. The first is the semantics of an inttoptr is highly dependent on the target for non-integral pointers. At the moment, it can basically only be used to implement non-inlined built-in routines in any practical way. The second issue is that our definition of the integral pointer types themselves appear to be in flux, and are very vague about certain key details. The result is I'm left unsure how to formally specify them. This is why I used the implementation defined wording I did.

On the SCEV side, I understand the frustration, but I think you're also mischaracterizing slightly. SCEV has long had the notion of subtracting two pointers which has been "questionable" the whole time as the semantics of subtracting two unrelated pointers is unclear from the underlying IE. (IR doesn't have subtract, but it does have icmp which is more or less the same.) Eli's recent changes - which are making progress btw, even if slow - are the first I've seen to really raise the question if subtract is a primitive which should be representable for pointers in SCEV. (I'll also add that the confusion around whether we could still have sizeless pointers - which I admittedly contributed to - has only recently been cleared up.)

In terms of forward progress, I am willing to accept crippling SCEV for NI pointers provided that all the changes are otherwise well structured and make sense. I'm not thrilled, and I will be a skeptical reviewer, but I won't block changes which are well justified.

Glad that we have established this.

Wed, Jun 16, 9:12 AM · Restricted Project
lebedev.ri added a comment to D104322: [SCEV] PtrToInt on non-integral pointers is allowed.

As a drive-by note, it would be great if you could expand LangRef on non-integral pointers a bit. It made sense to me when it specified that you can't use ptrtoint on a non-integral pointer, but without that limitation, it's not really clear to me what the actual difference between a non-integral pointer and a normal one is. What transforms are you not allowed to perform on a non-integral pointer that you can perform on a normal one?

Wed, Jun 16, 8:29 AM · Restricted Project
lebedev.ri added a comment to D104322: [SCEV] PtrToInt on non-integral pointers is allowed.

Roughly, we can either end up with SCEV effectively being inoperable for non-integral pointers,
or the users of non-integral pointers having issues with lowering the ptrtoint's produced by SCEV.
I don't know what is worse, and i'm presently not sure i care, but i do believe that we can not
leave the current SCEV casual-ness of treating pointers as integers as-is.
I actually had some of the patches @efriedma posted locally, and i agree with them.

Wed, Jun 16, 8:27 AM · Restricted Project
lebedev.ri requested changes to D104363: [llvm] Mark more internal command line optins as cl::Hidden.

https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-mca/llvm-mca.cpp#L310

Wed, Jun 16, 4:56 AM · Restricted Project
lebedev.ri added a comment to D104363: [llvm] Mark more internal command line optins as cl::Hidden.

I don't believe this is correct.
This is either not a bug, r should be dealt with by adding grouping, and hiding non-tool cl::opts.

Wed, Jun 16, 1:36 AM · Restricted Project
lebedev.ri resigned from D100486: [COST]Improve cost model for shuffles in SLP..

No i didn't.
I think while this may be somewhat correct,
is not really correct. For example, before AVX,
there is no sub-32-bit shuffles, only unpacks.

Wed, Jun 16, 12:55 AM · Restricted Project
lebedev.ri added a comment to D104232: [WIP][DAGCombiner] createBuildVecShuffle(): more vector concatenation.

@RKSimon there are two changes here:

  1. InVT2Size * 2 == VTSize && InVT1Size == VTSize -> (VTSize % InVT2Size == 0) && InVT1Size == VTSize
  2. merging two blocks together.
Wed, Jun 16, 12:30 AM · Restricted Project
lebedev.ri committed rG308f6a5245a2: [NFC][X86] lowerVECTOR_SHUFFLE(): drop FIXME about widening to i128 (YMM half)… (authored by lebedev.ri).
[NFC][X86] lowerVECTOR_SHUFFLE(): drop FIXME about widening to i128 (YMM half)…
Wed, Jun 16, 12:25 AM
lebedev.ri committed rGa3113df21994: [SCEV] PtrToInt on non-integral pointers is allowed (authored by lebedev.ri).
[SCEV] PtrToInt on non-integral pointers is allowed
Wed, Jun 16, 12:25 AM
lebedev.ri closed D103818: [NFC][X86] lowerVECTOR_SHUFFLE(): drop FIXME about widening to i128 (YMM half) element type.
Wed, Jun 16, 12:25 AM · Restricted Project
lebedev.ri closed D104322: [SCEV] PtrToInt on non-integral pointers is allowed.
Wed, Jun 16, 12:25 AM · Restricted Project

Tue, Jun 15

lebedev.ri requested review of D104322: [SCEV] PtrToInt on non-integral pointers is allowed.
Tue, Jun 15, 2:29 PM · Restricted Project
lebedev.ri added a comment to D101074: [X86] Canonicalize SGT/UGT compares with constants to use SGE/UGE to reduce the number of EFLAGs reads. (PR48760).

Add SETLT/SETULT handling

Tue, Jun 15, 1:37 PM · Restricted Project
lebedev.ri updated the summary of D101074: [X86] Canonicalize SGT/UGT compares with constants to use SGE/UGE to reduce the number of EFLAGs reads. (PR48760).
Tue, Jun 15, 1:10 PM · Restricted Project
lebedev.ri added a comment to D95789: [SpeculateAroundPHIs] Avoid speculation on loop back edges.

The pass has now been reverted by rGe52364532afb2748c324f360bc1cc12605d314f3.
Abandon this?

Tue, Jun 15, 1:08 PM · Restricted Project
lebedev.ri added a comment to D37467: Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable..

FYI this has now been reverted by rGe52364532afb2748c324f360bc1cc12605d314f3.

Tue, Jun 15, 1:08 PM · Restricted Project
lebedev.ri added a comment to D102107: [OpenMP] Codegen aggregate for outlined function captures.

(This is not offload-specific, right?)
This does not bring any compatibility issues, right?
Does this bring any compatibility issues?

Tue, Jun 15, 10:47 AM · Restricted Project, Restricted Project
lebedev.ri committed rGe52364532afb: [NewPM] Remove SpeculateAroundPHIs pass (authored by lebedev.ri).
[NewPM] Remove SpeculateAroundPHIs pass
Tue, Jun 15, 10:36 AM
lebedev.ri closed D104099: [NewPM] Remove SpeculateAroundPHIs pass.
Tue, Jun 15, 10:36 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D104099: [NewPM] Remove SpeculateAroundPHIs pass.

I see. Going to land this now. Thanks for looking!

Tue, Jun 15, 10:32 AM · Restricted Project, Restricted Project
lebedev.ri added a reviewer for D104099: [NewPM] Remove SpeculateAroundPHIs pass: MatzeB.
Tue, Jun 15, 10:28 AM · Restricted Project, Restricted Project
lebedev.ri updated the diff for D104099: [NewPM] Remove SpeculateAroundPHIs pass.

On Tue, Jun 15, 2021 at 8:14 PM Wei Mi <wmi@google.com> wrote:

On Mon, Jun 14, 2021 at 4:52 PM Wei Mi <wmi@google.com> wrote:

On Mon, Jun 14, 2021 at 4:04 PM Xinliang David Li <davidxl@google.com> wrote:

On Mon, Jun 14, 2021 at 3:59 PM Roman Lebedev via Phabricator <reviews@reviews.llvm.org> wrote:

lebedev.ri added a subscriber: MaskRay.
lebedev.ri added a comment.

In D104099#2815531 https://reviews.llvm.org/D104099#2815531, @wenlei wrote:

In D104099#2814167 https://reviews.llvm.org/D104099#2814167, @davidxl wrote:

Adding Wei to help measure performance impact on our internal workloads.  Also add Wenlei to help measure impact with FB's workloads.

Measured perf using FB internal workload w/ and w/o this pass, result is neutral.

Thank you for checking!

So far, it seems the reaction to this proposal has been overwhelmingly positive.
Does anyone else wish to chime in? Should i land this? @asbirlea @MaskRay ?

Wei is doing more measurement @google. Please wait for the response.

David

Start doing the test. Will report back.

Wei.

No performance change found in google internal benchmarks.

 Wei.

Thanks for checking.
So, so far no one can point out why this pass is beneficial? :]

Tue, Jun 15, 10:28 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

So i'm not sure the rest can be dealt with by SimplifyMultipleUseDemandedBits().

What if the ZERO_EXTEND_VECTOR_INREG case is extended to just bitcast the source if we only need the 0th element and the upper source elements aliasing that 0th element is known to be zero?

Tue, Jun 15, 7:37 AM · Restricted Project
lebedev.ri added a comment to D104193: [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X).

Time to ask for one then?

Tue, Jun 15, 2:58 AM · Restricted Project
lebedev.ri added a comment to D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

Many remaining cases are rotates, with the pattern like:

Combining: t0: ch = EntryToken
Optimized vector-legalized selection DAG: %bb.0 'splatvar_funnnel_v8i32:'
SelectionDAG has 26 nodes:
  t0: ch = EntryToken
  t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %0
          t25: v2i64 = zero_extend_vector_inreg t45
        t26: v4i32 = bitcast t25
      t27: v8i32 = X86ISD::VSHL t2, t26
              t40: v4i32 = BUILD_VECTOR Constant:i32<32>, Constant:i32<32>, Constant:i32<32>, Constant:i32<32>
            t38: v4i32 = sub t40, t45
          t30: v2i64 = zero_extend_vector_inreg t38
        t31: v4i32 = bitcast t30
      t32: v8i32 = X86ISD::VSRL t2, t31
    t21: v8i32 = or t27, t32
  t10: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t21
        t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1
      t6: v8i32 = vector_shuffle<0,0,0,0,0,0,0,0> t4, undef:v8i32
    t43: v4i32 = extract_subvector t6, Constant:i64<0>
    t47: v4i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>
  t45: v4i32 = and t43, t47
  t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:v8i32 $ymm0, t10:1

Let's suppose we start at t27: v8i32 = X86ISD::VSHL t2, t26, and demand the 0'th element of shift amount t26.
I think the problem is that t45 has another use - t38,
for another shift. Likewise, if we start from the other shift.
I'm not sure if this can be solved within the existing demandedelts infra?

Looks like SimplifyMultipleUseDemandedBits() could theoretically deal with it,
but then it would need to learn not only about scalar_to_vector-of-extract_vector_elt,
but also and and vector_shuffle would need to be handled to recourse throught SimplifyMultipleUseDemandedBits().

Tue, Jun 15, 2:57 AM · Restricted Project
lebedev.ri accepted D104193: [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X).

Seems fine to me. Please precommit the tests before committing this.
Thanks.

Tue, Jun 15, 2:17 AM · Restricted Project
lebedev.ri added a comment to D101868: [clang-format] Adds a formatter for aligning arrays of structs.

The test is still intermittently failing on bots.
Is there some non-determinism in the change?
It may be a good idea to revert this.

Tue, Jun 15, 2:14 AM · Restricted Project, Restricted Project
lebedev.ri committed rG88da6c1ead3f: [X86] Schedule-model second (mask) output of GATHER instruction (authored by lebedev.ri).
[X86] Schedule-model second (mask) output of GATHER instruction
Tue, Jun 15, 2:05 AM
lebedev.ri closed D104205: [X86] Schedule-model second (mask) output of GATHER instruction.
Tue, Jun 15, 2:04 AM · Restricted Project
lebedev.ri added a comment to D104205: [X86] Schedule-model second (mask) output of GATHER instruction.

@RKSimon thank you for the review!

Tue, Jun 15, 1:49 AM · Restricted Project

Mon, Jun 14

lebedev.ri updated subscribers of D104099: [NewPM] Remove SpeculateAroundPHIs pass.

Adding Wei to help measure performance impact on our internal workloads. Also add Wenlei to help measure impact with FB's workloads.

Measured perf using FB internal workload w/ and w/o this pass, result is neutral.

Mon, Jun 14, 3:59 PM · Restricted Project, Restricted Project
lebedev.ri added a comment to D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

Many remaining cases are rotates, with the pattern like:

Combining: t0: ch = EntryToken
Optimized vector-legalized selection DAG: %bb.0 'splatvar_funnnel_v8i32:'
SelectionDAG has 26 nodes:
  t0: ch = EntryToken
  t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %0
          t25: v2i64 = zero_extend_vector_inreg t45
        t26: v4i32 = bitcast t25
      t27: v8i32 = X86ISD::VSHL t2, t26
              t40: v4i32 = BUILD_VECTOR Constant:i32<32>, Constant:i32<32>, Constant:i32<32>, Constant:i32<32>
            t38: v4i32 = sub t40, t45
          t30: v2i64 = zero_extend_vector_inreg t38
        t31: v4i32 = bitcast t30
      t32: v8i32 = X86ISD::VSRL t2, t31
    t21: v8i32 = or t27, t32
  t10: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t21
        t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1
      t6: v8i32 = vector_shuffle<0,0,0,0,0,0,0,0> t4, undef:v8i32
    t43: v4i32 = extract_subvector t6, Constant:i64<0>
    t47: v4i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>
  t45: v4i32 = and t43, t47
  t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:v8i32 $ymm0, t10:1

Let's suppose we start at t27: v8i32 = X86ISD::VSHL t2, t26, and demand the 0'th element of shift amount t26.
I think the problem is that t45 has another use - t38,
for another shift. Likewise, if we start from the other shift.
I'm not sure if this can be solved within the existing demandedelts infra?

Mon, Jun 14, 3:44 PM · Restricted Project
lebedev.ri added a comment to D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

Many remaining cases are rotates, with the pattern like:

Combining: t0: ch = EntryToken
Optimized vector-legalized selection DAG: %bb.0 'splatvar_funnnel_v8i32:'
SelectionDAG has 26 nodes:
  t0: ch = EntryToken
  t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %0
          t25: v2i64 = zero_extend_vector_inreg t45
        t26: v4i32 = bitcast t25
      t27: v8i32 = X86ISD::VSHL t2, t26
              t40: v4i32 = BUILD_VECTOR Constant:i32<32>, Constant:i32<32>, Constant:i32<32>, Constant:i32<32>
            t38: v4i32 = sub t40, t45
          t30: v2i64 = zero_extend_vector_inreg t38
        t31: v4i32 = bitcast t30
      t32: v8i32 = X86ISD::VSRL t2, t31
    t21: v8i32 = or t27, t32
  t10: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t21
        t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1
      t6: v8i32 = vector_shuffle<0,0,0,0,0,0,0,0> t4, undef:v8i32
    t43: v4i32 = extract_subvector t6, Constant:i64<0>
    t47: v4i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>
  t45: v4i32 = and t43, t47
  t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:v8i32 $ymm0, t10:1
Mon, Jun 14, 2:45 PM · Restricted Project
lebedev.ri updated the diff for D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

Rebasing now that D104250 landed.
Some changes remain, and some of them still look like they're missing SimplifyDemandedVectorElts() folds.

Mon, Jun 14, 2:09 PM · Restricted Project
lebedev.ri committed rG585e65d3307f: [TLI] SimplifyDemandedVectorElts(): handle SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(? (authored by lebedev.ri).
[TLI] SimplifyDemandedVectorElts(): handle SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(?
Mon, Jun 14, 2:05 PM
lebedev.ri closed D104250: [TLI] SimplifyDemandedVectorElts(): handle SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(?, 0)).
Mon, Jun 14, 2:05 PM · Restricted Project
lebedev.ri added a comment to D104250: [TLI] SimplifyDemandedVectorElts(): handle SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(?, 0)).

LGTM - cheers

Mon, Jun 14, 1:52 PM · Restricted Project
lebedev.ri updated the diff for D104250: [TLI] SimplifyDemandedVectorElts(): handle SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(?, 0)).

Use isNullConstant()

Mon, Jun 14, 1:25 PM · Restricted Project
lebedev.ri added a comment to D104258: [OPENMP]Fix PR50699: capture locals in combine directrives for aligned clause..

Thank you!

Mon, Jun 14, 12:38 PM · Restricted Project
lebedev.ri added inline comments to D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').
Mon, Jun 14, 11:25 AM · Restricted Project
lebedev.ri requested review of D104250: [TLI] SimplifyDemandedVectorElts(): handle SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(?, 0)).
Mon, Jun 14, 11:25 AM · Restricted Project
lebedev.ri requested review of D104232: [WIP][DAGCombiner] createBuildVecShuffle(): more vector concatenation.
Mon, Jun 14, 8:20 AM · Restricted Project
lebedev.ri added inline comments to D104187: [DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size.
Mon, Jun 14, 6:40 AM · Restricted Project
lebedev.ri committed rG0f94c3c80dde: [NFC][DAGCombine] Extract getFirstIndexOf() lambda back into a function (authored by lebedev.ri).
[NFC][DAGCombine] Extract getFirstIndexOf() lambda back into a function
Mon, Jun 14, 6:26 AM
lebedev.ri committed rG6e5628354e22: [DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size (authored by lebedev.ri).
[DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size
Mon, Jun 14, 6:19 AM
lebedev.ri closed D104187: [DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size.
Mon, Jun 14, 6:19 AM · Restricted Project
lebedev.ri added inline comments to D104205: [X86] Schedule-model second (mask) output of GATHER instruction.
Mon, Jun 14, 5:53 AM · Restricted Project
lebedev.ri added a comment to D104187: [DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size.

Thank you for the review!
I should hopefully have a few more follow-ups for createBuildVecShuffle() soon..

Mon, Jun 14, 5:45 AM · Restricted Project
lebedev.ri updated the diff for D104205: [X86] Schedule-model second (mask) output of GATHER instruction.

Fix AVX512 gather too.

Mon, Jun 14, 3:12 AM · Restricted Project
lebedev.ri added a comment to D104205: [X86] Schedule-model second (mask) output of GATHER instruction.

@RKSimon sanity check: i'm not adding shed class to simplify modelling of GATHER instruction scheduling.
I'm only trying to fix a sched model correctness check.

Mon, Jun 14, 2:38 AM · Restricted Project

Sun, Jun 13

lebedev.ri requested review of D104205: [X86] Schedule-model second (mask) output of GATHER instruction.
Sun, Jun 13, 2:20 PM · Restricted Project
lebedev.ri updated the diff for D104187: [DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size.

@RKSimon thank you for taking a look!
Addressing review notes.

Sun, Jun 13, 9:29 AM · Restricted Project
lebedev.ri added a comment to D104193: [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X).

Correction: what i meant to say is, there are two cases:

  1. X == Y, in which case we do not care about use count. this implicitly handles the case where both operands are the same instruction
  2. either operand is one-use

I'd add one more case where one operand has one use and an other more than one. We check this for zext in mul_bools_use2 and for sext in mul_bools_sext_use1 and mul_bools_sext_use2 (they seem to differ only in operand's order, should it be the same for zext?).

Regarding your cases...

I can be wrong, but we do care about use count when X == Y because the way how we optimize depends on it. Look at the example below

declare void @use(i32)
define i32 @src(i1 %x, i1 %y) {
  %zx = zext i1 %x to i32
  %r = mul i32 %zx, %zx
  call void @use(i32 %zx)
  ret i32 %r
}

If there is more than one user (or 2 uses in other words), we cannot replace

%zx = zext i1 %x to i32
%r = mul i32 %zx, %zx

by

%r = and i1 %x, %y
%zy = zext i1 %r to i32

because zx has another user (call of @use(i32)). Otherwise, we need to leave zx as-is. I'm not sure if we should do this (would it be beneficial?). I didn't intend to handle this particular case as a part of that patch.

I'm not sure i follow.
%r = and i1 %x, %x is just %x, is it not?
So you end up with a single zext.

Sun, Jun 13, 9:19 AM · Restricted Project
lebedev.ri added a comment to D104193: [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X).

Correction: what i meant to say is, there are two cases:

  1. X == Y, in which case we do not care about use count. this implicitly handles the case where both operands are the same instruction
  2. either operand is one-use
Sun, Jun 13, 7:15 AM · Restricted Project
lebedev.ri added a comment to D104193: [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X).

There's two cases here

  1. X == Y, where we should produce zext regardless of use count
  2. and this. but there is a number of tests missing - can this not happen when the operands of mul are different instructions?
Sun, Jun 13, 6:56 AM · Restricted Project
lebedev.ri added a comment to D104191: [LVI] Remove recursion from getValueForCondition (NFC).

Nice!

Sun, Jun 13, 6:20 AM · Restricted Project
lebedev.ri added a comment to D104187: [DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size.

A few nits but the premise seems sound

Sun, Jun 13, 6:18 AM · Restricted Project

Sat, Jun 12

lebedev.ri requested review of D104187: [DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size.
Sat, Jun 12, 2:22 PM · Restricted Project
lebedev.ri committed rG2db64e199aa3: [NFC][X86][Codegen] Add shuffle test that would benefit from sorting in… (authored by lebedev.ri).
[NFC][X86][Codegen] Add shuffle test that would benefit from sorting in…
Sat, Jun 12, 2:08 PM
lebedev.ri updated the diff for D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

Add more context to the diff, NFC.

Sat, Jun 12, 4:59 AM · Restricted Project
lebedev.ri updated the diff for D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

Aha, looks like this is salvageable, adding even more restrictions seems to have gotten rid of obvious regressions.

Sat, Jun 12, 4:57 AM · Restricted Project

Fri, Jun 11

lebedev.ri requested review of D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').
Fri, Jun 11, 3:31 PM · Restricted Project
lebedev.ri committed rG0aef747b8465: [NFC][X86][Codegen] Megacommit: mass-regenerate all check lines that were… (authored by lebedev.ri).
[NFC][X86][Codegen] Megacommit: mass-regenerate all check lines that were…
Fri, Jun 11, 1:57 PM
lebedev.ri added a comment to D104099: [NewPM] Remove SpeculateAroundPHIs pass.

Which pass that comes after SpeculateAroundPHIs in the X86 pipeline (either in the optimization or codegen) would undo its effects?

Fri, Jun 11, 10:36 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D104099: [NewPM] Remove SpeculateAroundPHIs pass.

Some backends don't run SimplifyCFG, e.g. X86. I believe the pass was originally created specifically for X86 (the header has some X86 examples) and may or may not extend to other targets (I'm not very familiar with the pass itself).

I'm not opposed to landing this and seeing who complains, but if somebody does, we can make this pass X86-specific by adding it to X86TargetMachine::registerPassBuilderCallbacks() (which doesn't exist yet).

Fri, Jun 11, 10:20 AM · Restricted Project, Restricted Project
lebedev.ri updated the summary of D104099: [NewPM] Remove SpeculateAroundPHIs pass.
Fri, Jun 11, 10:00 AM · Restricted Project, Restricted Project
lebedev.ri abandoned D104107: [NFCI][X86] Drop "atom"/"slm" target tuning "features", derive them from CPU string.

Oh i see. Thanks.

Fri, Jun 11, 9:29 AM · Restricted Project
lebedev.ri updated the summary of D104099: [NewPM] Remove SpeculateAroundPHIs pass.
Fri, Jun 11, 7:39 AM · Restricted Project, Restricted Project