Page MenuHomePhabricator

lebedev.ri (Roman Lebedev)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 27 2012, 6:35 AM (505 w, 3 d)

Recent Activity

May 3 2022

lebedev.ri added a comment to D123408: [InstCombine] Limit folding of cast into PHI.

The cleanest way I can think of to teach LoopVectorizer about this would be to introduce a whole new set of composite reduction operations of the form <op>-then-<lop> (eg RecurKind::AddThenAnd, RecurKind::MulThenAnd, RecurKind::OrThenAnd, and so on)...and that's just for combining logical and with the known integer reduction ops, so if we want to support e.g. or we'd need to double the number of additional recurrence kinds (and the extra logic that comes with it) again. The identity value would be determined from the <op>, and the <lop> has to be applied when reducing the final vector into a single scalar upon loop exit.

@lebedev.ri is this what you had in mind or is there a better way to do it?

@lebedev.ri Can you please advise if the above described way is how we would implement this within the LoopVectorizer?

May 3 2022, 12:33 PM · Restricted Project, Restricted Project

Apr 28 2022

lebedev.ri requested review of D124646: [SCEV] Support modelling of same base pointer `select`s in more complex than most trivial cases (when there is a base variable offset).
Apr 28 2022, 4:49 PM · Restricted Project, Restricted Project
lebedev.ri requested review of D124645: [SCEV] Model simple same base pointer `select`s via `umin_seq`.
Apr 28 2022, 4:49 PM · Restricted Project, Restricted Project
lebedev.ri committed rG981ed72a17e4: [NFC][SCEV] Refactor `createNodeForSelectViaUMinSeq()` out of… (authored by lebedev.ri).
[NFC][SCEV] Refactor `createNodeForSelectViaUMinSeq()` out of…
Apr 28 2022, 4:38 PM · Restricted Project, Restricted Project
lebedev.ri committed rGfd20eb55f1b6: [NFC][SCEV] Tests with modellable pointer `select`s (authored by lebedev.ri).
[NFC][SCEV] Tests with modellable pointer `select`s
Apr 28 2022, 4:38 PM · Restricted Project, Restricted Project

Apr 27 2022

lebedev.ri committed rGffafa71f6425: [InstCombine] 'round up integer': if bias is just right, just reuse instructions (authored by lebedev.ri).
[InstCombine] 'round up integer': if bias is just right, just reuse instructions
Apr 27 2022, 7:29 AM · Restricted Project, Restricted Project
lebedev.ri committed rGd4563bfeb940: [NFC][InstCombine] Add some tests for open-coded round-up of an integer w/… (authored by lebedev.ri).
[NFC][InstCombine] Add some tests for open-coded round-up of an integer w/…
Apr 27 2022, 7:29 AM · Restricted Project, Restricted Project
lebedev.ri committed rGaac0afd1dd99: [InstCombine] Fold 'round up integer' pattern (when alignment is a power of two) (authored by lebedev.ri).
[InstCombine] Fold 'round up integer' pattern (when alignment is a power of two)
Apr 27 2022, 7:29 AM · Restricted Project, Restricted Project

Apr 22 2022

lebedev.ri added a comment to D123991: [LangRef] Clarify load/store of non-byte-sized types.

As for i1, you have seen enum BooleanContent, correct?

Apr 22 2022, 3:12 AM · Restricted Project, Restricted Project

Apr 21 2022

lebedev.ri added a comment to D124183: [InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1).

One other reason I insist one-use is we can give programmer choice. If we remove one-use, there is no choice for programmer.
https://godbolt.org/z/Wrzh5aczf
https://godbolt.org/z/1nGf88E87

Apr 21 2022, 11:23 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D124183: [InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1).

We may be going in circles here.
Both of these transforms do not require one-use check, neither from correctness perspective,
nor from the profitability perspective (at least, as instcombine views it).
I'm not sure it will be productive if i simply repeat everything i have already stated,
so i'm not sure how to approach the disagreement differently.

Apr 21 2022, 11:12 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D124183: [InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1).

This is not consistent with our general folding rules.

Can you help to explain the general folding rules? I'm a beginner so it will be grateful you can tell me the detail rules.
At least tell me what case can add one-use and what case can't add. There are many one-use in instcombine so it is important for me.

What does "may" mean? We'd trade sequential mul-shl for two independent mul.
Former would be sequential anyway, while two mul may or may not be sequential depending on how bad the microarchitecture is.

I think sequential mul-shr is also very easy to fold to two independent mul in backend but not the other way. If the microarch have enough mul port and don't care too much power they can do this in backend.

Is there an actual performance bugreport?

No performance report. But the general IR change is also hard to get a large set perf data I think.

Does this regress e.g. the following case: https://godbolt.org/z/aWsvzW751 ?
What about the case where the backend isn't even meant to recover, like https://godbolt.org/z/4Tbb3YG3s ?

Thanks for the case you mentioned. Maybe I can add somemore condition to avoid the regression but now I think a rule for one-use is important for me.

Apr 21 2022, 10:37 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D124183: [InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1).

This is not consistent with our general folding rules.
What does "may" mean? We'd trade sequential mul-shl for two independent mul.
Former would be sequential anyway, while two mul may or may not be sequential depending on how bad the microarchitecture is.
Is there an actual performance bugreport?

Apr 21 2022, 9:44 AM · Restricted Project, Restricted Project

Apr 18 2022

lebedev.ri added a comment to D123962: [InstCombine] fold freeze of partial undef/poison vector constants.

Given that we are replacing freeze constant with constant, and selecting the constant based on all the users,
can we instead go over all the users, and perform replacement for each one separately, always picking the best choice?

Apr 18 2022, 2:50 PM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123897: [X86] Unbreak LIT/FileCheck.

I will try to be more specific:

Paragraph 1:

  • This contains a non-technical criticism of someone else's work and provides no useful information about the commit. Please remove.

Paragraph 2:

  • LGTM

Paragraph 3:

  • I don't understand exactly what is being said here, but the words 'destroyed' and 'damaging' seem overly aggressive. Please drop this paragraph too. The 2nd paragraph on its own is a good enough commit message.
Apr 18 2022, 9:44 AM · Restricted Project, Restricted Project
lebedev.ri updated the summary of D123897: [X86] Unbreak LIT/FileCheck.
Apr 18 2022, 9:25 AM · Restricted Project, Restricted Project
lebedev.ri updated the summary of D123897: [X86] Unbreak LIT/FileCheck.
Apr 18 2022, 8:18 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123897: [X86] Unbreak LIT/FileCheck.

@lebedev.ri This commit message is not appropriate. Please change it in before committing to the repo and also update this review with something else.

Apr 18 2022, 7:37 AM · Restricted Project, Restricted Project
lebedev.ri updated the summary of D123897: [X86] Unbreak LIT/FileCheck.
Apr 18 2022, 7:11 AM · Restricted Project, Restricted Project
lebedev.ri added a reviewer for D122963: [X86] Extend the integer parameter if the function isn't local linked: rnk.
Apr 18 2022, 3:23 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D122963: [X86] Extend the integer parameter if the function isn't local linked.

I'm not really seeing consensus, neither here nor in the RFC, i'm afraid.
I suspect the fix will require keeping the existing attributes intact,
introducing new ones that mean what you want, and using them.

Apr 18 2022, 2:29 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123908: [HWAsan] Add hwasan_sys_page_size command option to support 4K/8K/16K/64K page size system.

If this is meant to specify the value for the target system, then certainly,
this is still hardcoding some value, that will still not be always right.
I would guess the solution should be to read it from the IR metadata.

Apr 18 2022, 2:23 AM · Restricted Project, Restricted Project
lebedev.ri removed a reviewer for D123848: [RS4GC] Don't clone BDV if its inputs are not derived: lebedev.ri.
Apr 18 2022, 2:19 AM · Restricted Project, Restricted Project

Apr 17 2022

lebedev.ri added a comment to D123897: [X86] Unbreak LIT/FileCheck.

Thanks everyone, i do believe we now have a rough understanding/consensus.
Does anyone feel like stamping the patch?

Apr 17 2022, 2:41 AM · Restricted Project, Restricted Project

Apr 16 2022

lebedev.ri added a comment to D123901: [LLVM][Casting.h] Update dyn_cast machinery to provide more control over how the casting is performed..

It may be helpful to also show the patch with the intended use-case.

Apr 16 2022, 11:55 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123897: [X86] Unbreak LIT/FileCheck.

Added David back - he had some concerns and a good (I think) suggestion. Could this patch do his “UTC_ARGS: —allow-unused-prefixes” suggestion instead? Seems it wouldn't be any bigger of a change, and I think everybody would have their concerns addressed.

Everybody except me? :)

My understanding is that his proposal will make any test generated with one of the auto-generators automatically enrolled into --allow-unused-prefixes. Wouldn't that address your scenario?

  1. It's a workaround for the case of unchanged default. I'm interested in fixing the default, not workarounding it.

I think there are 2 scenarios, please correct me if I misunderstand them (esp. the first):

  • I use autogen-ing script, and rely on 'unused prefixes', so I want to just write/update my .ll, run the autogenerator, and be done.
  • I write tests manually, and I can make mistakes (like mean to use a prefix but actually mistakenly not), which I would like to have auto-detected

There are 29164 .ll files under llvm/test, out of which 12449 have the word 'autogenerated' in them, so less than half. If the default were flipped, the onus would be on the manually generated ones to flip it on, which is likely to fail us - people forget, etc. The autogenerated ones both appear to be the main users of allowing unused prefixes, and in the position of opting in to their desired behavior, transparent to the user.

This sounds correct to me.

Apr 16 2022, 11:05 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123897: [X86] Unbreak LIT/FileCheck.

Once again, let's cut on the bikeshedding here, please.
This is only affects X86 codegen tests, all of which should be autogenerated in the first place, so the check prefix can not be wrong in the first place.

Apr 16 2022, 10:52 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123897: [X86] Unbreak LIT/FileCheck.

Added David back - he had some concerns and a good (I think) suggestion. Could this patch do his “UTC_ARGS: —allow-unused-prefixes” suggestion instead? Seems it wouldn't be any bigger of a change, and I think everybody would have their concerns addressed.

Everybody except me? :)

My understanding is that his proposal will make any test generated with one of the auto-generators automatically enrolled into --allow-unused-prefixes. Wouldn't that address your scenario?

Apr 16 2022, 10:36 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123897: [X86] Unbreak LIT/FileCheck.

And https://reviews.llvm.org/D94744 used explicit approach.

Use same approach here?

Apr 16 2022, 10:27 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123897: [X86] Unbreak LIT/FileCheck.

Added David back - he had some concerns and a good (I think) suggestion. Could this patch do his “UTC_ARGS: —allow-unused-prefixes” suggestion instead? Seems it wouldn't be any bigger of a change, and I think everybody would have their concerns addressed.

Apr 16 2022, 10:17 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123897: [X86] Unbreak LIT/FileCheck.

You need evidence to back up your statement.

Let's unflip the table. Why do you believe the current default is right, for X86 tests?, for FileCheck in general?
If you like having broken default, that's up to you, but i'm confident that that not many agree with you on this.
I'm only fixing the X86 story that affects me.

Apr 16 2022, 10:06 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123897: [X86] Unbreak LIT/FileCheck.

If a new test needs, it can specify ‘ --allow-unused-prefixes’ explicitly, no? If we apply patch like this one to more and more folders, than original work was a wasted effort.

In these cases - where whole directories are predominantly auto-generated tests - the idea was to use the approach in this patch, for example Transforms/Attributor/lit.local.cfg (where basically all are autogenerated). For CodeGen/X86, more than half of the tests have the word 'autogenerated' in them.

Maybe we should just use the word autogenerated in the header of the .ll file and make FileCheck automatically switch to --allow-unused-prefixes in that case?

I'm only asking other X86 contributors whether they had enough and want to go back to pre---allow-unused-prefixes FileCheck.
As for the global scope of things - I don't know, the damage has already been done. I guess i'd personally recommend starting an RFC to at least flip the default back to the sensible one.

Apr 16 2022, 9:18 AM · Restricted Project, Restricted Project
lebedev.ri removed a reviewer for D123897: [X86] Unbreak LIT/FileCheck: xbolva00.
Apr 16 2022, 9:12 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123897: [X86] Unbreak LIT/FileCheck.

<...>

Hello. I'm not falling for this trap again.

Apr 16 2022, 7:47 AM · Restricted Project, Restricted Project
lebedev.ri updated the diff for D123897: [X86] Unbreak LIT/FileCheck.

Drop python syntax hint.

Apr 16 2022, 7:29 AM · Restricted Project, Restricted Project
lebedev.ri added inline comments to D123897: [X86] Unbreak LIT/FileCheck.
Apr 16 2022, 7:23 AM · Restricted Project, Restricted Project
lebedev.ri requested review of D123897: [X86] Unbreak LIT/FileCheck.
Apr 16 2022, 6:02 AM · Restricted Project, Restricted Project

Apr 15 2022

lebedev.ri committed rGbe5c15c7aee1: [NFC][Costmodel][LV][X86] Refresh one or two interleaved load/store tests (authored by lebedev.ri).
[NFC][Costmodel][LV][X86] Refresh one or two interleaved load/store tests
Apr 15 2022, 7:43 AM · Restricted Project, Restricted Project
lebedev.ri committed rG5865a74755ac: Require asserts in newly added test (authored by lebedev.ri).
Require asserts in newly added test
Apr 15 2022, 5:57 AM · Restricted Project, Restricted Project
lebedev.ri committed rG8fbed6870bb2: [UpdateTestChecks] Prevent rapid onset insanity when forced to write… (authored by lebedev.ri).
[UpdateTestChecks] Prevent rapid onset insanity when forced to write…
Apr 15 2022, 5:44 AM · Restricted Project, Restricted Project
lebedev.ri closed D121133: [UpdateTestChecks] Prevent rapid onset insanity when forced to write LoopVectorize-driven costmodel tests.
Apr 15 2022, 5:43 AM · Restricted Project, Restricted Project
lebedev.ri removed a reviewer for D123846: [RS4GC] Prune inputs of BDV if they are BDV themselves: lebedev.ri.
Apr 15 2022, 4:50 AM · Restricted Project, Restricted Project

Apr 12 2022

lebedev.ri abandoned D123234: [X86] `lowerBuildVectorAsBroadcast()`: with AVX2, allow i64->XMM broadcasts from constant pool.
Apr 12 2022, 12:38 PM · Restricted Project, Restricted Project
lebedev.ri abandoned D123207: [WIP][X86][AVX2] More broadcasts from constant pool.
Apr 12 2022, 12:38 PM · Restricted Project, Restricted Project
lebedev.ri added a comment to D121133: [UpdateTestChecks] Prevent rapid onset insanity when forced to write LoopVectorize-driven costmodel tests.

I take it we don't want this.

Apr 12 2022, 12:38 PM · Restricted Project, Restricted Project
lebedev.ri abandoned D99121: [IR][InstCombine] IntToPtr Produces Typeless Pointer To Byte.
Apr 12 2022, 12:37 PM · Restricted Project, Restricted Project, Restricted Project
lebedev.ri added a comment to D123544: [randstruct] Automatically randomize a structure of function pointers.

Does this not lead to non-deterministic/non-reproducible builds?
I do not understand why this feature must be inflicted onto everyone.

Apr 12 2022, 11:12 AM · Restricted Project, Restricted Project

Apr 10 2022

lebedev.ri added a comment to D122569: [lit] Support %if ... %else syntax for RUN lines.

Please don't forget to also update the llvm/utils/update_*_checks.py.

Apr 10 2022, 2:54 PM · Restricted Project, Restricted Project

Apr 9 2022

lebedev.ri added a comment to D123453: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor.

For example, what about this:
https://alive2.llvm.org/ce/z/aov-GB
https://alive2.llvm.org/ce/z/ox4wAt (no need for exact, only nuw)

Apr 9 2022, 12:59 PM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123453: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor.

I see. Then the proof is wrong.

I'm sorry I really don't know how to proof by alive2. Can you teach me how to proof it?

Apr 9 2022, 12:20 PM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123453: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor.

I see. Then the proof is wrong.

Apr 9 2022, 11:26 AM · Restricted Project, Restricted Project
lebedev.ri added inline comments to D123453: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor.
Apr 9 2022, 11:16 AM · Restricted Project, Restricted Project
lebedev.ri added inline comments to D123453: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor.
Apr 9 2022, 10:50 AM · Restricted Project, Restricted Project
lebedev.ri updated the summary of D123453: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor.
Apr 9 2022, 10:50 AM · Restricted Project, Restricted Project
lebedev.ri updated the summary of D123453: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor.
Apr 9 2022, 10:49 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123453: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor.

Please can you post the link to the general proof?

I'm sorry but can you help to explain what the general proof is?

Apr 9 2022, 10:14 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123453: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor.

Please can you post the link to the general proof?

Apr 9 2022, 9:17 AM · Restricted Project, Restricted Project

Apr 8 2022

lebedev.ri added a comment to D123408: [InstCombine] Limit folding of cast into PHI.

This seems like a vectorizer bug.

The reason it doesn't get vectorized is because loop vectorizer sees a truncating and instruction in the def-use chain of the reduction phi. The IR that vectorizer sees looks like this:

define internal fastcc void @do_one() unnamed_addr #1 {
entry:
  tail call void (i8*, ...) @obfuscate(i8* noundef bitcast ([8192 x i16]* @x to i8*)) #3
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %a.010 = phi i32 [ 0, %entry ], [ %phi.cast, %for.body ]
  %i.09 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
  %arrayidx = getelementptr inbounds [8192 x i16], [8192 x i16]* @x, i64 0, i64 %i.09
  %0 = load i16, i16* %arrayidx, align 2, !tbaa !6
  %conv = zext i16 %0 to i32
  %add = add nuw nsw i32 %a.010, %conv
  %inc = add nuw nsw i64 %i.09, 1
  %phi.cast = and i32 %add, 65535
  %exitcond.not = icmp eq i64 %inc, 8192
  br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !10

for.end:                                          ; preds = %for.body
  %phi.cast.lcssa = phi i32 [ %phi.cast, %for.body ]
  tail call void (i8*, ...) @obfuscate(i8* noundef bitcast ([8192 x i16]* @x to i8*), i32 noundef signext %phi.cast.lcssa) #3
  ret void
}

The %phi.cast = and i32 %add, 65535 instruction is not an ADD so the %a.010 phi doesn't qualify as an add reduction.

LV: Not vectorizing: Found an unidentified PHI %a.010 = phi i32 [ 0, %entry ], [ %phi.cast, %for.body ]
Apr 8 2022, 10:30 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123408: [InstCombine] Limit folding of cast into PHI.

This seems like a vectorizer bug.

Apr 8 2022, 10:19 AM · Restricted Project, Restricted Project

Apr 7 2022

lebedev.ri accepted D123306: [X86] Enable fast variable per-lane shuffle tuning on all Ryzen targets.

Okay.

Apr 7 2022, 6:58 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123234: [X86] `lowerBuildVectorAsBroadcast()`: with AVX2, allow i64->XMM broadcasts from constant pool.

This is why I don't think we want to perform too much of this in the DAG - we quickly get to cases where the decision between broadcast vs vector load of constants can't be easily determined - value tracking, multiple uses, hoisting, lost folds, spilling etc. all get affected.

A while ago I was investigating the use of VPMOVSX/ZX to reduce the size of the constant pool, and hit many of the same problems. And constant rematerialization would be the same if we ever get to that point.

There's probably some minor further tweaks we can do (more hasOneUse checks?), but really we need to think about performing less in the DAG, and more in later passes.

Apr 7 2022, 5:55 AM · Restricted Project, Restricted Project
lebedev.ri accepted D123118: [InstCombine] SimplifyDemandedUseBits - remove lshr node if we only demand known sign bit.
Apr 7 2022, 5:43 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123118: [InstCombine] SimplifyDemandedUseBits - remove lshr node if we only demand known sign bit.

LGTM, thanks.

Apr 7 2022, 5:43 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D122963: [X86] Extend the integer parameter if the function isn't local linked.

Still not okay with it. The fix is to fix whatever is emitting the attribute, not completely ignore it.

I don't quite understand what you mean.
If there is no such attribute, it should not do any extension itself. Regardless of whether this function is only a local linked. The optimization here is only for parameters with zeroext/signext attributes.

Apr 7 2022, 3:11 AM · Restricted Project, Restricted Project
lebedev.ri requested changes to D122963: [X86] Extend the integer parameter if the function isn't local linked.

Still not okay with it. The fix is to fix whatever is emitting the attribute, not completely ignore it.

Apr 7 2022, 2:50 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123273: [utils] Avoid hardcoding metadata ids in update_cc_test_checks.

Can you follow the pattern of how the tests for other update utils are written, by autogenerating the check lines, committing such a file, and comparing that newly-generated checlines don't differ?

Apr 7 2022, 2:48 AM · Restricted Project, Restricted Project, Restricted Project

Apr 6 2022

lebedev.ri updated the diff for D123234: [X86] `lowerBuildVectorAsBroadcast()`: with AVX2, allow i64->XMM broadcasts from constant pool.

Fix check prefixes in a (single) problematic test.

Apr 6 2022, 1:03 PM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123234: [X86] `lowerBuildVectorAsBroadcast()`: with AVX2, allow i64->XMM broadcasts from constant pool.

Hm, so we already have cases where we fail to undo broadcast load into a folded load (https://godbolt.org/z/3jzEd91ca), so i'm still unsure if that is a blocker?

Apr 6 2022, 12:56 PM · Restricted Project, Restricted Project
lebedev.ri accepted D123109: [x86] Improve select lowering for smin(x, 0) & smax(x, 0).

LG, i'm not really sure for which CPU's this improves things,
but i don't think it would regress things,
and avoiding EFLAGS is a win.

Apr 6 2022, 10:43 AM · Restricted Project, Restricted Project
lebedev.ri updated the diff for D123234: [X86] `lowerBuildVectorAsBroadcast()`: with AVX2, allow i64->XMM broadcasts from constant pool.

Add more context.

Apr 6 2022, 9:49 AM · Restricted Project, Restricted Project
lebedev.ri requested review of D123234: [X86] `lowerBuildVectorAsBroadcast()`: with AVX2, allow i64->XMM broadcasts from constant pool.
Apr 6 2022, 9:42 AM · Restricted Project, Restricted Project
lebedev.ri committed rG9be6e7b0f249: [X86] `lowerBuildVectorAsBroadcast()`: with AVX512VL, allow i64->XMM broadcasts… (authored by lebedev.ri).
[X86] `lowerBuildVectorAsBroadcast()`: with AVX512VL, allow i64->XMM broadcasts…
Apr 6 2022, 8:34 AM · Restricted Project, Restricted Project
lebedev.ri closed D123221: [X86] `lowerBuildVectorAsBroadcast()`: with AVX512VL, allow i64 broadcasts from constant pool.
Apr 6 2022, 8:34 AM · Restricted Project, Restricted Project
lebedev.ri added inline comments to D123221: [X86] `lowerBuildVectorAsBroadcast()`: with AVX512VL, allow i64 broadcasts from constant pool.
Apr 6 2022, 8:13 AM · Restricted Project, Restricted Project
lebedev.ri updated the diff for D123221: [X86] `lowerBuildVectorAsBroadcast()`: with AVX512VL, allow i64 broadcasts from constant pool.

Require AVX512VL, not just plain AVX512F.

Apr 6 2022, 8:13 AM · Restricted Project, Restricted Project
lebedev.ri updated the summary of D123109: [x86] Improve select lowering for smin(x, 0) & smax(x, 0).
Apr 6 2022, 7:57 AM · Restricted Project, Restricted Project
lebedev.ri updated the diff for D123221: [X86] `lowerBuildVectorAsBroadcast()`: with AVX512VL, allow i64 broadcasts from constant pool.

Adjust check prefixes in one test to restore check lines.

Apr 6 2022, 7:44 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123221: [X86] `lowerBuildVectorAsBroadcast()`: with AVX512VL, allow i64 broadcasts from constant pool.

There is a number of failures to fold the broadcast load into a folded load.
Is that a blocker? I'm not really sure what is going on with that.

Apr 6 2022, 7:40 AM · Restricted Project, Restricted Project
lebedev.ri requested review of D123221: [X86] `lowerBuildVectorAsBroadcast()`: with AVX512VL, allow i64 broadcasts from constant pool.
Apr 6 2022, 7:35 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D115274: [IR][RFC] Memory region declaration intrinsic.

Ping. Do you plan to commit the change? Any blockage?

I don't know. I think i'm waiting for the dust to settle on the github pr debacle.

Any context on that? I feel like I missed something here...

The TLDR is that i don't feel like contributing to projects that continuously shit on their fellow contributors.
Migration from phab to github pull requests will likely be my tipping point.

Apr 6 2022, 7:15 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D115274: [IR][RFC] Memory region declaration intrinsic.

Ping. Do you plan to commit the change? Any blockage?

Apr 6 2022, 6:34 AM · Restricted Project, Restricted Project
lebedev.ri requested review of D123207: [WIP][X86][AVX2] More broadcasts from constant pool.
Apr 6 2022, 4:59 AM · Restricted Project, Restricted Project
lebedev.ri committed rG34ce9fd864b5: [TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill… (authored by lebedev.ri).
[TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill…
Apr 6 2022, 4:20 AM · Restricted Project, Restricted Project
lebedev.ri closed D123163: [TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits.
Apr 6 2022, 4:19 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123163: [TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits.

LGTM - cheers

Apr 6 2022, 3:50 AM · Restricted Project, Restricted Project
lebedev.ri added inline comments to D123163: [TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits.
Apr 6 2022, 3:45 AM · Restricted Project, Restricted Project
lebedev.ri added inline comments to D123163: [TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits.
Apr 6 2022, 3:32 AM · Restricted Project, Restricted Project
lebedev.ri added inline comments to D123163: [TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits.
Apr 6 2022, 3:29 AM · Restricted Project, Restricted Project
lebedev.ri accepted D122754: [DAG] Allow XOR(X,MIN_SIGNED_VALUE) to perform AddLike folds.

Seems fine to me.

Apr 6 2022, 2:07 AM · Restricted Project, Restricted Project
lebedev.ri added inline comments to D123163: [TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits.
Apr 6 2022, 2:04 AM · Restricted Project, Restricted Project

Apr 5 2022

lebedev.ri updated the summary of D123163: [TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits.
Apr 5 2022, 4:47 PM · Restricted Project, Restricted Project
lebedev.ri requested review of D123163: [TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits.
Apr 5 2022, 4:38 PM · Restricted Project, Restricted Project
lebedev.ri committed rG019e7b7f6ed2: [PartiallyInlineLibCalls] Don't partially inline a musttail libcall. (authored by babrath).
[PartiallyInlineLibCalls] Don't partially inline a musttail libcall.
Apr 5 2022, 12:31 PM · Restricted Project, Restricted Project
lebedev.ri closed D123116: Don't partially inline a musttail libcall..
Apr 5 2022, 12:31 PM · Restricted Project, Restricted Project
lebedev.ri accepted D123116: Don't partially inline a musttail libcall..

LG.
Please specify the "Name <e@ma.il>" to be used for commit.

Apr 5 2022, 7:33 AM · Restricted Project, Restricted Project
lebedev.ri added inline comments to D123116: Don't partially inline a musttail libcall..
Apr 5 2022, 6:58 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D123116: Don't partially inline a musttail libcall..

Test?

Apr 5 2022, 4:49 AM · Restricted Project, Restricted Project

Apr 4 2022

lebedev.ri added a comment to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

There are two ways to view this;

  1. if all of the IV's of PHI are fully identical instructions with fully identical operands, then we don't need to PHI together the operands, and can replace the PHI with said instruction.
  2. The one-user check there is there to ensure that the instruction count does not increase, so in principle, if we need to PHI together the operands, we need as many of the instructions to be one-user as many PHI's we need.

I'm thinking of the following change, so we can remove PHI instruction without introduce any extra instructions on any path.

BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
  %161 = or i64 %24, 1
  ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  %245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

==>

BB1:
   ...
   // New instruction is inserted here
   %245 = or i64 %24, 1
   br %cond, label %BB2, label %BB3

BB2:
   ...
  ...
   br label BBX

BB3:
   ...
  // Use of %245.
  ...
   br label BBX

BBX:
  // our interesting bb
  // PHI instruction is deleted.
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...
Apr 4 2022, 8:35 AM · Restricted Project, Restricted Project

Apr 2 2022

lebedev.ri added a comment to D122909: [InstCombine] limit icmp fold with sub if other sub user is a phi.

It seems like something for a late undo pass to deal with.

Agree, but the comments in https://github.com/llvm/llvm-project/issues/54558 suggested there wasn't a quick fix, and there are already one-use limitations here, so that's why I proposed a fast fix for the regression. Do you object to this change?

Apr 2 2022, 3:26 PM · Restricted Project, Restricted Project
lebedev.ri updated the summary of D122909: [InstCombine] limit icmp fold with sub if other sub user is a phi.
Apr 2 2022, 3:23 PM · Restricted Project, Restricted Project