Page MenuHomePhabricator

lebedev.ri (Roman Lebedev)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 27 2012, 6:35 AM (475 w, 4 d)

Recent Activity

Today

lebedev.ri added inline comments to D115392: [SLP] Don't vectorize div/rem with undef denominators.
Wed, Dec 8, 1:30 PM · Restricted Project
lebedev.ri requested changes to D115392: [SLP] Don't vectorize div/rem with undef denominators.
Wed, Dec 8, 1:30 PM · Restricted Project
lebedev.ri added inline comments to D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').
Wed, Dec 8, 12:03 PM · Restricted Project
lebedev.ri updated the diff for D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

@RKSimon sorry for so much back-and-forth.

Wed, Dec 8, 12:02 PM · Restricted Project
lebedev.ri added a comment to D115342: [benchmarks] Unbreak third-party/benchmark build on Solaris.

Can you please submit this upstream first?

Wed, Dec 8, 6:49 AM · Restricted Project
lebedev.ri added a comment to D114272: [InstCombine] Add two optimizations for mul-and-icmp patterns.

@fwolff reverse ping?
This doesn't need any big fundamental changes, just a little rebasing.

Wed, Dec 8, 5:48 AM · Restricted Project
lebedev.ri removed a reviewer for D115323: [MetaRenamer] Add command line options to disable renaming specified prefixes: lebedev.ri.
Wed, Dec 8, 5:47 AM · Restricted Project
lebedev.ri updated the diff for D115274: [IR][RFC] Memory region declaration intrinsic.

Misc wording improvements.

Wed, Dec 8, 4:29 AM · Restricted Project
lebedev.ri added a comment to D114832: [SROA] Improve SROA to prevent generating redundant coalescing operations..

The bad code pattern generated is probably not general enough to be useful to introduce a cleanup pass or to enhance existing pass to do so -- it will probably just shift the complexity from one pass to another. Fixing this at the source (SROA) is reasonable (unlike other canonicalization pass, this code pattern from SROA actually makes IR much worse)

Wed, Dec 8, 3:08 AM · Restricted Project
lebedev.ri added inline comments to D115274: [IR][RFC] Memory region declaration intrinsic.
Wed, Dec 8, 2:57 AM · Restricted Project

Yesterday

lebedev.ri updated the diff for D115274: [IR][RFC] Memory region declaration intrinsic.

Some more blurb.

Tue, Dec 7, 3:50 PM · Restricted Project
lebedev.ri updated the diff for D115274: [IR][RFC] Memory region declaration intrinsic.

Drop nocapture as per @efriedma / https://alive2.llvm.org/ce/z/EmvTq9.

Tue, Dec 7, 2:42 PM · Restricted Project
lebedev.ri added a comment to D115261: [LV] Disable runtime unrolling for vectorized loops..

Even if both of the unrollers are right as per their model
(LU duplicates whole loop body; while LU duplicates each instruction,
increasing live ranges, i believe), i'm mainly just worried
that two unroll strategies disagree in the end.

Tue, Dec 7, 12:41 PM · Restricted Project
lebedev.ri accepted D115173: [InstCombine] try to fold rem with constant dividend and select-of-constants divisor.

Could you please add this test: https://godbolt.org/z/WvYxnejnc (it should not optimize to unreachable/undef/poison)

Tue, Dec 7, 11:44 AM · Restricted Project
lebedev.ri accepted D114962: [Support] improve known bits analysis for multiply by power-of-2 (1 set bit).

LGTM, thank you.

Tue, Dec 7, 11:26 AM · Restricted Project
lebedev.ri added a comment to D115274: [IR][RFC] Memory region declaration intrinsic.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154249.html

Tue, Dec 7, 11:25 AM · Restricted Project
lebedev.ri updated the summary of D115274: [IR][RFC] Memory region declaration intrinsic.
Tue, Dec 7, 11:25 AM · Restricted Project
lebedev.ri requested review of D115274: [IR][RFC] Memory region declaration intrinsic.
Tue, Dec 7, 11:23 AM · Restricted Project
lebedev.ri added a comment to D114832: [SROA] Improve SROA to prevent generating redundant coalescing operations..

Can the issues instead be viewed as missed transformations that should be implemented instead of trying to prevent creating those "bad" patterns?

Tue, Dec 7, 10:22 AM · Restricted Project
lebedev.ri added a comment to D114962: [Support] improve known bits analysis for multiply by power-of-2 (1 set bit).

Is there an exhaustive test for this method?

Tue, Dec 7, 10:16 AM · Restricted Project
lebedev.ri added a comment to D115179: [NFC] Clarify comment about LoopDeletionPass in the optimization pipeline.

FTR i will be rather surprised if that alone would fully alleviate the need for this late loopdelete.

Changing the EarlyCSEPass in PassBuilder::buildFunctionSimplificationPipeline() to GVNPass does make llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll pass if we delete the -O1 RUN line

Tue, Dec 7, 9:59 AM · Restricted Project
lebedev.ri added a comment to D115261: [LV] Disable runtime unrolling for vectorized loops..

Compile time is misleading.
What about run time impact?

Tue, Dec 7, 9:49 AM · Restricted Project

Mon, Dec 6

lebedev.ri updated the diff for D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

And back to restrictive profitability check.

Mon, Dec 6, 1:28 PM · Restricted Project
lebedev.ri accepted D115179: [NFC] Clarify comment about LoopDeletionPass in the optimization pipeline.

FTR i will be rather surprised if that alone would fully alleviate the need for this late loopdelete.

Mon, Dec 6, 1:10 PM · Restricted Project
lebedev.ri added inline comments to D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').
Mon, Dec 6, 1:04 PM · Restricted Project
lebedev.ri added inline comments to D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').
Mon, Dec 6, 10:29 AM · Restricted Project
lebedev.ri updated the diff for D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

Relaxed the profitability check - i'm not sure this is better, and that is why it was there.

Mon, Dec 6, 10:29 AM · Restricted Project
lebedev.ri updated the diff for D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

Forego of the undef constant folding, rebased.

Mon, Dec 6, 7:25 AM · Restricted Project

Sun, Dec 5

lebedev.ri added a comment to D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

ping

Sun, Dec 5, 11:23 PM · Restricted Project

Fri, Dec 3

lebedev.ri added a comment to D115052: [Passes] Only run extra vector passes if loops have been vectorized..

It would be good to mimic what the module equivalence check does
(when no changes were reported, the module should not have been changed),
and e.g. when in EXPENSIVE_CHECKS mode, if we weren't going to run those
extra vectorization passes because we predicted they wouldn't do anything,
to still run them, and check that they actually didn't do anything.

Fri, Dec 3, 12:37 PM · Restricted Project
lebedev.ri planned changes to D114988: [IR] `GetElementPtrInst`: per-index `inrange` support.

Ok, i guess i have a rather more general and radical, invasive solution, let's see if that works out first.

Fri, Dec 3, 8:53 AM · Restricted Project
lebedev.ri retitled D114996: [InstSimplify] Add logic `or` fold from Add logic `or` fold to [InstSimplify] Add logic `or` fold .
Fri, Dec 3, 8:46 AM · Restricted Project
lebedev.ri added a comment to D115016: [CostModel][X86] Add i64 mul cost for avx512 as 1cy.

which is not true for most recent cpus.

It seems Zen1's TP is still 2 in some cases. See Agner Fog's table and https://uops.info/table.html?search=mul%2064&cb_lat=on&cb_tp=on&cb_uops=on&cb_ports=on&cb_SKL=on&cb_ZENp=on&cb_ZEN2=on&cb_measurements=on&cb_doc=on&cb_base=on

Ack, this is the tragedy of these cost tables, they have to represent the worst-case, even if it is bogusly higher than the average.

Fri, Dec 3, 12:27 AM · Restricted Project

Thu, Dec 2

lebedev.ri added a comment to D114988: [IR] `GetElementPtrInst`: per-index `inrange` support.

I think you are missing the whole point there. It is explicitly NOT the point of this patch
to be able encode that some index must take values in range of [x, y). So if that is your proposal,
while it may be interesting, it's explicitly inferior, and does not solve the motivational case.

Right, sorry. I thought you wanted to solve the same issue discussed in the llvm-dev thread, but I see that you're more interested in SROA than AA here.

Yup.

Thu, Dec 2, 2:40 PM · Restricted Project
lebedev.ri added a reviewer for D114988: [IR] `GetElementPtrInst`: per-index `inrange` support: dblaikie.
Thu, Dec 2, 2:00 PM · Restricted Project
lebedev.ri requested review of D114988: [IR] `GetElementPtrInst`: per-index `inrange` support.

While I support the general goal of exposing GEP offset restrictions to IR,
I am quite strongly opposed to the implementation approach of extending inrange.
The core issue is that this is strongly tied to LLVM struct types and structural GEP indexing.
This will be a blow to opaque pointer usefulness and future offset canonicalization for GEPs.

While i'm certainly sympathetic to the opaque pointer future,
i'd also like to remind that they are just a tool.

Thu, Dec 2, 1:17 PM · Restricted Project
lebedev.ri requested review of D114988: [IR] `GetElementPtrInst`: per-index `inrange` support.
Thu, Dec 2, 12:24 PM · Restricted Project
lebedev.ri added inline comments to D114951: [Analysis][AArch64] Add on the address computational cost for gathers/scatters.
Thu, Dec 2, 6:13 AM · Restricted Project
lebedev.ri added inline comments to D114589: [DAG] Enable ISD::EXTRACT_ELEMENT SimplifyDemandedBits handling.
Thu, Dec 2, 12:26 AM · Restricted Project

Wed, Dec 1

lebedev.ri updated the diff for D114779: [LV][X86] Sink `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` further into TTI, disable for X86/AVX2+.

Hide the problem by defaulting TargetTransformInfoImplBase::useEmulatedMaskMemRefHack() to true.
If no triple is specifically requested, that is the TTI that is used.
This highlights that those tests are somewhat of a lie,
and raises questions about the implementation of those features.

Wed, Dec 1, 1:45 PM · Restricted Project
lebedev.ri updated subscribers of D114779: [LV][X86] Sink `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` further into TTI, disable for X86/AVX2+.

Ok, so i'm looking at those optsize.ll/tripcount.ll tests, and i'm not sure what exactly they are testing.
They don't specify the triple/attributes, so what costs do they expect to get?

Wed, Dec 1, 1:16 PM · Restricted Project
lebedev.ri resigned from D113179: [Passes] Move AggressiveInstCombine after InstCombine.
Wed, Dec 1, 10:37 AM · Restricted Project, Restricted Project
lebedev.ri added a comment to D113520: [SROA] Spill alloca's around non-capturing escapes via calls to allow alloca partitioning/promotion.

@lebedev.ri Your suggested approach makes sense to me at a basic level. Forming the alloca is starting to seem more and more like a hack, but we can come back to implementing partial mem2reg as a follow on code improvement.

Wed, Dec 1, 10:30 AM · Restricted Project
lebedev.ri added a comment to D113520: [SROA] Spill alloca's around non-capturing escapes via calls to allow alloca partitioning/promotion.

I think the obvious solution is:

  1. Keep all stores. (in terms of this patch, iterate over all slices and duplicate all store instructions to also store into the cloned alloca)
  2. Keep all loads (in terms of this patch, iterate over all slices and before the load from the original alloca, load from the cloned alloca and store into the original alloca) UNLESS we can omit particular load because we can tell that there was no taint (may-write calls) on every path from the every previous store.
Wed, Dec 1, 9:44 AM · Restricted Project
lebedev.ri accepted D114882: [PatternMatch] create and use matcher for 'not' that excludes undef elements.

LG

Wed, Dec 1, 8:54 AM · Restricted Project
lebedev.ri added inline comments to D114882: [PatternMatch] create and use matcher for 'not' that excludes undef elements.
Wed, Dec 1, 8:40 AM · Restricted Project
lebedev.ri added inline comments to D113289: LICM: Hoist LOAD without STORE.
Wed, Dec 1, 6:52 AM · Restricted Project
lebedev.ri added a reviewer for D114676: [DAG][PowerPC] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling: qiucf.

Not really familiar with PPC, but the TargetLowering.cpp part makes sense to me.

Wed, Dec 1, 3:03 AM · Restricted Project
lebedev.ri retitled D114676: [DAG][PowerPC] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling from [DAG] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling to [DAG][PowerPC] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling.
Wed, Dec 1, 3:01 AM · Restricted Project
lebedev.ri added a comment to D114766: If constrained intrinsic is replaced, remove its side effect.

This is starting to sound like the isnan threads all over again. I don't envision making much progress with this approach...

Wed, Dec 1, 12:59 AM · Restricted Project
lebedev.ri added a comment to D114832: [SROA] Improve SROA to prevent generating redundant coalescing operations..

That's a lot of complexity. It sounds like we are missing some cleanup transforms in some other passes?
Can you please post a standalone example of the "bad" codegen (via a godbolt link) that you are trying to avoid?

Wed, Dec 1, 12:27 AM · Restricted Project

Tue, Nov 30

lebedev.ri added a comment to D113520: [SROA] Spill alloca's around non-capturing escapes via calls to allow alloca partitioning/promotion.

Spent a decent amount of time looking at this, but haven't yet fully formed a recommendation. I'm going to summarize my findings to date in the hopes this is useful.

Thank you for taking a look!

Tue, Nov 30, 1:19 PM · Restricted Project
lebedev.ri accepted D113371: [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT.

LGTM, not sure if someone else should take a look.
Thank you.

Tue, Nov 30, 10:12 AM · Restricted Project
lebedev.ri added a comment to D113371: [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT.

Hm, i'm not sure i'm following, why is the truncation free if the dropped bits are zeros / why it won't be free otherwise?

Its not - IsTruncateFree is only true if we've extended both inputs, in which case truncating the inputs should be free, and we're better off doing that so we can perform the MULH on a smaller vector.

Tue, Nov 30, 9:44 AM · Restricted Project
lebedev.ri added a comment to D114766: If constrained intrinsic is replaced, remove its side effect.

Moving this logic to wouldInstructionBeTriviallyDead() does not seem right, because the decision about removing side effect comes from constant evaluation and not from the state of call object.

I'm sympathetic to this argument. If a transformation pass determines that a constrained intrinsic can be removed then it would be helpful if that pass could note that fact. This doesn't help us with analysis passes, though. And I'm a little worried about having to explicitly mark constrained intrinsics as removable since I don't know that we do that for any other instruction. If constrained intrinsics are the only time we do this then it seems error prone. Maybe I'm wrong?

The ability to remove side effect is vital for the implementation of floating point arithmetic. There are several possible optimization techniques, which depend on this possibility. For example, if FP operations in a function have attributes "round.dynamic" and "fpexcept.ignore" and we know that rounding mode is not changed in this function, we could remove side effect of all the operations, which can give performance of non-constrained operations. As we cannot mix ordinary and constrained operation in the same function, the only way is to control side effect.

Tue, Nov 30, 8:22 AM · Restricted Project
lebedev.ri added a comment to D113371: [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT.

Hm, i'm not sure i'm following, why is the truncation free if the dropped bits are zeros / why it won't be free otherwise?

Tue, Nov 30, 8:14 AM · Restricted Project
lebedev.ri updated the diff for D104156: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask').

Now that the costmodel patch spam has winded down,
rebase&ping.

Tue, Nov 30, 2:04 AM · Restricted Project
lebedev.ri requested review of D114779: [LV][X86] Sink `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` further into TTI, disable for X86/AVX2+.
Tue, Nov 30, 1:11 AM · Restricted Project
lebedev.ri abandoned D111220: [X86][LV][TTI][Costmodel] LoopVectorizer: don't use `TTI::isLegalMaskedGather()` hook, introduce `TTI::shouldUseMaskedGatherForVectorization()`.

D111460 landed.

Tue, Nov 30, 1:08 AM · Restricted Project
lebedev.ri abandoned D111363: [VectorCombine] Scalarize vector GEP if that isn't more costly.

D111460 landed.

Tue, Nov 30, 1:08 AM · Restricted Project

Mon, Nov 29

lebedev.ri committed rG8cd782487fe6: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()` (authored by lebedev.ri).
[X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`
Mon, Nov 29, 11:48 PM
lebedev.ri closed D111460: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`.
Mon, Nov 29, 11:48 PM · Restricted Project
lebedev.ri accepted D114729: [InstCombine] try to fold 'or' into 'mul' operand.

LG, thank you.

Mon, Nov 29, 12:41 PM · Restricted Project
lebedev.ri added inline comments to D114729: [InstCombine] try to fold 'or' into 'mul' operand.
Mon, Nov 29, 12:20 PM · Restricted Project
lebedev.ri added inline comments to D114729: [InstCombine] try to fold 'or' into 'mul' operand.
Mon, Nov 29, 11:30 AM · Restricted Project
lebedev.ri added a comment to D114704: [llvm][Inline] Add FunctionSimplificationPipeline to module inliner pipeline.

Tests?
Presumably this is default-off?

Mon, Nov 29, 11:18 AM · Restricted Project
lebedev.ri added a comment to D111460: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`.

I can't predict if the new shuffle patterns will be better or worse on each affected platform, least of all looking at IR, not ASM.

But if those changes have positive changes in code generation (translating to better benchmark numbers), then this looks good to me.

Mon, Nov 29, 10:53 AM · Restricted Project
lebedev.ri updated the diff for D111460: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`.

Rebased again.

Mon, Nov 29, 7:43 AM · Restricted Project
lebedev.ri committed rG7e73c2a66a8b: [X86][Costmodel] `getInterleavedMemoryOpCostAVX512()`: masked load can not be… (authored by lebedev.ri).
[X86][Costmodel] `getInterleavedMemoryOpCostAVX512()`: masked load can not be…
Mon, Nov 29, 7:37 AM
lebedev.ri closed D114697: [X86][Costmodel] `getInterleavedMemoryOpCostAVX512()`: masked load can not be folded into a shuffle.
Mon, Nov 29, 7:37 AM · Restricted Project
lebedev.ri added a comment to D114697: [X86][Costmodel] `getInterleavedMemoryOpCostAVX512()`: masked load can not be folded into a shuffle.

LGTM - at least we know the code path is touched :)

Mon, Nov 29, 7:30 AM · Restricted Project
lebedev.ri updated the diff for D114697: [X86][Costmodel] `getInterleavedMemoryOpCostAVX512()`: masked load can not be folded into a shuffle.

How easy is it to add test coverage for this?

Mon, Nov 29, 5:49 AM · Restricted Project
lebedev.ri committed rG5e96553608a1: [NFC][X86][LV][Costmodel] Add most basic test for masked interleaved load (authored by lebedev.ri).
[NFC][X86][LV][Costmodel] Add most basic test for masked interleaved load
Mon, Nov 29, 5:48 AM
lebedev.ri updated the diff for D111460: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`.

Rebased onto D114697.

Mon, Nov 29, 4:16 AM · Restricted Project
lebedev.ri requested review of D114697: [X86][Costmodel] `getInterleavedMemoryOpCostAVX512()`: masked load can not be folded into a shuffle.
Mon, Nov 29, 4:15 AM · Restricted Project
lebedev.ri added inline comments to D114316: [X86][Costmodel] Now that `getReplicationShuffleCost()` is good, update `getInterleavedMemoryOpCostAVX512()`.
Mon, Nov 29, 4:04 AM · Restricted Project
lebedev.ri updated the diff for D111460: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`.

Okay!
It took some time, but the costmodel for AVX512's masked interleaved has been completed.
I believe, this addressed all the review notes here.
Does anyone have any further closing remarks?
If not, i will be landing this shortly.

Mon, Nov 29, 3:56 AM · Restricted Project
lebedev.ri committed rGcffe3a084f87: [X86][Costmodel] Now that `getReplicationShuffleCost()` is good, update… (authored by lebedev.ri).
[X86][Costmodel] Now that `getReplicationShuffleCost()` is good, update…
Mon, Nov 29, 3:45 AM
lebedev.ri closed D114316: [X86][Costmodel] Now that `getReplicationShuffleCost()` is good, update `getInterleavedMemoryOpCostAVX512()`.
Mon, Nov 29, 3:45 AM · Restricted Project
lebedev.ri added a comment to D114316: [X86][Costmodel] Now that `getReplicationShuffleCost()` is good, update `getInterleavedMemoryOpCostAVX512()`.

LGTM

Mon, Nov 29, 3:41 AM · Restricted Project
lebedev.ri added a comment to D114316: [X86][Costmodel] Now that `getReplicationShuffleCost()` is good, update `getInterleavedMemoryOpCostAVX512()`.

ping

Mon, Nov 29, 1:35 AM · Restricted Project
lebedev.ri added a comment to D113520: [SROA] Spill alloca's around non-capturing escapes via calls to allow alloca partitioning/promotion.

ping

Mon, Nov 29, 1:35 AM · Restricted Project

Sun, Nov 28

lebedev.ri added a comment to D114650: [SCEV] Construct SCEV iteratively (WIP)..

To ask the obvious question: Can we get away with simply adding a getSCEV recursion limit instead? Is there reason to believe that a sensible recursion cutoff (similar to the many SCEV already has) would adversely affect analysis quality in practice? I'd rather avoid the additional complexity if we can.

Sun, Nov 28, 1:33 PM · Restricted Project
lebedev.ri added a comment to D114386: [InstCombine] use decomposeBitTestICmp to make icmp (trunc X), C more consistent.

LGTM, sorry for the delay, i like this.

Thanks for the suggestion and review! Let's see if this causes any fallout...

Sun, Nov 28, 7:37 AM · Restricted Project

Fri, Nov 26

lebedev.ri added a comment to D114650: [SCEV] Construct SCEV iteratively (WIP)..

Thanks for looking into this!
I previously looked into this: https://github.com/LebedevRI/llvm-project/commit/6b006aa21caf018aa0f280828899d510274c8444
... but as it is apparent, never posted the diff. Nonetheless, it may be interesting to look at.

Fri, Nov 26, 10:25 AM · Restricted Project
lebedev.ri added inline comments to D110292: Use a deterministic order when updating the DominatorTree.
Fri, Nov 26, 7:16 AM · Restricted Project
lebedev.ri accepted D113497: [IPSCCP] Support unfeasible default dests for switch..

If SCCP runs again, will it then again create a new unreachable block
for the default that is already pointing at the unreachable block?
Should it not do such redundant work?

Fri, Nov 26, 5:32 AM · Restricted Project
lebedev.ri accepted D114386: [InstCombine] use decomposeBitTestICmp to make icmp (trunc X), C more consistent.

LGTM, sorry for the delay, i like this.

Fri, Nov 26, 3:03 AM · Restricted Project
lebedev.ri added a comment to D114272: [InstCombine] Add two optimizations for mul-and-icmp patterns.

Thank you for looking into this.

Fri, Nov 26, 2:42 AM · Restricted Project
lebedev.ri updated the summary of D114272: [InstCombine] Add two optimizations for mul-and-icmp patterns.
Fri, Nov 26, 2:38 AM · Restricted Project
lebedev.ri resigned from D114579: [clang-tidy] Exempt _MSVC_EXECUTION_CHARACTER_SET from cppcoreguidelines-macro-usage.
Fri, Nov 26, 2:03 AM · Restricted Project
lebedev.ri resigned from D114361: [MachineCSE] Add an option to enable global CSE.
Fri, Nov 26, 2:01 AM · Restricted Project

Thu, Nov 25

lebedev.ri added a comment to D113497: [IPSCCP] Support unfeasible default dests for switch..

Not really familiar with this pass, but the transform sounds legal, although i would have guessed you'd want to instead point it to an new block with unreachable?

Thu, Nov 25, 9:12 AM · Restricted Project

Wed, Nov 24

lebedev.ri added a comment to D114533: LLVM IR should allow bitcast between address spaces with the same size..

Patch description should include this avoids a need to introduce ptrtoint/inttoptr pairs

Wed, Nov 24, 7:06 AM · Restricted Project, Restricted Project
lebedev.ri updated the diff for D114316: [X86][Costmodel] Now that `getReplicationShuffleCost()` is good, update `getInterleavedMemoryOpCostAVX512()`.

Rebased, NFC.
As far as i currently know, this is the last prerequisite for D114316.

Wed, Nov 24, 6:45 AM · Restricted Project
lebedev.ri committed rGcd8d21953691: [X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to… (authored by lebedev.ri).
[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to…
Wed, Nov 24, 6:41 AM
lebedev.ri closed D114315: [X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 32 bit when have AVX512DQ.
Wed, Nov 24, 6:41 AM · Restricted Project
lebedev.ri added a comment to D114533: LLVM IR should allow bitcast between address spaces with the same size..

Is there an RFC? This quite seems like asking for problems.

Wed, Nov 24, 6:01 AM · Restricted Project, Restricted Project
lebedev.ri added inline comments to D114357: [CodeGen] Change getAnyExtOrTrunc to use SIGN_EXTEND for some constants.
Wed, Nov 24, 2:01 AM · Restricted Project
lebedev.ri added a comment to D114315: [X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 32 bit when have AVX512DQ.

Poke, i think these two patches (this&D114316) are the last bits needed for D111460.

Wed, Nov 24, 1:32 AM · Restricted Project