lebedev.ri (Roman Lebedev)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 27 2012, 6:35 AM (307 w, 6 d)

Recent Activity

Today

lebedev.ri added a comment to D51929: [DAGCombiner] use UADDO to optimize saturated unsigned add.

I'm not following the 'adc' suggestion though. That uses the carry from a previous op in the addition. What would that code sequence look like?

Uhm, after thinking a bit more, i think we'd only care about ADD? I haven't really thought about it though.
But in ADC case, i guess the same as with ADD, but prefixed with CLC -> no point in using ADC in the first place.

Fri, Sep 21, 11:35 AM
lebedev.ri accepted D51929: [DAGCombiner] use UADDO to optimize saturated unsigned add.

Do we somehow enforce that in %r = select i1 %c, i32 -1, i32 %a, -1 is in the middle?
If not, we miss at least one case i think:

That's correct. I think there are also 4 swapped variants for the pattern with a variable and a 'not' op (online version of Alive looks dead? cc'ing @nlopes):
...

My plan is to canonicalize all of the patterns in IR. If I'm seeing it correctly, there shouldn't be anything blocking those canonicalizations because we only try to form the uaddo here when the cmp (setcc) has one use (the select). That should let us get by with just the basic matching here in the backend.

Ok, sounds good. At worst, this can be extended.
I'm not sure why there is no x86 test coverage.
Won't ADC work for this purpose?
ADD evaluates the result for both signed and unsigned integer operands and sets the CF and OF flags to indicate a carry (overflow) in the signed or unsigned result, respectively, too.
Some plumbing missing?

Fri, Sep 21, 10:41 AM
lebedev.ri added a comment to D52324: [ValueTracking] Allow select patterns to work on vectors in more places.

You can always write an unit-test, too.

Fri, Sep 21, 8:03 AM
lebedev.ri added a reviewer for D52360: [clang-tidy] Fix for performance-unnecessary-value-param, performance-unnecessary-copy-initialization and performance-for-range-copy: lebedev.ri.

Better, but these blacklists need to be config options, not random hardcoded globs.
See e.g. google-runtime-references.WhiteListTypes.

Fri, Sep 21, 7:25 AM · Restricted Project
lebedev.ri added inline comments to D52047: [clangd] Add building benchmark and memory consumption tracking.
Fri, Sep 21, 6:42 AM · Restricted Project
lebedev.ri added a comment to D52300: [clangd] Implement VByte PostingList compression.

Also, I'll refine D52047 a little bit and I believe that is should be way easier to understand performance + memory consumption once we have these benchmarks in. Both @ioeric and @ilya-biryukov expressed their concern with regard to the memory consumption "benchmark" and suggested a separate binary. While this seems fine to me, I think it's important to keep performance + memory tracking infrastructure easy to use (in this sense scattering different metrics across multiple binaries makes it less accessible and probably introduce some code duplication) and therefore using this "trick" is OK to me, but I don't have a strong opinion about this. What do you think, @sammccall?

FWIW, I think the "trick" for memory benchmark is fine. I just think we should add proper output to make the trick clear to users, as suggested in the patch comment.

Fri, Sep 21, 5:42 AM · Restricted Project
lebedev.ri added a comment to D45179: [libc++] Add _LIBCPP_ENABLE_NODISCARD and _LIBCPP_NODISCARD_EXT to allow pre-C++2a [[nodiscard]].

I don't think it ever landed. I'll try to get it in this week.

Fri, Sep 21, 4:38 AM
lebedev.ri added a comment to D52348: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) pattern.

I believe i will have to move BZHI selection from tablegen into here, due to https://reviews.llvm.org/D48768#inline-461084,
else we will have test coverage issues, and code duplication.

Fri, Sep 21, 4:35 AM
lebedev.ri added inline comments to D48768: [X86] When have BMI2, prefer shifts to clear low/high bits, rather than variable mask..
Fri, Sep 21, 4:26 AM
lebedev.ri added inline comments to D52347: [X86][BtVer2] Fix latency and resource cycles of AVX 256-bit zero-idioms..
Fri, Sep 21, 3:03 AM
lebedev.ri added a dependency for D52348: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) pattern: D52304: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern.
Fri, Sep 21, 2:56 AM
lebedev.ri added a dependent revision for D52304: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern: D52348: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) pattern.
Fri, Sep 21, 2:56 AM
lebedev.ri created D52348: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) pattern.
Fri, Sep 21, 2:56 AM
lebedev.ri added a comment to D51929: [DAGCombiner] use UADDO to optimize saturated unsigned add.

Do we somehow enforce that in %r = select i1 %c, i32 -1, i32 %a, -1 is in the middle?
If not, we miss at least one case i think:

$ ~/src/alive/alive.py /tmp/test.txt 
----------------------------------------
Optimization: 1
Precondition: true
  %a = add i32 %x, 42
  %c = icmp ugt i32 %x, -43
  %r = select i1 %c, i32 -1, i32 %a
=>
  %c2 = icmp ult i32 %x, -43
  %r = select i1 %c2, i32 %a, i32 -1
Fri, Sep 21, 1:15 AM
lebedev.ri added inline comments to D52294: [InstCombine] Fix incongruous GEP type addrspace .
Fri, Sep 21, 1:01 AM
lebedev.ri added a comment to D52315: [clang-tidy] Fix for performance-unnecessary-value-param, performance-unnecessary-copy-initialization and performance-for-range-copy.

I still wonder what true positives could we get without checking the size? No real-life types come to my mind with size of a pointer but really expensive copy operations. What could be so really expensive when copying a single machine world?

Fri, Sep 21, 12:51 AM · Restricted Project
lebedev.ri added inline comments to D52281: [clang-tidy] Add modernize check to use std::invoke in generic code.
Fri, Sep 21, 12:42 AM · Restricted Project

Yesterday

lebedev.ri retitled D52324: [ValueTracking] Allow select patterns to work on vectors in more places from Allow select patterns to work on vectors in more places to [ValueTracking] Allow select patterns to work on vectors in more places.
Thu, Sep 20, 2:45 PM
lebedev.ri added a comment to D52324: [ValueTracking] Allow select patterns to work on vectors in more places.

I'm somewhat sure this kind of change needs it's own tests, outside of $backend.

Thu, Sep 20, 2:40 PM
lebedev.ri added inline comments to D52248: [SEMA] ignore duplicate declaration specifiers from typeof exprs.
Thu, Sep 20, 1:15 PM
lebedev.ri accepted D52313: [clang-doc] Avoid parsing undefined base classes.

If it no longer crashes, i guess you can test for that?

Thu, Sep 20, 12:24 PM · Restricted Project
lebedev.ri added a reviewer for D52315: [clang-tidy] Fix for performance-unnecessary-value-param, performance-unnecessary-copy-initialization and performance-for-range-copy: lebedev.ri.
Thu, Sep 20, 12:24 PM · Restricted Project
lebedev.ri updated the diff for D52304: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern.

I'm gonna stop here, i think this is the state for the review.

Thu, Sep 20, 12:05 PM
lebedev.ri updated the diff for D52304: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern.

And get rid of a last few unneeded movzbl.

Thu, Sep 20, 11:51 AM
lebedev.ri added a comment to D52315: [clang-tidy] Fix for performance-unnecessary-value-param, performance-unnecessary-copy-initialization and performance-for-range-copy.

This looks questionable to me.
I don't disagree with the reasons stated about llvm types.
But is that *always* true?
I would personally be very surprized, and consider this a false-positive.

Thu, Sep 20, 10:29 AM · Restricted Project
lebedev.ri updated the diff for D52304: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern.

Use ISD::ANY_EXTEND. This gets rid of the extra movzbl of the NBits.
At least that is what D48768 already produced.
If NBits is larger than 64 (in 64-bit case), the shift was already undefined, i think?

Thu, Sep 20, 8:59 AM
lebedev.ri added a comment to D52285: [SelectionDAG] replace duplicated peekThroughBitcast helper functions; NFCI.

Unless there is a need to conditionally use the one-use variant,
two separate functions seem cleaner. But no strong opinion here.

Thu, Sep 20, 8:13 AM
lebedev.ri added inline comments to D52304: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern.
Thu, Sep 20, 8:00 AM
lebedev.ri updated the diff for D52304: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern.

TBM BEXTRI is of no interest to us here.

Thu, Sep 20, 7:47 AM
lebedev.ri added a dependent revision for D48490: [NFC][x86][AArch64] Add BEXTR-like test patterns.: D52304: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern.
Thu, Sep 20, 7:35 AM
lebedev.ri added a dependency for D52304: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern: D48490: [NFC][x86][AArch64] Add BEXTR-like test patterns..
Thu, Sep 20, 7:35 AM
lebedev.ri created D52304: [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern.
Thu, Sep 20, 7:34 AM
lebedev.ri added inline comments to D52293: [X86][BMI] BEXTR: handle (X >> C1) & (C2 << C3) as ((X >> (C1 + C3)) & C2) << C3 (PR38938).
Thu, Sep 20, 5:22 AM
lebedev.ri created D52293: [X86][BMI] BEXTR: handle (X >> C1) & (C2 << C3) as ((X >> (C1 + C3)) & C2) << C3 (PR38938).
Thu, Sep 20, 3:03 AM
lebedev.ri updated the diff for D48490: [NFC][x86][AArch64] Add BEXTR-like test patterns..

One last rebase

Thu, Sep 20, 12:55 AM

Wed, Sep 19

lebedev.ri added inline comments to D52281: [clang-tidy] Add modernize check to use std::invoke in generic code.
Wed, Sep 19, 11:09 PM · Restricted Project
lebedev.ri added a comment to D51216: Fix IRBuilder.CreateFCmp(X, X) misfolding.

Clearly, since this patch is here, the previous choice was wrong (i guess?).
So some more blurb with explanation as to why it is so would be good.
(i'm guessing nan / inf etc?)

First, I'll note that usually when trying to fold a FCmp with constant operands, the operands will be ConstantFP instead of ConstantExpr, so they'll hit the code path at ConstantFold.cpp:1789. This code path is only used when one of the operands is constant, but not a literal floating-point value, or I think this bug would be breaking lots of basic stuff. I couldn't find a way to hit this path with Clang, for example.

If V1 == V2 in evaluateFCmpRelation, you can infer that the value of V1 must be the same as the value of V2.

Wed, Sep 19, 1:23 PM
lebedev.ri added inline comments to D51215: Fix misfolding of IRBuilder.CreateICmp(int_ty X, bitcast (float_ty Y) to int_ty).
Wed, Sep 19, 7:23 AM
lebedev.ri added a comment to D51216: Fix IRBuilder.CreateFCmp(X, X) misfolding.

What i think is missing here, is that

The code incorrectly inferred that the relationship of a constant expression X to itself is FCMP_OEQ (ordered and equal), when it's actually FCMP_UEQ (unordered or equal).

simply states it as the matter of a fact.
Clearly, since this patch is here, the previous choice was wrong (i guess?).
So some more blurb with explanation as to why it is so would be good.
(i'm guessing nan / inf etc?)

Wed, Sep 19, 7:23 AM
lebedev.ri accepted D52263: [InstCombine] Handle vector compares in foldGEPIcmp().

LG with a nit.

Wed, Sep 19, 6:54 AM
lebedev.ri added inline comments to D52263: [InstCombine] Handle vector compares in foldGEPIcmp().
Wed, Sep 19, 6:14 AM
lebedev.ri accepted D52262: [benchmark] Cherrypick fix for MinGW/ARM from upstream.

LG

Wed, Sep 19, 5:46 AM

Tue, Sep 18

lebedev.ri added a comment to D50901: [clang][ubsan] Split Implicit Integer Truncation Sanitizer into unsigned and signed checks.

@vsk thanks for taking a look!

Tue, Sep 18, 1:46 PM · Restricted Project, Restricted Project
lebedev.ri added a comment to D50901: [clang][ubsan] Split Implicit Integer Truncation Sanitizer into unsigned and signed checks.

@vsk thanks for taking a look!

Tue, Sep 18, 1:46 PM · Restricted Project, Restricted Project
lebedev.ri added a comment to D50902: [compiler-rt][ubsan] Split Implicit Integer Truncation Sanitizer into unsigned and signed checks.

@vsk / @filcab - ping.
Assuming the clang part gets accepted, is this OK?

Tue, Sep 18, 12:57 PM · Restricted Project
lebedev.ri added a comment to D50901: [clang][ubsan] Split Implicit Integer Truncation Sanitizer into unsigned and signed checks.

@rsmith - ping.
This one should be rather uncontroversial i think? Is this moving in the direction you suggested? :)

Tue, Sep 18, 12:57 PM · Restricted Project, Restricted Project
lebedev.ri added a comment to D35035: [InstCombine] Prevent memcpy generation for small data size.

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

Agreed, I'll remove that part then. That was only to make other testcases happy by preserving existing behavior. Thanks

Tue, Sep 18, 9:42 AM
lebedev.ri accepted D52181: [benchmark] Lowercase windows specific includes.

LG

Tue, Sep 18, 1:43 AM

Mon, Sep 17

lebedev.ri added a comment to D52146: [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask.

Is this a generalization of D52001 (and if so, can we remove that code as a special-case of the more general pattern)?

No, i don't think so, why?

Like i wrote in the description:

Two folds are happening here:

  1. https://rise4fun.com/Alive/oaFX
  2. And then foldICmpWithHighBitMask() (D52001) fires on newly formed IR: https://rise4fun.com/Alive/wsP4

I.e. this fold actually allows D52001 to happen.

Sorry, I misread what was happening in this patch series.

Great. I was worrying i was missing something obvious here.

Mon, Sep 17, 1:13 PM
lebedev.ri added a reviewer for D52181: [benchmark] Lowercase windows specific includes: lebedev.ri.

Would be great if this could go upstream first, and then backported.

Mon, Sep 17, 11:41 AM
lebedev.ri added a comment to D52146: [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask.

Is this a generalization of D52001 (and if so, can we remove that code as a special-case of the more general pattern)?

Mon, Sep 17, 7:13 AM
lebedev.ri added a comment to D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

One more thought then on this:

If we want to distinguish optimizing for size from optimizing for speed, that belongs in the backend

that means we shouldn't be doing any memcpy lowering at all in InstCombiner::SimplifyAnyMemTransfer. In other words, if I rewrite this patch to actually strip out any memcpy lowering here in InstCombine, would that be an acceptable approach?

Mon, Sep 17, 5:22 AM
lebedev.ri added a comment to D52164: [InstSimplify] Fold ne/eq comparison of a pointer produced by a noalias function.

Are there tests with comparisons with nullptr? (Also mind the nullptr is valid thingy)

Mon, Sep 17, 1:41 AM
lebedev.ri added a comment to D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

Thanks for the feedback and suggestions! Summarising where we are:

  • I think this WIP patch generates the code that we want

we ?
Do keep in mind that there is more than one backend, more than one target architecture.

Mon, Sep 17, 1:22 AM
lebedev.ri added a comment to D50250: [clang][ubsan] Implicit Conversion Sanitizer - integer sign change - clang part.

It might help if you're more specific about whose review you're asking for.

Mon, Sep 17, 12:12 AM · Restricted Project, Restricted Project

Sun, Sep 16

lebedev.ri updated the diff for D48490: [NFC][x86][AArch64] Add BEXTR-like test patterns..

Rebased.

Sun, Sep 16, 11:47 PM
lebedev.ri added a comment to D48491: [X86] Select BEXTR when there is only BMI1..

Can we DAG combine to X86ISD::BEXTR for these cases? Then we can properly check one use of the inner parts of the pattern.

Do we care whether the *entire* old patter goes away, or just one more instruction?
But yes, should work, let's see.

Sun, Sep 16, 11:11 PM
lebedev.ri added a comment to D48491: [X86] Select BEXTR when there is only BMI1..

Just to reiterate, i don't see what else we can do here.
We do need to emit two instructions.
And i do think we should indeed be emitting bextr in these situations.

Sun, Sep 16, 12:07 PM
lebedev.ri added a comment to D48490: [NFC][x86][AArch64] Add BEXTR-like test patterns..

@lebedev.ri What happened with this patch? I can see x86/extract-lowbits.ll in trunk

Sun, Sep 16, 11:41 AM
lebedev.ri added a dependency for D52148: [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) >> y) mask: D52147: [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask.
Sun, Sep 16, 6:07 AM
lebedev.ri added a dependent revision for D52147: [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask: D52148: [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) >> y) mask.
Sun, Sep 16, 6:07 AM
lebedev.ri created D52148: [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) >> y) mask.
Sun, Sep 16, 6:06 AM
lebedev.ri updated the summary of D52146: [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask.
Sun, Sep 16, 4:12 AM
lebedev.ri added a dependent revision for D52146: [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask: D52147: [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask.
Sun, Sep 16, 1:37 AM
lebedev.ri added a dependency for D52147: [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask: D52146: [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask.
Sun, Sep 16, 1:37 AM
lebedev.ri created D52147: [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask.
Sun, Sep 16, 1:37 AM
lebedev.ri updated the summary of D52146: [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask.
Sun, Sep 16, 1:33 AM
lebedev.ri created D52146: [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask.
Sun, Sep 16, 1:27 AM

Sat, Sep 15

lebedev.ri added a reviewer for D52137: Added warning for unary minus used with unsigned type: thakis.
Sat, Sep 15, 11:25 PM
lebedev.ri added a comment to D52137: Added warning for unary minus used with unsigned type.

found some occurrences of this expression on our code base.

Sat, Sep 15, 11:35 AM
lebedev.ri added a comment to D52137: Added warning for unary minus used with unsigned type.

/home/xbolva00/LLVM/build/lib/clang/8.0.0/include/bmiintrin.h:312:16: error: unary minus operator applied to type 'unsigned long long', result value is still unsigned

return __X & -__X;

@RKSimon what do you think? valid? Can we fix that header to return X & -(int)X;?
@craig.topper

Sat, Sep 15, 11:23 AM
lebedev.ri added a comment to D52137: Added warning for unary minus used with unsigned type.

I don't think this makes much sense.

Sat, Sep 15, 11:16 AM
lebedev.ri added inline comments to D52136: [clang-tidy] Add modernize-concat-nested-namespaces check.
Sat, Sep 15, 7:50 AM · Restricted Project
lebedev.ri added inline comments to D52136: [clang-tidy] Add modernize-concat-nested-namespaces check.
Sat, Sep 15, 7:38 AM · Restricted Project
lebedev.ri added a comment to D52135: [Clang-Tidy: modernize] Fix for modernize-redundant-void-arg: complains about variable cast to void.

This looks weird.
It will now completely skip all the explicitly instantiated functions, no?
I'd think the problem that needs to be fixed is that it considers the local variable int a_template; to be in the function argument list.

Sat, Sep 15, 7:28 AM · Restricted Project
lebedev.ri added inline comments to D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.
Sat, Sep 15, 7:19 AM
lebedev.ri added a comment to D52062: [InstCombine] Inefficient pattern for high-bits checking 3 (PR38708).

LGTM

Thank you for the review!

Sat, Sep 15, 5:04 AM
lebedev.ri added a comment to rL342092: [TSan] Update test values.

Hm, so it wasn't a sanitizer buildbot being flaky as usual.
Sorry for the trouble and thanks for the fix :)

Sat, Sep 15, 4:44 AM
lebedev.ri added a comment to D52120: [analyzer] Treat std::{move,forward} as casts in ExprMutationAnalyzer..

Though some of that is still false-positives (pointers, va_arg, callback lambdas passed as templated function param), but i'll file further bugs i guess.

Filed the ones that i can pinpoint, marked them as blockers of PR38890

Sat, Sep 15, 3:10 AM

Fri, Sep 14

lebedev.ri added a dependent revision for D52120: [analyzer] Treat std::{move,forward} as casts in ExprMutationAnalyzer.: D51870: [private][clang-tidy] New bugprone-loop-variable-mutations check.
Fri, Sep 14, 11:55 PM
lebedev.ri added a dependency for D51870: [private][clang-tidy] New bugprone-loop-variable-mutations check: D52120: [analyzer] Treat std::{move,forward} as casts in ExprMutationAnalyzer..
Fri, Sep 14, 11:55 PM · Restricted Project
lebedev.ri added inline comments to D52093: [ToolChains] Linux.cpp: Use the same style across all all multi-arch includes. NFC..
Fri, Sep 14, 11:04 PM · Restricted Project
lebedev.ri accepted D52120: [analyzer] Treat std::{move,forward} as casts in ExprMutationAnalyzer..

Thank you for working on this!

Fri, Sep 14, 2:00 PM
lebedev.ri added a comment to D52120: [analyzer] Treat std::{move,forward} as casts in ExprMutationAnalyzer..

@lebedev.ri could you help test whether this fully resolves PR38891? Thanks!

Fri, Sep 14, 1:34 PM
lebedev.ri added inline comments to D52121: [X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)).
Fri, Sep 14, 1:23 PM
lebedev.ri added inline comments to D52066: [Driver] Fix missing MultiArch include dir on powerpcspe.
Fri, Sep 14, 6:07 AM
lebedev.ri added a comment to D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

Thanks both, those are fair points.

For a bit more context, this is the problem that I am trying to solve:

void foo (char *A, char *B) {
  memcpy(A, B, 8);
}

compiled with -Oz -mno-unaligned-access this results in this disaster:

ldrb	r3, [r1]
ldrb	r4, [r1, #1]
ldrb	r5, [r1, #2]
ldrb	r6, [r1, #3]
ldrb	r1, [r1, #5]
ldrb	lr, [r2, #4]!
ldrb.w	r12, [r2, #2]
ldrb	r2, [r2, #3]
strb	r1, [r0, #5]
strb	r6, [r0, #3]
strb	r5, [r0, #2]
strb	r4, [r0, #1]
strb	r3, [r0]
strb	lr, [r0, #4]!
strb	r2, [r0, #3]
strb.w	r12, [r0, #2]
ldr	r11, [sp], #4

but forgetting about this no-unaligned case, we see that with alignment support the code bloat is already there:

ldr	r2, [r1]
ldr	r1, [r1, #4]
str	r1, [r0, #4]
str	r2, [r0]
bx	lr

So, for the decision making here in InstCombine, which is mostly target independent at the moment, I would like to ignore the whole aligned/unaligned business. And what I want to generate is of course just this:

movs	r2, #8
b	__aeabi_memcpy

Are you sure you shouldn't be fixing this in the backend?

Fri, Sep 14, 4:51 AM
lebedev.ri added a comment to D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

Two questions:

  1. Should this be somehow more parametrized, rather than hardcoding magical Size > 4? Some target info?
  2. Is there some reverse transform, that tries to actually for memcpy from such expanded load+store pairs? Shoukd it e disabled too, to avoid looping?
Fri, Sep 14, 2:58 AM

Thu, Sep 13

lebedev.ri added a dependent revision for D52001: [InstCombine] Inefficient pattern for high-bits checking 2 (PR38708): D52062: [InstCombine] Inefficient pattern for high-bits checking 3 (PR38708).
Thu, Sep 13, 2:43 PM
lebedev.ri added a dependency for D52062: [InstCombine] Inefficient pattern for high-bits checking 3 (PR38708): D52001: [InstCombine] Inefficient pattern for high-bits checking 2 (PR38708).
Thu, Sep 13, 2:43 PM
lebedev.ri created D52062: [InstCombine] Inefficient pattern for high-bits checking 3 (PR38708).
Thu, Sep 13, 2:43 PM
lebedev.ri added a comment to D52054: Add missing MC dependency to llvm-exegesis/CMakeLists.txt.

And a follow-up in rL342182.

Thu, Sep 13, 2:27 PM
lebedev.ri added a comment to D51984: Fixes for `LLVM_LINK_LLVM_DYLIB` && Polly..

@lebedev.ri, I told him he could commit the changes without review because I saw them in this patch, and in my mind, as an experienced developer in this area of LLVM, I saw those changes as obvious and good. That is not in violation of the developer policy, further that discussion is right here in this review, so if you have a problem with it you should point it at me, not @DiamondLovesYou.

Thu, Sep 13, 2:14 PM
lebedev.ri added a comment to D51949: [WIP][clang-tidy] initial ideas to isolate variable declarations.

Yes, do you think it should be included in the diag?

Yes, please :) Else, the message seems a bit too empty.
I don't think it should point (via NOTE:) at the each decl though.

Thu, Sep 13, 1:26 PM · Restricted Project
lebedev.ri added inline comments to D51949: [WIP][clang-tidy] initial ideas to isolate variable declarations.
Thu, Sep 13, 1:09 PM · Restricted Project
lebedev.ri added a comment to D52001: [InstCombine] Inefficient pattern for high-bits checking 2 (PR38708).

I was wondering if it would be easier to read if we swapped everything as a first step, so something like this in the existing code:

// We want X to be the icmp's second operand, so swap if not.
Value *Cmp0 = Cmp.getOperand(0), *X = Cmp.getOperand(1);
ICmpInst::Predicate Pred = Cmp.getPredicate(), NewPred;
if (match(X, m_OneUse(m_Shl(m_One(), m_Value())))) {
  std::swap(Cmp0, X);
  Pred = Cmp.getSwappedPredicate();
}
Value *Y;
if (!match(Cmp0, m_OneUse(m_Shl(m_One(), m_Value(Y)))))
  return nullptr;
if (Pred == ICmpInst::ICMP_ULE) NewPred = ICmpInst::ICMP_NE;
else if (Pred == ICmpInst::ICMP_UGT) NewPred = ICmpInst::ICMP_EQ;
else return nullptr;

I'm not sure.
I think we really should have two separate matches, and two switch()es.
Else i think we may use the wrong predicate..
Also, i *think* i will add one/two patterns to this new matcher (less canonical variants with extra uses),
so specifying the pattern twice seems sub-par.

Ok - just wanted to throw it out as a possibility. I agree that the switch version is less likely to go buggy.

define i1 @p0(i8 %shamt0, i8 %shamt1) {
  %t0 = shl i8 1, %shamt0
  %t1 = shl i8 1, %shamt1
  %r = icmp ugt i8 %t0, %t1
  ret i1 %r
}

With the current code, it gets transformed to:

define i1 @p0(i8 %shamt0, i8 %shamt1) {
  %t1 = shl i8 1, %shamt1
  %t1.highbits = lshr i8 %t1, %shamt0
  %r = icmp eq i8 %t1.highbits, 0
  ret i1 %r
}

It should be reduced:

%t0 = shl i8 1, %shamt0
%t1 = shl i8 1, %shamt1
%r = icmp ugt i8 %t0, %t1
=>
%r = icmp ugt i8 %shamt0, %shamt1

I don't think we need to hold up this patch for that, but maybe it changes the way we want to implement it?

I acknowledge that there is some problem when we have the same/similar pattern on the both sides,
i have thought about it a bit (rL342076), but i don't have anything concrete on that.

Sounds good.

Thu, Sep 13, 1:03 PM
lebedev.ri added a comment to D50858: [M680x0] Add ELF and Triple info.

I think i lost track, was there some consensus on this 'support for m680x0' somewhere on the mailing list?
I don't dis-welcome this, just asking.

Thu, Sep 13, 12:38 PM
lebedev.ri added a comment to D51333: Diagnose likely typos in include statements.

The tests seem to have disappeared form the diff.

Thu, Sep 13, 12:34 PM
lebedev.ri added inline comments to D51550: Use alias analysis to check for real interference in cascade comparison.
Thu, Sep 13, 11:59 AM
lebedev.ri added a comment to D52001: [InstCombine] Inefficient pattern for high-bits checking 2 (PR38708).

I was wondering if it would be easier to read if we swapped everything as a first step, so something like this in the existing code:

// We want X to be the icmp's second operand, so swap if not.
Value *Cmp0 = Cmp.getOperand(0), *X = Cmp.getOperand(1);
ICmpInst::Predicate Pred = Cmp.getPredicate(), NewPred;
if (match(X, m_OneUse(m_Shl(m_One(), m_Value())))) {
  std::swap(Cmp0, X);
  Pred = Cmp.getSwappedPredicate();
}
Value *Y;
if (!match(Cmp0, m_OneUse(m_Shl(m_One(), m_Value(Y)))))
  return nullptr;
if (Pred == ICmpInst::ICMP_ULE) NewPred = ICmpInst::ICMP_NE;
else if (Pred == ICmpInst::ICMP_UGT) NewPred = ICmpInst::ICMP_EQ;
else return nullptr;

I'm not sure.
I think we really should have two separate matches, and two switch()es.
Else i think we may use the wrong predicate..
Also, i *think* i will add one/two patterns to this new matcher (less canonical variants with extra uses),
so specifying the pattern twice seems sub-par.

Thu, Sep 13, 11:52 AM