Page MenuHomePhabricator
Feed Advanced Search

May 24 2019

ramred01 added a comment to D61947: [AArch64] Merge globals when optimising for size.

This probably makes sense (the change basically just makes this code match 32-bit ARM), but I'd like to see codesize and performance numbers, since this will substantially change code generation in a lot of cases.

May 24 2019, 8:42 AM · Restricted Project

May 23 2019

ramred01 created D62301: Fold Address Computations into Load/Store instructions for AArch64.
May 23 2019, 3:54 AM

May 15 2019

ramred01 created D61947: [AArch64] Merge globals when optimising for size.
May 15 2019, 7:37 AM · Restricted Project

May 13 2019

ramred01 updated the diff for D59080: Merge of global constants does not happen when constants have common linkage.

Updated the patch with a fix that does not try to merge globals with common linkage as the earlier patch did.

May 13 2019, 7:35 AM

May 2 2019

ramred01 created D61433: -Oz: Reuse constants in registers instead of canonicalizing operations that use a different constant.
May 2 2019, 5:10 AM

Apr 11 2019

ramred01 created D60586: [Clang] Conversion of a switch table to a bitmap is not profitable for -Os and -Oz compilation.
Apr 11 2019, 2:24 PM · Restricted Project
ramred01 created D60584: Conversion of a switch table to a bitmap is not profitable for -Os and -Oz compilation.
Apr 11 2019, 2:15 PM

Apr 5 2019

ramred01 updated the diff for D59926: Constant folding sometimes creates large constants even though a smaller constant is used multiple times.

Updated the patch with checks for optimization level to be either -Os or -Oz.

Apr 5 2019, 7:30 AM

Apr 3 2019

ramred01 created D60209: If conversion to conditional select for -Oz or -Os unprofitable for single operation blocks.
Apr 3 2019, 9:11 AM

Mar 29 2019

ramred01 added a comment to D59926: Constant folding sometimes creates large constants even though a smaller constant is used multiple times.

This sounds like a back-end problem.
Have you considered/tried solving it there, instead of limiting the middle-end?

This folding is happening in the middle-end itself. The LLVM IR already has the folded value.

That is precisely my point. Is that folding not optimal for the IR?

That folding would have been optimal for the IR if the constant were not to be reused. If the constant is reused, then most architectures will do better with materializing one constant in a register and reusing it rather than materializing two constants. Most architectures require two instructions to materialize 32-bit constants. If, we add the additional operation of shift also, that now needs to be done, even with one reuse, it will generate 1 instruction fewer. With more reuses and each reuse folded, that number could increase.

I think you're mixing up canonicalization with optimization (which I've also done a bunch). The goal here in instcombine is to produce canonical code. Ie, create a reduced set of easier-to-optimize forms of the infinite patterns that might have existed in the source. Canonical code often is identical to optimal code, but sometimes they diverge. That's when things get harder (and we might end up reversing an earlier transform). But we have to deal with it in the backend because -- as Roman noted -- any solution at this level would be incomplete simply because the input might already be in the form that you don't want.

I get your point and fully agree with it.

But if a certain canonicalization were to result in a form that always requires reversing that transform at a later stage, won't we be better off not performing that transform in the first place? If a canonicalization were to result in exposing better optimization opportunities across architectures, then that makes more sense, isn't it?

Is it *always*, though?
If i understand correctly, you want to block this transform, right?
https://godbolt.org/z/IvWYKl

we have essentially replaced

%4 = ashr i32 %3, 8
%5 = icmp slt i32 %4, 4096

with

%5 = icmp slt i32 %3, 1048832

thus that cmp does not depend on the ashr, thus we reduced the data dependency chain,
thus icmp can execute without waiting for the ashr.

Which is, as it can be seen from the lowest view there, unsurprisingly improves performance. (assuming i fixed-up asm for mca correctly..)

So i'm not convinced that this hammer approach is the right solution to the problem.

Mar 29 2019, 5:48 AM
ramred01 added a comment to D59926: Constant folding sometimes creates large constants even though a smaller constant is used multiple times.

This sounds like a back-end problem.
Have you considered/tried solving it there, instead of limiting the middle-end?

This folding is happening in the middle-end itself. The LLVM IR already has the folded value.

That is precisely my point. Is that folding not optimal for the IR?

That folding would have been optimal for the IR if the constant were not to be reused. If the constant is reused, then most architectures will do better with materializing one constant in a register and reusing it rather than materializing two constants. Most architectures require two instructions to materialize 32-bit constants. If, we add the additional operation of shift also, that now needs to be done, even with one reuse, it will generate 1 instruction fewer. With more reuses and each reuse folded, that number could increase.

I think you're mixing up canonicalization with optimization (which I've also done a bunch). The goal here in instcombine is to produce canonical code. Ie, create a reduced set of easier-to-optimize forms of the infinite patterns that might have existed in the source. Canonical code often is identical to optimal code, but sometimes they diverge. That's when things get harder (and we might end up reversing an earlier transform). But we have to deal with it in the backend because -- as Roman noted -- any solution at this level would be incomplete simply because the input might already be in the form that you don't want.

Mar 29 2019, 3:17 AM

Mar 28 2019

ramred01 added a comment to D59926: Constant folding sometimes creates large constants even though a smaller constant is used multiple times.

This sounds like a back-end problem.
Have you considered/tried solving it there, instead of limiting the middle-end?

This folding is happening in the middle-end itself. The LLVM IR already has the folded value.

That is precisely my point. Is that folding not optimal for the IR?

Mar 28 2019, 5:31 AM
ramred01 added a comment to D59926: Constant folding sometimes creates large constants even though a smaller constant is used multiple times.

This sounds like a back-end problem.
Have you considered/tried solving it there, instead of limiting the middle-end?

Mar 28 2019, 5:09 AM
ramred01 created D59926: Constant folding sometimes creates large constants even though a smaller constant is used multiple times.
Mar 28 2019, 4:53 AM
ramred01 updated the diff for D59080: Merge of global constants does not happen when constants have common linkage.

Updated the summary.

Mar 28 2019, 3:42 AM
ramred01 updated the diff for D59078: memcpy is not tailcalled.

Updated the summary to properly reflect the issue and the solution.

Mar 28 2019, 3:00 AM

Mar 7 2019

ramred01 created D59080: Merge of global constants does not happen when constants have common linkage.
Mar 7 2019, 1:56 AM
ramred01 created D59078: memcpy is not tailcalled.
Mar 7 2019, 1:45 AM