Page MenuHomePhabricator

zatrazz (Adhemerval Zanella)
User

Projects

User does not belong to any projects.

User Details

User Since
Mar 10 2015, 12:58 PM (218 w, 6 d)

Recent Activity

Today

zatrazz updated the diff for D62018: [AArch64] Handle ISD::LRINT and ISD::LLRINT.

Updated patch based on previous comments. Add a pattern to handle i32 return and a testcase for Windows.

Tue, May 21, 12:01 PM · Restricted Project
zatrazz added inline comments to D62018: [AArch64] Handle ISD::LRINT and ISD::LLRINT.
Tue, May 21, 11:58 AM · Restricted Project
zatrazz added a comment to D62018: [AArch64] Handle ISD::LRINT and ISD::LLRINT.

Ping now that both D62017 and D62019 have been approved.

Tue, May 21, 5:23 AM · Restricted Project

Yesterday

zatrazz updated the diff for D62019: [clang] Handle lrint/llrint builtins.

Updated patch based on D62026.

Mon, May 20, 8:15 AM · Restricted Project
zatrazz updated the diff for D62017: [CodeGen] Add lrint/llrint builtins.

Updated patch based on D62026.

Mon, May 20, 8:15 AM · Restricted Project
zatrazz added a comment to D62017: [CodeGen] Add lrint/llrint builtins.

Can we do this with 2 intrinsics with overloaded result types as I've done for lround/llround in D62026?

Mon, May 20, 5:55 AM · Restricted Project
zatrazz accepted D62026: [Intrinsics] Merge lround.i32 and lround.i64 into a single intrinsic with overloaded result type. Make result type for llround overloaded instead of fixing to i64.

LGTM, thanks for the follow-up patch.

Mon, May 20, 5:53 AM · Restricted Project

Thu, May 16

zatrazz created D62019: [clang] Handle lrint/llrint builtins.
Thu, May 16, 11:28 AM · Restricted Project
zatrazz created D62018: [AArch64] Handle ISD::LRINT and ISD::LLRINT.
Thu, May 16, 11:26 AM · Restricted Project
zatrazz created D62017: [CodeGen] Add lrint/llrint builtins.
Thu, May 16, 11:26 AM · Restricted Project
zatrazz committed rG0d9dcd7bf01f: [clang] Handle lround/llround builtins (authored by zatrazz).
[clang] Handle lround/llround builtins
Thu, May 16, 6:43 AM
zatrazz closed D61392: [clang] Handle lround/llround builtins.
Thu, May 16, 6:41 AM · Restricted Project
zatrazz added inline comments to D61391: [AArc64] Handle ISD::LROUND and ISD::LLROUND.
Thu, May 16, 6:29 AM · Restricted Project
zatrazz committed rG2d28db6b9f40: [AArch64] Handle ISD::LROUND and ISD::LLROUND (authored by zatrazz).
[AArch64] Handle ISD::LROUND and ISD::LLROUND
Thu, May 16, 6:28 AM
zatrazz closed D61391: [AArc64] Handle ISD::LROUND and ISD::LLROUND.
Thu, May 16, 6:27 AM · Restricted Project
zatrazz closed D61390: [CodeGen] Add lround/llround builtins.
Thu, May 16, 6:24 AM · Restricted Project
zatrazz committed rG73643b5041bb: [CodeGen] Add lround/llround builtins (authored by zatrazz).
[CodeGen] Add lround/llround builtins
Thu, May 16, 6:15 AM

Tue, May 14

zatrazz added a comment to D61391: [AArc64] Handle ISD::LROUND and ISD::LLROUND.

So should I handle f16 in a different patch or should I also adapt it on this one?

Tue, May 14, 10:36 AM · Restricted Project

Mon, May 13

zatrazz added inline comments to D61391: [AArc64] Handle ISD::LROUND and ISD::LLROUND.
Mon, May 13, 10:16 AM · Restricted Project
zatrazz added a comment to D61391: [AArc64] Handle ISD::LROUND and ISD::LLROUND.

Ping again now that D61390 has been approved.

Mon, May 13, 6:02 AM · Restricted Project

Fri, May 10

zatrazz added a comment to D61390: [CodeGen] Add lround/llround builtins.

Ping.

Fri, May 10, 4:32 AM · Restricted Project

Wed, May 8

zatrazz updated the diff for D61390: [CodeGen] Add lround/llround builtins.

Updated patch based on the previous comment. The main change is the IBM long double tests which also required both SoftenFloatOp_LROUND and SoftenFloatOp_LLROUND implementation to handle the type correctly.

Wed, May 8, 2:30 PM · Restricted Project
zatrazz added a comment to D61390: [CodeGen] Add lround/llround builtins.

Should we have a PowerPC test for ppcf128?

Wed, May 8, 12:26 PM · Restricted Project
zatrazz updated the diff for D61390: [CodeGen] Add lround/llround builtins.

Updated patch based on previous review.

Wed, May 8, 11:39 AM · Restricted Project

Tue, May 7

zatrazz updated the diff for D61390: [CodeGen] Add lround/llround builtins.

Updated patch based on previous comments. The changes are:

Tue, May 7, 2:37 PM · Restricted Project
zatrazz added a comment to D61390: [CodeGen] Add lround/llround builtins.

Can you test lround.i32 with -mtriple=i686-unknown with and without -mattr=sse2.

Use utils/update_llc_test_checks.py to generate the checks for at least X86. Try it on the other targets too if it works for them.

Tue, May 7, 2:34 PM · Restricted Project

Mon, May 6

zatrazz updated the diff for D61390: [CodeGen] Add lround/llround builtins.

Updated patch based on the previous comment. The main change are:

Mon, May 6, 11:50 AM · Restricted Project
zatrazz updated the diff for D61391: [AArc64] Handle ISD::LROUND and ISD::LLROUND.

Updated patch based on D61390 changes. The main change is it allows
index both lround/llround by returned type.

Mon, May 6, 11:50 AM · Restricted Project

Fri, May 3

zatrazz updated the diff for D61391: [AArc64] Handle ISD::LROUND and ISD::LLROUND.

Updated patch based on the previous comment. The main change adapt the testcases for D61390
update and handle lroundl/llroundl correctly.

Fri, May 3, 12:45 PM · Restricted Project
zatrazz updated the diff for D61390: [CodeGen] Add lround/llround builtins.

In fact, it turned out that indexing the new lround/llround based on
input argument didn't really make it correctly handled as an expanded
operation. I will dig into exactly why backend is not selecting correctly
based on input argument, so I change it back to previous indexing by
return type.

Fri, May 3, 12:43 PM · Restricted Project
zatrazz updated the diff for D61390: [CodeGen] Add lround/llround builtins.

Updated patch based on previous comments. The changes from the previous version are:

Fri, May 3, 8:49 AM · Restricted Project

Wed, May 1

zatrazz created D61391: [AArc64] Handle ISD::LROUND and ISD::LLROUND.
Wed, May 1, 11:25 AM · Restricted Project
zatrazz created D61392: [clang] Handle lround/llround builtins.
Wed, May 1, 11:25 AM · Restricted Project
zatrazz created D61390: [CodeGen] Add lround/llround builtins.
Wed, May 1, 11:20 AM · Restricted Project

Wed, Apr 24

zatrazz committed rG91cee68e1f0f: [fuzzer] Fix reload.test on Linux/aarch64 (authored by zatrazz).
[fuzzer] Fix reload.test on Linux/aarch64
Wed, Apr 24, 12:01 PM
zatrazz closed D61066: [fuzzer] Fix reload.test on Linux/aarch64.
Wed, Apr 24, 12:01 PM · Restricted Project, Restricted Project
zatrazz created D61066: [fuzzer] Fix reload.test on Linux/aarch64.
Wed, Apr 24, 7:08 AM · Restricted Project, Restricted Project

Mar 18 2019

zatrazz committed rG270249de2bb3: [AArch64] Small fix for getIntImmCost (authored by zatrazz).
[AArch64] Small fix for getIntImmCost
Mar 18 2019, 11:52 AM
zatrazz closed D58461: [AArch64] Small fix for getIntImmCost.
Mar 18 2019, 11:52 AM · Restricted Project
zatrazz committed rGa3cefa5d6492: [AArch64] Optimize floating point materialization (authored by zatrazz).
[AArch64] Optimize floating point materialization
Mar 18 2019, 11:47 AM
zatrazz closed D58460: [AArch64] Optimize floating point materialization.
Mar 18 2019, 11:46 AM · Restricted Project
zatrazz committed rG664c1ef52849: [TargetLowering] Add code size information on isFPImmLegal. NFC (authored by zatrazz).
[TargetLowering] Add code size information on isFPImmLegal. NFC
Mar 18 2019, 11:41 AM
zatrazz closed D58690: [AArch64] Add code size information on isFPImmLegal.
Mar 18 2019, 11:40 AM · Restricted Project
zatrazz committed rG8a595b1d2edf: [AArch64] Refactor floating point materialization. NFC (authored by zatrazz).
[AArch64] Refactor floating point materialization. NFC
Mar 18 2019, 11:24 AM
zatrazz closed D58915: [AArch64] Refactor floating point materialization. NFC.
Mar 18 2019, 11:24 AM · Restricted Project

Mar 15 2019

zatrazz updated the diff for D58915: [AArch64] Refactor floating point materialization. NFC.

Updated patch based on previous comments.

Mar 15 2019, 6:07 AM · Restricted Project
zatrazz added inline comments to D58915: [AArch64] Refactor floating point materialization. NFC.
Mar 15 2019, 6:07 AM · Restricted Project

Mar 13 2019

zatrazz added a comment to D58915: [AArch64] Refactor floating point materialization. NFC.

Ping, only this refactor is missing review for my aarch64 fp materialization optimization.

Mar 13 2019, 12:52 PM · Restricted Project

Mar 11 2019

zatrazz added inline comments to D58690: [AArch64] Add code size information on isFPImmLegal.
Mar 11 2019, 1:42 PM · Restricted Project
zatrazz added a comment to D58915: [AArch64] Refactor floating point materialization. NFC.

Ping.

Mar 11 2019, 4:20 AM · Restricted Project
zatrazz added a comment to D58690: [AArch64] Add code size information on isFPImmLegal.

Ping.

Mar 11 2019, 4:20 AM · Restricted Project

Mar 7 2019

zatrazz updated the diff for D58690: [AArch64] Add code size information on isFPImmLegal.

Fxied CamelCase (sorry missing the comment).

Mar 7 2019, 10:36 AM · Restricted Project

Mar 6 2019

zatrazz updated the diff for D58461: [AArch64] Small fix for getIntImmCost.

I added a testcase based on test/CodeGen/ARM/immcost.ll.

Mar 6 2019, 6:11 AM · Restricted Project

Mar 5 2019

zatrazz updated the diff for D58915: [AArch64] Refactor floating point materialization. NFC.

Updated patch based on previous comments.

Mar 5 2019, 10:03 AM · Restricted Project

Mar 4 2019

zatrazz updated the diff for D58461: [AArch64] Small fix for getIntImmCost.

Updated patch based on previous comments. It depends on https://reviews.llvm.org/D58915. I am trying to come up with a testcase to actually test it. The the main difference for the cost analysis is just for some immediate that contains some 16-bit zero or one chunk (which will have the size adjusted from 3 to 2). Since they are not TCC_Free, TCC_Basic, or TCC_Expensive the change does not actually interfere in the further cost analysis.

Mar 4 2019, 11:45 AM · Restricted Project
zatrazz updated the diff for D58460: [AArch64] Optimize floating point materialization.

Updated patch based on previous comments. It depends on https://reviews.llvm.org/D58915 and https://reviews.llvm.org/D58690

Mar 4 2019, 11:37 AM · Restricted Project
zatrazz updated the diff for D58690: [AArch64] Add code size information on isFPImmLegal.

Updated patch based on previous comments.

Mar 4 2019, 11:37 AM · Restricted Project
zatrazz created D58915: [AArch64] Refactor floating point materialization. NFC.
Mar 4 2019, 11:37 AM · Restricted Project

Feb 27 2019

zatrazz added a comment to D58460: [AArch64] Optimize floating point materialization.

Not sure I like the duplicated logic in getExpandImmCost; it doesn't have good test coverage, and it could fall out of sync in the future. That said, I've been considering refactoring the code in expandMOVImm anyway, to split the actual instruction emission away from the logic that figures out the appropriate sequence. Basically, the idea would be that instead of returning a number from getExpandImmCost, you return an abstraction of the instruction sequence: an array that contains, for each instruction, the appropriate opcode and immediate. isFPImmLegal just uses the number of elements in the array, while expandMOVImm actually emits instructions based on the array. I think this would shrink the code overall because the logic for building instructions is currently duplicated multiple times. (I was considering it more in the context of adding more possible sequences, but it works here as well.)

Feb 27 2019, 10:20 AM · Restricted Project

Feb 26 2019

zatrazz created D58690: [AArch64] Add code size information on isFPImmLegal.
Feb 26 2019, 11:11 AM · Restricted Project
zatrazz added inline comments to D58460: [AArch64] Optimize floating point materialization.
Feb 26 2019, 10:52 AM · Restricted Project
zatrazz updated the diff for D58460: [AArch64] Optimize floating point materialization.

New revision based on previous comments. I refactored the logic used on isFPImmLegal to evaluate whether to materialize the FP constant or not by adding a new function on common aarch64 code, AArch64_AM::getExpandImmCost. To avoid code replication I refactored the code by moving some definitions from AArch64ExpandPseudoInsts.cpp.

Feb 26 2019, 10:52 AM · Restricted Project

Feb 20 2019

zatrazz created D58461: [AArch64] Small fix for getIntImmCost.
Feb 20 2019, 10:07 AM · Restricted Project
zatrazz created D58460: [AArch64] Optimize floating point materialization.
Feb 20 2019, 10:05 AM · Restricted Project

Jan 31 2019

zatrazz updated the diff for D57044: [AArch64] OOptimize floating point materialization.

Updated patch based on previous comment:

Jan 31 2019, 9:48 AM · Restricted Project

Jan 29 2019

zatrazz added a comment to D57044: [AArch64] OOptimize floating point materialization.

Ping.

Jan 29 2019, 3:14 AM · Restricted Project

Jan 25 2019

zatrazz updated the diff for D57044: [AArch64] OOptimize floating point materialization.

Patch updated based on previous comments. The isAnyMOVWMovAlias
can catch more cases where we can materialize the floating-point constant
than isLogicalImmediate (128 .0 for instance). The positive zero still need to
be handled as specific case because the resulting fmov will use the zero
register instead of an immediate. The isAnyMOVWMovAlias path is not use
for fp16 (I will need to check further if is safe for all cases).

Jan 25 2019, 4:55 AM · Restricted Project

Jan 24 2019

zatrazz added inline comments to D57044: [AArch64] OOptimize floating point materialization.
Jan 24 2019, 5:51 AM · Restricted Project

Jan 23 2019

zatrazz updated the diff for D57044: [AArch64] OOptimize floating point materialization.

Updated patch from previous comments. I added an extra check using AArch64_AM::isLogicalImmediate for constants that can be materialized as the immediate operand of a logical instruction. We still need to handle the cases for some 8-bit immediates on fmov using getFPXXImm and positive 0.0.

Jan 23 2019, 12:37 PM · Restricted Project
zatrazz added a comment to D57044: [AArch64] OOptimize floating point materialization.

Is there some reason we're checking for specific floating-point constants here, as opposed to just calling AArch64_AM::isLogicalImmediate or AArch64_AM::isAnyMOVWMovAlias?

Jan 23 2019, 10:05 AM · Restricted Project
zatrazz added inline comments to D57044: [AArch64] OOptimize floating point materialization.
Jan 23 2019, 9:54 AM · Restricted Project
zatrazz updated the diff for D57044: [AArch64] OOptimize floating point materialization.

Updated patch with additional tests. I added tests for __builtin_isinf expanded builtin for float, double, and long double (the latter still requires loading the constant).

Jan 23 2019, 8:56 AM · Restricted Project

Jan 22 2019

zatrazz created D57044: [AArch64] OOptimize floating point materialization.
Jan 22 2019, 3:12 AM · Restricted Project
zatrazz abandoned D46283: [AArch64] Set vectorizer-maximize-bandwidth as default true.
Jan 22 2019, 3:09 AM

Jan 11 2019

zatrazz closed D55427: [libcxx] Call __count_bool_true for bitset count.
Jan 11 2019, 9:35 AM
zatrazz added inline comments to D55427: [libcxx] Call __count_bool_true for bitset count.
Jan 11 2019, 2:40 AM

Dec 19 2018

zatrazz added a comment to D55427: [libcxx] Call __count_bool_true for bitset count.

Ping.

Dec 19 2018, 2:15 PM

Dec 17 2018

zatrazz added a comment to D55427: [libcxx] Call __count_bool_true for bitset count.

Getting a bit late in this discussion, as we had an internal one just recently.

The change to remove always_inline in a number of libc++ template functions is a good one, especially when the inliner can guess and does a good job already.

In this case, however, because the type is a reference, the inliner would require a lot more effort to inspect the uses (and side-effects).

Improving the inliner here would be a huge hammer, probably increasing compile time for all codes for the minimal benefit of this very special case.

Then perhaps, it would be beneficial and pragmatic, to revert that removal in this special case.

The issue I have to define it per symbol is the hackery it would require to handle _LIBCPP_INTERNAL_LINKAGE and its implications,
or at least add *another* macro to inline some symbols depending of the configuration/etc.

Dec 17 2018, 3:58 AM

Dec 14 2018

zatrazz added a comment to D55427: [libcxx] Call __count_bool_true for bitset count.

Getting a bit late in this discussion, as we had an internal one just recently.

The change to remove always_inline in a number of libc++ template functions is a good one, especially when the inliner can guess and does a good job already.

In this case, however, because the type is a reference, the inliner would require a lot more effort to inspect the uses (and side-effects).

Improving the inliner here would be a huge hammer, probably increasing compile time for all codes for the minimal benefit of this very special case.

Then perhaps, it would be beneficial and pragmatic, to revert that removal in this special case.

Dec 14 2018, 8:20 AM

Dec 13 2018

zatrazz added a comment to D55427: [libcxx] Call __count_bool_true for bitset count.

This patch aims to help clang with better information so it can inline
__bit_reference count function usage for both std::biset. Current clang
inliner can not infer that the passed typed will be used only to select
the optimized variant, it evaluates the type argument and type check as
a load plus compare (although later optimization phases correctly
optimized this out).

I'm unclear on the magnitude of the improvement here.
Are we talking a single load + compare instruction in the call to std::count ?
Or something inside the loop?

[ I'm pretty sure that the patch is correct now - but I don't understand how important it is ]

Dec 13 2018, 10:23 AM
zatrazz added a comment to D55427: [libcxx] Call __count_bool_true for bitset count.

Ping.

Dec 13 2018, 3:34 AM

Dec 10 2018

zatrazz updated the diff for D55427: [libcxx] Call __count_bool_true for bitset count.

This patch aims to help clang with better information so it can inline
__bit_reference count function usage for both std::biset. Current clang
inliner can not infer that the passed typed will be used only to select
the optimized variant, it evaluates the type argument and type check as
a load plus compare (although later optimization phases correctly
optimized this out).

Dec 10 2018, 5:29 AM
zatrazz added a comment to D55427: [libcxx] Call __count_bool_true for bitset count.

This looks like a behavior change to me.
The old code calls __count_bool_true if the _Tp can be static_cast to bool, and __count_bool_false otherwise.
The new code calls __count_bool_true if the _Tp is exactly bool, and __count_bool_false otherwise.

Sorry; the old code calls __count_bool_true if the static_cast<bool>(value) is true, and __count_bool_false otherwise.

Dec 10 2018, 5:28 AM

Dec 7 2018

zatrazz created D55427: [libcxx] Call __count_bool_true for bitset count.
Dec 7 2018, 5:08 AM

Jun 27 2018

zatrazz closed D48332: [AArch64] Add custom lowering for v4i8 trunc store.
Jun 27 2018, 7:04 AM

Jun 26 2018

zatrazz added a comment to D48332: [AArch64] Add custom lowering for v4i8 trunc store.

Ping?

Jun 26 2018, 11:00 AM

Jun 22 2018

zatrazz updated the diff for D48332: [AArch64] Add custom lowering for v4i8 trunc store.

Update patch from previous comments.

Jun 22 2018, 11:43 AM
zatrazz updated the diff for D48332: [AArch64] Add custom lowering for v4i8 trunc store.

Updated patch based on previous comments. Indeed, changing AArch64TTIImpl::getMemoryOpCost for both load and store was wrong, since <4 x i8> loads are still scalarized. I have changed to just adjust cost for stores.

Jun 22 2018, 5:25 AM

Jun 21 2018

zatrazz added inline comments to D48332: [AArch64] Add custom lowering for v4i8 trunc store.
Jun 21 2018, 2:01 PM
zatrazz updated the diff for D48332: [AArch64] Add custom lowering for v4i8 trunc store.

Updated patch based from previous comment.

Jun 21 2018, 5:49 AM

Jun 20 2018

zatrazz updated the diff for D48332: [AArch64] Add custom lowering for v4i8 trunc store.

Updated patch from previous comments.

Jun 20 2018, 12:52 PM
zatrazz added inline comments to D48332: [AArch64] Add custom lowering for v4i8 trunc store.
Jun 20 2018, 11:54 AM
zatrazz added a comment to D48332: [AArch64] Add custom lowering for v4i8 trunc store.

I wonder if we should prefer to widen <2 x i8> and <4 x i8> to <8 x i8> instead of promoting to <4 x i16>. It would make stores like this a bit cheaper. Maybe an interesting experiment at some point (mostly just modifying AArch64TargetLowering::getPreferredVectorAction, I think, and seeing what happens to the generated code).

Jun 20 2018, 11:50 AM

Jun 19 2018

zatrazz created D48332: [AArch64] Add custom lowering for v4i8 trunc store.
Jun 19 2018, 12:38 PM

Jun 1 2018

zatrazz added a comment to D46283: [AArch64] Set vectorizer-maximize-bandwidth as default true.

Indeed the machine was using for speccpu2006 was not best suitable, I used a different one now (tx1, A57) with extra care to lower variance (cpu binding, services disabled, etc) and it indeed showed a better result:

Jun 1 2018, 6:25 AM

May 21 2018

zatrazz added a comment to D46283: [AArch64] Set vectorizer-maximize-bandwidth as default true.

For some reason I did not attach the meant comments in this update. This is an update of the previous patch with an extended analysis. I checked a bootstrap build TargetTransformation::shouldMaximizeVectorBandwidth enabled for both armhf (r332595) and powerpc64le (r332840). On armhf I did not see any regression, however on powerpc64le I found an issue related on how current code handles the MaximizeBandwidth option. The testcase 'Transforms/LoopVectorize/PowerPC/pr30990.ll' explicit sets vectorizer-maximize-bandwidth to 0, however the code checks for:

May 21 2018, 2:33 PM
zatrazz updated the diff for D46283: [AArch64] Set vectorizer-maximize-bandwidth as default true.
May 21 2018, 10:41 AM

May 9 2018

zatrazz closed D46010: [AArch64] Improve cost of vector division by constant.
May 9 2018, 6:21 AM
zatrazz added a comment to D46010: [AArch64] Improve cost of vector division by constant.

In fact I couldn't find any code generation different with and without this patch on speccpu2000. Most likely the performance difference is expected runtime variance from speccpu2000 (I do see a lot in some components).

May 9 2018, 4:42 AM

May 7 2018

zatrazz added a comment to D46010: [AArch64] Improve cost of vector division by constant.

I tired this patch on Exynos and the performance of SPEC CPU2000 was virtually neutral, even if slightly negative overall in the integer score. Perhaps the defaults costs are too optimistic about the relative difference between multiply and add?

May 7 2018, 8:31 AM