evandro (Evandro Menezes)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 5 2016, 9:21 AM (128 w, 1 d)

Currently working at the Samsung Austin R&D Center on the Samsung next generation ARM cores.

Recent Activity

Yesterday

evandro added a comment to D48172: [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc.

It looks reasonable to me. If no one else objects in a couple of days or so, it LGTM.

Wed, Jun 20, 10:50 AM
evandro added inline comments to D48172: [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc.
Wed, Jun 20, 8:33 AM
evandro added a comment to D48172: [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc.

Next time, when one patch causes a regression that another patch fixes, please, make the one a parent of the other.

Wed, Jun 20, 8:32 AM

Tue, Jun 19

evandro added inline comments to D48172: [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc.
Tue, Jun 19, 10:39 AM

Thu, Jun 14

evandro added inline comments to D48172: [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc.
Thu, Jun 14, 12:28 PM
evandro added inline comments to D48172: [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc.
Thu, Jun 14, 9:29 AM

Mon, Jun 11

evandro added a comment to D46356: [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles..

Thank you, @courbet.

Mon, Jun 11, 8:18 AM

Fri, Jun 8

evandro added a comment to D46356: [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles..

@evandro Does AArch64SchedExynosM1.td and AArch64SchedThunderX2T99.td look correct now please?

Fri, Jun 8, 12:09 PM
evandro added a comment to D46356: [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles..

Update ExynosM1.

Fri, Jun 8, 8:37 AM

Wed, Jun 6

evandro committed rC334116: [PATCH 2/2] [test] Add support for Samsung Exynos M4 (NFC).
[PATCH 2/2] [test] Add support for Samsung Exynos M4 (NFC)
Wed, Jun 6, 12:05 PM
evandro committed rL334116: [PATCH 2/2] [test] Add support for Samsung Exynos M4 (NFC).
[PATCH 2/2] [test] Add support for Samsung Exynos M4 (NFC)
Wed, Jun 6, 12:05 PM
evandro committed rL334115: [AArch64, ARM] Add support for Samsung Exynos M4.
[AArch64, ARM] Add support for Samsung Exynos M4
Wed, Jun 6, 12:00 PM

Tue, Jun 5

evandro added a comment to D46356: [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles..

A review of ExynosM1 /ThunderX2, as I'm not familiar with these CPUs.

Tue, Jun 5, 2:41 PM
Herald added a reviewer for D44794: [AArch64] Don't reduce the width of loads if it prevents combining a shift: javed.absar.
Tue, Jun 5, 1:21 PM

Tue, May 29

evandro committed rL333429: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.
[AArch64] Fix PR32384: bump up the number of stores per memset and memcpy
Tue, May 29, 9:03 AM
evandro closed D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.
Tue, May 29, 9:02 AM

Fri, May 25

evandro added a comment to D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

Thank you.

Fri, May 25, 2:38 PM
evandro updated the diff for D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.
Fri, May 25, 2:35 PM
evandro updated the diff for D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

Include the solution found in D47349.

Fri, May 25, 2:22 PM
evandro removed a dependency for D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy: D47349: [AArch64] Limit inlining string functions with strict alignment.
Fri, May 25, 2:14 PM
evandro removed a dependent revision for D47349: [AArch64] Limit inlining string functions with strict alignment: D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.
Fri, May 25, 2:14 PM
evandro abandoned D47349: [AArch64] Limit inlining string functions with strict alignment.

There's no need to make these methods virtual and override them in the sub target. Rather, modify the limits when the sub target is initialized, as @eli.friedman suggested.

Fri, May 25, 2:14 PM
evandro added a comment to D47349: [AArch64] Limit inlining string functions with strict alignment.

MaxStoresPerMemcpy = STI.requiresStrictAlign() ? 4 : 16;?

Fri, May 25, 1:04 PM
evandro added a comment to D47349: [AArch64] Limit inlining string functions with strict alignment.

I meant that you can change the value of MaxStoresPerMemcpy in AArch64TargetLowering::AArch64TargetLowering, instead of overriding getMaxStoresPerMemcpy.

Fri, May 25, 12:01 PM
evandro added inline comments to D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.
Fri, May 25, 11:58 AM
evandro added inline comments to D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.
Fri, May 25, 11:28 AM
evandro added a comment to D47349: [AArch64] Limit inlining string functions with strict alignment.

The AArch64Subtarget is explicitly passed as a parameter to AArch64TargetLowering::AArch64TargetLowering.

Fri, May 25, 11:25 AM
evandro added a comment to D47349: [AArch64] Limit inlining string functions with strict alignment.

If I'm following correctly, the problem here is that we don't have paired load/store operations for byte operations, so the threshold is roughly double what it should be. StrictAlign isn't really a good proxy for that; it doesn't have anything to do with whether a particular memcpy will lower to paired operations. Ideally, we should probably come up with some callback to give a bonus to paired operations rather than penalize all operations in StrictAlign mode.

Fri, May 25, 9:15 AM
evandro added a comment to D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

Those numbers look very different from the ones before. Is r332482 making this less profitable somehow? Or is the change all noise?

I'll have to take a closer look at these CPU2000 results, but in proprietary benchmarks this change is still beneficial.

Fri, May 25, 9:05 AM

Thu, May 24

evandro added a dependent revision for D47349: [AArch64] Limit inlining string functions with strict alignment: D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.
Thu, May 24, 4:42 PM
evandro added a dependency for D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy: D47349: [AArch64] Limit inlining string functions with strict alignment.
Thu, May 24, 4:42 PM
evandro created D47349: [AArch64] Limit inlining string functions with strict alignment.
Thu, May 24, 4:39 PM
evandro updated the diff for D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

Update test case to keep it from failing due to this change.

Thu, May 24, 2:25 PM
evandro added a comment to D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

Since @sebpop has just left for a deserved vacation, he asked me to babysit his pending patches.

Thu, May 24, 2:24 PM
evandro commandeered D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.
Thu, May 24, 2:22 PM
evandro set the repository for D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy to rL LLVM.
Thu, May 24, 2:17 PM
evandro added a comment to D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

Those numbers look very different from the ones before. Is r332482 making this less profitable somehow? Or is the change all noise?

Thu, May 24, 2:11 PM

May 15 2018

evandro committed rL332394: [AArch64] Improve single vector lane unscaled stores.
[AArch64] Improve single vector lane unscaled stores
May 15 2018, 1:45 PM
evandro closed D46762: [AArch64] Improve single vector lane unscaled stores.
May 15 2018, 1:45 PM
evandro accepted D46356: [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles..

@evandro Do you have benchmarks to evaluate the impact of this change ?

Yes. They're not done yet. Will let you know when I get the results.

Any news ?

May 15 2018, 8:19 AM

May 14 2018

evandro added a comment to D46851: [WIP] [AArch64] Pattern-match byte store from a vector register..

i8 isn't considered a legal type by the AArch64ISelLowering, so it will never show up in the input to the instruction selector. Given that, I'm not sure it's really meaningful to say i8 maps to any specific register class.

But I guess it wouldn't do any harm to list i8 as part of FPR8? At least, I can't think of any other effects.

May 14 2018, 3:34 PM
evandro added a comment to D46851: [WIP] [AArch64] Pattern-match byte store from a vector register..

Apropos, why is FPR8 defined as untyped and not i8 below?

May 14 2018, 2:51 PM
evandro committed rL332251: [AArch64] Improve single vector lane stores.
[AArch64] Improve single vector lane stores
May 14 2018, 8:32 AM
evandro closed D46655: [AArch64] Improve single vector lane stores.
May 14 2018, 8:32 AM
evandro added a comment to D46655: [AArch64] Improve single vector lane stores.

LGTM. (We should also add the i8 patterns at some point, but we can do that as a followup.)

May 14 2018, 8:08 AM

May 11 2018

evandro updated the diff for D46655: [AArch64] Improve single vector lane stores.
May 11 2018, 1:44 PM
evandro added inline comments to D46655: [AArch64] Improve single vector lane stores.
May 11 2018, 1:43 PM
evandro added a dependency for D46762: [AArch64] Improve single vector lane unscaled stores: D46655: [AArch64] Improve single vector lane stores.
May 11 2018, 10:58 AM
evandro added a dependent revision for D46655: [AArch64] Improve single vector lane stores: D46762: [AArch64] Improve single vector lane unscaled stores.
May 11 2018, 10:58 AM
evandro created D46762: [AArch64] Improve single vector lane unscaled stores.
May 11 2018, 10:58 AM

May 10 2018

evandro added inline comments to D46655: [AArch64] Improve single vector lane stores.
May 10 2018, 12:05 PM
evandro added inline comments to D46655: [AArch64] Improve single vector lane stores.
May 10 2018, 11:16 AM
evandro updated the diff for D46655: [AArch64] Improve single vector lane stores.
May 10 2018, 7:51 AM

May 9 2018

evandro updated the diff for D46655: [AArch64] Improve single vector lane stores.
May 9 2018, 4:03 PM
evandro updated the diff for D46655: [AArch64] Improve single vector lane stores.
May 9 2018, 4:02 PM
evandro added inline comments to D46655: [AArch64] Improve single vector lane stores.
May 9 2018, 2:50 PM
evandro updated the diff for D46655: [AArch64] Improve single vector lane stores.
May 9 2018, 11:50 AM
evandro created D46655: [AArch64] Improve single vector lane stores.
May 9 2018, 11:48 AM

May 4 2018

evandro added a comment to D46010: [AArch64] Improve cost of vector division by constant.

I tired this patch on Exynos and the performance of SPEC CPU2000 was virtually neutral, even if slightly negative overall in the integer score. Perhaps the defaults costs are too optimistic about the relative difference between multiply and add?

May 4 2018, 12:58 PM
evandro added a comment to D46356: [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles..

@evandro Do you have benchmarks to evaluate the impact of this change ?

May 4 2018, 8:44 AM

May 3 2018

evandro added inline comments to D46356: [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles..
May 3 2018, 9:39 AM
evandro added a comment to D46356: [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles..

Does FDIV use both the resources for 8 cycles? Writing [8,8] would imply that.

May 3 2018, 9:30 AM
evandro added inline comments to D46356: [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles..
May 3 2018, 8:10 AM

May 2 2018

evandro added inline comments to D46356: [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles..
May 2 2018, 2:20 PM

Apr 10 2018

evandro added a comment to D39976: [AArch64] Query the target when folding loads and stores.

Ping! 🔔

Apr 10 2018, 2:18 PM

Apr 3 2018

evandro committed rL329130: [AArch64] Adjust the cost model for Exynos M3.
[AArch64] Adjust the cost model for Exynos M3
Apr 3 2018, 4:00 PM

Apr 2 2018

evandro added a comment to D39976: [AArch64] Query the target when folding loads and stores.

This change is more generic and flexible than FeatureSlowPaired128. This change controls not only when loads and stores are paired, but also other foldings that this pass performs, including the pre or post indexing of the offset register.

That's not what the code looks like it is doing. isReplacementProfitable() is only being called from mergeUpdateInsn(), which is only called when folding to form base register incrementing load/stores.

Apr 2 2018, 11:56 AM

Mar 30 2018

evandro added a comment to D45098: [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy.

Should we check for hasNEON() here? The generic code doesn't know AArch64 has ldp/stp, so we might want to be a little more aggressive to compensate.

Mar 30 2018, 11:43 AM

Mar 27 2018

evandro added a comment to D39976: [AArch64] Query the target when folding loads and stores.

FeatureSlowPaired128 was just too coarse. The alternative would be to change it to something more specific, like FeatureSlowSomePaired128Sometimes, and then create yet another when for the next generation to specialize it further. Instead, querying the scheduling model seems to be a much more reasonable approach.

I'm more confused now. 'FeatureSlowPaired128' controls whether certain load/store opcodes are combined to form paired load/stores. But this change prevents some load/store opcodes from having their base register increment folded in. The two seem unrelated.

Mar 27 2018, 3:10 PM

Mar 20 2018

evandro committed rL328027: [AArch64] Adjust the cost model for Exynos M3.
[AArch64] Adjust the cost model for Exynos M3
Mar 20 2018, 1:03 PM

Mar 15 2018

evandro committed rL327663: [AArch64] Adjust the cost model for Exynos M3.
[AArch64] Adjust the cost model for Exynos M3
Mar 15 2018, 1:40 PM
evandro committed rL327662: [AArch64] Adjust the cost model for Exynos M3.
[AArch64] Adjust the cost model for Exynos M3
Mar 15 2018, 1:34 PM
evandro committed rL327661: [AArch64] Adjust the cost model for Exynos M3.
[AArch64] Adjust the cost model for Exynos M3
Mar 15 2018, 1:33 PM

Mar 14 2018

evandro added a comment to D44490: [AArch64] Implement getArithmeticReductionCost.

Do you intend to consider AArch64TTIImpl::getMinMaxRdxCost() too?

Mar 14 2018, 1:06 PM

Mar 13 2018

evandro added a comment to D39976: [AArch64] Query the target when folding loads and stores.

Ping! 🔔

Mar 13 2018, 12:32 PM

Mar 8 2018

evandro added a comment to D39976: [AArch64] Query the target when folding loads and stores.

I've thought about this some more and tested it out on Falkor. As currently written this change causes SIMD store instructions to not have pre/post increments folded into them, causing minor performance regressions.

Mar 8 2018, 8:32 AM

Mar 7 2018

evandro committed rL326955: [AArch64] Adjust the cost of integer vector division.
[AArch64] Adjust the cost of integer vector division
Mar 7 2018, 2:38 PM
evandro closed D43974: [AArch64] Adjust the cost of integer vector division.
Mar 7 2018, 2:38 PM
evandro updated the diff for D43974: [AArch64] Adjust the cost of integer vector division.
Mar 7 2018, 2:10 PM
evandro added inline comments to D43974: [AArch64] Adjust the cost of integer vector division.
Mar 7 2018, 2:10 PM
evandro updated the diff for D43974: [AArch64] Adjust the cost of integer vector division.

Updated the test case.

Mar 7 2018, 1:46 PM
evandro added a comment to D43974: [AArch64] Adjust the cost of integer vector division.

That's a sensible alternative.

Mar 7 2018, 1:28 PM
evandro added a comment to D43974: [AArch64] Adjust the cost of integer vector division.

The test case has scalar types and it seems more interesting to see the cost rising proportionally with the vector factor.

Mar 7 2018, 1:06 PM
evandro accepted D44222: [AArch64] Add vmulxh_lane FP16 intrinsics.

Looks pretty straightforward to me.

Mar 7 2018, 12:57 PM
evandro updated the diff for D43974: [AArch64] Adjust the cost of integer vector division.

Added a test case.

Mar 7 2018, 12:39 PM
Herald updated subscribers of D30225: [LIR] re-enable generation of memmove with runtime checks.
Mar 7 2018, 9:03 AM

Mar 6 2018

evandro updated subscribers of D44118: [x86][AArch64] ask the target whether it has a vector blend instruction.

It LGTM, but I wonder how it affects other major targets, like PPC. It's probably a good idea to give them some time to ponder this change.

Mar 6 2018, 8:37 AM

Mar 5 2018

evandro added a comment to D43973: [AArch64] define isExtractSubvectorCheap.

It LGTM, but I'd wait a while to give a chance for others to chime up.

Mar 5 2018, 12:05 PM
evandro added a comment to D39976: [AArch64] Query the target when folding loads and stores.

Methinks that the gist is to move away from features and to rely more on the cost model. In the case of this patch, it also removes the feature FeatureSlowPaired128 in D40107.

That seems like a worthwhile goal, but this change doesn't really seem to be accomplishing that. If the sched model is being used by a subtarget-specific heuristic, that seems like just a more roundabout way of achieving the same result for your subtarget. Is there any net effect of this change combined with D40107?

Mar 5 2018, 11:17 AM
evandro committed rL326724: [AArch64] Harden test case.
[AArch64] Harden test case
Mar 5 2018, 9:44 AM
evandro committed rL326718: [AArch64] Improve code generation of constant vectors.
[AArch64] Improve code generation of constant vectors
Mar 5 2018, 9:05 AM
This revision was not accepted when it landed; it landed in state Needs Review.
Mar 5 2018, 9:05 AM
evandro added a comment to D40831: [AArch64] Only use writeback in the load/store optimizer when needed.

I'm not sure what is gained by not performing this optimization, if I understood the gist of it from the new test case below. For even if the register is killed, the pointer adjustment is folded into the load or store and an instruction is eliminated. What other optimizations are expected to be happen should this patch be applied?

Mar 5 2018, 8:50 AM

Mar 1 2018

evandro added a reviewer for D43974: [AArch64] Adjust the cost of integer vector division: magabari.
Mar 1 2018, 2:35 PM
evandro added a comment to D43974: [AArch64] Adjust the cost of integer vector division.

This patch results in some image processing benchmarks to improve by over 1% on big Cortex and Exynos cores.

Mar 1 2018, 2:32 PM
evandro created D43974: [AArch64] Adjust the cost of integer vector division.
Mar 1 2018, 2:30 PM
evandro committed rL326486: [AArch64] Clean up code (NFC).
[AArch64] Clean up code (NFC)
Mar 1 2018, 1:19 PM
evandro updated subscribers of D42133: [AArch64] Improve code generation of constant vectors.
Mar 1 2018, 8:10 AM

Feb 28 2018

evandro updated subscribers of D42133: [AArch64] Improve code generation of constant vectors.
Feb 28 2018, 2:34 PM

Feb 26 2018

evandro committed rL326147: [AArch64] Harden test cases.
[AArch64] Harden test cases
Feb 26 2018, 3:21 PM