Page MenuHomePhabricator

andreadb (Andrea Di Biagio)
User

Projects

User does not belong to any projects.

User Details

User Since
May 9 2013, 11:10 AM (302 w, 1 d)

Recent Activity

Wed, Feb 20

andreadb committed rG51c1cc0757ff: [MCA][Scheduler] Correctly initialize field NumDispatchedToThePendingSet. (authored by andreadb).
[MCA][Scheduler] Correctly initialize field NumDispatchedToThePendingSet.
Wed, Feb 20, 10:23 AM
andreadb committed rG3316eb5bb802: [MCA][Scheduler] Collect resource pressure and memory dependency bottlenecks. (authored by andreadb).
[MCA][Scheduler] Collect resource pressure and memory dependency bottlenecks.
Wed, Feb 20, 10:02 AM
andreadb committed rGd882ad5e6efb: [MCA][ResourceManager] Add a table that maps processor resource indices to… (authored by andreadb).
[MCA][ResourceManager] Add a table that maps processor resource indices to…
Wed, Feb 20, 6:53 AM

Mon, Feb 18

andreadb committed rGc102e2a2275b: [MCA] Correctly update register definitions in the PRF after move elimination. (authored by andreadb).
[MCA] Correctly update register definitions in the PRF after move elimination.
Mon, Feb 18, 6:16 AM
andreadb committed rG7a950ed587ba: [MCA] Slightly refactor method writeStartEvent in WriteState and ReadState. NFCI (authored by andreadb).
[MCA] Slightly refactor method writeStartEvent in WriteState and ReadState. NFCI
Mon, Feb 18, 3:27 AM

Fri, Feb 15

andreadb committed rG2187a4fa6aaa: [MCA] Improved code comment. NFC (authored by andreadb).
[MCA] Improved code comment. NFC
Fri, Feb 15, 10:28 AM
andreadb committed rG5ad52e35a8e4: [MCA][LSUnit] Return the ID of the dependent memory operation from method… (authored by andreadb).
[MCA][LSUnit] Return the ID of the dependent memory operation from method…
Fri, Feb 15, 10:08 AM

Wed, Feb 13

andreadb committed rG245163ffd0ef: [MCA] Store a bitmask of used groups in the instruction descriptor. (authored by andreadb).
[MCA] Store a bitmask of used groups in the instruction descriptor.
Wed, Feb 13, 6:56 AM
andreadb committed rG318f990aee79: [MCA][Scheduler] Use latency information to further classify busy instructions. (authored by andreadb).
[MCA][Scheduler] Use latency information to further classify busy instructions.
Wed, Feb 13, 3:03 AM

Tue, Feb 12

andreadb committed rGd30fff9a9049: [MCA] Improved debug prints. NFC (authored by andreadb).
[MCA] Improved debug prints. NFC
Tue, Feb 12, 8:19 AM
andreadb updated the diff for D58066: [MCA][Scheduler] Use latency information to further classify busy instructions..

Addressed review comments.

Tue, Feb 12, 4:51 AM · Restricted Project
andreadb added inline comments to D58066: [MCA][Scheduler] Use latency information to further classify busy instructions..
Tue, Feb 12, 3:03 AM · Restricted Project

Mon, Feb 11

andreadb created D58066: [MCA][Scheduler] Use latency information to further classify busy instructions..
Mon, Feb 11, 10:52 AM · Restricted Project
andreadb committed rG23ff2aa47cfe: [MCA][Scheduler] Track resources that were found busy when issuing an… (authored by andreadb).
[MCA][Scheduler] Track resources that were found busy when issuing an…
Mon, Feb 11, 9:56 AM
andreadb committed rG83e68854d545: [MCA] Return a mask of busy resources from method ResourceManager… (authored by andreadb).
[MCA] Return a mask of busy resources from method ResourceManager…
Mon, Feb 11, 6:53 AM

Wed, Feb 6

andreadb committed rG02974728dc45: [MCA] Speedup ResourceManager queries. NFCI (authored by andreadb).
[MCA] Speedup ResourceManager queries. NFCI
Wed, Feb 6, 6:58 AM

Tue, Feb 5

andreadb committed rG4bce783ee332: [MCA] Moved the logic that updates register dependencies from DispatchStage to… (authored by andreadb).
[MCA] Moved the logic that updates register dependencies from DispatchStage to…
Tue, Feb 5, 6:12 AM
andreadb committed rG998a925e0e47: [MCA] Simplify the logic in method WriteState::addUser. NFCI (authored by andreadb).
[MCA] Simplify the logic in method WriteState::addUser. NFCI
Tue, Feb 5, 3:37 AM

Mon, Feb 4

andreadb committed rGedbf06a76771: [AsmPrinter] Remove hidden flag -print-schedule. (authored by andreadb).
[AsmPrinter] Remove hidden flag -print-schedule.
Mon, Feb 4, 4:53 AM

Fri, Feb 1

andreadb accepted D57300: [X86][BdVer2] Transfer delays from the integer to the floating point unit..

Thanks for running that experiment. There is clearly an 8-10cy delay.

Fri, Feb 1, 2:52 AM · Restricted Project

Tue, Jan 29

andreadb added a comment to D57300: [X86][BdVer2] Transfer delays from the integer to the floating point unit..

(I edited my previous comment. However, the system didn't send another email.)

Tue, Jan 29, 9:38 AM · Restricted Project
andreadb added a comment to D57300: [X86][BdVer2] Transfer delays from the integer to the floating point unit..
      2e:       41 bf 00 00 00 00       mov    $0x0,%r15d
      34:       c4 c3 41 20 ff 01       vpinsrb $0x1,%r15d,%xmm7,%xmm7
      3a:       c4 c3 41 20 ff 01       vpinsrb $0x1,%r15d,%xmm7,%xmm7
....
    ea88:       c4 c3 41 20 ff 01       vpinsrb $0x1,%r15d,%xmm7,%xmm7
Tue, Jan 29, 9:28 AM · Restricted Project
andreadb updated the diff for D57148: [X86][Btver2] Improved latency/throughput model for scalar int-to-float conversions..

Patch updated.
Addressed review comment.

Tue, Jan 29, 7:52 AM
andreadb updated the diff for D57148: [X86][Btver2] Improved latency/throughput model for scalar int-to-float conversions..

Patch updated.

Tue, Jan 29, 5:30 AM
andreadb added reviewers for D57375: X86AsmParser AVX-512: Return error instead of hitting assert: craig.topper, RKSimon.
Tue, Jan 29, 3:37 AM · Restricted Project
andreadb added a comment to D57300: [X86][BdVer2] Transfer delays from the integer to the floating point unit..

I'm unable to find this number in the "AMD SOG for family 15h".
llvm-exegesis measures the latencies of these instructions as 2,
which matches the latencies specified in "AMD SOG for family 15h".

Tue, Jan 29, 2:39 AM · Restricted Project

Mon, Jan 28

andreadb updated the diff for D57244: [AsmPrinter] Remove hidden flag -print-schedule..

Patch updated.

Mon, Jan 28, 5:47 AM · Restricted Project

Fri, Jan 25

andreadb created D57244: [AsmPrinter] Remove hidden flag -print-schedule..
Fri, Jan 25, 9:02 AM · Restricted Project

Jan 24 2019

andreadb created D57148: [X86][Btver2] Improved latency/throughput model for scalar int-to-float conversions..
Jan 24 2019, 5:07 AM

Jan 23 2019

andreadb updated the diff for D57056: [MC][X86] Correctly model additional operand latency caused by transfer delays from the integer to the floating point unit..

Thanks for the feedback.

Jan 23 2019, 4:52 AM
andreadb accepted D57090: [IR] Match intrinsic paramater by scalar/vectorwidth.

The patch looks good to me. However, I don't claim to be an expert of this area. So getting another pair of eyes on this could be useful.

Jan 23 2019, 4:45 AM

Jan 22 2019

andreadb added inline comments to D57056: [MC][X86] Correctly model additional operand latency caused by transfer delays from the integer to the floating point unit..
Jan 22 2019, 10:54 AM
andreadb added a reviewer for D57056: [MC][X86] Correctly model additional operand latency caused by transfer delays from the integer to the floating point unit.: mattd.
Jan 22 2019, 9:51 AM
andreadb created D57056: [MC][X86] Correctly model additional operand latency caused by transfer delays from the integer to the floating point unit..
Jan 22 2019, 7:41 AM

Jan 18 2019

andreadb updated the diff for D56922: [X86][BtVer2] Update the WriteLoad latency..

Removed FIXME.

Jan 18 2019, 10:42 AM
andreadb created D56922: [X86][BtVer2] Update the WriteLoad latency..
Jan 18 2019, 10:08 AM

Jan 16 2019

andreadb updated the diff for D56777: [X86][BtVer2] Update latency of horizontal operations..

Patch updated.

Jan 16 2019, 5:50 AM
andreadb updated the diff for D56777: [X86][BtVer2] Update latency of horizontal operations..

Added comments.

Jan 16 2019, 5:28 AM
andreadb created D56777: [X86][BtVer2] Update latency of horizontal operations..
Jan 16 2019, 3:53 AM

Jan 9 2019

andreadb accepted D56450: [x86] fix horizontal binop matching for 256-bit vectors (PR40243).

Looks good to me.

Jan 9 2019, 8:41 AM

Jan 3 2019

andreadb accepted D55722: [DAGCombiner] scalarize binop followed by extractelement.
Jan 3 2019, 6:43 AM
andreadb added a comment to D55722: [DAGCombiner] scalarize binop followed by extractelement.

The X86 changes look good to me.

Jan 3 2019, 6:43 AM

Dec 19 2018

andreadb accepted D55803: [llvm-mca] Add an error handler for error from parseCodeRegions.

LGTM

Dec 19 2018, 9:33 AM
andreadb added inline comments to D55870: [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing.
Dec 19 2018, 9:22 AM
andreadb added inline comments to D55870: [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing.
Dec 19 2018, 8:55 AM
andreadb added inline comments to D55870: [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing.
Dec 19 2018, 8:51 AM
andreadb added inline comments to D55870: [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing.
Dec 19 2018, 8:39 AM
andreadb accepted D55866: [DAGCombiner] allow narrowing of add followed by truncate.
Dec 19 2018, 7:01 AM
andreadb added a comment to D55866: [DAGCombiner] allow narrowing of add followed by truncate.

From an x86 point of view, this change looks good.

At some point, we should address the poor codegen from test/CodeGen/X86/pr32345.ll. Maybe you could raise a bug for it.

I don't think it's worth the effort if we look at the background for this test; the intent was to check for a crash with *unoptimized* code:
https://bugs.llvm.org/show_bug.cgi?id=32345
rL298923
...so the test includes CSE values that the DAG never expects to encounter. I think the optimized RUN was just added for completeness/sanity. The whole thing is folded to 'unreachable' in IR.

Dec 19 2018, 7:00 AM
andreadb accepted D55870: [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing.

The change LGTM.

Dec 19 2018, 6:06 AM
andreadb added inline comments to D55870: [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing.
Dec 19 2018, 5:43 AM
andreadb accepted D55883: [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091).

LGTM

Dec 19 2018, 3:59 AM
andreadb added a comment to D55866: [DAGCombiner] allow narrowing of add followed by truncate.

From an x86 point of view, this change looks good.

Dec 19 2018, 3:26 AM
andreadb added inline comments to D55866: [DAGCombiner] allow narrowing of add followed by truncate.
Dec 19 2018, 3:24 AM

Dec 18 2018

andreadb added a comment to D55780: [X86] Create PSUBUS from (add (umax X, C), -C).

This change handles psubus cases with no undefs, so LGTM.
Depending on how the patches land, we could add the undef capability and tests for it directly here or make that an enhancement.
But as I mentioned, I don't think this patch alone is enough to close PR40053 because that's at least 2 independent bugs in 1 report.

Don't get me wrong: I am not suggesting that we shouldn't commit this patch.

I just wanted to point out that it is not true that all the psubus cases are fixed as you wrote. To fully fix psubus we need to make sure that we match UMAX even in the presence of undefs.
As I wrote in my previous comment, if you slightly change the example from the bugzilla, then we no longer get a UMAX in the DAG, and we lose this optimization (see below):

unsigned long long test_sub_2(__m128i x) {
    __m128i c = _mm_set1_epi8(70);
    return _mm_subs_epu8(x, c)[0];
}

Ah, sorry I forgot to address that example. I filed it here:
https://bugs.llvm.org/show_bug.cgi?id=40083

Dec 18 2018, 7:51 AM
andreadb added a comment to D55780: [X86] Create PSUBUS from (add (umax X, C), -C).

This change handles psubus cases with no undefs, so LGTM.
Depending on how the patches land, we could add the undef capability and tests for it directly here or make that an enhancement.
But as I mentioned, I don't think this patch alone is enough to close PR40053 because that's at least 2 independent bugs in 1 report.

Dec 18 2018, 7:26 AM
andreadb added a comment to D55822: [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate.

As I wrote to D55819, I think this is a nice improvement and makes the matching logic resilient to the presence of undefs in the input vectors. It also fixes the regression introduced by D55600 in @knownbits_mask_or_shuffle_uitofp. So, overall, I think this is a nice improvement.

Dec 18 2018, 6:51 AM
andreadb added a comment to D55819: [SelectionDAG] Optional handling of UNDEF elements in matchUnaryPredicate.

I think this is the right way to go.

Dec 18 2018, 6:33 AM
andreadb added a comment to D55780: [X86] Create PSUBUS from (add (umax X, C), -C).

Hi Craig,

Dec 18 2018, 3:50 AM

Dec 17 2018

andreadb accepted D55768: [TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000).

LGTM. Thanks!

Dec 17 2018, 10:00 AM

Dec 14 2018

andreadb accepted D55600: [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts.

LGTM

Dec 14 2018, 3:20 AM

Dec 13 2018

andreadb accepted D55655: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract.

LGTM

Dec 13 2018, 7:33 AM
andreadb added inline comments to D55600: [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts.
Dec 13 2018, 6:44 AM
andreadb added inline comments to D55600: [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts.
Dec 13 2018, 5:43 AM
andreadb added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

Let's not lose sight of the big picture here. If uarch problems exist, are they *worse* than the cost of calling memcmp()?

Almost certainly no, even for memcpy where potential store-forwarding stalls or 4k aliasing are a pretty minor concern most of the time.

I pointed those things out so the new unaligned load/store code-gen can be the best it can be while people are working on that code anyway, *not* because I think there's a risk of overall regressions.

Dec 13 2018, 3:35 AM
andreadb accepted D55557: [llvm-mca] Move llvm-mca library to llvm/lib/MCA..

This patch LGTM.

Dec 13 2018, 3:27 AM

Dec 12 2018

andreadb added inline comments to D55600: [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts.
Dec 12 2018, 10:03 AM
andreadb accepted D55426: [SelectionDAG] Add a generic isSplatValue function..

Thanks Simon.

Dec 12 2018, 9:55 AM
andreadb added inline comments to D55426: [SelectionDAG] Add a generic isSplatValue function..
Dec 12 2018, 8:51 AM
andreadb added a comment to D55565: [X86] Don't emit MULX by default with BMI2.

I don't have a strong opinion on this.

Dec 12 2018, 4:58 AM
andreadb accepted D55558: [TargetLowering] Add ISD::AND handling to SimplifyDemandedVectorElts.

Looks good to me.

Dec 12 2018, 4:34 AM
andreadb accepted D55414: [X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input..

LGTM too.

Dec 12 2018, 4:10 AM

Dec 11 2018

andreadb added a comment to D55494: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA.

This patch looks good to me.

Dec 11 2018, 4:21 AM
andreadb added a comment to rL348114: [ARM][MC] Move information about variadic register defs into tablegen.

In the ARM backend, we only use variable_ops for the LDM and STM instructions, where the variadic operands are either all uses or all defs. I'm less familiar with the other backends, but all of the other cases where variable_ops is used look similar.

If a target does need both variable uses and defs on the same instruction, I assume that we'd need to add a way to represent that to the MCInst or MCOperand, as the sequence of uses and defs would vary between different instances of the instruction.simillar

Dec 11 2018, 2:57 AM

Dec 10 2018

andreadb accepted D55345: [AArch64] Refactor the Exynos scheduling predicates.

Thanks Evandro.

Dec 10 2018, 9:10 AM
andreadb accepted D55507: [DAGCombiner] Use the result value type in visitCONCAT_VECTORS.

I only have one comment. Otherwise, this LGTM. Thanks.

Dec 10 2018, 5:13 AM

Dec 7 2018

andreadb added a comment to D55414: [X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input..

Hi Craig,

Dec 7 2018, 11:48 PM
andreadb added a comment to D55345: [AArch64] Refactor the Exynos scheduling predicates.

I went through the various changes, and I didn’t find anything that looks obviously wrong. The new predicates look fine to me.

Dec 7 2018, 8:31 AM

Dec 6 2018

andreadb added inline comments to D55375: [AArch64] Refactor the scheduling predicates.
Dec 6 2018, 3:52 PM
andreadb accepted D55375: [AArch64] Refactor the scheduling predicates.

I only have one question. Otherwise, the new predicates LGTM (their logic seems to match what you wrote in the code comments).

Dec 6 2018, 12:43 PM
andreadb added a comment to D55345: [AArch64] Refactor the Exynos scheduling predicates.

Sorry, I was juggling too many patches out of order.

I'd appreciate your reviewing the new predicates and their application while I work on test cases.

Thank you.

Dec 6 2018, 11:18 AM
andreadb added a comment to D55375: [AArch64] Refactor the scheduling predicates.

Note that I made this patch a predecessor for D55345.

Dec 6 2018, 10:38 AM
andreadb added inline comments to D55375: [AArch64] Refactor the scheduling predicates.
Dec 6 2018, 9:42 AM
andreadb added a comment to D55345: [AArch64] Refactor the Exynos scheduling predicates.

Rebase the patch.

Dec 6 2018, 9:42 AM
andreadb added a comment to D55345: [AArch64] Refactor the Exynos scheduling predicates.

Hi Evandro,

Dec 6 2018, 6:18 AM
andreadb added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

I just looked over the codegen changes so far, but I want to add some more knowledgeable x86 hackers to have a look too. There are 2 concerns:
<snip>

  1. Are there any known uarch problems with unaligned accesses (either scalar or SSE)?

*Unaligned* loads are a potential minor slowdown if they cross cache-line boundaries. (Or on AMD, maybe even 32-byte or even 16-byte boundaries). There is literally zero penalty when they don't cross any relevant boundary on modern CPUs (on Intel, that's 64-byte cache lines).

Dec 6 2018, 4:54 AM
andreadb added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

One of my coworkers did an informal test last year and saw that newer Intel CPUs optimization of REP-string-op-instruction was faster than using SSE2 (he used large data sizes, not anything in the shorter ranges this patch deals with). Is that something that should be looked at? (or has somebody done that examination already)

Yes, I'm planning to work on this next :) It should go in SelectionDAGTargetInfo::EmitTargetCodeForMemcmp(), similar to what we did for memcpy and memset though.

Dec 6 2018, 3:35 AM
andreadb added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

Here's a basic benchmark for memcmp(a, b, N) where N is a compile-time constant, and a and b differ first at character M:

The change makes the impacted values 2.5 - 3x as fast.

Dec 6 2018, 3:15 AM

Dec 5 2018

andreadb updated the diff for D55274: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef..

Patch updated.

Dec 5 2018, 11:15 AM
andreadb added inline comments to D55274: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef..
Dec 5 2018, 7:58 AM
andreadb updated the diff for D55274: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef..

Thanks Simon for the feedback.
My new combine logic was indeed very similar to the existing logic in visitCONCAT_VECTORS.
This patch reuses that logic and introduces a new rule for the case where the first operand of the concat_vector is a scalar_to_vector.

Dec 5 2018, 7:37 AM

Dec 4 2018

andreadb created D55274: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef..
Dec 4 2018, 8:12 AM

Nov 30 2018

andreadb accepted D55089: [TableGen] Fix negation of simple predicates.

I'm all for reusing code, but this is not any sophisticated algorithm, but mere streaming of data. I rather keep the additional methods and simplify the exiting ones to eliminate their special cases for the sake of maintenance. After all, complicating them is what resulted in this issue.

Nov 30 2018, 12:07 PM
andreadb added a comment to D55089: [TableGen] Fix negation of simple predicates.

Hi Evandro.

Nov 30 2018, 3:33 AM

Nov 29 2018

andreadb added a comment to D54640: [DAGCombiner] narrow truncated binops.

LGTM.

At the beginning, I was a bit concerned by the increase in the number partial register writes.
So I started experimenting with your patch fo a bit. In practice, I couldn't come up with an obvious case where your patch introduces extra partial register stalls. Also, none of the tests modified by your patch really suffers from partial register stalls.
The checks perfomed by your transform are quite conservative (the new combine rule is limited to pre-legalization, and it only affects binary opcodes where one of the operands is a constant). So, I am happy with it.

Thanks everyone for the reviews and comments!
I haven't thought about partial reg stalls in a while, but IIRC, AMD doesn't care in cases like these because it doesn't rename 8/16-bit regs, and Intel should be protected because we have a machine pass (-x86-fixup-bw-insts) to prevent the problem.

Nov 29 2018, 12:47 PM
andreadb added inline comments to D54640: [DAGCombiner] narrow truncated binops.
Nov 29 2018, 11:02 AM
andreadb added inline comments to D54640: [DAGCombiner] narrow truncated binops.
Nov 29 2018, 10:38 AM
andreadb accepted D54640: [DAGCombiner] narrow truncated binops.
Nov 29 2018, 10:08 AM
andreadb added a comment to D54640: [DAGCombiner] narrow truncated binops.

At the beginning, I was a bit concerned by the increase in the number partial register writes.
So I started experimenting with your patch fo a bit. In practice, I couldn't come up with an obvious case where your patch introduces extra partial register stalls. Also, none of the tests modified by your patch really suffers from partial register stalls.
The checks perfomed by your transform are quite conservative (the new combine rule is limited to pre-legalization, and it only affects binary opcodes where one of the operands is a constant). So, I am happy with it.

Nov 29 2018, 10:08 AM

Nov 28 2018

andreadb updated the diff for D54957: [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666)..

Address review comments.

Nov 28 2018, 11:14 AM