aemerson (Amara Emerson)
Asian George Costanza

Projects

User does not belong to any projects.

User Details

User Since
Sep 9 2013, 3:45 AM (210 w, 6 d)

Compilers at a fruit company

Recent Activity

Thu, Sep 21

aemerson created D38161: [X86] Improve codegen for inverted overflow checking intrinsics.
Thu, Sep 21, 5:19 PM
aemerson created D38160: [AArch64] Improve codegen for inverted overflow checking intrinsics.
Thu, Sep 21, 5:11 PM

Wed, Sep 13

aemerson resigned from D31724: [SelectionDAG] Remove special call to LHS computeKnownBits for ANDs with constant RHS..
Wed, Sep 13, 11:49 AM

Aug 4 2017

aemerson committed rL310117: [SCEV] Preserve NSW information for sext(subtract)..
[SCEV] Preserve NSW information for sext(subtract).
Aug 4 2017, 1:20 PM
aemerson closed D35256: [SCEV] Try harder to preserve NSW information for sext(sub) expressions by committing rL310117: [SCEV] Preserve NSW information for sext(subtract)..
Aug 4 2017, 1:20 PM
aemerson resigned from D31239: [WIP] Add Caching of Known Bits in InstCombine.
Aug 4 2017, 9:48 AM
aemerson resigned from D35307: [AArch64] Initial SVE register definitions.
Aug 4 2017, 9:44 AM
aemerson closed D35076: [AArch64] Add an SVE target feature to the backend and TargetParser.
Aug 4 2017, 9:43 AM
aemerson updated the diff for D35256: [SCEV] Try harder to preserve NSW information for sext(sub) expressions.

Added support for handling the nsw/nuw ssub.with.overflow intrinsic.

Aug 4 2017, 7:24 AM
aemerson added a comment to D36059: [memops] Add a new pass to inject fast-path code for specific library function calls..

IMO, there is no need for doing this in this place. If we're just leaving a marker here for the target to expand, we don't need to do anything. We already get a chance to custom expand the libcall in the target. Adding the versioning doesn't make that any simpler given that it still needs to introduce a loop.

The difference is that at the target codegen level we can't as easily do the predicate analysis as we can at the IR level.

If, for a particular target, it is worth emitting a versioned, carefully target-crafted loop or instruction sequence, I would expect them to not use this pass but to custom lower the calls in the backend much like x86 does for constant-size calls.

But in the patch description you say that one of the challenges is constructing *just* the right IR to get efficient codegen from the backend. I understand this is for x86 right now, but if you don't have plans to allow other targets to work well with it, why not put it into the Target/X86 directory and make it a backend-specific IR pass to avoid confusion?

Aug 4 2017, 2:45 AM

Aug 2 2017

aemerson added a comment to D35307: [AArch64] Initial SVE register definitions.

I'm resigning from SVE upstreaming related activities, so Graham will be taking over this patch and others from here.

Aug 2 2017, 6:24 AM

Aug 1 2017

aemerson added inline comments to D35256: [SCEV] Try harder to preserve NSW information for sext(sub) expressions.
Aug 1 2017, 5:38 AM
aemerson added a comment to D36059: [memops] Add a new pass to inject fast-path code for specific library function calls..

Instead of generating loop IR for the fast path, how about creating a versioned memcpy/memset with the constrained parameters guarded under the condition test? That way, in the back-end the exact preferred optimal code can be generated, allowing for unrolled loop bodies specific to individual targets.

Aug 1 2017, 2:01 AM

Jul 31 2017

aemerson updated the diff for D35256: [SCEV] Try harder to preserve NSW information for sext(sub) expressions.

Updated to use MatchBinaryOp.

Jul 31 2017, 4:05 AM

Jul 24 2017

aemerson accepted D35777: [LoopInterchange] Update code to use range-based for loops (NFC)..

LGTM.

Jul 24 2017, 3:20 AM

Jul 23 2017

aemerson added inline comments to D35777: [LoopInterchange] Update code to use range-based for loops (NFC)..
Jul 23 2017, 11:13 AM

Jul 17 2017

aemerson added a comment to D35256: [SCEV] Try harder to preserve NSW information for sext(sub) expressions.

Ping.

Jul 17 2017, 3:21 AM

Jul 13 2017

aemerson committed rL307919: [AArch64] Add support for handling the +sve target feature..
[AArch64] Add support for handling the +sve target feature.
Jul 13 2017, 8:36 AM
aemerson closed D35118: [AArch64] Add support for handling the +sve target feature by committing rL307919: [AArch64] Add support for handling the +sve target feature..
Jul 13 2017, 8:36 AM · Restricted Project
aemerson committed rL307917: [AArch64] Add an SVE target feature to the backend and TargetParser..
[AArch64] Add an SVE target feature to the backend and TargetParser.
Jul 13 2017, 8:20 AM
aemerson updated the diff for D35118: [AArch64] Add support for handling the +sve target feature.

The reason it's removed is because it's not actually used anywhere, just as a default value. I'm not going to debate it further though so I've put it back in.

Jul 13 2017, 3:44 AM · Restricted Project
aemerson added a comment to D35076: [AArch64] Add an SVE target feature to the backend and TargetParser.

Ignore previous comment, was supposed to be added to D35118.

Jul 13 2017, 2:19 AM
aemerson added a comment to D35118: [AArch64] Add support for handling the +sve target feature.

@jmolloy Can you check this change, please?

Jul 13 2017, 2:19 AM · Restricted Project
aemerson added a comment to D35076: [AArch64] Add an SVE target feature to the backend and TargetParser.
Jul 13 2017, 2:18 AM
aemerson added a comment to D35307: [AArch64] Initial SVE register definitions.

Hi Amara,

This seems a very raw change, without any further description, comments or proper usage, other than a few changes on random places.

Jul 13 2017, 2:11 AM

Jul 12 2017

aemerson updated the diff for D35307: [AArch64] Initial SVE register definitions.

Removing comment.

Jul 12 2017, 8:51 AM
aemerson created D35307: [AArch64] Initial SVE register definitions.
Jul 12 2017, 8:09 AM
aemerson abandoned D35264: [LICM] Teach LICM to hoist conditional loads.

This looks wrong.
In particular, it would break for something like:

a = *p;
free(p)
while (k) { // k is always false
 ... = *p;
}

We have llvm::isSafeToLoadUnconditionally() that does the domination check safely, but just checking that it's safe to load unconditionally in the pre-header may not be enough.

Consider something like:

a = *p;
while (k) {
  if (m) {
    free(p);
  }
  ... = *p;

}

Now, in those cases we shouldn't be hoisting regardless, because if p escapes, the value may not be loop-invariant.
But we need to make sure the patch doesn't break that, so additional tests may be needed.

Jul 12 2017, 1:25 AM

Jul 11 2017

aemerson updated the diff for D35264: [LICM] Teach LICM to hoist conditional loads.

Whitespace fixes.

Jul 11 2017, 9:19 AM
aemerson created D35264: [LICM] Teach LICM to hoist conditional loads.
Jul 11 2017, 9:15 AM
aemerson created D35256: [SCEV] Try harder to preserve NSW information for sext(sub) expressions.
Jul 11 2017, 7:00 AM
aemerson added a comment to D35076: [AArch64] Add an SVE target feature to the backend and TargetParser.

Ping.

Jul 11 2017, 2:02 AM

Jul 7 2017

aemerson added inline comments to D35118: [AArch64] Add support for handling the +sve target feature.
Jul 7 2017, 5:16 AM · Restricted Project
aemerson added a comment to D35076: [AArch64] Add an SVE target feature to the backend and TargetParser.

Sure, up for review at D35118.

Jul 7 2017, 4:30 AM
aemerson created D35118: [AArch64] Add support for handling the +sve target feature.
Jul 7 2017, 4:28 AM · Restricted Project
aemerson added a dependent revision for D35076: [AArch64] Add an SVE target feature to the backend and TargetParser: D35118: [AArch64] Add support for handling the +sve target feature.
Jul 7 2017, 4:28 AM

Jul 6 2017

aemerson added a comment to D35076: [AArch64] Add an SVE target feature to the backend and TargetParser.

Hi Amara,

Thanks for the patch, looks trivial to me. I guess we don't have any targets with SVE by default, so we don't need most tests.

I'm guessing you have a Clang counterpart, too?

cheers,
--renato

Jul 6 2017, 3:05 PM
aemerson updated the summary of D35076: [AArch64] Add an SVE target feature to the backend and TargetParser.
Jul 6 2017, 11:11 AM
aemerson created D35076: [AArch64] Add an SVE target feature to the backend and TargetParser.
Jul 6 2017, 11:11 AM
aemerson resigned from D12178: [TSAN/AArch64/Android] Changes for AArch64/Android.
Jul 6 2017, 11:04 AM
aemerson resigned from D12177: [TSAN/AArch64/Android] Set up initial structs for TSAN.
Jul 6 2017, 11:03 AM
aemerson removed a reviewer for D8492: Add support for AArch64 and ARM backends for v8.1 architecture extension.: aemerson.
Jul 6 2017, 11:03 AM
aemerson abandoned D2736: [ARM] Fix NEON being enabled with soft-float.
Jul 6 2017, 11:02 AM
aemerson removed a reviewer for D2736: [ARM] Fix NEON being enabled with soft-float: t.p.northover.
Jul 6 2017, 11:01 AM
aemerson abandoned D1900: [ARM] Fix FP ABI attributes with no VFP.
Jul 6 2017, 10:59 AM
aemerson removed a reviewer for D1900: [ARM] Fix FP ABI attributes with no VFP: richard.barton.arm.
Jul 6 2017, 10:58 AM
aemerson abandoned D2110: [AArch64] Remove NEON from "generic" CPU target.
Jul 6 2017, 10:58 AM
aemerson removed a reviewer for D2110: [AArch64] Remove NEON from "generic" CPU target: t.p.northover.
Jul 6 2017, 10:58 AM
aemerson abandoned D2586: [AArch64] Add -mgeneral_regs_only option.
Jul 6 2017, 10:57 AM
aemerson updated subscribers of D2586: [AArch64] Add -mgeneral_regs_only option.
Jul 6 2017, 10:57 AM
aemerson abandoned D4207: Fix crash in LICM due to unreachable uses after LCSSA.
Jul 6 2017, 10:55 AM · deleted
aemerson updated subscribers of D4207: Fix crash in LICM due to unreachable uses after LCSSA.
Jul 6 2017, 10:54 AM · deleted

Jun 26 2017

aemerson accepted D32730: LV: Don't insert runtime ptr checks on divergent targets.

LGTM.

Jun 26 2017, 3:13 AM
aemerson accepted D33058: [LV] Sink casts to unravel first order recurrence.

Sorry, this fell off my radar. LGTM.

Jun 26 2017, 3:09 AM

May 22 2017

aemerson added a comment to D32737: [Constants][SVE] Represent the runtime length of a scalable vector.

Hi all,

May 22 2017, 2:17 AM

May 19 2017

aemerson committed rL303416: Fix vector pass-through value being unused in IRBuilder::CreateMaskedGather.
Fix vector pass-through value being unused in IRBuilder::CreateMaskedGather
May 19 2017, 3:53 AM

May 16 2017

aemerson committed rL303211: Re-commit r302678, fixing PR33053..
Re-commit r302678, fixing PR33053.
May 16 2017, 2:43 PM

May 11 2017

aemerson added a comment to D33058: [LV] Sink casts to unravel first order recurrence.
In D33058#752230, @Ayal wrote:

The specific a[i]+a[i+1] case could indeed be handled before vectorization, namely by (LoopSimplify?) hoisting the cast along with the load. This applies in general to other instructions that operate symmetrically on both a[i] and a[i+1]. Instructions that only apply to a[i] need to be carefully placed inside the loop in order for it to be vectorized. It's probably better to have the vectorizer handle such motion; nothing keeps these instructions from returning back to their original positions.

Sounds reasonable?

May 11 2017, 2:21 PM
aemerson added a comment to D33058: [LV] Sink casts to unravel first order recurrence.

In general I wonder if this is really the best place to do this. It would be nice if the loop was canonicalised to be in this form given how cheap it is to do. Perhaps LoopSimplify? Not blocking this change, but something to think about.

May 11 2017, 2:28 AM

May 10 2017

aemerson committed rL302678: [AArch64] Enable use of reduction intrinsics..
[AArch64] Enable use of reduction intrinsics.
May 10 2017, 8:29 AM
aemerson closed D32247: Switch AArch64 to use reduction intrinsics by committing rL302678: [AArch64] Enable use of reduction intrinsics..
May 10 2017, 8:28 AM
aemerson accepted D32247: Switch AArch64 to use reduction intrinsics.
May 10 2017, 8:01 AM
aemerson updated the diff for D32247: Switch AArch64 to use reduction intrinsics.

New patch, rebased on latest ToT and using the different API implemented in the previous patch in D30086.

May 10 2017, 8:01 AM
aemerson committed rL302631: Add a late IR expansion pass for the experimental reduction intrinsics..
Add a late IR expansion pass for the experimental reduction intrinsics.
May 10 2017, 2:56 AM
aemerson closed D32245: Add an IR expansion pass for the experimental reductions by committing rL302631: Add a late IR expansion pass for the experimental reduction intrinsics..
May 10 2017, 2:56 AM
aemerson added a comment to D32245: Add an IR expansion pass for the experimental reductions.

Thanks, I'll make that change and commit.

May 10 2017, 2:21 AM

May 9 2017

aemerson updated the diff for D32245: Add an IR expansion pass for the experimental reductions.

Addressed review comments, rewritten the pass a bit to be somewhat neater. D30086 is now committed now so this is ready to go if it looks ok.

May 9 2017, 7:56 AM
aemerson committed rL302514: Introduce experimental generic intrinsics for horizontal vector reductions..
Introduce experimental generic intrinsics for horizontal vector reductions.
May 9 2017, 3:57 AM
aemerson closed D30086: Add generic IR vector reductions by committing rL302514: Introduce experimental generic intrinsics for horizontal vector reductions..
May 9 2017, 3:56 AM
aemerson accepted D30086: Add generic IR vector reductions.

Thanks. I'll make the last few changes requested and commit.

May 9 2017, 1:55 AM

May 8 2017

aemerson added a comment to D32964: [Doc] Document "Splat" in the lexicon.

Splat is a synonym for broadcast as well, probably worth adding a mention.

May 8 2017, 7:37 AM
aemerson added a comment to D30086: Add generic IR vector reductions.

When people add new intrinsics they don't really add tests to check the IR since you get assertion failures during the FunctionType creation if something goes wrong.

But without IR functions that actually uses them, there's no way to get the assertion, right?

If you have a look at prior art for adding intrinsics you'll see that actual verifier tests are only done for illegal combinations of constant value parameters. There are no illegal constant parameter combinations with this patch. E.g. Dan Berlin's r294341 doesn't come with a test, likewise with others.

May 8 2017, 6:11 AM
aemerson added a comment to D30086: Add generic IR vector reductions.

Right, this is looking much better. Now, what about tests?

We'd probably need a bunch of tests to make sure that the intrinsics are accepted in the syntax that they're documented and rejected otherwise.

I'm not sure how's the best way forward, but probably just having IR in the right/wrong format and passing -validate or something expecting it to pass/fail would be a start.

cheers,
--renato

May 8 2017, 5:55 AM
aemerson added inline comments to D30086: Add generic IR vector reductions.
May 8 2017, 5:37 AM
aemerson added a comment to D30086: Add generic IR vector reductions.

Ping. Ok to go?

May 8 2017, 1:31 AM

May 4 2017

aemerson updated the diff for D30086: Add generic IR vector reductions.

Renato and I discussed this offline for a bit because we got our wires crossed a bit before. We agreed to simplify this code a bit more by extending createSimpleTargetReduction() to handle min/max by passing it the ReductionFlags. This essentially moves code from createTargetReduction() making it now just unwrap information from a RecurrenceDescriptor. Some other API changes done as a result.

May 4 2017, 3:59 PM

May 2 2017

aemerson added inline comments to D32737: [Constants][SVE] Represent the runtime length of a scalable vector.
May 2 2017, 4:06 PM
aemerson updated the diff for D30086: Add generic IR vector reductions.

Ok, so I've restructured the two functions a bit so that the simple (non minmax) reductions are generated from the createSimpleTargetReduction() function and the recurrence descriptor uses that for the simple cases, passing in the opcode.

May 2 2017, 7:13 AM
aemerson added inline comments to D30086: Add generic IR vector reductions.
May 2 2017, 4:31 AM
aemerson updated the diff for D30086: Add generic IR vector reductions.
  • Split out SDNode changes into D32527 which is now committed.
  • Added comments to the ISDNodes definitions.
May 2 2017, 3:56 AM
aemerson added inline comments to D32730: LV: Don't insert runtime ptr checks on divergent targets.
May 2 2017, 2:46 AM

May 1 2017

aemerson committed rL301803: Generalize the specialized flag-carrying SDNodes by moving flags into SDNode..
Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.
May 1 2017, 8:31 AM
aemerson closed D32527: Generalize flag carrying SDNodes beyond binary ops. NFC. by committing rL301803: Generalize the specialized flag-carrying SDNodes by moving flags into SDNode..
May 1 2017, 8:31 AM
aemerson added inline comments to D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..
May 1 2017, 7:59 AM

Apr 28 2017

aemerson added inline comments to D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..
Apr 28 2017, 3:51 PM
aemerson added a comment to D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..

Seems we haven't seen Justin active for a few weeks. @spatel are you ok for this to go in?

Apr 28 2017, 3:15 PM

Apr 27 2017

aemerson updated the diff for D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..

Rebased and updated with requested changes. Flags are now in SDNode, with an additional "defined" state bit to preserve semantics when intersecting flags.

Apr 27 2017, 6:19 AM

Apr 26 2017

aemerson added a comment to D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..

Is there an advantage to this vs. just putting the flags in the base SDNode class? We're going to eventually want to use the flags for more than unary and binary nodes (eg, FMA), so I'd prefer to just go directly to that step.

I'm not the original author of the code but I think this is due to the extra 2 bytes of storage needed for the flags in each SDNode. With the current solution we only incur this cost if we have flags to store.

It's been a long time since D8900, but IIRC, the size argument was actually moot because we have "free" bytes based on the struct alignment (at least on common 64-bit systems). As mentioned there, the fact that the flags were only on binops was a limitation that we wanted to lift even back then, but it just hasn't been done yet. If you can do that in this patch, it would be great. :)

Apr 26 2017, 6:57 AM
aemerson added inline comments to D32245: Add an IR expansion pass for the experimental reductions.
Apr 26 2017, 6:39 AM
aemerson added a comment to D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..

Is there an advantage to this vs. just putting the flags in the base SDNode class? We're going to eventually want to use the flags for more than unary and binary nodes (eg, FMA), so I'd prefer to just go directly to that step.

Apr 26 2017, 6:33 AM
aemerson updated the diff for D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..

Added more patch context.

Apr 26 2017, 6:23 AM
aemerson updated the diff for D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..

Done.

Apr 26 2017, 6:22 AM
aemerson added inline comments to D32245: Add an IR expansion pass for the experimental reductions.
Apr 26 2017, 5:47 AM
aemerson added a comment to D32247: Switch AArch64 to use reduction intrinsics.

No changes are needed on the MC side. The same target-specific reduction DAG nodes (e.g. AArch64ISD::UADDV) should be created and from then on everything should work as before.

Apr 26 2017, 3:31 AM
aemerson updated subscribers of D32245: Add an IR expansion pass for the experimental reductions.
Apr 26 2017, 3:18 AM
aemerson added a dependent revision for D32527: Generalize flag carrying SDNodes beyond binary ops. NFC.: D30086: Add generic IR vector reductions.
Apr 26 2017, 3:02 AM
aemerson added a dependency for D30086: Add generic IR vector reductions: D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..
Apr 26 2017, 3:02 AM
aemerson created D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..
Apr 26 2017, 3:02 AM

Apr 25 2017

aemerson added a comment to D30086: Add generic IR vector reductions.

At the moment nothing is emitting strict float reductions as no target supports it. We have it implemented for SVE but the IR type and vectorizer changes aren't upstream yet. The reason I've had to include it in this patch is because we want to agree on an intrinsics spec first without changing it later when SVE support lands.

Apr 25 2017, 9:30 AM
aemerson updated the diff for D30086: Add generic IR vector reductions.

Addressed review comments. Renamed the simpler create function to createSimpleTargetReduction() and added comments to both header and definition.

Apr 25 2017, 8:37 AM