Compilers at a fruit company
- User Since
- Sep 9 2013, 3:45 AM (210 w, 6 d)
Thu, Sep 21
Wed, Sep 13
Aug 4 2017
Added support for handling the nsw/nuw ssub.with.overflow intrinsic.
The difference is that at the target codegen level we can't as easily do the predicate analysis as we can at the IR level.
If, for a particular target, it is worth emitting a versioned, carefully target-crafted loop or instruction sequence, I would expect them to not use this pass but to custom lower the calls in the backend much like x86 does for constant-size calls.
But in the patch description you say that one of the challenges is constructing *just* the right IR to get efficient codegen from the backend. I understand this is for x86 right now, but if you don't have plans to allow other targets to work well with it, why not put it into the Target/X86 directory and make it a backend-specific IR pass to avoid confusion?
Aug 2 2017
I'm resigning from SVE upstreaming related activities, so Graham will be taking over this patch and others from here.
Aug 1 2017
Instead of generating loop IR for the fast path, how about creating a versioned memcpy/memset with the constrained parameters guarded under the condition test? That way, in the back-end the exact preferred optimal code can be generated, allowing for unrolled loop bodies specific to individual targets.
Jul 31 2017
Updated to use MatchBinaryOp.
Jul 24 2017
Jul 23 2017
Jul 17 2017
Jul 13 2017
The reason it's removed is because it's not actually used anywhere, just as a default value. I'm not going to debate it further though so I've put it back in.
Ignore previous comment, was supposed to be added to D35118.
Jul 12 2017
Jul 11 2017
Jul 7 2017
Sure, up for review at D35118.
Jul 6 2017
Jun 26 2017
Sorry, this fell off my radar. LGTM.
May 22 2017
May 19 2017
May 16 2017
May 11 2017
In general I wonder if this is really the best place to do this. It would be nice if the loop was canonicalised to be in this form given how cheap it is to do. Perhaps LoopSimplify? Not blocking this change, but something to think about.
May 10 2017
New patch, rebased on latest ToT and using the different API implemented in the previous patch in D30086.
Thanks, I'll make that change and commit.
May 9 2017
Addressed review comments, rewritten the pass a bit to be somewhat neater. D30086 is now committed now so this is ready to go if it looks ok.
Thanks. I'll make the last few changes requested and commit.
May 8 2017
Splat is a synonym for broadcast as well, probably worth adding a mention.
If you have a look at prior art for adding intrinsics you'll see that actual verifier tests are only done for illegal combinations of constant value parameters. There are no illegal constant parameter combinations with this patch. E.g. Dan Berlin's r294341 doesn't come with a test, likewise with others.
Ping. Ok to go?
May 4 2017
Renato and I discussed this offline for a bit because we got our wires crossed a bit before. We agreed to simplify this code a bit more by extending createSimpleTargetReduction() to handle min/max by passing it the ReductionFlags. This essentially moves code from createTargetReduction() making it now just unwrap information from a RecurrenceDescriptor. Some other API changes done as a result.
May 2 2017
Ok, so I've restructured the two functions a bit so that the simple (non minmax) reductions are generated from the createSimpleTargetReduction() function and the recurrence descriptor uses that for the simple cases, passing in the opcode.
- Split out SDNode changes into D32527 which is now committed.
- Added comments to the ISDNodes definitions.
May 1 2017
Apr 28 2017
Seems we haven't seen Justin active for a few weeks. @spatel are you ok for this to go in?
Apr 27 2017
Rebased and updated with requested changes. Flags are now in SDNode, with an additional "defined" state bit to preserve semantics when intersecting flags.
Apr 26 2017
Added more patch context.
No changes are needed on the MC side. The same target-specific reduction DAG nodes (e.g. AArch64ISD::UADDV) should be created and from then on everything should work as before.
Apr 25 2017
At the moment nothing is emitting strict float reductions as no target supports it. We have it implemented for SVE but the IR type and vectorizer changes aren't upstream yet. The reason I've had to include it in this patch is because we want to agree on an intrinsics spec first without changing it later when SVE support lands.