Page MenuHomePhabricator
Feed Advanced Search

Thu, Oct 22

SjoerdMeijer accepted D72939: [Schedule] Add a MultiHazardRecognizer.

LGTM, but please wait a day in case @arsenm has more comments.

Thu, Oct 22, 12:30 PM · Restricted Project
SjoerdMeijer added a comment to D89693: [AArch64] Favor post-increments.

Those numbers don't look too bad, but like you say it's probably worth looking into what x264_r is doing, just to see what is going on. Sanne ran some other numbers from the burst compiler and they were about the same - some small improvements, a couple of small losses but overall OK. That gives us confidence that big out of order cores are not going to hate this.

The original tests were on an in-order core I believe? Which from the optimization guide looks like it should be sensible to use. And the option doesn't seem to be messing anything up especially.

Thu, Oct 22, 12:26 PM · Restricted Project
SjoerdMeijer added a comment to D72939: [Schedule] Add a MultiHazardRecognizer.

Hazard recognisers is not really my area, but this looks like a straightforward refactoring to me. Two nits: one is a style issue inlined, the other is just a quick question if you plan to use this extra flexibility any time soon?

Thu, Oct 22, 5:16 AM · Restricted Project
SjoerdMeijer committed rG51d7df3fa1c3: [InstructionSimplify] icmp (X+Y), (X+Z) simplification (authored by SjoerdMeijer).
[InstructionSimplify] icmp (X+Y), (X+Z) simplification
Thu, Oct 22, 1:06 AM
SjoerdMeijer closed D89317: [InstructionSimplify] icmp simplification.
Thu, Oct 22, 1:05 AM · Restricted Project
SjoerdMeijer added a comment to D89317: [InstructionSimplify] icmp simplification.

Many thanks for reviewing!

Thu, Oct 22, 1:04 AM · Restricted Project
SjoerdMeijer accepted D89896: Add loop distribution to the LTO pipeline.

This looks consistent with the other pipeline, it's opt-in, and gives a nice uplift, so LGTM.

Thu, Oct 22, 12:53 AM · Restricted Project

Wed, Oct 21

SjoerdMeijer added inline comments to D88496: [ARM] Fix IT block generation after Thumb2SizeReduce with -Oz.
Wed, Oct 21, 12:21 PM · Restricted Project
SjoerdMeijer added a comment to D89896: Add loop distribution to the LTO pipeline.

FWIW, I really dislike these pipeline tests because some of them are actually very tricky to update and I doubt they provide any useful information. But agreed with Dave that for consistency such a test would probably be best (in case someone finds it useful).

Wed, Oct 21, 12:09 PM · Restricted Project
SjoerdMeijer added a comment to D89693: [AArch64] Favor post-increments.

SPECInt numbers:

Wed, Oct 21, 11:53 AM · Restricted Project
SjoerdMeijer updated the diff for D89317: [InstructionSimplify] icmp simplification.

Cheers, typo fixed: C2->isNonPositive() -> C1->isNonPositive(). And the test case added.

Wed, Oct 21, 11:35 AM · Restricted Project
SjoerdMeijer updated the diff for D89317: [InstructionSimplify] icmp simplification.

Thanks!
I have added support for the other SLT case and have added tests, which I have not precommitted this time (hopefully easy to spot these cases), just to completely cover the SLT case. After this, I can quickly follow up to address other predicates, if that is okay.

Wed, Oct 21, 9:41 AM · Restricted Project
SjoerdMeijer added inline comments to D88494: Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate.
Wed, Oct 21, 4:10 AM · Restricted Project
SjoerdMeijer added inline comments to D89317: [InstructionSimplify] icmp simplification.
Wed, Oct 21, 3:11 AM · Restricted Project
SjoerdMeijer updated the diff for D89317: [InstructionSimplify] icmp simplification.
  • Precommitted 3 tests for testing commuting the operands in rGe86a70ce3def,
  • And rebased this on that again.
Wed, Oct 21, 3:09 AM · Restricted Project
SjoerdMeijer committed rGe86a70ce3def: [InstructionSimplify] And precommit more tests for D89317. NFC. (authored by SjoerdMeijer).
[InstructionSimplify] And precommit more tests for D89317. NFC.
Wed, Oct 21, 3:04 AM
SjoerdMeijer updated the diff for D89317: [InstructionSimplify] icmp simplification.
  • Precommitted the extra tests in rG782b8f0d38c9.
  • Rebased this on that.
Wed, Oct 21, 2:37 AM · Restricted Project
SjoerdMeijer committed rG782b8f0d38c9: [InstructionSimplify] Precommit more tests for D89317. NFC. (authored by SjoerdMeijer).
[InstructionSimplify] Precommit more tests for D89317. NFC.
Wed, Oct 21, 2:25 AM

Tue, Oct 20

SjoerdMeijer added a comment to D89693: [AArch64] Favor post-increments.

Yep, cheers, hopefully SPEC is better and more conclusive. The 1.2% uplift in one benchmark was on baremetal aarch64, will check if I can run some more things on that too.

Tue, Oct 20, 12:21 PM · Restricted Project
SjoerdMeijer added a comment to D89693: [AArch64] Favor post-increments.

Okay, so my last results basically shows the noise as I had forgotten to do a rebuild between the runs. Now results look less convincing....looking into it.

Tue, Oct 20, 11:43 AM · Restricted Project
SjoerdMeijer added a comment to D89693: [AArch64] Favor post-increments.

Ah wait a minute, I am doing the last experiment again, just double checking if I haven't make mistake running them

Tue, Oct 20, 10:57 AM · Restricted Project
SjoerdMeijer added a comment to D89693: [AArch64] Favor post-increments.

Just curious if there's something in particular you are concerned about? I am just asking because then I can focus on that.

Tue, Oct 20, 10:48 AM · Restricted Project
SjoerdMeijer added inline comments to D89378: [LoopFlatten] Loop limit invariant checks.
Tue, Oct 20, 7:37 AM · Restricted Project
SjoerdMeijer updated the diff for D89317: [InstructionSimplify] icmp simplification.

Thanks for your help.

  • this is using your function, the pattern matching, which indeed is much nicer,
  • and I think all tests are there, positive and negative. For example, for the positive tests, I have added all the combos, the different data types, including a vector test, and for the negative tests, there are indeed tests with different predicate and the nsw is on the 'wrong' add, etc.
Tue, Oct 20, 3:54 AM · Restricted Project
SjoerdMeijer added a comment to D89693: [AArch64] Favor post-increments.

I've CTMark and filter out the test that run very shortly with --filter-short and see the same trend, i.e. no regressions and some okay improvements:

Tue, Oct 20, 12:41 AM · Restricted Project

Mon, Oct 19

SjoerdMeijer added a comment to D89693: [AArch64] Favor post-increments.

Cheers guys, that's fair, will give the llvm test suite a try too.

Mon, Oct 19, 7:45 AM · Restricted Project
SjoerdMeijer updated the diff for D89317: [InstructionSimplify] icmp simplification.

Fixed precondition, and added a test case for that.

Mon, Oct 19, 7:44 AM · Restricted Project
SjoerdMeijer added a comment to D89317: [InstructionSimplify] icmp simplification.

This seems to be missing some other preconditions on the constant values:
https://rise4fun.com/Alive/EvB

Mon, Oct 19, 7:05 AM · Restricted Project
SjoerdMeijer requested review of D89693: [AArch64] Favor post-increments.
Mon, Oct 19, 6:06 AM · Restricted Project
SjoerdMeijer added a comment to D89317: [InstructionSimplify] icmp simplification.

friendly ping

Mon, Oct 19, 1:51 AM · Restricted Project

Sat, Oct 17

SjoerdMeijer accepted D88980: [ARM] Basic getArithmeticReductionCost reduction costs.

Okidoki, cheers.

Sat, Oct 17, 1:16 AM · Restricted Project

Fri, Oct 16

SjoerdMeijer added inline comments to D88980: [ARM] Basic getArithmeticReductionCost reduction costs.
Fri, Oct 16, 7:17 AM · Restricted Project
SjoerdMeijer accepted D88989: [ARM] Add a very basic active_lane_mask cost.

Forgot about this one, but agree with this.

Fri, Oct 16, 7:15 AM · Restricted Project
SjoerdMeijer added a comment to D89549: [ARM][LowOverheadLoops] Check live-out for InsertPt instead of Start.

Can we add tests for this?

Fri, Oct 16, 6:34 AM · Restricted Project
SjoerdMeijer updated the diff for D88880: [IndVarSimplify] Add loop-flattening.

I've got a slightly different proposal. This moves the loop flatten pass into IndVarSimplify for several reasons:

  • loop-flatten is best run just before IndVarSimplify because IndVarSimplify can promote induction variables. For overflow analysis to see if loop flattening is legal, it's best if inductions variables haven't been promoted yet.
  • When induction variables of a loop nest don't use the maximum legal integer type, we promote them to the widest type so we know loop flattening is safe thus avoiding overflow analysis. Promoting induction variables is what IndVarSimplify was already doing, so this reusing that.
  • Last but not least, with the loops that we support with loop-flattening, induction variable simplification is exactly the point of this transform, so this looks like a good home for it. Thus, this also avoids quite some churn making modifications to LoopUtils where refactored/shared code could live, and in both of the passes.
Fri, Oct 16, 6:22 AM · Restricted Project

Thu, Oct 15

SjoerdMeijer added a comment to D88819: [LV] Support for Remainder loop vectorization.

I think Florian answered those questions, that looks indeed the most sensible way forward then.

Thu, Oct 15, 4:50 AM · Restricted Project
SjoerdMeijer added a comment to D88819: [LV] Support for Remainder loop vectorization.

I do see the elegance of just feeding the epilogue to the vectoriser again, but also have sympathy for not pushing the responsibility of the clean up down the line to something else especially if this is non-trivial. But to progress this discussion, I was wondering if we can say something more about this:

Thu, Oct 15, 3:35 AM · Restricted Project

Wed, Oct 14

SjoerdMeijer requested review of D89378: [LoopFlatten] Loop limit invariant checks.
Wed, Oct 14, 2:34 AM · Restricted Project
SjoerdMeijer committed rG20c7ab87a78c: [LoopFlatten] Precommit new test cases. NFC. (authored by SjoerdMeijer).
[LoopFlatten] Precommit new test cases. NFC.
Wed, Oct 14, 2:11 AM
SjoerdMeijer added a comment to D88880: [IndVarSimplify] Add loop-flattening.

I guess this is similar to the widening that IndVarSimplify does? Can we just re-use the stuff from there or have IndVarSimplify just do it for us?

Wed, Oct 14, 1:43 AM · Restricted Project
SjoerdMeijer added a comment to D88880: [IndVarSimplify] Add loop-flattening.

We might be able to salvage this by adding a check that the GEP dominates the loop latch?

Wed, Oct 14, 1:19 AM · Restricted Project

Tue, Oct 13

SjoerdMeijer added inline comments to D89317: [InstructionSimplify] icmp simplification.
Tue, Oct 13, 8:17 AM · Restricted Project
SjoerdMeijer updated the diff for D89317: [InstructionSimplify] icmp simplification.

Thanks for reviewing.

  • I have precommitted the tests in rG66f22411e1bb, and
  • use APInts in the comparisons.
Tue, Oct 13, 8:12 AM · Restricted Project
SjoerdMeijer committed rG66f22411e1bb: [InstructionSimplify] Precommit tests for D89317. NFC. (authored by SjoerdMeijer).
[InstructionSimplify] Precommit tests for D89317. NFC.
Tue, Oct 13, 7:41 AM
SjoerdMeijer requested review of D89317: [InstructionSimplify] icmp simplification.
Tue, Oct 13, 6:56 AM · Restricted Project

Fri, Oct 9

SjoerdMeijer added a comment to D89048: [ARM][LowOverheadLoops] Insert loop start at end of block in more cases.

Insert loop start at end of block in more cases

Hmm. Just a quick check - do we want that? I can see it improves some tail predication cases, that's good. But do we want that in general? The DLS instructions have a latency like any other, and earlier is better from that perspective. Or are we assuming that that latency will never matter into the LE instruction?

Fri, Oct 9, 1:23 AM · Restricted Project

Thu, Oct 8

SjoerdMeijer added a comment to D88880: [IndVarSimplify] Add loop-flattening.

I think it might be really good if it would be possible to not implement all this overflow detection from scratch.
Is there nothing in SCEV already that does this?

Thu, Oct 8, 7:08 AM · Restricted Project
SjoerdMeijer accepted D84451: [LV] Tail folded inloop reductions..

Cheers, looks like a good change to me. Perhaps wait a day with committing just in case there are more/other opinions on this.

Thu, Oct 8, 3:40 AM · Restricted Project

Wed, Oct 7

SjoerdMeijer added inline comments to D88926: [ARM] Attempt to make Tail predication / RDA more resilient to empty blocks.
Wed, Oct 7, 4:10 AM · Restricted Project
SjoerdMeijer added a comment to D84451: [LV] Tail folded inloop reductions..

Just some minor questions inline.

Wed, Oct 7, 2:25 AM · Restricted Project

Tue, Oct 6

SjoerdMeijer added a comment to D87836: [ARM] Fold select_cc(vecreduce_[u|s][min|max], x) into VMINV or VMAXV.

Hi Amara.

Apologies for this. The fix is just a matter of updating the two failing
tests. What is the process for getting the change back in?

If it's just a trivial update of a test, you can just go ahead and recommit it. If you're in doubt, you can always put it for review again.

Tue, Oct 6, 4:57 AM · Restricted Project
SjoerdMeijer added inline comments to D88880: [IndVarSimplify] Add loop-flattening.
Tue, Oct 6, 2:26 AM · Restricted Project
SjoerdMeijer requested review of D88880: [IndVarSimplify] Add loop-flattening.
Tue, Oct 6, 2:23 AM · Restricted Project

Mon, Oct 5

SjoerdMeijer added reviewers for D88819: [LV] Support for Remainder loop vectorization: Ayal, fhahn, dmgreen, SjoerdMeijer, gilr.
Mon, Oct 5, 4:57 AM · Restricted Project

Fri, Oct 2

SjoerdMeijer committed rG8825fec37e73: [AArch64] Add CPU Cortex-R82 (authored by SjoerdMeijer).
[AArch64] Add CPU Cortex-R82
Fri, Oct 2, 4:48 AM
SjoerdMeijer closed D88660: [AArch64] Add CPU Cortex-R82.
Fri, Oct 2, 4:48 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D88660: [AArch64] Add CPU Cortex-R82.

Thanks for reviewing!

Fri, Oct 2, 3:53 AM · Restricted Project, Restricted Project

Thu, Oct 1

SjoerdMeijer added inline comments to D88660: [AArch64] Add CPU Cortex-R82.
Thu, Oct 1, 8:01 AM · Restricted Project, Restricted Project
SjoerdMeijer updated the summary of D88660: [AArch64] Add CPU Cortex-R82.
Thu, Oct 1, 7:54 AM · Restricted Project, Restricted Project
SjoerdMeijer added a reviewer for D88660: [AArch64] Add CPU Cortex-R82: ostannard.
Thu, Oct 1, 7:52 AM · Restricted Project, Restricted Project
SjoerdMeijer requested review of D88660: [AArch64] Add CPU Cortex-R82.
Thu, Oct 1, 7:52 AM · Restricted Project, Restricted Project
SjoerdMeijer committed rGd53b4bee0ccd: [LoopFlatten] Add a loop-flattening pass (authored by SjoerdMeijer).
[LoopFlatten] Add a loop-flattening pass
Thu, Oct 1, 5:56 AM
SjoerdMeijer closed D42365: [LoopFlatten] Add a loop-flattening pass.
Thu, Oct 1, 5:55 AM · Restricted Project
SjoerdMeijer added a comment to D42365: [LoopFlatten] Add a loop-flattening pass.

Thanks Sam. And in addition to what you wrote, this is not enabled by default.
I plan to iterate on this in-tree, and then we can start thinking about the default, but that's TBD I guess.

Thu, Oct 1, 4:04 AM · Restricted Project
SjoerdMeijer added a comment to D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

I haven't looked in much detail at this patch, but this looks like some straightforward lowering of llvm.experimental.vector.reduce.add. Absolutely nothing wrong with that, but I am curious who's going to produce this intrinsic? The vectoriser, the matrix pass? In other words, any ideas on the bigger picture?

Thu, Oct 1, 2:29 AM · Restricted Project
SjoerdMeijer accepted D88638: [ARM][LowOverheadLoops] Adjust Start insertion..

Nice simplification.

Thu, Oct 1, 2:19 AM · Restricted Project
SjoerdMeijer accepted D88549: [ARM][LowOverheadLoops] Iteration count liveness.

LGTM

Thu, Oct 1, 1:27 AM · Restricted Project
SjoerdMeijer accepted D88542: [ARM][LowOverheadLoops] Start insertion point.
Thu, Oct 1, 1:24 AM · Restricted Project
SjoerdMeijer added inline comments to D88542: [ARM][LowOverheadLoops] Start insertion point.
Thu, Oct 1, 1:07 AM · Restricted Project

Wed, Sep 30

SjoerdMeijer updated the diff for D42365: [LoopFlatten] Add a loop-flattening pass.

I've added the test cases from PR40581. Test v0 does not trigger yet, test v1 triggers. I propose adding support for v0 once we've got something in-tree.

Wed, Sep 30, 7:56 AM · Restricted Project
SjoerdMeijer accepted D88554: [RDA] isSafeToDefRegAt: Look at global uses.

Looks like a good fix to me.

Wed, Sep 30, 6:03 AM · Restricted Project
SjoerdMeijer updated the diff for D42365: [LoopFlatten] Add a loop-flattening pass.

This addresses minor issues from @samparker.
@dmgreen had already answered other question.

Wed, Sep 30, 3:38 AM · Restricted Project
SjoerdMeijer updated the diff for D42365: [LoopFlatten] Add a loop-flattening pass.

Arg, silly! Thanks for letting me know.

Wed, Sep 30, 3:19 AM · Restricted Project
SjoerdMeijer updated the diff for D42365: [LoopFlatten] Add a loop-flattening pass.

Rebased

Wed, Sep 30, 2:13 AM · Restricted Project
SjoerdMeijer added reviewers for D42365: [LoopFlatten] Add a loop-flattening pass: ostannard, samparker, alanphipps.
Wed, Sep 30, 1:46 AM · Restricted Project
SjoerdMeijer commandeered D42365: [LoopFlatten] Add a loop-flattening pass.

With Dave's and Oliver's permission I am commandeering this because I really would like to see this getting committed soonish and I have some bandwidth to progress this.

Wed, Sep 30, 1:45 AM · Restricted Project

Tue, Sep 29

SjoerdMeijer added inline comments to D88419: [RDA] Switch isSafeToMove iterators.
Tue, Sep 29, 1:19 AM · Restricted Project
SjoerdMeijer accepted D88419: [RDA] Switch isSafeToMove iterators.

If Sam has no further questions, this looks good to me.

Tue, Sep 29, 1:07 AM · Restricted Project

Mon, Sep 28

SjoerdMeijer committed rG1696dd27fb61: [ARM][MVE] Enable tail-predication by default (authored by SjoerdMeijer).
[ARM][MVE] Enable tail-predication by default
Mon, Sep 28, 6:08 AM
SjoerdMeijer closed D88093: [ARM][MVE] Enable tail-predication by default.
Mon, Sep 28, 6:08 AM · Restricted Project
SjoerdMeijer added a comment to D88093: [ARM][MVE] Enable tail-predication by default.

Thanks Dave. With D88086 committed now, I don't think there's anything in our way anymore.

Mon, Sep 28, 5:57 AM · Restricted Project
SjoerdMeijer committed rGf39f92c1f610: [ARM][MVE] tail-predication: overflow checks for elementcount, cont'd (authored by SjoerdMeijer).
[ARM][MVE] tail-predication: overflow checks for elementcount, cont'd
Mon, Sep 28, 1:22 AM
SjoerdMeijer closed D88086: [ARM][MVE] tail-predication: checks for the elementcount, cont'd.
Mon, Sep 28, 1:22 AM · Restricted Project
SjoerdMeijer added a comment to D88086: [ARM][MVE] tail-predication: checks for the elementcount, cont'd.

Many thanks @efriedma and @samparker for your help with this work.

Mon, Sep 28, 1:10 AM · Restricted Project

Fri, Sep 25

SjoerdMeijer added a reviewer for D88307: [DON'T MERGE] Jump-threading for finite state automata: alanphipps.
Fri, Sep 25, 8:01 AM · Restricted Project
SjoerdMeijer updated the diff for D88086: [ARM][MVE] tail-predication: checks for the elementcount, cont'd.

I am so happy that this approach works! I.e., this determines equality of TC and ElemenCount by calculating 2 scev expressions and subtracting them and testing the result for 0. Also a check for the base of the AddRec has been added now, so I think this addresses all comments.

Fri, Sep 25, 3:39 AM · Restricted Project

Thu, Sep 24

SjoerdMeijer added inline comments to D88086: [ARM][MVE] tail-predication: checks for the elementcount, cont'd.
Thu, Sep 24, 12:12 PM · Restricted Project
SjoerdMeijer updated the diff for D88086: [ARM][MVE] tail-predication: checks for the elementcount, cont'd.

I wanted to write the new checks in a separate patch as I thought it would be a new lump of code, wanted to get this clean up first out of the way, but since our last idea it is probably best to continue here. I.e., the TC == (ElemCount+VW-1) / VW is hopefully just a minor addition.

Thu, Sep 24, 8:23 AM · Restricted Project

Sep 24 2020

SjoerdMeijer committed rG2fc690ac904c: [ARM] LowoverheadLoops: add an option to disable tail-predication (authored by SjoerdMeijer).
[ARM] LowoverheadLoops: add an option to disable tail-predication
Sep 24 2020, 5:36 AM
SjoerdMeijer closed D88212: [ARM] LowoverheadLoops: add an option to disable tail-predication.
Sep 24 2020, 5:36 AM · Restricted Project
SjoerdMeijer added a comment to D88209: [ARM] Check for LSTP side-effects..

Okidoki, nice one

Sep 24 2020, 5:22 AM · Restricted Project
SjoerdMeijer accepted D88209: [ARM] Check for LSTP side-effects..

Thanks, perfectly clear, LGTM.

Sep 24 2020, 5:13 AM · Restricted Project
SjoerdMeijer added a comment to D88209: [ARM] Check for LSTP side-effects..

Looks good, but ignoring the nits I have one question inlined that asks about explaining why we are doing this, and am interested to have a read first.

Sep 24 2020, 4:17 AM · Restricted Project
SjoerdMeijer updated the diff for D88212: [ARM] LowoverheadLoops: add an option to disable tail-predication.
Sep 24 2020, 4:02 AM · Restricted Project
SjoerdMeijer requested review of D88212: [ARM] LowoverheadLoops: add an option to disable tail-predication.
Sep 24 2020, 3:59 AM · Restricted Project
SjoerdMeijer added a comment to D88086: [ARM][MVE] tail-predication: checks for the elementcount, cont'd.

Actually, I guess if you could prove that the tripcount is precisely equal to (ElementCount + VectorWidth - 1)/VectorWidth, you could also use that to prove the subtraction doesn't overflow.

This sounds like the same suggestion that I made many moons ago... I suggested taking these values and substituting them into the expected SCEV expression, and then perform some SCEV algebra on it and the vector TC expression, until hopefully they both just equal ElementCount == ElementCount. My quick prototype 'worked', but I don't know if that says much.

Sep 24 2020, 12:44 AM · Restricted Project

Sep 23 2020

SjoerdMeijer updated the diff for D88086: [ARM][MVE] tail-predication: checks for the elementcount, cont'd.

Thanks for looking Eli.

Sep 23 2020, 8:51 AM · Restricted Project

Sep 22 2020

SjoerdMeijer requested review of D88093: [ARM][MVE] Enable tail-predication by default.
Sep 22 2020, 6:39 AM · Restricted Project
SjoerdMeijer requested review of D88086: [ARM][MVE] tail-predication: checks for the elementcount, cont'd.
Sep 22 2020, 3:56 AM · Restricted Project
SjoerdMeijer added a comment to D86074: [ARM][MVE] Tail-predication: check get.active.lane.mask's TC value.

Sorry, I wrote a reply end of last week, but apparently forgot to push submit. So please see my reply inline, but I will open a new review soon, where it's probably best to continue this discussion and my reply.

Sep 22 2020, 3:35 AM · Restricted Project