- User Since
- May 24 2016, 8:35 AM (147 w, 2 d)
Sun, Mar 17
Thu, Mar 14
Removed isARMLowRegister check, reversed the condition of DestOffset - BrOffset, and added some Kill flag handling and tests.
Now with some mir tests.
Wed, Mar 13
Tue, Mar 12
It may be possibly to create ldrd's, but I don't think that they will be any quicker. An ldrd will take the same time as an ldm (1+N). It could be smaller, depending on whether T1 ldr's are used.
Looks good to me then.
Mon, Mar 11
Looks OK to me.
Fri, Mar 8
This looks generally useful. Please add context
Hello. I think the idea of this sounds sensible, to limit things if the instruction are too far apart, I'm just not sure of using domtree's to do that.
Thu, Mar 7
Hello. This was increasing codesize by more that I'd like, which was why I never committed it. The performance results would probably make it worthwhile, but at -Oz, it's not something people should be paying for.
Thanks Sam. And thanks Brendon!
Tue, Mar 5
Mon, Mar 4
Switchup the position of EnableRecursiveSetupCost.
The Hexagon related test changes look good to me. Neither of these test changes indicate that your patch causes any problems. In swp-carried-1.ll, the hardware loop is no longer generated. In this case, I should just change the test to use a real value for the initial loop induction variable, %v4, instead of undef, say 0 (I can do that later). In swp-epilog-phi5.ll, the compiler is now generating an extra, innermost, hardware loop, so that's why the test changes from loop0 to loop1. The loop1/endloop1 instructions represent an outer hardware loop. This is another test that I should fix since subtle changes in the generated code cause the compiler to create/not create a hardware loop.
Hello. This one looks OK from the tests I just ran. A few things moved, but only by a small amount, and usually in the right direction.
Sun, Mar 3
Altered comment, and just use indvars for the tests (not instcombine/loop-delete).
Is it possible that the tests were checking the correctness for certain expressions that we no longer check because these happen to be expensive?
Thu, Feb 28
This is another one from cmsis dsp (https://github.com/ARM-software/CMSIS_5/blob/develop/CMSIS/DSP/Source/MatrixFunctions/arm_mat_trans_q31.c). Apparently the original was 20% slower on a thumb1 target, but something daft is likely going on there, as it can sometimes make bad decisions.
There does look like there's some extra uxth's in there, I've not looked into why. I was compiling with something like "-target arm-arm-none-eabi -mcpu=cortex-m33 -O3" (or maybe m23 for the thumb1 case).
Yep. I think that makes sense to me. Would be good to remove the hack. And LSR should handle most of these cases. I think there's times when it doesn't do what it should, though.
Oh, and I presume most loop transforms will use SCEV's, which won't have changed? So we don't need to update the tests for them all, to make sure they are still doing what they should?
It looks like this was added on purpose way back in https://reviews.llvm.org/rL16473. That was so long ago that everything will have changed since then, but have you run any benchmarks for this?
From what I can tell, t.p.northover was added as a blocking reviewer, he didn't block this patch.
Wed, Feb 27
This is targeted at codesize, but I ran the standard set of benchmarks and didn't see any changes that didn't look like noise.
I have recently taken another looks at this, and came to the conclusion that, especially for some of the signed loops, we probably shouldn't be expanding these out in IndVars. They won't simplify even if we try to clear them up in instcombine. There is a check in there to bail if they are high cost SCEVs, but the logic seems a bit wrong and is treating them as low cost just because they are divided by a power of 2. (Which is SCEV's way of doing the mod).
Tue, Feb 26
Other than adding the two optimisation features, this LGTM.
Mon, Feb 25
Fri, Feb 22
Thu, Feb 21
Wed, Feb 20
Feb 18 2019
Feb 16 2019
Feb 15 2019
Feb 14 2019
Feb 13 2019
Updates for IsThumb1
Feb 12 2019
Oh, nice. We must have found the same thing at the same time. Thanks!
Thanks. We will try and figure out some of the changes, see if we can get this enabled.
Feb 11 2019
Hello. I ran some benchmarks and they were kind of all over the place, but on average a little down. The Arm backend seems to be quite opinionated about the uaddo's, using it's own nodes for a lot of things, so there might be some missing parts. There were enough ups in there to make us think that this can be good (and can show areas of improvement like in D57833), but we may need a few fixes first. Putting it behind a target hook in the meantime whilst we take a look does sound sensible.
Feb 9 2019
It looks like there are already a couple of patterns in ARMInstrThumb2.td for thumb1 instructions. I think with a comment it would be fine to put it in there. HasV8MBaseline tends to cross Thumb1/Thumb2 already.
Feb 8 2019
As far as understand, ARMInstrThumb.td is just #included in the other tablegen file. I guess the ordering might be important for them. Would it work better if it's defined in ARMInstrThumb2.td?
Feb 7 2019
Fixed that bit I missed, plus better formatting and cleanup