- User Since
- May 11 2015, 7:59 AM (187 w, 4 d)
Wed, Dec 12
I've moved the logic under the control of a new TTI flag as it seems that the current shouldFavourPostInc is trying to achieve different things. Hopefully I've also addressed Gil's comments.
Tue, Dec 11
Okay, thanks. We're also seeing some regressions, so I know I've got some tuning to do. Do you have any idea of the characteristics of your regressions? At the moment I'm thinking:
- That the costs that I've added here are overly simplistic, for one I think I need to add a setup cost.
- It's also probably not worth doing when we know that the loop iteration count is low.
- In the current state, we also see code size regressions whereas your previous work helps us reduce code size. It may mean that I'll need a different flag to enable this change, but it also maybe a symptom of the performance regressions.
Thu, Dec 6
Fri, Nov 30
Ah, ok, I don't think they're being generated. I will have a look at GenerateConstantOffsetsImpl. Thanks!
Thu, Nov 29
@qcolombet maybe you could suggest the areas in LSR which will enable to help produce these post inc loads? What we have with the default complexity is:
Wed, Nov 28
Great to see those other test changes! LGTM with the few minor comments, no need to re-review. cheers!
Thanks for the explanation, LGTM!
Thanks for taking a look and for the value renamer tip! The test case has now been renamed and reduced.
Tue, Nov 27
My point was that, even though this extra work and as you mentioned in the comments, the test case isn't generating optimum code. We could introduce a subs node to remove the unnecessary cmp, making the sub opaque and improving codegen. Unless there's a reason why we couldn't do this?
Mon, Nov 26
Fri, Nov 23
Wed, Nov 21
Mon, Nov 19
Added some comments and created a test file for switch statements, which includes existing tests plus a couple of new ones.
Thu, Nov 15
Nov 14 2018
Nov 9 2018
Nov 8 2018
Nov 6 2018
Now disallowing icmps that operate on types that are smaller than TypeSize. The handling of truncs being sources and sinks has also been reverted.
- we allow casts if their result or operand is <= TypeSize.
- zexts are sinks if their result > TypeSize.
- truncs are still sinks if their operand == TypeSize.
- truncs are still sources if their result == TypeSize.
On initialisation, record the functions that contain uses of the sel intrinsic so that we now check on a function basis and not module.
Yes, it's certainly complicating that the ABI and ACLE don't talk about inline assembly or other intrinsics and this is why I still want to have the option guarded by a command line option. I will update the patch to look at the function only.
Nov 5 2018
Nov 2 2018
Fixed a bug that allowed zext to be generated with the same source and destination types.
The one patch I made in this area was corrected by Eli, so he's far more informed than I! The problem that time was also due to interleaving of down/up, and, as Matthias said in his email, I'm not sure why we'd need this ability. I'll take sometime today to look into the DAG builder to see if serialising these nodes isn't too much of a pain, because I'd hope that expressing this via dependencies would be better long-term.
Nov 1 2018
Oct 31 2018
Ok, fair point. If we are going to introduce a new node to fix this issue, could we have a SUBS node that can be glued to the CMOV?
Oct 30 2018
I thought the normal way to stop combining was to return the original node. Could you not manually replace N with Res and then return N?
Oct 26 2018
Helpful FileCheck change! Thanks!
Oct 24 2018
Ok, fair enough, my LNT numbers show that the MISched results are more variable:
Added the a72 to a couple of scheduling tests, as well as the basic unroll one.
Oct 23 2018
Oct 17 2018
Cheers Sjoerd, I've added two helper functions to try to clean this up a bit.
Oct 16 2018
Thanks, LGTM. With one bonus question, are the fused operations fast on the M7..?
Good point. Would it be worth adding a test for the M7 though? We seem to be a little lacking in our m-class FP tests.
Oct 15 2018
Oct 1 2018
Two breaking assumptions were that:
- the base load would be before the offset load.
- each load would only have one user - this is true but I also really meant and assumed that the sext of the load has one user.
This patch was reverted again in rL343082.
Sep 28 2018
Sep 27 2018
Sep 26 2018
Sep 25 2018
Thanks for suggestions Sjoerd. I've evidently had a difficult time of wrapping (pun intended) my head around this and really should have put some comments up before. Hopefully I've now also illustrated when and how we can use this.
Sep 24 2018
Committed in rL342870.
Sep 20 2018
The patch caused an assert on some vectorized code because I missed a check that the muls are plain integer types. parallel-dsp-top-bottom-neg-vec.ll has been added which was the reproducer provided.
Commit was reverted in rL342260.
Shouldn't we also consider code size here?
Sep 17 2018
Thanks for the reproducer and for the revert.
Sep 14 2018
Sep 13 2018
Sep 12 2018
Added a negative test for loading i8s as well as a positive test for 64-bit macs.
Sep 11 2018
Aug 29 2018
Aug 28 2018
Fixed a couple of typos and added assert to AddMACCandidate
Aug 22 2018
Added test for armv8m.main+dsp.
Aug 21 2018
Aug 16 2018
Thanks for putting the time into this, just one nit before its committed please.
Fixed and recommitted in r339858.
Re-adjusted ShAmt for big endian targets.
Thanks for reverting. The issue was that I was assuming that the instruction operands mapped to arguments for CallInsts. Will be recommitting.
Aug 15 2018
Changed test regexesess
Aug 14 2018
Removed the unnecessary isa<Instruction> checks and updated the test to actually test.
Rebased and fixed the handling of undef values. I've also moved the tests around so we have a standalone file for the call tests. Also added a small piece of control to decide whether we bother to promote or not: if we find nothing but sources, sinks and the icmp, then we don't bother doing anything.
Rebased and updated changes to the x86 codegen tests.