- User Since
- Jan 26 2016, 2:17 AM (374 w, 4 d)
Wed, Mar 29
Gentle ping. Any further opinions on this?
Mon, Mar 27
Fri, Mar 24
It's strange that I first increase the cost, and this proposes it to lower it. Clearly we need some alignment and clarity on this. But most importantly, we need some motivating examples before we continue with any of this.
Thu, Mar 23
I did some performance runs, nothing stands out, so LGTM.
Wed, Mar 22
Sorry, I also forgot about this, but thanks for pinging me.
Mon, Mar 20
Cheers, this makes a lot of sense to me.
Just a heads up that I am going to recommit this tomorrow.
Added FNEG, and checks for BF16 and FP16.
Fri, Mar 17
Thu, Mar 16
Thanks for your help and reviews! I will commit this tomorrow if there will be no further comments.
Added context, fixed the test cases.
I have now benchmarked RAJAPerf too, which is a loop based and FP heavy benchmark. Results are neutral, mostly no changes.
So overall I am only seeing the uplift of this, with no regression.
Wed, Mar 15
I am wondering if you're trying to do too much in this patch. But I only had a quick look at this, and need to look again.
But my first request is going to be about the cost/bonus calculation. Since this is crucial for this transformation, it would be useful to document and comment the idea/algorithm at some place, with all the formulas and definitions (costs, bonus, latency, code size etc). I think this would greatly improve readability of these changes and the code in general.
High level question first, it looks like that geomean compile-times improve, and that is very surprising.
My guess is that one of the functional changes in findSpecialization is a good improvement on itself. Or maybe it is the caching of the CodeMetrics? Anyway, that's my question, just curious if you know what it is?
Tue, Mar 14
Looks like a good fix to me.
Mon, Mar 13
Thanks, I have restored that piece of logic and added the code-size check to it (and added a code size check to the test).
That means that we now get add an additional cost for TCK_RecipThroughput, as well as for TCK_Latency and TCK_SizeAndLatency.
Fri, Mar 10
I like it when I can delete things and achieve the same, so I have just done that. This was my understanding of your comments. Thanks for the suggestion and for looking into this.
Thu, Mar 9
Option --filter-short now accepts an optional arguments, and it defaults to 1.0s.
Some special care had to be taken if this optional argument is omitted, it then needs to recognise that the FILE arguments is not the optional argument to --filter-short, as also explained in the comments.
Wed, Mar 8
Tue, Mar 7
Thanks for the comments, I also like this idea!
I will prepare a new revision to support this.
Mar 1 2023
(accidentally pressed enter)
Thanks, also for that case which I will add, that's interesting indeed.
Feb 28 2023
Feb 27 2023
Ok, cheers, and this looks very reasonable to me.
Looks reasonable to me.
The idea makes sense I think, but just to put things into context, do you already have a case or patch where we can see the benefit of this?
Feb 22 2023
Thanks, and I will apply those changes before committing.
Feb 21 2023
Feb 20 2023
@dmgreen wrote in the other patch:
Thanks for looking into this.
I will let @RKSimon comment on the usage of the TTI hook.
I agree these tests are missing, so this is a good addition.
Feb 17 2023
Feb 16 2023
Feb 15 2023
Feb 14 2023
Feb 10 2023
Feb 9 2023
Yeah, makes sense, cheers. LGTM too
The patch looks good to me, but I was just wondering if another approach would be to just match the sub(x, add(m1, m2)). pattern as mls, or is this easier/better?
Feb 5 2023
We handle this by not making the choice during ISEL but instead selecting a pseudo instruction that gets expanded after register allocation when we have access to more information.
Feb 3 2023
There is no precedent for:
- matching opcodes like that, the latest revision adds the splat_vector opcode check,
- checking the uses of sdnodes like that.
I experimented with replacing AArch64mul_p_firstOpndWithSingleUse -> AArch64mul_p_oneuse :
And in the meantime, can you upload a new revision with full context please, and rebase it on top of your change that precommit tests? Then we can see and discuss the changes this is making, and check if we see any improvements.
Feb 1 2023
And for completeness, summarising previous comments, these are my other 2 concerns.
I think we need at least 3 patches:
Jan 31 2023
and one more request, which I forgot to add earlier:
Ok, I misunderstood a few things, but see comment inlined.
First, I think the description needs some clarifications.
As I understand it, the problem description is that we would like to match/rewrite this pattern:
I will look at this soon.