- User Since
- May 11 2015, 7:59 AM (273 w, 5 d)
Fri, Aug 7
CodeGen looks a little better! LGTM
Thu, Aug 6
Wed, Aug 5
Registers are free unless you go over a limit.
Yes, good point.
Do you mean it's the cost of a phi that is altering things?
I believe so, since I figure there's plenty more of them than branches and returns that would affect the vectorizer. I think register usages should still represent some cost, because they are important physical resources, but I suspect that the vectorizer is taking the scalar cost and multiplying it by the vector factor, even though we're likely to still only use a single register.
If I build up the will, I'll take a look at the vectorizer and figure out how it's using these costs... I'm assuming it's the cost of the phi that causes the problems which, as a register, sounds counter-intuitive to be 'free'.
Tue, Aug 4
Mon, Aug 3
My understanding is that it is a terminator that produces a value, and that does not work very well. If you need to spill the value, you would have not place to spill it.
It also could also have the benefit of other instructions using the value in LR too. Maybe it makes reverting easier too, but I can't even remember now... Either way, I think trying to fuse the operations is quite a bit less invasive than trying to re-implement how we do these loops :) and if it can give us some little improvements, let's do it.
Thanks for sorting this out.
Yeah, makes sense, it's now using isThumb and I've cleaned up the predicate a bit too. The thumbv7 select costs tests have been updated accordingly too.
Fri, Jul 17
Rebased and cleaned up the logic a bit.
Rebased after precommitting the test.
Nice and tidy fix, thanks!
Thu, Jul 16
@efriedma Was there much more to do on your original patch?
Tue, Jul 14
Cheers, added a run for armv8a.
Mon, Jul 13
Jul 8 2020
I guess I just don't understand why the BX_RET wouldn't be marked with using the link register in the first place?
Jul 3 2020
I suspect that the getCFInstr maybe used in an odd way by the vectorizer for these changes to make such a difference. I'm suspect that the vectorizer is multiplying the cost of it, where 0 or 1 would make quite a difference, but where it just shouldn't be... hopefully the cost of a branch and phi doesn't change that much after vectorization / scalarization.
Jul 2 2020
Thanks, fixed typo.
Renamed getUserCost to getInstructionCost.
Is getUserCost a better name than getInstructionCost? I wonder if we're moving in the wrong direction.
I'd prefer to use getInstructionCost too.
Jul 1 2020
Thanks, I'm hoping that doing it just for M-class at the moment won't disrupt too many people if there are some bug to fix.
so maybe wait for it before committing it.
Can this be abandoned now?
Jun 30 2020
- Reduced test case further
- Now multiplying cost by TTI::Basic.
Updated according to @dfukalov suggestions.
Not sure if we'll have to revisit this again if we're doing, say a reduction, and but also have a VCMP in the loop which writes to the VPR meaning it needs to be spilled and restored for the VPSEL... But we can cross that bridge if we come to it, cheers!
- Rebased after adding tests in reductions.ll.
- Re-added support for VADDi8 and VADDi16.
- Added some extra comments and TODOs.
Jun 29 2020
Rebased, which has made checking the VPSEL predicate a much more simple task.
LGTM - and sorry for causing trouble.
Sorry, forgot about this. Please add a quick comment about the importance of 4 lanes for FP16 operations.
Jun 24 2020
Updated with final patch now that all the other pieces are in.
Now only +1 for the IT block when we're querying for code size. But we now +1 for i1 values too to account for (some) cost of rematerialising those values.
Jun 23 2020
There's a lot of changes here... would it be worth committing them as separate patches: MVE, NEON and generic, or splitting of memory and cast?
Fixed typo and also predicated the code for an M-class subtarget.
Jun 18 2020
Can you explain where the cost of 2 is coming form?
Well the IT isn't guaranteed to be free, it still takes up code size and, of course, it doesn't even exist for T1. But in my testing, a few months ago, it wasn't worth differentiating between M-profile versions - but I will double check. The cmsis numbers for the M33 look good, it doesn't effect many kernels but they're all decent improvements.
It is dangerous with how we have things set up at the moment for anything to look at the _type_ of the context instruction
But now we're checking that the queried type matches the type of context instruction too, so I'm not really what other conclusion to draw other than that the user asking: what is the cost of this instruction? Based on the currently limited API, what scenario do you think this would not generally be true?
I've tried to address @dmgreen comments about TLI being queried for an instruction which doesn't match the queried types, so this is now checked. But this wasn't good for X86, as a lot of SLP tests remained scalars, so I've added the TLI check into the X86 backend too. This also reverts the previous test change for PPC.
Jun 17 2020
I've been struggling to create a reproducer, though I can see the changes happening. I think my real problem is that simplify-cfg creates selects of i1 after the phi insertions. I will revert my revert and look into solving the problem.
Jun 16 2020
Jun 15 2020
Thanks, done in 3e39760f8eaa.
Jun 14 2020
Jun 11 2020
Great stuff, I look forward to the other patches then :)
- Changed return value and added comment.
- Added a few internal UndefValue where possible.
- Moved test.
I just run some numbers on the test suite and the results are already looking very good with 1.4% geomean reduction for T32, so many thanks! Is this the final patch of the set?
Was the failure only a fuzzer problem?
Not sure, but either it seems harsh to assert of technically legal IR.
Jun 10 2020
Rebased and cleaned up the test changes.