- User Since
- May 11 2015, 7:59 AM (223 w, 5 d)
Mon, Aug 12
Fri, Aug 9
Huh, surprised this was missing. Add i64 before committing?
Thu, Aug 8
Some codegen tests would be good.
What an eye opener...
Ah, nice. LGTM
Updated with context.
Why does the llvm_arm_vctp32 not return a <4xi1> directly?
The vctp family are defined like that because the ACLE specifies that they return a mve_pred16_t and I'm assuming this is a scalar - but I can't find a definition! I think that all the user facing predicate generators will produce a scalar and we will need to do the conversion to make it nice and LLVMy.
Wed, Aug 7
With my confusion cleared, LGTM.
Yes, I'm definitely up for preventing unrolling and I think checking the instructions would be better - we'll catch vector intrinsics that way too.
Tue, Aug 6
Yes, I guess that would be sensible approach! I am worried that one of the (many) passes will trip somewhere, so this gives me another example test case... For performance, I'm still not convinced this is the best approach because (1) we can't depend on metadata and (2) doesn't this also prevent the scalar remainder from being unrolled too?
Rather than for performance, I can't see how we can legally unroll a vectorized loops in the case of tail predication. I also don't know how we'd detect that a loop had been unrolled when we get to the conversion phase, so it seems like this is the only place where we can maintain correctness. And because we need this for legality, we can't rely on metadata...
Anything is allowed to use the value of LR, just not after it has been updated by LoopDec.
Because of the VecLd/LHS typo, I had assumed that there was something wrong with the handling of the 'exchange' instructions. When I added that functionality, the tests weren't very varied so I assumed that I had missed an edge case with operand orderings. What the tests show is that there are some patterns which don't get optimised well, see the TODOs. The tests in overlapping are the kind of accesses that cause the problems and it wasn't something I had thought about. We test for sext loads with multiple users, but not in the case where multiple ones are widened.
Mon, Aug 5
So why doesn't sdiv get optimised like udiv..?
Fri, Aug 2
Thu, Aug 1
Wed, Jul 31
@efriedma Thanks, added Preserved before committing.
Tue, Jul 30
Thanks. Added skipFunction and rebased.
Mon, Jul 29
Fri, Jul 26
Reduced test case and removed change from previous commit.
I made some changes before committing - after noticing I should have attached the def to LoopEnd and not LoopDec.
Thu, Jul 25
Jul 25 2019
Not sure why this is still lingering in my phab, this has been closed already.
Jul 24 2019
Any reason why not to use this for thumb2 as well? It looks like it would be useful for T2 codesize.
Jul 23 2019
I trust that those massive test files are okay! LGTM.
Jul 22 2019
Added support for setcc ge/uge, 1
Do you see a way that we can make this more efficient in the future..? If so, could we add a comment somewhere because looking at these code sequences, I hope this doesn't come up often!
- renamed the nodes and the search function.
- added support for other setcc opcodes, other than EQ and NE.
Jul 12 2019
Jul 11 2019
Committed in rL365749.
Modified logic during lowering.
Jul 10 2019
Added and modified comments.
Jul 9 2019
Committed in rL365363.
Jul 3 2019
Please could you add an opt test for the hardware-loops too? This looks like a corner case that we're not getting generic coverage on.
Thanks for doing this! LGTM.
Jul 2 2019
There's a fair number of sub, load and ptrtoint instructions so narrowing the context of the checks would be good, just checking which basic block would be good. As well as what GEP the load is using. Thanks.
And please could you update the test so that is clear which instructions you are testing.