- User Since
- May 11 2015, 7:59 AM (179 w, 4 d)
Wed, Oct 17
Cheers Sjoerd, I've added two helper functions to try to clean this up a bit.
Tue, Oct 16
Thanks, LGTM. With one bonus question, are the fused operations fast on the M7..?
Good point. Would it be worth adding a test for the M7 though? We seem to be a little lacking in our m-class FP tests.
Mon, Oct 15
Mon, Oct 1
Two breaking assumptions were that:
- the base load would be before the offset load.
- each load would only have one user - this is true but I also really meant and assumed that the sext of the load has one user.
This patch was reverted again in rL343082.
Fri, Sep 28
Thu, Sep 27
Wed, Sep 26
Tue, Sep 25
Thanks for suggestions Sjoerd. I've evidently had a difficult time of wrapping (pun intended) my head around this and really should have put some comments up before. Hopefully I've now also illustrated when and how we can use this.
Mon, Sep 24
Committed in rL342870.
Thu, Sep 20
The patch caused an assert on some vectorized code because I missed a check that the muls are plain integer types.
Commit was reverted in rL342260.
Shouldn't we also consider code size here?
Sep 17 2018
Thanks for the reproducer and for the revert.
Sep 14 2018
Sep 13 2018
Sep 12 2018
Added a negative test for loading i8s as well as a positive test for 64-bit macs.
Sep 11 2018
Aug 29 2018
Aug 28 2018
Fixed a couple of typos and added assert to AddMACCandidate
Aug 22 2018
Added test for armv8m.main+dsp.
Aug 21 2018
Aug 16 2018
Thanks for putting the time into this, just one nit before its committed please.
Fixed and recommitted in r339858.
Re-adjusted ShAmt for big endian targets.
Thanks for reverting. The issue was that I was assuming that the instruction operands mapped to arguments for CallInsts. Will be recommitting.
Aug 15 2018
Changed test regexesess
Aug 14 2018
Removed the unnecessary isa<Instruction> checks and updated the test to actually test.
Rebased and fixed the handling of undef values. I've also moved the tests around so we have a standalone file for the call tests. Also added a small piece of control to decide whether we bother to promote or not: if we find nothing but sources, sinks and the icmp, then we don't bother doing anything.
Rebased and updated changes to the x86 codegen tests.
Thanks for fixing this, LGTM.
Aug 10 2018
- Removed commented out code
- Added some TODOs
- Expanded the description of what a source and sink are
Aug 9 2018
Moved arguments to occupy a single line.
Aug 8 2018
Thanks for the extra tests, LGTM.
To try to make it clear that these are not user facing intrinsics, I've renamed them to arm.codegen.zeroext
Ok, from your reply on the other ticket - LGTM.
Cheers, shufflevector always confuses me. LGTM.
Aug 2 2018
So this started life in the DAGCombiner and issues around the implementation were raised and that it would be useful to have earlier in the pipeline. But it seems that it hasn't really be thought, or discussion, about how this would fit well in the existing passes... I think DAG combine has always been the right place for this because we're trying to reuse values - something that DAGs are good for. In DAGCombiner::visitANDLike, we already handle ANDs with SRL operands and the motivating example can be addressed with very little effort:
Aug 1 2018
And what would I need to do about testing for a generic intrinsic? Add to BitCode/compatibility-6.0.ll test and just keep this codegen one too?
Yes exactly, I would like to use these for loops. In ARMCodeGenPrepare I need to insert truncs to keep the IR legal, although I've already proved that the value is already zero extended. In those cases I want to use these intrinsics to carry that knowledge through. So it's not trying to work around the lack of info of target specific nodes. Another idea I wondered about was adding flags to instructions, but that seems far more intrusive.
Jul 31 2018
Jul 25 2018
This LGTM. I don't think there's a problem with solving this niche in the backend.
Jul 24 2018
Is it such a bad idea? Sure, I would like to check whether the sel intrinsic has been used or not, but what happens in the case of inline assembly? The AAPCS is also vague, I'm not sure what a 'public interface' is in terms of an LLVM module. I'd like to have an option which is the user can be explicit in saying its fine to use these instructions.
Jul 23 2018
Added tests for:
- non load operand to the add,
- immediate operand to the add,
- volatile store
- non-consecutive loads
Performed a rebase and added a test from a manually unrolled example. I've also added an option to control the use of the GE writing flags - really I think this should go as a subtarget feature so this can be used across this pass and ARMCodeGenPrepare.
Added another test
Now disabled at -O0
Jul 20 2018
I did some work for Thumb-2 last year in a similar vain during the combine phase, in PerformSHLSimplify, but this (unsurprisingly) doesn't handle lshr. I didn't find any headaches from changing the canonical form in those cases, so probably would be worth having it there.
All of your test cases are rooted at an or, so it makes sense to search up from there. Why not start with searching just from or (and xors?) and then add the search from more operators in later patches?
Jul 19 2018
Jul 18 2018
Ok, thanks for the clarification. I'll have a look in instcombine.
Fixed support for handling switch instructions and added another test.
This looks like an odd solution to me, I haven't seen TokenFactors used like that before. Isn't it okay for the AND to be folded? Why not just check that the AND hasn't be folded into a constant before trying to update its, now non-existent, operands?