- User Since
- Apr 9 2019, 2:05 AM (114 w, 5 d)
Nov 13 2019
Nov 11 2019
Nov 9 2019
Nov 8 2019
Updated after D69975 was reviewed and committed.
Now showing the codegen improvements related with this patch
This is now commited to the github repo
Hi @spatel, @asl
Thanks for reviewing this. I'm getting some trouble at committing to the upstream repo in the way I would like. I asked for help here first, but then I thought it was better to use llvm-dev instead to avoid clutter here. So this is an edit removing my previous comment. Thanks again
Nov 7 2019
Following the above conversation, I created D69975 as the base tests before this patch can be committed.
I put the base tests in a separate patch to actually show the differential effects of this one.
I will open this for review after D69975 is reviewed.
Ok, no problem.
Nov 6 2019
@spatel Thanks for reviewing and commiting my previous patches on this subject. I have now been given commit access. I have not tried yet, but it should be good.
Made sure diff matches latest trunk changes
Oct 31 2019
@asl Ok thanks, I agree that 9 bit shifts may also benefit from the version with amount.
If this is ok, I will propose the TLI function replacement on a different patch after this is reviewed, to keep things organised.
Updated diff to account for the latest trunk changes
Oct 30 2019
@asl I identified you as the creator of the MSP430 patch that optimised 8 bit and larger shifts by combining them with bswap and extend instructions. This creates relatively cheap instruction sequences for exactly 8 bit shifts.
@spatel I updated this revision diff according to your suggestions. I also created a new baseline test revision to help identify the effects of this patch. I hope this is ok. Thanks in advance for your time.
I created a new baseline revision, D69609, that help to better reflect the effects of applying this patch. The referred revision should therefore be applied before this one for consistency reasons.
Oct 29 2019
Updated tests to reflect each covered case in an independent way with more focused IR patterns.
Added comments to the test file that refer to the transformation we are dealing with.
Oct 22 2019
Code style correction
Oct 20 2019
Made sure the updated diff file accounts for recent trunk changes
Oct 19 2019
Move repeated calls to N0.getValueType() into a variable
Oct 18 2019
Updated with latest suggestions
Oct 17 2019
Should be ok now.
I just realised that I somehow messed this test file with the already optimised one. Please discard this until I upload the right one. Thanks
Updated baseline test file with the given recommendations:
- removed explicit target and datalayout from .ll test file
- removed dso_local
Oct 16 2019
I know this is possibly not the right place to ask this particular question, but I thought that I would get a more focused reply by asking here:
Sorry, I somehow missed your post before I posted my last one. Thanks for that.
Thank you for your support. I'll try to do what you suggest regarding generating of baseline tests and splitting the patch.
Oct 15 2019
I abandoned this, created D68982 instead
Oct 14 2019
Oct 13 2019
@spatel I want to express my support to selects over bit manipulation instructions in IR, as stated above, in order to move such optimisations to DAGCombine. Ideally, this should involve the removal of some of the existing InstCombineSelect transformations, particularly most of the ones in foldSelectInstWithICmp. However, as I exposed earlier in LLVM-dev, the DAGCombine code should incorporate hooks to allow targets to decide whether such bihacks are actually profitable, or it's best to keep them as selects.
Oct 12 2019
Oct 3 2019
It works for me.
Jul 8 2019
This fixed bug 42528. okey to me.
Jun 27 2019
@craig.topper , can this be considered ok for the X86?. Thanks.
Jun 26 2019
Hi Craig, thanks for commenting.
Yes, I was actually compiling for -Oz, but the differences when using -Os are even bigger. Let me try to explain every case.
- For Oz, the compiler indeed generates expensive push/pop instructions as an attempt to reduce code size. However, after the patch is applied these instructions are removed without any code size increase. I have checked that in several scenarios and the result is either the same size or less size. This is because the patch reduces the overall number of instructions. On the examples above, the resulting code size is identical, this is because the sequence:
Fixes reported assertion crash on SystemZ
Jun 25 2019
I recently sent a new LSR patch proposal, D63692, that to some extent runs in the opposite direction than this one. I mean, I am adding an additional initial formula for ICmpZero as opposed to excluding it. I think that both patches are however fully compatible and probably worth merging together to get the best of all platforms. I added @shchenz, the author of this, as a D63692 reviewer so he can post any considerations.