- User Since
- Mar 22 2018, 8:44 AM (159 w, 4 d)
Jan 2 2019
Dec 17 2018
Aug 14 2018
Aug 13 2018
Is clang part good to go too?
Implemented suggested changes.
Regarding masking with select - do you mean creating new avx512_padds/avx512_psubs intrinsics without masks and replacing the old calls with new intrinsic+select?
Aug 9 2018
Split to two different patches for signed and unsigned intrinsics.
Corrected version number.
Aug 7 2018
Removed signed intrinsics lowering due to the pattern being too complicated - instead some minor optimizations were introduced on LLVM side.
Brought back signed intrinsics and added constant folding in InstCombine.
Jul 24 2018
Now the pattern is not detected if there are any undef elements in the constant operand.
I removed the tests added after fixing the bug which caused reversion of the patch - in the meantime the pattern conditions changed and the sequence used in those tests is now perfectly valid.
Should FIXME comments be removed?
Jul 17 2018
Removed unnecessary comment.
Rebased and implemented suggested changes.
Jul 16 2018
Jul 9 2018
Jun 28 2018
Jun 27 2018
Jun 18 2018
Jun 15 2018
Jun 14 2018
Fixed rounding mode calls.
Jun 10 2018
Jun 9 2018
Jun 7 2018
Jun 6 2018
Jun 5 2018
Jun 4 2018
Removed CHECK-NOTs for consistency.
Jun 1 2018
Whoops, that's a wrong revision. I'll revert it shortly.
Added missing scalar intrinsics without rounding.
Added missing scalar intrinsics without rounding.
Added lowering of scalar sqrt intrinsics without rounding (relies on D47621).
Mask scalar case is closed and doesn't have any effects on this revision. Besides, I resolved issues connected to lowering scalar sqrt intrinsics without rounding (that is, if D47621 is accepted). Should I add them here to have everything sqrt in one place or upstream this and add them to a new revision?
May 29 2018
Added a check for two constant operands. I'm still waiting for answer for my last question.
Added fast-isel tests - there are some XORs/MOVs which aren't combined but it doesn't look terrible overall.
May 28 2018
May 25 2018
Fixed fma mask3 pattern.
May 24 2018
May 21 2018
Test diffs are now visible.
Note: there is no test for div in combine-select.ll - div pattern gets split into 3 basic block with a branch condition, so this optimization doesn't apply to it.
Nevermind, I got it wrong - after compiling something similar to https://bugs.llvm.org/show_bug.cgi?id=37417, it does split the pattern. Moreover, it splits it in different manner for different intrinsics and hoists more than one more instruction, so it might not be as simple as moving back than one subtraction in the bug reproducer. I don't think there is any other way though - replicating simplifyX86immShift and simplifyX86varShift directly in IR would be a nightmare.
Because emitting shifts in IR is more complicated than just adding an shl/lshr node due to those poison values (see D46946) and would create some redundant code. I guess I can use simplifyX86immShift directly instead of emitting a call here.
As for the bug - much more than one instruction gets thrown out of the loop after applying shift lowering patch - I'm leaning to leaving only non-variable intrinsics in this patch and implement variable ones after the generic intrinsic is introduced.
Added missing test checks.
Can you provide an example of such behavior? I tried various (rather simple) tests like this one:
Added tests for ISelLowering combining and FMA patterns.
May 17 2018
Yes we could, but only for X86ISelLowering folding and FMA patterns - the other ones are multiclasses without any definitions, so they never alter anything at this point. They are intended to be a basis for several upcoming patches which will add defms for these patterns with the tests alongside them. I'm OOO tomorrow so I'll be able to add the tests for first two cases on Monday.
About lib/Target/X86/X86ISelLowering.cpp:36183 - does it make sense to emit normal ADD/SUB without saturation when element type after truncation is larger than before extension?
May 16 2018
I brought back lowering in AutoUpgrade for unsigned intrinsics, although I'm not sure if there's a universal agreement about it. In email correspondence @DavidKreitzer
suggested that it would be better to retain the intrinsics and get rid of them in another time/patch, which Craig disagreed to and argued we can lower non-complex ones (like addus/subus) without much trouble. Is there a consensus about it?
May 15 2018
As was decided, I removed lowering in Autoupgrade.cpp. I also moved the tests into corresponding *target*-intrinsics-canonical.ll files as discussed with @mike.dvoretsky. I added tests for PR37260 in test/CodeGen/X86/avx2-intrinsics-canonical.ll.
What's your opinion on fast-isel subus emmision and my proposition in detectAddSubSatPattern function?
May 11 2018
Nevermind - these four are not strictly truncating. Sorry for the confusion.
There are four other similar intrinsics which convert to 128/256-bit vectors:
May 10 2018
During internal review I proposed something like that (~0 - a) < b ? ~0 : a+b.
It gets rid of zext/trunc in addus pattern but introduces additional subtraction which is presumably more costly. Your solution seems much better, I'll change it to that form.
About test/CodeGen/X86/avx2-intrinsics-fast-isel.ll change - there is a canonical form for subus pattern I'm using here - it's different from adds/addus/subs patterns. While those three use ext/trunc pattern and fold correctly, subus has only max+sub. If fast-isel is enabled, sub is not put into SelectionDAG - that's what prevents it from combining. Instead, it's appended after isel as a lowered node. Is there a machine instruction pass for combining?
May 9 2018
We're currently discussing the possible solutions for the JIT pipeline issue with @DavidKreitzer.