The dead flag on those registers is not set because we do not run "Live Variable Analysis" pass on -O0 optimization level.
We can add Live Variable Analysis pass prior of execution of Fast Register Allocator pass, but this will cause ~59 LIT tests to fail.
At the same time, in some situations the produced assembler code is more clean, does not contain unnecessary register spills.
And I am not sure can this potentially affect a code debuggability or not.
What would be your suggestion? Shall we add Live Variable Analysis pass on -O0?
May 31 2018
May 24 2018
Feb 22 2018
Feb 21 2018
Since a "naked" function can only have nothing but "asm" statements inside, and those "asm" statements can not have input or output parameters, it is safe to just disable any registers spilling for a "naked" function.
Do naked functions have vregs at all then? Could we just skip the whole register allocation process in runOnMachineFunction() instead?
Feb 20 2018
Jan 30 2018
Jan 29 2018
Jan 10 2018
Four instructions perhaps:
vextractf128, vextractf128, vshufps, vblendps?
It is better than current six instructions.
Hi Craig, I placed simple test on the Bugzilla, your fix seems to be working fine.
The only thing, there are couple or more of shuffle masks which looks like they can be a subject of the similar optimization, but the generated code for them remained not optimized.
IN0: |0|1| | |4|5| | |
IN1: |8|9| | | | |E|F|
IN0: |0|1| | | | |6|7|
IN1: |8|9| | |C|D| | |
Jan 5 2018
can you add a test for "128 bit line swapped" shuffle masks: [8, 10, 12, 14, 0, 2, 4, 6,], [9, 11, 13, 15, 1, 3, 5, 7 ]
currently you call lowerVectorShuffleSplitLowHigh() form lowerV8F32VectorShuffle() only.
Dec 12 2017
Reimplemented the fix based on the reviewers recommendations.
Now the fix makes an attempt to transform sequence :
SHUFFLE<T0>(MASK) --> BITCACT<T1> --> BINOP<T1> --> BITCAST<T0> --> SHUFFLE<T0>(MASK)
BITCAST<T1> --> BINOP<T1> --> SHUFFLE<T1>(NEW_MASK)
It is always possible when sizeof of BINOP vector element type is smaller than sizeof of SHUFFLE vector element type,
and sometimes is possible when it is not.
Nov 30 2017
Thanks guys for valuable comments, will reimplementing the fix as suggested by Sanyaj.
Nov 29 2017
Nov 21 2017
Simon, can you please take a second look at this one when you have a chance.
Nov 20 2017
Craig, can you please when you have chance, review the update that I had made for the initial fix of the bug.
Nov 13 2017
Updates the fix to support 32bit mode, moved all non 64 bit tests into shift-double.ll
Nov 6 2017
Sep 26 2017
Sep 25 2017
Sep 20 2017
Closing bug PR24319 https://bugs.llvm.org/show_bug.cgi?id=24319 because your fix had fixed this bug as well.
Sep 15 2017
Reimplemented fixup overflow check, treat PC relative fixup values as signed values and absolute fixup values as unsigned values.
The performance measurement testing did not show any performance benefit for the proposed 8bit bit reversal intrinsic implementation over the existing one.
Sep 1 2017
treat PC relative fixups as signed values and absolute fixups as unsigned values
Aug 24 2017
as was pointed by Craig, corrected isIntN() argument mistake
Aug 21 2017
Aug 7 2017
limit showing error message only to the cases that require copy of non floating point datum into MVT::f80 container.
Jul 31 2017
Jul 18 2017
test regenerated with update_llc_test_checks.py, unnecessary artifacts removed.
Jul 14 2017
updated poorly formated diff file
Jul 13 2017
replaced llc option -mcpu for -mtriple, the diff file reformatted
Jul 12 2017
fix now works for 32bit mode, the test updated from crash checking to positive