User Details
- User Since
- Nov 4 2019, 3:49 AM (203 w, 4 d)
Jul 13 2023
Rebase again.
Rebase and merge precommit.
Add test with fneg as src modifier.
Something weird happening with patch application, maybe the parent review https://reviews.llvm.org/D155171 should be merged first?
Rebase.
Removed unnecessary flags from fmul instruction in tests.
Add test with abs as a src0 modifier.
Rebase.
Use selection patterns for selecting V_FMA/MAD_MIX* instead of combiners.
With combiners there is a possibility that fptrunc (fmul a, b) would be selected into fptrunc (fma a, b, 0), and some other combiner could transform it back to its original state.
Added tests without denormal flushing as a function attribute.
In all cases except GFX900 it should have the same result as with denormal flushing.
Since it doesn't affect any tests, should I add this line in D153544?
Jul 10 2023
I could add a function attribute containing denormal flushing or add more run lines to these tests.
Jul 5 2023
Jul 4 2023
Remove add instruction from fptrunc + fmul -> v_fma/mad_mix*.
Instead of writing patterns that select fma/mad_mix*, write combiners for sdag and global isel that will transform
fptrunc (fmul a, b) into fptrunc (fma a, b, 0), which will later be selected into v_fma/mad_mix*.
Jul 3 2023
Jun 28 2023
Transform fptrunc (mul a, b) -> fma_mix/mad_mix a, b, 0, implicit_def
or
build_vector el0, (fptrunc (mul a, b)) -> fma_mix/mad_mix a, b, 0, el0.
@arsenm Would it be correct if I wrote a pattern in MadFmaMixPats multiclass, that finds (f16 (fptrunc (f32 (fmul %src1, %src2)))) and turns it into v_fma_mixlo_f16 %src1, %src2, 0.
The add would stay the same.
Jun 22 2023
Should it also be done for GlobalISel in the same patch?
What other cases should I cover?
Should I cover the case when we have to pick the higher 16 bits of mul instruction and select v_fma_mixhi_f16?
Jun 16 2023
Jun 15 2023
Remove unnecessary if, use isNullValue instead of isZeroValue.
Thanks @foad.
Rebase.
Jun 9 2023
Instead of call to computeKnownFPClass, use findScalarElement.
Jun 8 2023
Jun 5 2023
Jun 2 2023
Remove *_buffer_store instructions from being optimized.
May 26 2023
Add test case with all zero components.
This patch was reverted on upstream, because of failed cts tests.
Change condition in for loop, instead of i >= 0, put i > 0. We don't want to optimize out the 0th element.
May 22 2023
May 18 2023
May 15 2023
Move the default case out of the switch.
Refactor.
Do the optimizations for image instructions that were done prior to this patch.
May 10 2023
Use Intrinsic opcode to know if the instructions has DMask instead of testing if the instruction has a ConstantInt as the second operand.
Add more run-lines to the test.
Change the name from findDemandedElts to trimTrailingZerosInVector.
Remove some unnecessary dyn_casts.
Refactor and rebase.
Thanks for the review.
May 8 2023
May 3 2023
Rebase and change in comments.
Apr 28 2023
Thank you @foad.
findDemandedElts with correct usage of computeKnownFPClass.
Changes in findDemandedElts, use computeKnownFPClass.
Apr 27 2023
Apr 26 2023
Apr 25 2023
Rebased.
Moved call of moreElementsIf to the bottom. Made changes to customIf call for G_EXTRACT_VECTOR_ELT and G_INSERT_VECTOR_ELT.
Apr 24 2023
Added named LegalityPredicate and LegalizeMutation for checking if the type doesn't have a corresponding AMDGPU RegClass and
if not, widenening the vector to the first next legal RegClass size.
Apr 20 2023
This is an extension of https://reviews.llvm.org/D144198.
It includes checking the exact width in get*ClassForBitWidth and also widening of the vector operand of
G_BUILD_VECTOR, G_INSERT_VECTOR_ELT and G_EXTRACT_VECTOR_ELT instructions.
Apr 12 2023
Rebase and minor changes.
Apr 10 2023
Closed, commit hash: f6e70ed1c73a2f3ac15eb6650423c1c10d278f50
Apr 7 2023
Apr 4 2023
Use of SmallVector instead of VectorMap for tracking which components were already added.
Remove some unnecessary dyn_casts.