- User Since
- Jul 10 2019, 8:51 AM (149 w, 5 d)
Mon, May 9
Thu, Apr 28
Mar 16 2022
Mar 10 2022
- Renamed VecVT to MemVT in the preferGatherScatter function.
- Fixed formatting issues.
Hi @rscottmanley, thank you for taking a look at this patch.
- Added the preferGatherScatter function to AArch64Subtarget, which checks if a gather/scatter with a given type & stride is preferable to contiguous loads/stores on the current target.
- Create new machine memory operands for the contiguous loads/stores instead of reusing the mem operand from the gather/scatter.
- Added tests for floating-point gathers & scatters, and extending & truncating gathers/scatters.
Mar 7 2022
- Moved the changes to performMaskedGatherScatterCombine into a new function, tryCombineToMaskedLoadStore.
- Removed the switch statement and use of getSimpleVT(), using if-else statements which check MemVT for supported types instead.
- Removed the restriction that the mask opcode of the gather/scatter is not an extract_subvector.
Mar 3 2022
Feb 22 2022
Feb 21 2022
- Changes to tryToBlend() so that all incoming values of the Phi are checked for in-loop reductions.
- Added an assert to tryToBlend() that the number of incoming values to the Phi is 2 if an in-loop reduction is found. Also added an assert that only one of the incoming values is an in-loop reduction.
- Reworded the comment describing the number of uses of Phi in getReductionOpChain().
Feb 17 2022
- Moved the and reduction in the @unconditional_and test into the if.then block.
- Reverted the previous changes which set Scale to TypeSize::Scalable(16) for all opcodes.
- Corrected the Min & Max values added to getMemOpInfo, as these should be the indices -8 to 7 for all structured loads & stores.
Feb 15 2022
- Changed Scale to TypeSize::Scalable(16) for all opcodes added to getMemOpInfo, fixing incorrect scaling when the immediate is out of range
Feb 14 2022
- Moved all structured load tests into sve-ldN.mir & all store tests to sve-stN.mir
- After some discussion about the tests in this patch offline, I have removed ldN-reg-imm-alloca.ll & stN-reg-imm-alloca.ll in favour of adding mir tests.
- Removed newlines introduced in AArch64InstrInfo.cpp.
Feb 11 2022
- Changed fixed-width allocas used in the tests to scalable.
- Added tests for offsets which are at the min/max range & tests outside the min/max range.
- Added the nounwind attribute to all tests.
- Changed the tests for non-zero offsets to remove the second alloca.
Feb 10 2022
- Removed unnecessary stores from tests in stN-reg-imm-alloca.ll which use only one alloca.
- Increased the number of elements for allocas in a number of tests in which a gep was attempting to access data beyond the allocated space.
- Ensure the correct amount of space is allocated in each test by increasing the size of the allocas.
- Added a store to each of the tests to ensure the allocas aren't optimised away.
- Moved vscale_range into the definitions of the tests.
Feb 9 2022
Feb 4 2022
- Renamed the @multiple_cond_ands test to @unconditional_and.
- Removed the -instcombine & -dce flags from scalable-reduction-inloop-cond.ll.
- Simplified the CHECK lines for the negative tests in reduction-inloop-cond.ll.
Feb 3 2022
Thank you for reviewing these changes, @david-arm!
- Changed getNextInstruction to iterate over Cur->users() and handle Phi nodes found by moving to the next user, similar to ICmp/FCmp.
- Removed the dyn_cast<PHINode>(Cur) == LoopExitInstr block as Phis are now handled by getNextInstruction.
- Added tests for various scenarios involving chained reductions where we should not vectorise with in-loop reductions.
Jan 31 2022
Jan 28 2022
- Generated the check lines in sve-alloca.ll with update_llc_test_checks.py
- Use TySize.getFixedValue() when TySize is not scalable
- Fixed AllocSize to ensure we multiply by vscale for scalable vectors
- Added CHECK lines to the test in sve-alloca.ll
Jan 27 2022
Jan 26 2022
Jan 25 2022
Thank you for this fix @congzhe, LGTM! I have just added one minor comment on the new test.
Jan 24 2022
Jan 21 2022
- Added StartVal as an external def to BestEpiPlan
- Updated pr35432.ll without auto-generating the check lines
Jan 19 2022
- Renamed getReductionOpChain -> findReductionOpChain
- Added !LoopExitInstr->hasNUses(2) back into findReductionOpChain
Jan 18 2022
Jan 17 2022
- Moved getReductionOpChain back to the original location in IVDescriptors.cpp
- Renamed ResumeValues -> ReductionResumeValues
- Removed several tests which were not covering anything new from this patch compared to other tests
Jan 14 2022
This seems like a sensible change to me, LGTM!
Jan 13 2022
- Added a test to epilog-vectorization-reductions.ll where the start value of the reduction is also Phi node
- Changed getResumeValue to return a PHINode instead of a Value
- Pass in the RecurrenceDescriptor as a const reference to getResumeValue
- Assert that the resume value must be found in getResumeValue
Jan 12 2022
- Added a map to associate resume values with RecurrenceDescriptors in the loop
- Changed the tests to use more meaningful start values
- Added a test case where there is more than one reduction in the loop
- Use the start value of the VPReductionPHIRecipe in fixReduction to check if the resume value is a Phi
- Restructured the changes to fixReduction & createEpilogueVectorizedLoopSkeleton
Jan 10 2022
Dec 10 2021
Thank you for adding these tests @paulwalker-arm, LGTM
Nov 23 2021
Nov 18 2021
Nov 16 2021
Nov 15 2021
Nov 12 2021
Nov 11 2021
Nov 10 2021
Nov 9 2021
Nov 5 2021
Thanks @sdesmalen, this LGTM!
Nov 4 2021
- Add store instructions to the Uniforms list in collectLoopUniforms, instead of the worklist. Added more comments to clarify that instructions in Uniforms may demand the first or last lane.
- Moved the new tests in sve-uniform-store.ll into sve-inv-store.ll.
- Removed the CHECK lines from middle.block from @inv_store_i16
Nov 3 2021
Merged this with the parent patch, D112725
- Merged with D113034, which makes changes to collectLoopUniforms to collect uniform store instructions.
Nov 2 2021
- Removed the uniform-store.ll test added in the previous revision.
Oct 29 2021
- Removed redundant Legal->isUniformMemOp(I) check from setCostBasedWideningDecision
- Added a comment to VPReplicateRecipe::execute
- Removed the State.VF.isScalable() check from VPReplicateRecipe::execute & updated the tests affected by this change. Also added a test of uniform stores for fixed-width.
Oct 28 2021
Thank you @RosieSumpter, this patch looks good to me and I think all of the comments have been addressed.
Oct 27 2021
Oct 25 2021
- Pass isExpandingLoad() to getMaskedLoad