Page MenuHomePhabricator

RosieSumpter (Rosie Sumpter)
User

Projects

User does not belong to any projects.

User Details

User Since
Jun 3 2021, 12:56 AM (77 w, 3 d)

Recent Activity

Jul 25 2022

RosieSumpter committed rG034a27e6882f: [AArch64] Add f16 fpimm patterns (authored by RosieSumpter).
[AArch64] Add f16 fpimm patterns
Jul 25 2022, 1:12 AM · Restricted Project, Restricted Project
RosieSumpter closed D129989: [AArch64] Add f16 fpimm patterns.
Jul 25 2022, 1:12 AM · Restricted Project, Restricted Project

Jul 21 2022

RosieSumpter updated the diff for D129989: [AArch64] Add f16 fpimm patterns.
  • Use FMOVWHr instead of COPY_TO_REGCLASS
  • Add let Predicates = [HasFullFP16]
  • Removed code to copy from GPR32 to FPR16
  • Updated a couple of recently added tests
Jul 21 2022, 8:29 AM · Restricted Project, Restricted Project

Jul 20 2022

RosieSumpter added inline comments to D129989: [AArch64] Add f16 fpimm patterns.
Jul 20 2022, 9:04 AM · Restricted Project, Restricted Project
RosieSumpter updated the diff for D129989: [AArch64] Add f16 fpimm patterns.
  • Updated title and summary
  • Made f16 immediates legal for hasFullFP16 rather than just SVE
  • Added f16 fpimm patterns
  • Updated tests
Jul 20 2022, 8:58 AM · Restricted Project, Restricted Project

Jul 19 2022

RosieSumpter committed rG05d424d16563: [AArch64][SVE] Fold fadda(ptrue, x, select(mask, y, -0.0)) into fadda(mask, x… (authored by RosieSumpter).
[AArch64][SVE] Fold fadda(ptrue, x, select(mask, y, -0.0)) into fadda(mask, x…
Jul 19 2022, 12:39 AM · Restricted Project, Restricted Project
RosieSumpter closed D129623: [AArch64][SVE] Fold fadda(ptrue, x, select(mask, y, -0.0)) into fadda(mask, x, y).
Jul 19 2022, 12:39 AM · Restricted Project, Restricted Project

Jul 18 2022

RosieSumpter updated the diff for D129989: [AArch64] Add f16 fpimm patterns.
  • Removed comment from sve-fadda-select.ll test
Jul 18 2022, 1:49 AM · Restricted Project, Restricted Project
RosieSumpter requested review of D129989: [AArch64] Add f16 fpimm patterns.
Jul 18 2022, 1:47 AM · Restricted Project, Restricted Project
RosieSumpter updated the diff for D129623: [AArch64][SVE] Fold fadda(ptrue, x, select(mask, y, -0.0)) into fadda(mask, x, y).
  • Change definition of SDT_AArch64ReduceWithInit so that fadda patterns don't need to be replicated for different predicate vector types.
Jul 18 2022, 12:33 AM · Restricted Project, Restricted Project

Jul 15 2022

RosieSumpter added inline comments to D129623: [AArch64][SVE] Fold fadda(ptrue, x, select(mask, y, -0.0)) into fadda(mask, x, y).
Jul 15 2022, 9:05 AM · Restricted Project, Restricted Project
RosieSumpter updated the diff for D129623: [AArch64][SVE] Fold fadda(ptrue, x, select(mask, y, -0.0)) into fadda(mask, x, y).
  • Used PatFrags
Jul 15 2022, 9:05 AM · Restricted Project, Restricted Project
RosieSumpter added a comment to D129852: [AArch64][SVE] Sink op into loop if it's used by PTEST and known to zero inactive lanes..

Thanks @sdesmalen for fixing some of my regressions!

Jul 15 2022, 8:00 AM · Restricted Project, Restricted Project

Jul 13 2022

RosieSumpter requested review of D129623: [AArch64][SVE] Fold fadda(ptrue, x, select(mask, y, -0.0)) into fadda(mask, x, y).
Jul 13 2022, 1:58 AM · Restricted Project, Restricted Project

Jul 12 2022

RosieSumpter closed D129282: [AArch64][SVE] Ensure PTEST operands have type nxv16i1.

e5edc1b5eecf

Jul 12 2022, 1:39 AM · Restricted Project, Restricted Project
RosieSumpter committed rGe5edc1b5eecf: [AArch64][SVE] Ensure PTEST operands have type nxv16i1 (authored by RosieSumpter).
[AArch64][SVE] Ensure PTEST operands have type nxv16i1
Jul 12 2022, 1:34 AM · Restricted Project, Restricted Project

Jul 8 2022

RosieSumpter added inline comments to D129282: [AArch64][SVE] Ensure PTEST operands have type nxv16i1.
Jul 8 2022, 5:47 AM · Restricted Project, Restricted Project
RosieSumpter updated the diff for D129282: [AArch64][SVE] Ensure PTEST operands have type nxv16i1.
  • Added assert that PTEST operands have same type
  • Moved PTEST pattern to class definition
Jul 8 2022, 5:46 AM · Restricted Project, Restricted Project
RosieSumpter updated the diff for D129282: [AArch64][SVE] Ensure PTEST operands have type nxv16i1.
  • Simplified code added to getPTest
  • Moved PTEST's pattern definition
Jul 8 2022, 3:01 AM · Restricted Project, Restricted Project

Jul 7 2022

RosieSumpter updated the diff for D129282: [AArch64][SVE] Ensure PTEST operands have type nxv16i1.
  • Renamed hasZeroedOtherLanes -> isZeroingInactiveLanes
  • Removed`WidenVT`
  • Used existing Pg and Op instead of defining new ReinterpretPg and ReinterpretOp
Jul 7 2022, 7:26 AM · Restricted Project, Restricted Project
RosieSumpter requested review of D129282: [AArch64][SVE] Ensure PTEST operands have type nxv16i1.
Jul 7 2022, 6:10 AM · Restricted Project, Restricted Project

Jul 5 2022

RosieSumpter accepted D129081: [AArch64][SVE] Zero other lanes when doing OR reduction on unpacked predicate using ptest..

LGTM

Jul 5 2022, 8:39 AM · Restricted Project, Restricted Project

Jun 14 2022

RosieSumpter added inline comments to D127210: [AArch64][SME] Add load/store intrinsics.
Jun 14 2022, 3:19 AM · Restricted Project, Restricted Project
RosieSumpter committed rG2c4e44752d1d: [AArch64][SME] Add load/store intrinsics (authored by RosieSumpter).
[AArch64][SME] Add load/store intrinsics
Jun 14 2022, 3:19 AM · Restricted Project, Restricted Project
RosieSumpter closed D127210: [AArch64][SME] Add load/store intrinsics.
Jun 14 2022, 3:19 AM · Restricted Project, Restricted Project

Jun 13 2022

RosieSumpter updated the diff for D127210: [AArch64][SME] Add load/store intrinsics.
  • Addressed comments from @c-rhodes
  • Will update the patch if needed when there is confirmation about the use of typed pointers
Jun 13 2022, 3:16 AM · Restricted Project, Restricted Project

Jun 7 2022

RosieSumpter requested review of D127210: [AArch64][SME] Add load/store intrinsics.
Jun 7 2022, 5:39 AM · Restricted Project, Restricted Project

May 10 2022

RosieSumpter committed rG131e6636f23c: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins (authored by RosieSumpter).
[Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins
May 10 2022, 5:42 AM · Restricted Project, Restricted Project
RosieSumpter closed D124850: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
May 10 2022, 5:41 AM · Restricted Project, Restricted Project
RosieSumpter committed rGf635e6370951: [Sema][SVE] Move/simplify Sema testing for SVE ACLE builtins (authored by RosieSumpter).
[Sema][SVE] Move/simplify Sema testing for SVE ACLE builtins
May 10 2022, 5:24 AM · Restricted Project, Restricted Project
RosieSumpter closed D124924: [Sema][SVE] Move/simplify Sema testing for SVE ACLE builtins.
May 10 2022, 5:23 AM · Restricted Project, Restricted Project
RosieSumpter added inline comments to D124850: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
May 10 2022, 1:19 AM · Restricted Project, Restricted Project
RosieSumpter updated the diff for D124850: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
  • Added REQUIRES: aarch64-registered-target to files where it was missed
  • Added overloaded forms to bfloat tests
May 10 2022, 1:19 AM · Restricted Project, Restricted Project
RosieSumpter added inline comments to D124924: [Sema][SVE] Move/simplify Sema testing for SVE ACLE builtins.
May 10 2022, 1:16 AM · Restricted Project, Restricted Project
RosieSumpter updated the diff for D124924: [Sema][SVE] Move/simplify Sema testing for SVE ACLE builtins.

Addressed nits

May 10 2022, 1:15 AM · Restricted Project, Restricted Project

May 9 2022

RosieSumpter committed rG1a2665902f12: [AArch64][SVE] Improve codegen when extracting first lane of active lane mask (authored by RosieSumpter).
[AArch64][SVE] Improve codegen when extracting first lane of active lane mask
May 9 2022, 6:02 AM · Restricted Project, Restricted Project
RosieSumpter closed D125215: [AArch64][SVE] Improve codegen when extracting first lane of active lane mask.
May 9 2022, 6:02 AM · Restricted Project, Restricted Project
RosieSumpter requested review of D125215: [AArch64][SVE] Improve codegen when extracting first lane of active lane mask.
May 9 2022, 2:56 AM · Restricted Project, Restricted Project

May 5 2022

RosieSumpter updated the diff for D124850: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
  • Corrected name const_b16_ptr to const_bf16_ptr in acle_sve2_bfloat.cpp
May 5 2022, 7:47 AM · Restricted Project, Restricted Project
RosieSumpter updated the diff for D124924: [Sema][SVE] Move/simplify Sema testing for SVE ACLE builtins.
  • Make operand names more descriptive
May 5 2022, 7:23 AM · Restricted Project, Restricted Project
RosieSumpter updated the diff for D124850: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
  • Changed operand names to be more descriptive
  • Made int/uint/float variables global
  • Moved bfloat tests into a separate file
May 5 2022, 4:53 AM · Restricted Project, Restricted Project

May 4 2022

RosieSumpter requested review of D124924: [Sema][SVE] Move/simplify Sema testing for SVE ACLE builtins.
May 4 2022, 7:32 AM · Restricted Project, Restricted Project

May 3 2022

RosieSumpter requested review of D124850: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
May 3 2022, 5:51 AM · Restricted Project, Restricted Project

Apr 28 2022

RosieSumpter closed D123605: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.

Commit: f7068c82a2560d97bf9826db1e917f931e887017

Apr 28 2022, 5:56 AM · Restricted Project, Restricted Project
RosieSumpter committed rGf7068c82a256: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins (authored by RosieSumpter).
[Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins
Apr 28 2022, 5:46 AM · Restricted Project, Restricted Project
RosieSumpter added inline comments to D123605: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
Apr 28 2022, 4:30 AM · Restricted Project, Restricted Project
RosieSumpter updated the diff for D123605: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
  • Removed EXPAND... macro
  • Added missing tests
  • Alternate between 0 and 180 argument for test_90_270()
Apr 28 2022, 4:27 AM · Restricted Project, Restricted Project
RosieSumpter added inline comments to D123605: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
Apr 28 2022, 1:26 AM · Restricted Project, Restricted Project
RosieSumpter added inline comments to D123605: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
Apr 28 2022, 1:19 AM · Restricted Project, Restricted Project

Apr 27 2022

RosieSumpter updated the diff for D123605: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.

Added REQUIRES: aarch64-registered-target line to each new test file.

Apr 27 2022, 5:52 AM · Restricted Project, Restricted Project

Apr 26 2022

RosieSumpter updated the diff for D123605: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.

This patch is now for moving/simpifying sematic testing for immediate arguments of the builtins. At the moment this is just for the SVE2 intrinsics so that the structure of the new tests can be checked, the tests for the SVE intrinsics will be added to this patch later.

Apr 26 2022, 3:56 AM · Restricted Project, Restricted Project

Apr 12 2022

RosieSumpter added inline comments to D123605: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
Apr 12 2022, 8:18 AM · Restricted Project, Restricted Project
RosieSumpter requested review of D123605: [Sema][SVE2] Move/simplify Sema testing for SVE2 ACLE builtins.
Apr 12 2022, 6:26 AM · Restricted Project, Restricted Project

Jan 12 2022

RosieSumpter committed rG552eb372cb81: [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter (authored by RosieSumpter).
[LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter
Jan 12 2022, 5:38 AM
RosieSumpter closed D115329: [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter.
Jan 12 2022, 5:38 AM · Restricted Project

Jan 10 2022

RosieSumpter added a comment to D115329: [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter.

Are the X86 behavior changes intentional?

Hi @fhahn, yes these changes are intentional - the cost model for X86 is now more accurate, and in some cases it ends up with a higher cost through getGSScalarCost because it knows the cost of the gather/scatter will be expanded later.

Jan 10 2022, 9:33 AM · Restricted Project
RosieSumpter updated the diff for D115329: [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter.
  • Updated descriptions of isScalarWithPredication and isPredicatedInst
  • rebased
Jan 10 2022, 9:33 AM · Restricted Project

Jan 4 2022

RosieSumpter added a comment to D115329: [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter.

Friendly ping @lebedev.ri, do you have any further comments for this patch?

Jan 4 2022, 2:33 AM · Restricted Project
RosieSumpter added inline comments to D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
Jan 4 2022, 2:28 AM · Restricted Project
RosieSumpter committed rG961f51fdf04f: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without… (authored by RosieSumpter).
[LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without…
Jan 4 2022, 2:26 AM
RosieSumpter closed D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
Jan 4 2022, 2:26 AM · Restricted Project

Dec 17 2021

RosieSumpter added inline comments to D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
Dec 17 2021, 6:46 AM · Restricted Project
RosieSumpter updated the diff for D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
  • Replaced SmallPtrSet CastInstsToRecurrenceType with unsigned MinWidthCastToRecurrenceType (there could possibly be a shorter name for this!)
  • Added 2 test cases to smallest-and-widest-types.ll
Dec 17 2021, 6:44 AM · Restricted Project

Dec 16 2021

RosieSumpter updated the diff for D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
  • Addressed nits
Dec 16 2021, 9:24 AM · Restricted Project
RosieSumpter updated the diff for D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
  • Split the cast instruction SmallPtrSet into 2 separate ones; CastsToRecurrenceType and CastsFromRecurrenceType
  • removed bool IgnoreCasts parameter in collectCastInstrs
  • call collectCastInsts for all recurrence descriptors which get saved at the end of AddReductionVar
  • query CastsToRecurrenceType when checking casts used by recurrence descriptors in getSmallestAndWidestTypes
Dec 16 2021, 8:12 AM · Restricted Project

Dec 15 2021

RosieSumpter added a comment to D115329: [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter.

Hi @lebedev.ri, the SLPVectorizer/X86/pr47629 tests have changed because isLegalMaskedGather now returns true for certain cases where it didn't before (due to the check on the number of vector elements now being in forceScalarizeMaskedGather as requested). It then calculates the gather/scatter cost, and because forceScalarizeMaskedGather returns true, it calculates the cost using getGSScalarCost. This cost is higher than before and so it chooses not to vectorize. An alternative approach would be to check forceScalarizeMaskedGather in isLegalMaskedGather instead of when calculating the cost, but this will then mean LoopVectorize will assume these operations need to be scalarized so will cause test failures there. What do you think the preferred option is here?

Dec 15 2021, 8:35 AM · Restricted Project
RosieSumpter updated the diff for D115329: [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter.
  • Removed forceScalarizeMaskedGather check from SLPVectorizer.cpp (since we don't check it in LoopVectorize.cpp)
  • Updated necessary SLPVectorizer/X86 tests
Dec 15 2021, 3:31 AM · Restricted Project

Dec 14 2021

RosieSumpter added a comment to D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.

I suppose this will fix PR33338 and PR52266?

Dec 14 2021, 5:16 AM · Restricted Project

Dec 13 2021

RosieSumpter updated the diff for D115329: [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter.
  • Change data type parameter for forceScalarizeMaskedGather/Scatter to be a VectorType instead of a Type
  • Address nits
Dec 13 2021, 8:34 AM · Restricted Project
RosieSumpter added a comment to D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.

As @fhahn pointed out, using the smallest legal integer type for the default max width negatively impacts performance if the smallest type used in the loop is smaller than the smallest legal integer type. Instead, this update iterates through the recurrences in the loop and sets the maximum width to be that of the smallest type used by the recurrences when ElementTypesInLoop is empty. To determine the smallest type used by recurrences, we need to check for any casts on the recurrences’ input operands, which are now found by collectCastInstrs. This means that the max VF isn’t restricted too much in cases where in-loop reductions use types with width smaller than the target's smallest legal int.

Dec 13 2021, 3:09 AM · Restricted Project
RosieSumpter updated the diff for D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
  • Change name of collectCastsToIgnore to collectCastInstrs and make it a member function of RecurrenceDescriptor.
  • Modify collectCastInstrs to also collect cast instructions where the destination type is the same as the recurrence type.
  • Call collectCastInstrs from LoopVectorizationCostModel::getSmallestAndWidestType in the case of in-loop reductions with no loads/stores to check through casts on recurrence operands when determining the max width.
Dec 13 2021, 1:32 AM · Restricted Project

Dec 10 2021

RosieSumpter updated the diff for D115329: [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter.
  • Moved code which checks if masked gathers/scatters are legal for the given vector width to forceScalarizeMaskedGather/Scatter for X86
  • Pass VF to isScalarWithPredication so that we are consistently passing a vector type to isLegalMaskedGather/Scatter
  • Add minimum vscale attribute to the necessary AArch64 tests
  • Remove changes to X86 tests (X86 behaviour is now unchanged by this patch)
Dec 10 2021, 3:26 AM · Restricted Project

Dec 8 2021

RosieSumpter requested review of D115329: [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter.
Dec 8 2021, 4:16 AM · Restricted Project

Nov 24 2021

RosieSumpter added inline comments to D113125: [LoopVectorize] Propagate fast-math flags for VPInstruction.
Nov 24 2021, 1:02 AM · Restricted Project
RosieSumpter committed rGdf32a39dd0f6: [LoopVectorize][CostModel] Update cost model for fmuladd intrinsic (authored by RosieSumpter).
[LoopVectorize][CostModel] Update cost model for fmuladd intrinsic
Nov 24 2021, 1:00 AM
RosieSumpter committed rG2d33327f9d4c: [LoopVectorize] Print fast-math flags for VPReductionRecipe (authored by RosieSumpter).
[LoopVectorize] Print fast-math flags for VPReductionRecipe
Nov 24 2021, 1:00 AM
RosieSumpter committed rG991074012a6c: [LoopVectorize] Propagate fast-math flags for VPInstruction (authored by RosieSumpter).
[LoopVectorize] Propagate fast-math flags for VPInstruction
Nov 24 2021, 1:00 AM
RosieSumpter committed rGc2441b6b89bf: [LoopVectorize] Add vector reduction support for fmuladd intrinsic (authored by RosieSumpter).
[LoopVectorize] Add vector reduction support for fmuladd intrinsic
Nov 24 2021, 1:00 AM
RosieSumpter closed D111630: [LoopVectorize][CostModel] Update cost model for fmuladd intrinsic.
Nov 24 2021, 1:00 AM · Restricted Project
RosieSumpter closed D113125: [LoopVectorize] Propagate fast-math flags for VPInstruction.
Nov 24 2021, 1:00 AM · Restricted Project
RosieSumpter closed D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.
Nov 24 2021, 12:59 AM · Restricted Project

Nov 23 2021

RosieSumpter updated the diff for D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.
  • Rebase + fixed conflict (added Exit->hasNUsesOrMore(3) check to checkOrderedReduction)
Nov 23 2021, 8:17 AM · Restricted Project
RosieSumpter updated the diff for D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.

These changes are to account for when there are multiple calls to fmuladd:

Nov 23 2021, 4:14 AM · Restricted Project

Nov 22 2021

RosieSumpter updated the diff for D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
  • Remove changes to LoopVectorize/pr32859.ll and LoopVectorize/pr36983.ll tests
Nov 22 2021, 2:37 AM · Restricted Project
RosieSumpter updated the diff for D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
  • Don't set max width to 32 if there are no legal int widths set (leave as 8)
Nov 22 2021, 1:54 AM · Restricted Project
RosieSumpter updated the diff for D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
  • If there are no element types, only set the max width to 32 if there are no legal int sizes set, otherwise set it to the smallest legal int width.
  • update X86/funclet.ll test.
Nov 22 2021, 1:39 AM · Restricted Project

Nov 16 2021

RosieSumpter updated the summary of D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.
Nov 16 2021, 3:07 AM · Restricted Project
RosieSumpter updated the diff for D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.
  • Corrected tests
  • Added negative test case where reduction phi appears as 2 operands of llvm.fmuladd
Nov 16 2021, 3:06 AM · Restricted Project
RosieSumpter updated the summary of D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
Nov 16 2021, 1:51 AM · Restricted Project
RosieSumpter requested review of D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores.
Nov 16 2021, 1:49 AM · Restricted Project

Nov 12 2021

RosieSumpter added a comment to D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.

I think, unlike the other opcodes in a reduction chain, we may need to check that the operand number is correct. The other opcodes are commutative so it doesn't matter which of the operands the reduction passes through, but for fmuladd we need to ensure we are dealing with the last addition parameter.

Nov 12 2021, 1:43 AM · Restricted Project
RosieSumpter updated the diff for D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.
  • Added a check that the reduction phi isn't one of the multiply operands of fmuladd to RecurrenceDescriptor::isRecurrenceInstr
  • Added a test case to strict-fadd.ll for the above
  • Used initializer list instead of SmallVector<> FMulOps when creating FMul VPInstruction
  • Simplified assert in LoopVectorizationPlanner::adjustRecipesForReductions
Nov 12 2021, 1:43 AM · Restricted Project

Nov 10 2021

RosieSumpter updated the diff for D113125: [LoopVectorize] Propagate fast-math flags for VPInstruction.
  • Changed FMF.print(O) to O << FMF
  • Added fast-math flags to VPReductionRecipe::print
Nov 10 2021, 1:19 AM · Restricted Project

Nov 9 2021

RosieSumpter updated the diff for D113125: [LoopVectorize] Propagate fast-math flags for VPInstruction.
  • Defined overloaded operator << for FastMathFlags and used it in AsmWriter.cpp
Nov 9 2021, 4:22 AM · Restricted Project

Nov 5 2021

RosieSumpter added a comment to D113125: [LoopVectorize] Propagate fast-math flags for VPInstruction.

Thanks for the comments @dmgreen and @fhahn. RE printing the fast-math flags, it seems the only way is to check for each flag and print the correct word, as was being done in WriteOptimizationInfo in llvm/lib/IR/AsmWriter.cpp. Since this is now occurring in two places, I've added a print method to FastMathFlags. Let me know if you think this is the right way to go.

Nov 5 2021, 5:34 AM · Restricted Project
RosieSumpter updated the diff for D113125: [LoopVectorize] Propagate fast-math flags for VPInstruction.
  • Added a print method to the FastMathFlags class and used it in VPInstruction::print
  • Added a test to vplan-printing.ll
  • Assert that the VPInstruction opcode is for a floating-point operation when setting fast-math flags
Nov 5 2021, 5:34 AM · Restricted Project

Nov 3 2021

RosieSumpter added a comment to D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.

I think you can remove the "strict" from the title and summary of this patch, if I'm understanding what strict means here. As far as I understand it should enable vectorization for strict (inloop) and non-strict (out of loop/fast) reductions of llvm.fmuladd, which is nice.

Nov 3 2021, 10:31 AM · Restricted Project
RosieSumpter updated the diff for D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.
  • Moved setFastMathFlags change to a follow-up patch D113125
  • Used isFMulAddIntrinsic in place of RecurKind::FMulAdd for safety
  • Used an initializer list for the fmul operands
Nov 3 2021, 10:29 AM · Restricted Project
RosieSumpter requested review of D113125: [LoopVectorize] Propagate fast-math flags for VPInstruction.
Nov 3 2021, 10:27 AM · Restricted Project

Nov 2 2021

RosieSumpter added inline comments to D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.
Nov 2 2021, 2:37 AM · Restricted Project