Page MenuHomePhabricator

dmgreen (Dave Green)
User

Projects

User does not belong to any projects.

User Details

User Since
May 24 2016, 8:35 AM (264 w, 6 d)

Recent Activity

Today

dmgreen requested review of D104659: [ARM] Limit v6m unrolling with multiple live outs.
Mon, Jun 21, 11:12 AM · Restricted Project
dmgreen updated the diff for D103263: [AArch64] Add S/UQXTRN tablegen patterns..
Mon, Jun 21, 12:45 AM · Restricted Project
dmgreen added inline comments to D103263: [AArch64] Add S/UQXTRN tablegen patterns..
Mon, Jun 21, 12:45 AM · Restricted Project
dmgreen updated the diff for D103755: [DAG] Fold neg(bvsplat(neg(x)) -> bvsplat(x).

Use getSplatValue.

Mon, Jun 21, 12:14 AM · Restricted Project
dmgreen added inline comments to D103755: [DAG] Fold neg(bvsplat(neg(x)) -> bvsplat(x).
Mon, Jun 21, 12:12 AM · Restricted Project

Yesterday

dmgreen committed rGa24b02193a30: [DSE] Remove stores in the same loop iteration (authored by dmgreen).
[DSE] Remove stores in the same loop iteration
Sun, Jun 20, 9:05 AM

Sat, Jun 19

dmgreen accepted D103903: [ARM] Transform a fixed-point to floating-point conversion into a VCVT_fix.

Thanks. This LGTM

Sat, Jun 19, 2:18 AM · Restricted Project

Fri, Jun 18

dmgreen accepted D104236: [AArch64] Add a TableGen pattern to generate uaddlv from uaddlp and addv.

Thanks. LGTM if you fix the v2i32 pattern to be like the others.

Fri, Jun 18, 8:45 AM · Restricted Project
dmgreen added a comment to D104538: [CostModel][AArch64] Improve cost model for vector reduction intrinsics.

Do And and Xor reductions work in the same way, with the same costs? Can we do those at the same time too?

Fri, Jun 18, 8:45 AM · Restricted Project
dmgreen added a comment to D104236: [AArch64] Add a TableGen pattern to generate uaddlv from uaddlp and addv.

For addp(addlp(x)) ->addv(x), addv genereates i32 output from v4i32 input and i16 output from v4i16 input. However, addp(addlp(x)) generates i64 output from v4i32 input and i32 output from v4i16 input. I think the outputs could not be same with big numbers input between addp(addlp(x)) and addv(x). If I missed something, please let me know.

Fri, Jun 18, 3:25 AM · Restricted Project
dmgreen requested review of D104515: [ARM] Lower MVETRUNC to stack operations.
Fri, Jun 18, 3:04 AM · Restricted Project
dmgreen updated the diff for D91921: [ARM] Introduce MVETRUNC ISel lowering.

Sorry about the very long delay. Other things came up, but I still think this is worthwhile. It feels a little odd to introduce a target node that isn't legal, but it helps in the lowering of truncates under MVE. It should allow us to, instead of expanding the trunc into a lot of lane extract/insert operations, lower them to two truncating stack stores and a reload (which I will do as a separate patch). We can then do the same for zext/sext, which should help improve the worst case lowering for a lot of zext/sext/trunc we currently see. And help with the lowering of nodes like VABD like we see here, and things like VMULL and certain vector reductions.

Fri, Jun 18, 3:00 AM · Restricted Project
dmgreen added a comment to D104236: [AArch64] Add a TableGen pattern to generate uaddlv from uaddlp and addv.

AArch64uaddv is a slightly overloaded term. It comes from a vecreduce and can be lowered to a number of things. I think for anything that is v2iX, it will produce an addp, not an addv. So we want to turn addp(addlp(x)) ->addv(x), but that is still the same pattern extended to v4i32->v2i64->i64 and v4i16->v2i32->i32 variants.

Fri, Jun 18, 12:52 AM · Restricted Project

Thu, Jun 17

dmgreen added a comment to D100464: [DSE] Remove stores in the same loop iteration.

spec2000/186 passes now -- thanks.

Thu, Jun 17, 7:18 AM · Restricted Project
dmgreen committed rGfda8b4714e05: [InterleaveAccess] Copy fast math flags when adjusting binary operators in… (authored by dmgreen).
[InterleaveAccess] Copy fast math flags when adjusting binary operators in…
Thu, Jun 17, 1:54 AM
dmgreen closed D104255: [InterleaveAccess] Copy fast math flags when adjusting binary operators in interleave access pass.
Thu, Jun 17, 1:53 AM · Restricted Project

Wed, Jun 16

dmgreen added a comment to D104236: [AArch64] Add a TableGen pattern to generate uaddlv from uaddlp and addv.

Thanks. I would expect 2 more types I think. v2i32 and v2i64? It may be better to create a new multiclass like SIMDAcrossLanesIntrinsic.

Wed, Jun 16, 7:01 AM · Restricted Project
dmgreen requested review of D100464: [DSE] Remove stores in the same loop iteration.
Wed, Jun 16, 5:12 AM · Restricted Project
dmgreen reopened D100464: [DSE] Remove stores in the same loop iteration.
Wed, Jun 16, 5:12 AM · Restricted Project
dmgreen updated the diff for D100464: [DSE] Remove stores in the same loop iteration.

Reopening with adjusted code (and more test cases). The issues were related to the checks for all paths leading to the exit collectively postdominating the earlier dead store. But:

  • A store that walked back to itself was ignored, effectively killing itself. This was this issue that came up from the reproducer. It is only true if the access is loop invariant, so it now uses an extra isGuaranteedLoopInvariant() check.
  • Stores in blocks with later PO numbers were again effectively ignored. With loops they need to be handled, in this case by returning None.
Wed, Jun 16, 5:11 AM · Restricted Project
dmgreen committed rG0a714eaa51d0: [ARM] Correct type of setcc results for FP vectors (authored by dmgreen).
[ARM] Correct type of setcc results for FP vectors
Wed, Jun 16, 3:11 AM
dmgreen committed rG3f18fc5ece72: [ARM] Extra tests for sign extended floating point compares. NFC (authored by dmgreen).
[ARM] Extra tests for sign extended floating point compares. NFC
Wed, Jun 16, 2:50 AM
dmgreen edited reviewers for D103263: [AArch64] Add S/UQXTRN tablegen patterns., added: sdesmalen, CarolineConcatto; removed: RKSimon.
Wed, Jun 16, 1:58 AM · Restricted Project
dmgreen added a comment to D104236: [AArch64] Add a TableGen pattern to generate uaddlv from uaddlp and addv.

It sounds like it needs something like SIMDAcrossLanesIntrinsic but for the combined uaddlp + addv patterns. That way it should be able to handle both the (AArch64uaddv(AArch64uaddlp(..)) form and the extract(insert((AArch64uaddv(AArch64uaddlp(..)))) form, hopefully.

Wed, Jun 16, 1:08 AM · Restricted Project

Tue, Jun 15

dmgreen committed rG93aa445e16f7: Revert "[ARM] Extend narrow values to allow using truncating scatters" (authored by dmgreen).
Revert "[ARM] Extend narrow values to allow using truncating scatters"
Tue, Jun 15, 10:19 AM
dmgreen committed rGb9bd2936f9cf: [ARM] Extend narrow values to allow using truncating scatters (authored by dmgreen).
[ARM] Extend narrow values to allow using truncating scatters
Tue, Jun 15, 9:45 AM
dmgreen closed D103704: [ARM] Extend narrow values to allow using truncating scatters.
Tue, Jun 15, 9:45 AM · Restricted Project
dmgreen committed rG680d3f8f1785: [ARM] Use rq gather/scatters for smaller v4 vectors (authored by dmgreen).
[ARM] Use rq gather/scatters for smaller v4 vectors
Tue, Jun 15, 9:06 AM
dmgreen closed D103674: [ARM] Use rq gather/scatters for smaller v4 vectors.
Tue, Jun 15, 9:06 AM · Restricted Project
dmgreen committed rG09924cbab780: [ARM] Rejig some of the MVE gather/scatter lowering pass. NFC (authored by dmgreen).
[ARM] Rejig some of the MVE gather/scatter lowering pass. NFC
Tue, Jun 15, 7:39 AM
dmgreen added inline comments to D103903: [ARM] Transform a fixed-point to floating-point conversion into a VCVT_fix.
Tue, Jun 15, 7:32 AM · Restricted Project
dmgreen added inline comments to D104255: [InterleaveAccess] Copy fast math flags when adjusting binary operators in interleave access pass.
Tue, Jun 15, 1:08 AM · Restricted Project
dmgreen updated the diff for D104255: [InterleaveAccess] Copy fast math flags when adjusting binary operators in interleave access pass.
Tue, Jun 15, 1:08 AM · Restricted Project

Mon, Jun 14

dmgreen added a comment to D103799: [CostModel] Express cost(urem) as cost(div+mul+sub) when set to Expand..

This sounds OK to me, so long as the X86 numbers are not bonkers.

Mon, Jun 14, 2:38 PM · Restricted Project
dmgreen added inline comments to D103903: [ARM] Transform a fixed-point to floating-point conversion into a VCVT_fix.
Mon, Jun 14, 2:18 PM · Restricted Project
dmgreen added a comment to D104236: [AArch64] Add a TableGen pattern to generate uaddlv from uaddlp and addv.

Can we add the other types too? It's good to add all the varieties if we can.

Mon, Jun 14, 2:15 PM · Restricted Project
dmgreen requested review of D104255: [InterleaveAccess] Copy fast math flags when adjusting binary operators in interleave access pass.
Mon, Jun 14, 12:04 PM · Restricted Project
dmgreen added inline comments to D104247: [DAGCombine] reassoc flag shouldn't enable contract.
Mon, Jun 14, 11:43 AM · Restricted Project
dmgreen accepted D104042: [AArch64] Improve SAD pattern.

Thanks. LGTM

Mon, Jun 14, 6:33 AM · Restricted Project

Sun, Jun 13

dmgreen committed rG562593ff82f8: [DSE] Extra multiblock loop tests, NFC. (authored by dmgreen).
[DSE] Extra multiblock loop tests, NFC.
Sun, Jun 13, 2:33 PM
dmgreen committed rGbee2f618d599: [ARM] Introduce t2WhileLoopStartTP (authored by dmgreen).
[ARM] Introduce t2WhileLoopStartTP
Sun, Jun 13, 5:56 AM
dmgreen closed D103236: [ARM] Introduce t2WhileLoopStartTP.
Sun, Jun 13, 5:56 AM · Restricted Project

Fri, Jun 11

dmgreen added a comment to D104042: [AArch64] Improve SAD pattern.

It is good point! I have tried below pattern following your suggestion. It seems to work. If you are ok, let me add below pattern in this patch.

let AddedComplexity = 10 in { 
def : Pat<(i32 (extractelt
                 (v4i32 (AArch64uaddv (v4i32 (AArch64uaddlp (v8i16 V128:$op))))),
                 (i64 0))),
          (UADDLVv8i16v V128:$op)>;
}

Do you mind doing this as a new patch? As it does feel logically separable. If we can test them, it would be good to add the various other sizes too. And, I'm not sure about this, but maybe it doesn't need to start from the extract, and can produce a INSERT_SUBREG like some of the other patterns do (like the ones from SIMDAcrossLanesIntrinsic). That might remove the need for the added complexity, and the INSERT_SUBREG / EXRACT_SUBREG should all get cleared up later in the pipeline.

Fri, Jun 11, 9:38 AM · Restricted Project
dmgreen accepted D103952: [CostModel][AArch64] Improve the cost estimate of CTPOP intrinsic.

Looks sensible to me, thanks

Fri, Jun 11, 1:00 AM · Restricted Project

Thu, Jun 10

dmgreen committed rG5d5b686f6bf6: [ARM] Fix Changed status in MVEGatherScatterLoweringPass. (authored by dmgreen).
[ARM] Fix Changed status in MVEGatherScatterLoweringPass.
Thu, Jun 10, 1:53 PM
dmgreen committed rGe0c605f6383c: [ARM] Ensure instructions are simplified prior to GatherScatter lowering. (authored by dmgreen).
[ARM] Ensure instructions are simplified prior to GatherScatter lowering.
Thu, Jun 10, 12:19 PM
dmgreen closed D103150: [ARM] Ensure instructions are simplified prior to GatherScatter lowering..
Thu, Jun 10, 12:18 PM · Restricted Project
dmgreen added a comment to D104042: [AArch64] Improve SAD pattern.

Nice patch. It's a big combine, but we certainly have bigger elsewhere. Sometimes it's possible to do things like this as smaller pieces, that add up to the same thing. Speaking of which, would it be possible to combine (in a separate patch, maybe as a tablegen pattern) addv (uaddlp x) -> uaddlv x ?

Thu, Jun 10, 12:16 PM · Restricted Project
dmgreen committed rG9872551ca09b: [ARM] Skip debug during vpt block creation (authored by dmgreen).
[ARM] Skip debug during vpt block creation
Thu, Jun 10, 6:49 AM
dmgreen committed rGdb9ba830d4b3: [ARM] MVE VPT block tests with debug info. NFC (authored by dmgreen).
[ARM] MVE VPT block tests with debug info. NFC
Thu, Jun 10, 6:49 AM
dmgreen closed D103610: [ARM] Skip debug during vpt block creation.
Thu, Jun 10, 6:49 AM · Restricted Project
dmgreen accepted D102755: [AArch64] Add cost tests for bitreverse.

Thanks. LGTM

Thu, Jun 10, 2:58 AM · Restricted Project

Wed, Jun 9

dmgreen updated subscribers of D103999: [AArch64][GlobalISel] Mark some G_BITREVERSE types as legal + select them.
Wed, Jun 9, 11:32 PM · Restricted Project
dmgreen added a comment to D103882: [CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> expensive..

It seems like a simpler overall route to remove the assert from the vectorizer, and allow it to deal with invalid costs. But I am not blocking this patch.

Wed, Jun 9, 11:01 AM · Restricted Project
dmgreen updated subscribers of D102755: [AArch64] Add cost tests for bitreverse.
Wed, Jun 9, 10:27 AM · Restricted Project
dmgreen updated subscribers of D103952: [CostModel][AArch64] Improve the cost estimate of CTPOP intrinsic.
Wed, Jun 9, 10:27 AM · Restricted Project
dmgreen added a comment to D102755: [AArch64] Add cost tests for bitreverse.

Costs look good, as far as I can tell.

Wed, Jun 9, 10:04 AM · Restricted Project
dmgreen accepted D103836: [ARM][NEON] Combine base address updates for vld1Ndup intrinsics.

I got confused by the _fixed vs _register vs _UPD for a while, but I see that's pre-existing :)

Wed, Jun 9, 2:08 AM · Restricted Project
dmgreen added a comment to D103882: [CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> expensive..

Why 9999? Why not an invalid cost? I thought that was what invalid was for, so we didn't need hacky random numbers like this.

We've chosen not to conflate legalization and cost-modelling, so that the cost-model must return a valid cost when the legalization phase says that a given VF is legal to use. The benefit of that is that it guards the requirement to have a complete (albeit not necessarily accurate) cost-model. For SVE a VF of vscale x 1 is legal in principle, meaning that we should be able to code-generate it. But because our code-generator is incomplete, we want to avoid using these VFs. I haven't added an interface to query the minimum legal VF, since I know we'll remove that interface at some point. The LV will expect all costs calculated to be Valid and will fail this assertion otherwise. So basically return <magic number> is just a temporary stop-gap to avoid failing that (and other) assertion failures.

Does that make more sense?

Wed, Jun 9, 12:59 AM · Restricted Project
dmgreen added inline comments to D103939: [SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE..
Wed, Jun 9, 12:13 AM · Restricted Project

Tue, Jun 8

dmgreen committed rG0178ae734ca3: [DSE] Add another multiblock loop DSE test. NFC (authored by dmgreen).
[DSE] Add another multiblock loop DSE test. NFC
Tue, Jun 8, 1:55 PM
dmgreen added a reverting change for rG222aeb4d51a4: [DSE] Remove stores in the same loop iteration: rG297088d1add7: Revert "[DSE] Remove stores in the same loop iteration".
Tue, Jun 8, 1:23 PM
dmgreen committed rG297088d1add7: Revert "[DSE] Remove stores in the same loop iteration" (authored by dmgreen).
Revert "[DSE] Remove stores in the same loop iteration"
Tue, Jun 8, 1:23 PM
dmgreen added a comment to D100464: [DSE] Remove stores in the same loop iteration.

Oh so it does. Thanks for the report. I thought we had a test case for that...

Tue, Jun 8, 1:07 PM · Restricted Project
dmgreen committed rGd7853bae9410: [ARM] Generate VDUP(Const) from constant buildvectors (authored by dmgreen).
[ARM] Generate VDUP(Const) from constant buildvectors
Tue, Jun 8, 12:52 PM
dmgreen closed D103808: [ARM] Generate VDUP(Const) from constant buildvectors.
Tue, Jun 8, 12:52 PM · Restricted Project
dmgreen committed rGf44770c32992: [ARM] A couple of extra VMOVimm tests, useful for showing BE codegen. NFC (authored by dmgreen).
[ARM] A couple of extra VMOVimm tests, useful for showing BE codegen. NFC
Tue, Jun 8, 11:40 AM
dmgreen added inline comments to D103704: [ARM] Extend narrow values to allow using truncating scatters.
Tue, Jun 8, 10:50 AM · Restricted Project
dmgreen added a comment to D103903: [ARM] Transform a fixed-point to floating-point conversion into a VCVT_fix.

Nice one. Sounds funky.

Tue, Jun 8, 10:37 AM · Restricted Project
dmgreen added a comment to D103882: [CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> expensive..

Why 9999? Why not an invalid cost? I thought that was what invalid was for, so we didn't need hacky random numbers like this.

Tue, Jun 8, 9:31 AM · Restricted Project
dmgreen added a comment to D103755: [DAG] Fold neg(bvsplat(neg(x)) -> bvsplat(x).

Funnily enough I was wondering about this pattern the other day as a followup to D98778...

Should we always be folding unaryop(splat(x)) -> splat(unaryop(x)) if the unaryop is legal/custom on the scalar type? And then maybe extend that to binop(splat(x),splat(y)) -> splat(binop(x,y)) as well?

Tue, Jun 8, 2:22 AM · Restricted Project
dmgreen updated the diff for D103755: [DAG] Fold neg(bvsplat(neg(x)) -> bvsplat(x).

Rebase over D103756

Tue, Jun 8, 2:21 AM · Restricted Project
dmgreen committed rGb889c6ee9911: [DAG] Allow isNullOrNullSplat to see truncated zeroes (authored by dmgreen).
[DAG] Allow isNullOrNullSplat to see truncated zeroes
Tue, Jun 8, 2:19 AM
dmgreen closed D103756: [DAG] Allow isNullOrNullSplat to see truncated zeroes.
Tue, Jun 8, 2:19 AM · Restricted Project
dmgreen added a comment to D103756: [DAG] Allow isNullOrNullSplat to see truncated zeroes.

Thanks Folks.

Tue, Jun 8, 2:13 AM · Restricted Project
dmgreen updated the diff for D103808: [ARM] Generate VDUP(Const) from constant buildvectors.

Added two new test cases, mov_int8_1234 that does like you said i8 <1,2,3,4,1,2,3,4,..> and mov_int32_16908546 which is 0x1020102 VDUP'd as a i16.

Tue, Jun 8, 12:42 AM · Restricted Project
dmgreen added inline comments to D102755: [AArch64] Add cost tests for bitreverse.
Tue, Jun 8, 12:34 AM · Restricted Project

Mon, Jun 7

dmgreen requested review of D103808: [ARM] Generate VDUP(Const) from constant buildvectors.
Mon, Jun 7, 6:20 AM · Restricted Project
dmgreen added inline comments to D103674: [ARM] Use rq gather/scatters for smaller v4 vectors.
Mon, Jun 7, 12:12 AM · Restricted Project
dmgreen updated the diff for D103674: [ARM] Use rq gather/scatters for smaller v4 vectors.

Update comment

Mon, Jun 7, 12:12 AM · Restricted Project

Sun, Jun 6

dmgreen updated the diff for D103756: [DAG] Allow isNullOrNullSplat to see truncated zeroes.
Sun, Jun 6, 11:56 PM · Restricted Project
dmgreen added a comment to D103756: [DAG] Allow isNullOrNullSplat to see truncated zeroes.

Whilst I'm here also add a call to peekThroughBitcasts as bitcast Zero is still Zero, and removed a related TODO comment from isOneOrOneSplat as the bitcast of 1 isn't always still 1.

Is this necessary for any of the test changes? I just wonder whether we should add this separately.

Sun, Jun 6, 11:55 PM · Restricted Project
dmgreen committed rGc85766f79b2e: [ARM] MVE tests for vmull from a splat. NFC (authored by dmgreen).
[ARM] MVE tests for vmull from a splat. NFC
Sun, Jun 6, 2:30 PM
dmgreen committed rG8f8273c54db9: [AArch64] Extra tests for vector shift. NFC (authored by dmgreen).
[AArch64] Extra tests for vector shift. NFC
Sun, Jun 6, 2:30 PM

Sat, Jun 5

dmgreen committed rG12f53e5392d6: [AArch64] Remove AArch64ISD::NEG (authored by dmgreen).
[AArch64] Remove AArch64ISD::NEG
Sat, Jun 5, 11:55 AM
dmgreen closed D103703: [AArch64] Remove AArch64ISD::NEG.
Sat, Jun 5, 11:55 AM · Restricted Project
dmgreen requested review of D103756: [DAG] Allow isNullOrNullSplat to see truncated zeroes.
Sat, Jun 5, 11:51 AM · Restricted Project
dmgreen requested review of D103755: [DAG] Fold neg(bvsplat(neg(x)) -> bvsplat(x).
Sat, Jun 5, 11:49 AM · Restricted Project

Fri, Jun 4

dmgreen requested review of D103704: [ARM] Extend narrow values to allow using truncating scatters.
Fri, Jun 4, 9:53 AM · Restricted Project
dmgreen requested review of D103703: [AArch64] Remove AArch64ISD::NEG.
Fri, Jun 4, 9:19 AM · Restricted Project
dmgreen accepted D103604: [AArch64] Further enable UnrollAndJam.

Thanks. LGTM

Fri, Jun 4, 1:21 AM · Restricted Project
dmgreen requested review of D103674: [ARM] Use rq gather/scatters for smaller v4 vectors.
Fri, Jun 4, 1:04 AM · Restricted Project

Thu, Jun 3

dmgreen added a reviewer for D103629: [AArch64] Cost-model i8 vector loads/stores: asavonic.
Thu, Jun 3, 10:18 PM · Restricted Project
dmgreen added a comment to D103604: [AArch64] Further enable UnrollAndJam.

UnJ isn't enabled by default upstream yet in the pass manager, but it should be OK to set the option in the target.

Thu, Jun 3, 8:53 AM · Restricted Project
dmgreen requested review of D103610: [ARM] Skip debug during vpt block creation.
Thu, Jun 3, 5:18 AM · Restricted Project
dmgreen committed rG929c54379a48: [ARM] Prettify gather/scatter debug comments. NFC (authored by dmgreen).
[ARM] Prettify gather/scatter debug comments. NFC
Thu, Jun 3, 4:36 AM
dmgreen updated the diff for D103236: [ARM] Introduce t2WhileLoopStartTP.

Sounds good. Now with a getWhileLoopStartTargetBB used in more places.

Thu, Jun 3, 4:14 AM · Restricted Project
dmgreen added inline comments to D103180: [InstSimplify] Add constant fold for extractelement + splat for scalable vectors.
Thu, Jun 3, 1:58 AM · Restricted Project

Wed, Jun 2

dmgreen accepted D103105: [AArch64] Optimise bitreverse lowering in ISel.

Thanks. LGTM.

Wed, Jun 2, 12:22 AM · Restricted Project

Tue, Jun 1

dmgreen added inline comments to D103105: [AArch64] Optimise bitreverse lowering in ISel.
Tue, Jun 1, 12:42 AM · Restricted Project