Page MenuHomePhabricator

SjoerdMeijer (Sjoerd Meijer)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 26 2016, 2:17 AM (374 w, 4 d)

Recent Activity

Wed, Mar 29

SjoerdMeijer added a comment to D146033: [AArch64][TTI] Cost model FADD/FSUB/FNEG.

Gentle ping. Any further opinions on this?

Wed, Mar 29, 1:10 AM · Restricted Project, Restricted Project

Mon, Mar 27

SjoerdMeijer accepted D145379: [FuncSpec] Cost model improvements..

Cheers, LGTM

Mon, Mar 27, 1:40 AM · Restricted Project, Restricted Project

Fri, Mar 24

SjoerdMeijer added a comment to D146708: [AArch64][CodeGen] Reduce cost of indexed ld1 instructions for Neoverse V1/V2 cores.

It's strange that I first increase the cost, and this proposes it to lower it. Clearly we need some alignment and clarity on this. But most importantly, we need some motivating examples before we continue with any of this.

Fri, Mar 24, 1:49 AM · Restricted Project, Restricted Project

Thu, Mar 23

SjoerdMeijer added a comment to D143162: [AArch64] Add PredictableSelectIsExpensive feature for the Neoverse V1.

I did some performance runs, nothing stands out, so LGTM.

Thu, Mar 23, 2:23 AM · Restricted Project, Restricted Project

Wed, Mar 22

SjoerdMeijer added a comment to D143162: [AArch64] Add PredictableSelectIsExpensive feature for the Neoverse V1.

Sorry, I also forgot about this, but thanks for pinging me.

Wed, Mar 22, 5:48 AM · Restricted Project, Restricted Project

Mon, Mar 20

SjoerdMeijer added a comment to D109958: [LoopFlatten] Enable it by default.

FWIW, I am quite unhappy with the implementation quality of this pass, but I don't think I have the energy to deal with this. In the future, due diligence for pass enablement needs to include a review of the pass implementation by a domain expert, if this was not already done as part of the initial implementation. (Domain expert = SCEV reviewer in this context.)

Mon, Mar 20, 12:16 PM · Restricted Project, Restricted Project
SjoerdMeijer accepted D146407: [AArch64] Combine fadd into fcmla.

Cheers, this makes a lot of sense to me.

Mon, Mar 20, 7:09 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D109958: [LoopFlatten] Enable it by default.

Just a heads up that I am going to recommit this tomorrow.

Mon, Mar 20, 6:33 AM · Restricted Project, Restricted Project
SjoerdMeijer committed rTf36619ce1b38: compare.py: increase --filter-short threshold, and accept optional argument (authored by SjoerdMeijer).
compare.py: increase --filter-short threshold, and accept optional argument
Mon, Mar 20, 6:11 AM · Restricted Project
SjoerdMeijer closed D144101: [test-suite] Increase the --filter-short threshold.
Mon, Mar 20, 6:11 AM · Restricted Project
SjoerdMeijer updated the diff for D146033: [AArch64][TTI] Cost model FADD/FSUB/FNEG.

Added FNEG, and checks for BF16 and FP16.

Mon, Mar 20, 5:16 AM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D146407: [AArch64] Combine fadd into fcmla.
Mon, Mar 20, 2:31 AM · Restricted Project, Restricted Project

Fri, Mar 17

SjoerdMeijer added inline comments to D145379: [FuncSpec] Cost model improvements..
Fri, Mar 17, 12:40 PM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D146033: [AArch64][TTI] Cost model FADD/FSUB/FNEG.

I think this makes a lot of sense. Especially if we are treating many shuffles with a cost of 1, floating point operations should not be twice the cost. We could consider doing the same for fmul from looking at software optimization guides, but the changes for fadd/fsub already have a high likelihood of causing some large changes. Adding fneg is worth it though as that should be a simple operation.

This might lead to a fair number of changes. From looking at the main regression I found for example, it was a case of identical code being produced by the loop vectorizer, but scalar and vector costs being closer lead it to have a higher minimum trip count for entering the vector body (from D109368). The real problem there is aliasing causing LD4 loads to not be recognized though, so it's tough to see how this change is really to blame. There are a number of improvements too, to make up for that regression.

@fhahn do you have any thoughts on the change from your side, or any benchmark results that could be helpful?

Fri, Mar 17, 12:22 PM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D145379: [FuncSpec] Cost model improvements..
Fri, Mar 17, 5:11 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D145578: [AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation.

Just a heads up with are seeing a 10% regression caused by this change in a very SLP sensitive workload (the original source for the slp-fma-loss.ll tests). I still have to double check where the slowdown is coming from exactly.

Fri, Mar 17, 3:49 AM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D145379: [FuncSpec] Cost model improvements..
Fri, Mar 17, 2:23 AM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D145379: [FuncSpec] Cost model improvements..
Fri, Mar 17, 2:01 AM · Restricted Project, Restricted Project
SjoerdMeijer abandoned D94308: [MachineSink] SinkIntoLoop: analyse stores and aliases in between.
Fri, Mar 17, 1:44 AM · Restricted Project, Restricted Project
SjoerdMeijer abandoned D55059: [ARM] FP16: constant initialised v4f16 and v8f16 vectors.
Fri, Mar 17, 1:43 AM · Restricted Project
SjoerdMeijer abandoned D35011: [ARM] add v4f16 and v8f16 as legal types.
Fri, Mar 17, 1:43 AM · Restricted Project
SjoerdMeijer abandoned D53710: [FileCheck] Warn if a prefix is only used in LABEL checks.
Fri, Mar 17, 1:42 AM · Restricted Project
SjoerdMeijer abandoned D54769: [FileCheck] New option -warn.
Fri, Mar 17, 1:42 AM · Restricted Project
SjoerdMeijer abandoned D23067: TargetInstrInfo: add two new target hooks to analyse branch offsets .
Fri, Mar 17, 1:41 AM · Restricted Project
SjoerdMeijer abandoned D23355: getInstSizeInBytes: sentinel value fix for AArch64, AMDGPU, MSP430.
Fri, Mar 17, 1:41 AM · Restricted Project
SjoerdMeijer abandoned D97050: [LoopInfo] Look through trunc instructions.
Fri, Mar 17, 1:40 AM · Restricted Project, Restricted Project
SjoerdMeijer abandoned D89693: [AArch64] Favor pre-increments and implement TTI::getPreferredAddressingMode.
Fri, Mar 17, 1:40 AM · Restricted Project, Restricted Project
SjoerdMeijer abandoned D138421: [AArch64][SVE] Enable Tail-Folding. WIP.
Fri, Mar 17, 1:39 AM · Restricted Project, Restricted Project
SjoerdMeijer abandoned D125008: [llvm-objdump] Print Mnemonic Histogram.
Fri, Mar 17, 1:39 AM · Restricted Project, Restricted Project
SjoerdMeijer abandoned D104373: [FuncSpec] Option for specializing const globals and function pointers..
Fri, Mar 17, 1:38 AM · Restricted Project, Restricted Project
SjoerdMeijer abandoned D59129: [SROA] Don't lower expensive allocas with minsize.
Fri, Mar 17, 1:36 AM · Restricted Project
SjoerdMeijer abandoned D26749: Generate aeabi_cdcmple libcalls.
Fri, Mar 17, 1:35 AM · Restricted Project
SjoerdMeijer abandoned D102748: [LoopUnroll] Don't unroll before vectorisation.
Fri, Mar 17, 1:34 AM · Restricted Project, Restricted Project

Thu, Mar 16

SjoerdMeijer added a comment to D144101: [test-suite] Increase the --filter-short threshold.

Thanks for your help and reviews! I will commit this tomorrow if there will be no further comments.

Thu, Mar 16, 8:19 AM · Restricted Project
SjoerdMeijer retitled D146033: [AArch64][TTI] Cost model FADD/FSUB/FNEG from [AArch64][TTI] Cost model FADD/FSUB. WIP. to [AArch64][TTI] Cost model FADD/FSUB.
Thu, Mar 16, 8:15 AM · Restricted Project, Restricted Project
SjoerdMeijer updated the diff for D146033: [AArch64][TTI] Cost model FADD/FSUB/FNEG.

Added context, fixed the test cases.

Thu, Mar 16, 8:13 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D146033: [AArch64][TTI] Cost model FADD/FSUB/FNEG.

I have now benchmarked RAJAPerf too, which is a loop based and FP heavy benchmark. Results are neutral, mostly no changes.
So overall I am only seeing the uplift of this, with no regression.

Thu, Mar 16, 7:53 AM · Restricted Project, Restricted Project

Wed, Mar 15

SjoerdMeijer added a comment to D145379: [FuncSpec] Cost model improvements..

I am wondering if you're trying to do too much in this patch. But I only had a quick look at this, and need to look again.
But my first request is going to be about the cost/bonus calculation. Since this is crucial for this transformation, it would be useful to document and comment the idea/algorithm at some place, with all the formulas and definitions (costs, bonus, latency, code size etc). I think this would greatly improve readability of these changes and the code in general.

Wed, Mar 15, 12:49 PM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D145819: [FuncSpec] Increase the maximum number of times the specializer can run..

High level question first, it looks like that geomean compile-times improve, and that is very surprising.
My guess is that one of the functional changes in findSpecialization is a good improvement on itself. Or maybe it is the caching of the CodeMetrics? Anyway, that's my question, just curious if you know what it is?

Wed, Mar 15, 12:07 PM · Restricted Project, Restricted Project

Tue, Mar 14

SjoerdMeijer accepted D146035: [AArch64] Add FP16 broadcast and transpose costs.

Looks like a good fix to me.

Tue, Mar 14, 5:20 AM · Restricted Project, Restricted Project
SjoerdMeijer requested review of D146033: [AArch64][TTI] Cost model FADD/FSUB/FNEG.
Tue, Mar 14, 4:51 AM · Restricted Project, Restricted Project

Mon, Mar 13

SjoerdMeijer committed rG775451b66a4c: [AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation (authored by SjoerdMeijer).
[AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation
Mon, Mar 13, 8:14 AM · Restricted Project, Restricted Project
SjoerdMeijer closed D145578: [AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation.
Mon, Mar 13, 8:14 AM · Restricted Project, Restricted Project
SjoerdMeijer updated the diff for D145578: [AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation.

Thanks, I have restored that piece of logic and added the code-size check to it (and added a code size check to the test).
That means that we now get add an additional cost for TCK_RecipThroughput, as well as for TCK_Latency and TCK_SizeAndLatency.

Mon, Mar 13, 6:20 AM · Restricted Project, Restricted Project

Fri, Mar 10

SjoerdMeijer updated the diff for D145578: [AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation.

I like it when I can delete things and achieve the same, so I have just done that. This was my understanding of your comments. Thanks for the suggestion and for looking into this.

Fri, Mar 10, 5:52 AM · Restricted Project, Restricted Project

Thu, Mar 9

SjoerdMeijer added inline comments to D144101: [test-suite] Increase the --filter-short threshold.
Thu, Mar 9, 3:47 AM · Restricted Project
SjoerdMeijer updated the diff for D144101: [test-suite] Increase the --filter-short threshold.

Option --filter-short now accepts an optional arguments, and it defaults to 1.0s.
Some special care had to be taken if this optional argument is omitted, it then needs to recognise that the FILE arguments is not the optional argument to --filter-short, as also explained in the comments.

Thu, Mar 9, 2:23 AM · Restricted Project

Wed, Mar 8

SjoerdMeijer added inline comments to D145578: [AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation.
Wed, Mar 8, 7:16 AM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D145578: [AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation.
Wed, Mar 8, 5:02 AM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D145578: [AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation.
Wed, Mar 8, 5:00 AM · Restricted Project, Restricted Project
SjoerdMeijer requested review of D145578: [AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation.
Wed, Mar 8, 4:55 AM · Restricted Project, Restricted Project

Tue, Mar 7

SjoerdMeijer added a comment to D144101: [test-suite] Increase the --filter-short threshold.

Thanks for the comments, I also like this idea!
I will prepare a new revision to support this.

Tue, Mar 7, 3:46 AM · Restricted Project

Mar 1 2023

SjoerdMeijer committed rGc9843af57856: [AArch64] Precommit some more LD1R splat tests for scalar int/fp loads (authored by SjoerdMeijer).
[AArch64] Precommit some more LD1R splat tests for scalar int/fp loads
Mar 1 2023, 12:12 PM · Restricted Project, Restricted Project
SjoerdMeijer closed D145067: [AArch64] Precommit some more LD1R splat tests for scalar int/fp loads.
Mar 1 2023, 12:12 PM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D145067: [AArch64] Precommit some more LD1R splat tests for scalar int/fp loads.

(accidentally pressed enter)

Mar 1 2023, 12:03 PM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D145067: [AArch64] Precommit some more LD1R splat tests for scalar int/fp loads.

Additional tests sounds OK. We don't always match splats to a single cost like we should.

Mar 1 2023, 12:00 PM · Restricted Project, Restricted Project
SjoerdMeijer requested review of D145067: [AArch64] Precommit some more LD1R splat tests for scalar int/fp loads.
Mar 1 2023, 6:50 AM · Restricted Project, Restricted Project
SjoerdMeijer committed rGa4c828a9e2b0: [AArch64] Precommit tests to check more ld1r vector splat patterns in D145004. (authored by SjoerdMeijer).
[AArch64] Precommit tests to check more ld1r vector splat patterns in D145004.
Mar 1 2023, 2:48 AM · Restricted Project, Restricted Project
SjoerdMeijer committed rG2b462eb98d67: [AArch64] More patterns to generate LD1R vector splats (authored by SjoerdMeijer).
[AArch64] More patterns to generate LD1R vector splats
Mar 1 2023, 2:48 AM · Restricted Project, Restricted Project
SjoerdMeijer closed D145004: [AArch64] More patterns to generate LD1R vector splats.
Mar 1 2023, 2:48 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D145004: [AArch64] More patterns to generate LD1R vector splats.

Thanks, also for that case which I will add, that's interesting indeed.

Mar 1 2023, 2:41 AM · Restricted Project, Restricted Project

Feb 28 2023

SjoerdMeijer requested review of D145004: [AArch64] More patterns to generate LD1R vector splats.
Feb 28 2023, 1:52 PM · Restricted Project, Restricted Project

Feb 27 2023

SjoerdMeijer accepted D144550: [AArch64] Remove 64bit->128bit vector insert lowering.

Ok, cheers, and this looks very reasonable to me.

Feb 27 2023, 5:14 AM · Restricted Project, Restricted Project
SjoerdMeijer accepted D144850: [AArch64] Don't remove free sext_inreg(vector_extract(x)) if it leads to multiple extracts.

Looks reasonable to me.

Feb 27 2023, 5:12 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D144550: [AArch64] Remove 64bit->128bit vector insert lowering.

The idea makes sense I think, but just to put things into context, do you already have a case or patch where we can see the benefit of this?

Feb 27 2023, 1:41 AM · Restricted Project, Restricted Project

Feb 22 2023

SjoerdMeijer committed rG314e431406de: [AArch64] Fix N2 SchedModel element-to-element INS latencies (authored by SjoerdMeijer).
[AArch64] Fix N2 SchedModel element-to-element INS latencies
Feb 22 2023, 3:21 AM · Restricted Project, Restricted Project
SjoerdMeijer closed D144508: [AArch64] Fix N2 SchedModel INS instruction latencies.
Feb 22 2023, 3:21 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D144508: [AArch64] Fix N2 SchedModel INS instruction latencies.

Thanks, and I will apply those changes before committing.

Feb 22 2023, 12:55 AM · Restricted Project, Restricted Project

Feb 21 2023

SjoerdMeijer requested review of D144508: [AArch64] Fix N2 SchedModel INS instruction latencies.
Feb 21 2023, 11:06 AM · Restricted Project, Restricted Project

Feb 20 2023

SjoerdMeijer added a comment to D144399: [AArch64] Add tests for saba (NFC).

@dmgreen wrote in the other patch:

Feb 20 2023, 8:12 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D144379: [AArch64] Fix abs(sub nsw) -> absd.

Thanks for looking into this.
I will let @RKSimon comment on the usage of the TTI hook.

Feb 20 2023, 5:30 AM · Restricted Project, Restricted Project
SjoerdMeijer accepted D144376: [AArch64] Add SLP test for abs.

I agree these tests are missing, so this is a good addition.

Feb 20 2023, 5:21 AM · Restricted Project, Restricted Project

Feb 17 2023

SjoerdMeijer added a comment to D142359: [TTI][AArch64] Cost model vector INS instructions.

I was expecting that to be similar to the other fmul+fadd vs fma issue we have seen elsewhere, but I'm not sure it is. Does it reduce the value to a single element again?

As for a few examples, this case is a little worse. I'm not sure if it is using bad costs, but the slp vectorization seems to reach through a phi and the result when put through llc is mostly just more instructions: https://godbolt.org/z/z6hnEcPG1.
Another case which is perhaps simpler is this "distance" one from cmsisdsp: https://godbolt.org/z/1xz7GP3fM. It looks like something might be being scalarized, but again I've not looked into the details.

There are some nice improvements too, if we can get the regressions hopefully fixed.

Feb 17 2023, 2:40 AM · Restricted Project, Restricted Project

Feb 16 2023

SjoerdMeijer added a reviewer for D144101: [test-suite] Increase the --filter-short threshold: rengolin.
Feb 16 2023, 5:39 AM · Restricted Project
SjoerdMeijer accepted D144086: [AArch64] Load into zero vector patterns.

Very nice.

Feb 16 2023, 12:41 AM · Restricted Project, Restricted Project

Feb 15 2023

SjoerdMeijer added a comment to D142359: [TTI][AArch64] Cost model vector INS instructions.

The internal embedded benchmarks I tried had some pretty wild swings in both direction. I think it is worth working towards this, if we can try and minimize the regressions in the process. Running more benchmarks from SPEC and perhaps the llvm-test-suite would be good (maybe try and see what is going on in salsa for example? We might be getting the costs of scalar rotates/funnel shifts incorrect?)

There might be quite a few other cases. I can try and provide some examples if I can extract them.

Feb 15 2023, 7:07 AM · Restricted Project, Restricted Project
SjoerdMeijer requested review of D144101: [test-suite] Increase the --filter-short threshold.
Feb 15 2023, 6:14 AM · Restricted Project
SjoerdMeijer added inline comments to D143988: [AArch64] Always lower fp16 zero to FMOVH0.
Feb 15 2023, 3:43 AM · Restricted Project, Restricted Project

Feb 14 2023

SjoerdMeijer added inline comments to D142288: [X86] Add basic vector handling for ISD::ABDS/ABDU (absolute difference) nodes.
Feb 14 2023, 6:51 AM · Restricted Project, Restricted Project
SjoerdMeijer added a reviewer for D143422: [LV] Update logic for calculating register usage due to invariants: fhahn.
Feb 14 2023, 3:25 AM · Restricted Project, Restricted Project

Feb 10 2023

SjoerdMeijer added inline comments to D142288: [X86] Add basic vector handling for ISD::ABDS/ABDU (absolute difference) nodes.
Feb 10 2023, 7:29 AM · Restricted Project, Restricted Project

Feb 9 2023

SjoerdMeijer committed rG079c488c6605: [TTI][AArch64] Cost model insertelement and indexed LD1 instructions (authored by SjoerdMeijer).
[TTI][AArch64] Cost model insertelement and indexed LD1 instructions
Feb 9 2023, 8:28 AM · Restricted Project, Restricted Project
SjoerdMeijer closed D141602: [TTI][AArch64] Cost model insertelement and indexed LD1 instructions.
Feb 9 2023, 8:28 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D143143: [AArch64] Reassociate sub(x, add(m1, m2)) to sub(sub(x, m1), m2).

Yeah, makes sense, cheers. LGTM too

Feb 9 2023, 7:46 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D143143: [AArch64] Reassociate sub(x, add(m1, m2)) to sub(sub(x, m1), m2).

The patch looks good to me, but I was just wondering if another approach would be to just match the sub(x, add(m1, m2)). pattern as mls, or is this easier/better?

Feb 9 2023, 12:41 AM · Restricted Project, Restricted Project

Feb 5 2023

SjoerdMeijer added a comment to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.

We handle this by not making the choice during ISEL but instead selecting a pseudo instruction that gets expanded after register allocation when we have access to more information.

Feb 5 2023, 11:25 AM · Restricted Project, Restricted Project

Feb 3 2023

SjoerdMeijer added a comment to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.

There is no precedent for:

  • matching opcodes like that, the latest revision adds the splat_vector opcode check,
  • checking the uses of sdnodes like that.
Feb 3 2023, 3:06 PM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.

I experimented with replacing AArch64mul_p_firstOpndWithSingleUse -> AArch64mul_p_oneuse :

Feb 3 2023, 3:41 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.

And in the meantime, can you upload a new revision with full context please, and rebase it on top of your change that precommit tests? Then we can see and discuss the changes this is making, and check if we see any improvements.

Feb 3 2023, 3:11 AM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.
Feb 3 2023, 3:04 AM · Restricted Project, Restricted Project

Feb 1 2023

SjoerdMeijer added a comment to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.

And for completeness, summarising previous comments, these are my other 2 concerns.

Feb 1 2023, 3:21 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.

Hi @SjoerdMeijer, in @sushgokh's defence there is precedent for some of the changes in this patch - by changing from SVE_4_Op_Pat to SVE_4_Mad_Op_Pat we are able to set the AddedComplexity field to the pattern, which is not dissimilar to SVE_3_Op_Pat_SelZero or SVE_3_Op_Pat_Shift_Imm_SelZero, i.e.

let AddedComplexity = 1 in {
class SVE_3_Op_Pat_SelZero<ValueType vtd, SDPatternOperator op, ValueType vt1,
                   ValueType vt2, ValueType vt3, Instruction inst>
: Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), vt3:$Op3))),
      (inst $Op1, $Op2, $Op3)>;

class SVE_3_Op_Pat_Shift_Imm_SelZero<ValueType vtd, SDPatternOperator op,
                                     ValueType vt1, ValueType vt2,
                                     Operand vt3, Instruction inst>
: Pat<(vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), (i32 (vt3:$Op3)))),
      (inst $Op1, $Op2, vt3:$Op3)>;
}

What I don't fully understand is why the complexity has to be so high, since it suggests there are multiple competing patterns and it might be useful to understand what they are. I admit that AArch64mul_p_firstOpndWithSingleUse looks a bit unusual and I'm not sure that we should be checking for explicit opcodes such as TokenFactor, etc.

Feb 1 2023, 3:12 AM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.
Feb 1 2023, 2:40 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D142998: [SVE][codegen] Add few more tests for MUL followed by ADD/SUB (NFC).

I tried seperating (1,2) and (3,4) as you say above. However, mla->mad is side effect of implementing (1) and this makes seperating (1) and (4) a difficult thing unless we introduce some hacks to do so

Feb 1 2023, 2:36 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D142998: [SVE][codegen] Add few more tests for MUL followed by ADD/SUB (NFC).

I think we need at least 3 patches:

Feb 1 2023, 1:00 AM · Restricted Project, Restricted Project

Jan 31 2023

SjoerdMeijer added inline comments to D142998: [SVE][codegen] Add few more tests for MUL followed by ADD/SUB (NFC).
Jan 31 2023, 12:19 PM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.

and one more request, which I forgot to add earlier:

Jan 31 2023, 8:40 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.

Ok, I misunderstood a few things, but see comment inlined.

Jan 31 2023, 8:24 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.

First, I think the description needs some clarifications.
As I understand it, the problem description is that we would like to match/rewrite this pattern:

Jan 31 2023, 3:01 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.

I will look at this soon.

Jan 31 2023, 2:14 AM · Restricted Project, Restricted Project