Page MenuHomePhabricator
Feed Advanced Search

Mar 10 2022

rscottmanley added a comment to D120912: [AArch64][SVE] Convert gather/scatter with a stride of 2 to contiguous loads/stores.

@kmclaughlin Thanks for the reply and adding the subtarget check. I figured this might have something to do with the scalable vectorization and this is a reasonable stopgap.

Mar 10 2022, 7:42 AM · Restricted Project, Restricted Project

Mar 7 2022

rscottmanley added a comment to D120912: [AArch64][SVE] Convert gather/scatter with a stride of 2 to contiguous loads/stores.

I find the performance claims interesting. I've asked this question to SVE hw engineers before on which is faster -- gather vs load and shuffle vs load2 and the answer was essentially "depends on your loop". If you're using up ports to shuffle that could otherwise be used for computation, it seems like this would be a loser. If that analysis is correct then IMO this decision should be made in LV and the backend should honor the gather. However, if it stays in the backend, I still have some comments:

Mar 7 2022, 9:59 AM · Restricted Project, Restricted Project
Herald added a project to D99750: [LV, VP] RFC: VP intrinsics support for the Loop Vectorizer (Proof-of-Concept): Restricted Project.

You can also join this LLVM-VP discord channel to follow/contribute to its progress.

Mar 7 2022, 8:20 AM · Restricted Project, Unknown Object (Project), Restricted Project

May 29 2020

rscottmanley added a comment to D80801: [DAGCombiner] allow more folding of fadd + fmul into fma.

Given the constraints in SDAG, we should choose the (fma(fma)) variant by default (assuming as we do here that the target has fma instructions). For example on x86, our best perf heuristic at this stage of compilation on any recent Intel or AMD core is number of uops. The option with separate fmul and fadd always has more uops, so it would be backwards to choose that sequence here and then try to undo that later.

May 29 2020, 2:46 PM · Restricted Project
rscottmanley added a comment to D80801: [DAGCombiner] allow more folding of fadd + fmul into fma.

Not sure if I'm understanding the question. Is there a target or a code pattern with a known disadvantage for the 2 fma variant?

May 29 2020, 12:33 PM · Restricted Project
rscottmanley added a comment to D80801: [DAGCombiner] allow more folding of fadd + fmul into fma.

Wouldn't it be better to choose between what you have here fmadd(a,b,fma(c,d,n)) and a*b + fmadd(c,d,n) for targets that perform worse with FMA chains?

May 29 2020, 10:24 AM · Restricted Project

Jul 30 2019

Herald added a project to D30247: Epilog loop vectorization: Restricted Project.
Jul 30 2019, 11:16 AM · Restricted Project