Page MenuHomePhabricator

SjoerdMeijer (Sjoerd Meijer)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 26 2016, 2:17 AM (281 w, 6 d)

Recent Activity

Today

SjoerdMeijer accepted D104651: [SLP][AArch64] Add SLP vectorizer regression test. NFC.

Yep, looks like a good test to me.

Mon, Jun 21, 8:22 AM · Restricted Project
SjoerdMeijer committed rG071dbaec8759: [FuncSpec] Add minsize test. NFC. (authored by SjoerdMeijer).
[FuncSpec] Add minsize test. NFC.
Mon, Jun 21, 7:22 AM
SjoerdMeijer added inline comments to D104538: [CostModel][AArch64] Improve cost model for vector reduction intrinsics.
Mon, Jun 21, 5:17 AM · Restricted Project
SjoerdMeijer added a comment to D104538: [CostModel][AArch64] Improve cost model for vector reduction intrinsics.

Nice one. This work was motivated by a missed opportunity in the SLP vectoriser.
Can you separately create a reduced test case for that, so that we can first precommit that?
Then, that tests should be modified here.

Mon, Jun 21, 4:54 AM · Restricted Project
SjoerdMeijer added a comment to D104365: [FuncSpec] Enabling specializing direct constant.

Specialising on constant integers would definitely be good, and is indeed what we want. But it's early days for function specialisation, and there are quite a few limitations, and this is just one of them.

Mon, Jun 21, 2:44 AM · Restricted Project
SjoerdMeijer accepted D103597: [AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier.

LGTM, but please wait a day with committing this in case there are more comments.

Mon, Jun 21, 1:16 AM · Restricted Project
SjoerdMeijer committed rG342bbb7832b6: [FuncSpec] Don't specialise functions with NoDuplicate instructions. (authored by SjoerdMeijer).
[FuncSpec] Don't specialise functions with NoDuplicate instructions.
Mon, Jun 21, 1:02 AM
SjoerdMeijer closed D104461: [FuncSpec] Don't specialize functions with NoDuplicate instructions..
Mon, Jun 21, 1:02 AM · Restricted Project
SjoerdMeijer added a comment to D104365: [FuncSpec] Enabling specializing direct constant.

Let me explain why I am a bit pushing back at this: I would like to see function specialisation enabled by default one day, and therefore I am interested to know how much more difficult that is going to be when we also specialise on integer constants. In previous comments you mentioned:

Mon, Jun 21, 12:49 AM · Restricted Project

Thu, Jun 17

SjoerdMeijer requested review of D104461: [FuncSpec] Don't specialize functions with NoDuplicate instructions..
Thu, Jun 17, 7:06 AM · Restricted Project
SjoerdMeijer committed rG3f596842e3d2: [FuncSpec] Precommit test: don't specialise funcs with NoDuplicate instrs. NFC. (authored by SjoerdMeijer).
[FuncSpec] Precommit test: don't specialise funcs with NoDuplicate instrs. NFC.
Thu, Jun 17, 6:14 AM
SjoerdMeijer committed rGdcd23d875a7e: [FuncSpec] Don't specialise functions with attribute NoDuplicate. (authored by SjoerdMeijer).
[FuncSpec] Don't specialise functions with attribute NoDuplicate.
Thu, Jun 17, 2:45 AM
SjoerdMeijer closed D104378: [FuncSpec] Don't specialize functions with attribute NoDuplicate..
Thu, Jun 17, 2:45 AM · Restricted Project
SjoerdMeijer added a comment to D104378: [FuncSpec] Don't specialize functions with attribute NoDuplicate..

It makes sense to me. If a function is marked with NoDuplicate, we shouldn't clone it. Everything is fine.
But I want to ask, in what situations, generally a function would be marked with NoDuplicate attribute? I didn't find it in docs or codes.

Thu, Jun 17, 12:41 AM · Restricted Project

Wed, Jun 16

SjoerdMeijer committed rG08c75fc5e358: [FuncSpec] Fixed prefix typo in test function-specialization-noexec.ll. NFC. (authored by SjoerdMeijer).
[FuncSpec] Fixed prefix typo in test function-specialization-noexec.ll. NFC.
Wed, Jun 16, 8:29 AM
SjoerdMeijer updated the diff for D104378: [FuncSpec] Don't specialize functions with attribute NoDuplicate..

Fixed typo in test.

Wed, Jun 16, 8:19 AM · Restricted Project
SjoerdMeijer requested review of D104378: [FuncSpec] Don't specialize functions with attribute NoDuplicate..
Wed, Jun 16, 6:30 AM · Restricted Project
SjoerdMeijer requested review of D104373: [FuncSpec] Option for specializing const globals and function pointers..
Wed, Jun 16, 5:55 AM · Restricted Project
SjoerdMeijer added a comment to D104365: [FuncSpec] Enabling specializing direct constant.

But I guess it may be OK since we control the cost to specialize function incrementally (implemented in getSpecializationCost()). So the amount of new specialized functions wouldn't be much larger.

Wed, Jun 16, 5:14 AM · Restricted Project
SjoerdMeijer added a comment to D104365: [FuncSpec] Enabling specializing direct constant.

Supporting integer constants opens up a whole new set of opportunities for function specialisation. This will have a big impact on compile-times, which is the main concern of this pass.
I suggest we take one step at a time, and add support for integer constants under an option that is off by default. Just for testing purposes this would be convenient, but it also allows to measure the impact on compile-times by toggling this option on/off.

Wed, Jun 16, 3:16 AM · Restricted Project
SjoerdMeijer added a comment to D103597: [AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier.

Besides some nits/observations inline, one question about testing. The reproducer that was attached does not get miscompiled anymore? And it is exactly one of these cases that was added, i.e. the modified base register? Just asking to see if there's not another case or test that we are missing.

Wed, Jun 16, 3:02 AM · Restricted Project
SjoerdMeijer committed rGc8a3fce77696: [FuncSpec] Remove other passes in a test RUN line. NFC. (authored by SjoerdMeijer).
[FuncSpec] Remove other passes in a test RUN line. NFC.
Wed, Jun 16, 2:40 AM
SjoerdMeijer added inline comments to D104102: [FuncSpec] Statistics.
Wed, Jun 16, 2:19 AM · Restricted Project
SjoerdMeijer committed rG29843cbc88f6: [FuncSpec] Add test for a call site that will never be executed. NFC. (authored by SjoerdMeijer).
[FuncSpec] Add test for a call site that will never be executed. NFC.
Wed, Jun 16, 2:11 AM
SjoerdMeijer committed rG49ab3b1735b6: [FuncSpec] Statistics (authored by SjoerdMeijer).
[FuncSpec] Statistics
Wed, Jun 16, 1:12 AM
SjoerdMeijer closed D104102: [FuncSpec] Statistics.
Wed, Jun 16, 1:12 AM · Restricted Project

Tue, Jun 15

SjoerdMeijer added inline comments to D104282: [FuncSpec] Use std::pow instead of operator^.
Tue, Jun 15, 3:03 AM · Restricted Project
SjoerdMeijer accepted D104282: [FuncSpec] Use std::pow instead of operator^.
Tue, Jun 15, 3:00 AM · Restricted Project
SjoerdMeijer added inline comments to D104102: [FuncSpec] Statistics.
Tue, Jun 15, 2:00 AM · Restricted Project
SjoerdMeijer updated the diff for D104102: [FuncSpec] Statistics.

This is now only about statistics: commented/renamed the counter, and added a test case.

Tue, Jun 15, 1:02 AM · Restricted Project
SjoerdMeijer added inline comments to D104282: [FuncSpec] Use std::pow instead of operator^.
Tue, Jun 15, 12:38 AM · Restricted Project
SjoerdMeijer added a comment to D104102: [FuncSpec] Statistics.

Thanks for your comments. I quickly put this patch together to collect post-commit comments. I will pick this is up today.

Tue, Jun 15, 12:32 AM · Restricted Project

Fri, Jun 11

SjoerdMeijer updated the diff for D104102: [FuncSpec] Statistics.

The wrong location didn't sit right, so I have precommitted that in rG9907746f5db7 before I continue with the rest and this is a little rebase on top of that.

Fri, Jun 11, 7:06 AM · Restricted Project
SjoerdMeijer committed rG9907746f5db7: Move Function Specialization to its correct location. NFC. (authored by SjoerdMeijer).
Move Function Specialization to its correct location. NFC.
Fri, Jun 11, 7:01 AM
SjoerdMeijer added inline comments to D104102: [FuncSpec] Statistics.
Fri, Jun 11, 3:26 AM · Restricted Project
SjoerdMeijer added a comment to D104102: [FuncSpec] Statistics.

This is sort of a placeholder for follow up comments of D93838 while I am looking into a few things.

Fri, Jun 11, 3:22 AM · Restricted Project
SjoerdMeijer requested review of D104102: [FuncSpec] Statistics.
Fri, Jun 11, 3:20 AM · Restricted Project
SjoerdMeijer added a comment to D93838: [SCCP] Add Function Specialization pass.

I'm not quite sure why the implementation is in llvm/lib/Transforms/Scalar/FunctionSpecialization.cpp, rather than llvm/lib/Transforms/IPO/FunctionSpecialization.cpp? It's interprocedural only, right? The implementation in SCCP.cpp works on both a single function and a module, that's probably with its in the Scalar sub-directory. It's also a separate pass from IPSCCP, so I am not sure if the pass definition should be in llvm/lib/Transforms/IPO/SCCP.cpp.

Fri, Jun 11, 1:51 AM · Restricted Project
SjoerdMeijer accepted D103674: [ARM] Use rq gather/scatters for smaller v4 vectors.

Cheers, looks like a good change to me.

Fri, Jun 11, 1:39 AM · Restricted Project
SjoerdMeijer added a comment to D93838: [SCCP] Add Function Specialization pass.

Many thanks for all your help and reviews. I have committed this because it was lgtm'ed (officially and also unofficially in comments), no major changes have been made/suggested for a while, and this is a first version that is off by default. And this is also just the beginning of function specialisation. I will continue working on this and will be happy to receive post-commit comments for this version, and of course comments for the work that I am planning and starting to do. I.e., I will now start working on some of the limitations before I start thinking how to eventually get this enabled by default.

Fri, Jun 11, 1:24 AM · Restricted Project
SjoerdMeijer committed rGc4a0969b9c14: Function Specialization Pass (authored by SjoerdMeijer).
Function Specialization Pass
Fri, Jun 11, 1:23 AM
SjoerdMeijer closed D93838: [SCCP] Add Function Specialization pass.
Fri, Jun 11, 1:22 AM · Restricted Project

Thu, Jun 10

SjoerdMeijer added a comment to D103597: [AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier.

Many thanks for the reproducer!
We are going to have a look at that.

Thu, Jun 10, 12:36 AM · Restricted Project

Wed, Jun 9

SjoerdMeijer added a comment to D103882: [CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> expensive..

It seems like a simpler overall route to remove the assert from the vectorizer, and allow it to deal with invalid costs. But I am not blocking this patch.

(Although, you might want to pick a number higher than 9999, it's not particularly high. And like Sjoerd suggested giving it a name might be helpful).

Wed, Jun 9, 11:45 AM · Restricted Project
SjoerdMeijer added inline comments to D102437: [LV] NFC: Decouple foldTailByMasking from isScalarWithPredication..
Wed, Jun 9, 7:00 AM · Restricted Project
SjoerdMeijer added a comment to D103882: [CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> expensive..

Why 9999? Why not an invalid cost? I thought that was what invalid was for, so we didn't need hacky random numbers like this.

We've chosen not to conflate legalization and cost-modelling, so that the cost-model must return a valid cost when the legalization phase says that a given VF is legal to use. The benefit of that is that it guards the requirement to have a complete (albeit not necessarily accurate) cost-model. For SVE a VF of vscale x 1 is legal in principle, meaning that we should be able to code-generate it. But because our code-generator is incomplete, we want to avoid using these VFs. I haven't added an interface to query the minimum legal VF, since I know we'll remove that interface at some point. The LV will expect all costs calculated to be Valid and will fail this assertion otherwise. So basically return <magic number> is just a temporary stop-gap to avoid failing that (and other) assertion failures.

Does that make more sense?

Wed, Jun 9, 6:07 AM · Restricted Project
SjoerdMeijer updated the diff for D93838: [SCCP] Add Function Specialization pass.

Removed those updates too, added the test case back. Appreciate it if you can have one more quick look.

Wed, Jun 9, 5:38 AM · Restricted Project
SjoerdMeijer added inline comments to D103952: [CostModel][AArch64] Improve the cost estimate of CTPOP intrinsic.
Wed, Jun 9, 4:19 AM · Restricted Project
SjoerdMeijer added a comment to D93838: [SCCP] Add Function Specialization pass.

Addressed @fhahn 's comments: don't run the solver for specialised functions removed the recursive specialization test for now.

I'm not sure if removing the recursive specialization test is the best thing to do, without known what it is supposed to test? If it is a legitimate test, I thin it would be good to keep it.

Wed, Jun 9, 4:01 AM · Restricted Project
SjoerdMeijer added inline comments to D102437: [LV] NFC: Decouple foldTailByMasking from isScalarWithPredication..
Wed, Jun 9, 2:47 AM · Restricted Project
SjoerdMeijer added a comment to D102437: [LV] NFC: Decouple foldTailByMasking from isScalarWithPredication..

Hmm, strange some comments went missing when I pushed the button previously.

Wed, Jun 9, 2:43 AM · Restricted Project
SjoerdMeijer added inline comments to D102437: [LV] NFC: Decouple foldTailByMasking from isScalarWithPredication..
Wed, Jun 9, 2:40 AM · Restricted Project
SjoerdMeijer accepted D103597: [AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier.

Perhaps you can elaborate a little bit in the commit message:

Wed, Jun 9, 12:46 AM · Restricted Project

Tue, Jun 8

SjoerdMeijer added a reviewer for D93838: [SCCP] Add Function Specialization pass: ChuanqiXu.
Tue, Jun 8, 12:54 AM · Restricted Project

Mon, Jun 7

SjoerdMeijer added a comment to D102834: [SLPVectorizer] WIP Implement initial memory versioning (WIP!).

Hi Florian, thanks for putting this up for review and starting the discussion. If you don't mind me asking, how high is this on your todo-list? The reason I am asking is that this well help x264 where we are behind a lot (and it more in general it solves this long outstanding issue that we have). Fixing x264 is high on our list, which is why I put up D102748 to start this discussion. If, for example, you don't see time to work on this, we could consider looking at it.

It's fairly high on my todo list, as we have quite a number of of such cases reported by our internal users. But I'm not planning to rush and I think we should also start collecting micro-benchmarks for the missed opportunities, so we can thoroughly evaluate the impact and do not introduce regressions in the future.

Mon, Jun 7, 6:11 AM · Restricted Project
SjoerdMeijer added a comment to D93838: [SCCP] Add Function Specialization pass.

For our use cases, it's more interesting to see how much benefit we could get from non-function-pointer constants, on top of PGO

Mon, Jun 7, 3:10 AM · Restricted Project
SjoerdMeijer added a comment to D101696: [Matrix] Implement C-style explicit type conversions in CXX for matrix types.

Perhaps for bonus points, update the Clang documentation in https://clang.llvm.org/docs/LanguageExtensions.html#matrix-types with some examples?

Can you please point me to how to submit patches to the documentation? I could only find links like https://clang.llvm.org/hacking.html which talk about code patches. Many thanks.

Mon, Jun 7, 12:38 AM · Restricted Project

Fri, Jun 4

SjoerdMeijer added a comment to D103629: [AArch64] Cost-model i8 vector loads/stores.

The two-instruction sequence leaves the bits in the right positions for a <4 x i16>. If you need a <4 x i32>, you need another zip. If you need sign-extension, you need to sshr the result or something like that. So "%x = load <4 x i8>, <4 x i8>* %a %y = sext <4 x i8> %x to <4 x i32>" would be four instructions total.

Fri, Jun 4, 9:56 AM · Restricted Project
SjoerdMeijer added a comment to D103629: [AArch64] Cost-model i8 vector loads/stores.

And while we don't have a load instruction that supports this

If <4 x i8> loads matter, we should probably convert them to a 32-bit load followed by a zip1, which should would have a cost of 2. (Or possibly 3 on big-endian, I guess.) Basically the inverse of LowerTruncateVectorStore.

Fri, Jun 4, 9:00 AM · Restricted Project
SjoerdMeijer added a comment to D101696: [Matrix] Implement C-style explicit type conversions in CXX for matrix types.

Perhaps for bonus points, update the Clang documentation in https://clang.llvm.org/docs/LanguageExtensions.html#matrix-types with some examples?

Fri, Jun 4, 4:59 AM · Restricted Project
SjoerdMeijer added inline comments to D103674: [ARM] Use rq gather/scatters for smaller v4 vectors.
Fri, Jun 4, 1:59 AM · Restricted Project
SjoerdMeijer added inline comments to D103597: [AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier.
Fri, Jun 4, 12:27 AM · Restricted Project
SjoerdMeijer added inline comments to D103629: [AArch64] Cost-model i8 vector loads/stores.
Fri, Jun 4, 12:15 AM · Restricted Project
SjoerdMeijer added inline comments to D103629: [AArch64] Cost-model i8 vector loads/stores.
Fri, Jun 4, 12:07 AM · Restricted Project

Thu, Jun 3

SjoerdMeijer added inline comments to D103629: [AArch64] Cost-model i8 vector loads/stores.
Thu, Jun 3, 9:47 AM · Restricted Project
SjoerdMeijer added inline comments to D103629: [AArch64] Cost-model i8 vector loads/stores.
Thu, Jun 3, 9:21 AM · Restricted Project
SjoerdMeijer requested review of D103629: [AArch64] Cost-model i8 vector loads/stores.
Thu, Jun 3, 9:17 AM · Restricted Project
SjoerdMeijer added a comment to D103597: [AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier.

Ideally we want to generate a LDP and STP for the test cases that you added. Why do we not yet get the LDP?

We don't get the LDP since it fails at the very start of this function (called on line 1529) since FirstMI is a load instruction:

 static bool
 canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
                  SmallPtrSetImpl<const TargetRegisterClass *> &RequiredClasses,
                  const TargetRegisterInfo *TRI) {
   if (!FirstMI.mayStore())
     return false;

   ...
}

Because this returns false, the renaming of registers isn't attempted for the loads and so they don't get recognised as a pair. It is something I still need to look into in more detail to see if there is something else we can try.

Thu, Jun 3, 8:53 AM · Restricted Project
SjoerdMeijer accepted D103601: [CostModel][AArch64] Add tests for ctlz, ctpop and cttz. NFC..

Sorry, ignore my previous comments, I have spotted them now (with your help).

Thu, Jun 3, 7:53 AM · Restricted Project
SjoerdMeijer added inline comments to D103601: [CostModel][AArch64] Add tests for ctlz, ctpop and cttz. NFC..
Thu, Jun 3, 7:42 AM · Restricted Project
SjoerdMeijer added inline comments to D103601: [CostModel][AArch64] Add tests for ctlz, ctpop and cttz. NFC..
Thu, Jun 3, 2:33 AM · Restricted Project
SjoerdMeijer added a comment to D103597: [AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier.

Hi Meera,

Thu, Jun 3, 2:14 AM · Restricted Project

Wed, Jun 2

SjoerdMeijer added a comment to D102834: [SLPVectorizer] WIP Implement initial memory versioning (WIP!).

Hi Florian, thanks for putting this up for review and starting the discussion. If you don't mind me asking, how high is is on your todo-list? The reason I am asking is that this well help x264 where we are behind a lot (and it more in general it solves this long outstanding issue that we have). Fixing x264 is high on our list, which is why I put up D102748 to start this discussion. If, for example, you don't see time to work on this, we could consider looking at it.

Wed, Jun 2, 6:32 AM · Restricted Project
SjoerdMeijer added a comment to D102748: [LoopUnroll] Don't unroll before vectorisation.

Thanks for raising that one. It feels like the number of missed opportunities is endless. From the discussion here however, my impression was that there's very little appetite to skip the full unroller at this point and that running the LV before the full unroller would involve quite some surgery in the optimisation pipeline. My take away was that trying to improve the SLP was probably more low hanging fruit. Thus, I was thinking about abandoning this change....

Wed, Jun 2, 4:33 AM · Restricted Project
SjoerdMeijer updated the diff for D93838: [SCCP] Add Function Specialization pass.

Addressed @fhahn 's comments: don't run the solver for specialised functions, removed the recursive specialization test for now.

Wed, Jun 2, 4:19 AM · Restricted Project
SjoerdMeijer added a comment to D93838: [SCCP] Add Function Specialization pass.

Ran clang-format.

@fhahn: about running the solver first before making IR changes. I think that is happening already. There are some places where the solver is kept up to date after transformations. I think this is a remainder of running function specialisation iteratively that I stripped out for now, but I think it's good to keep these updates to the solver as it's probably good to keep it consistent.

I'm not sure I'm looking at the right thing, but it looks like the specialization is still run iteratively via the code below? RunSCCPSolver seems to modify the IR after solving. Maybe I am looking at the wrong thing?

while (FuncSpecializationMaxIters != I++ &&
       FS.specializeFunctions(FuncDecls, CurrentSpecializations)) {

  // Run solver for the specialized functions only.
  RunSCCPSolver(CurrentSpecializations);

  CurrentSpecializations.clear();
  Changed = true;
}
Wed, Jun 2, 3:56 AM · Restricted Project

Tue, Jun 1

SjoerdMeijer added a comment to D93838: [SCCP] Add Function Specialization pass.

Friendly ping.

Tue, Jun 1, 12:20 AM · Restricted Project

Thu, May 27

SjoerdMeijer added a comment to D99707: Remove "Rewrite Symbols" from codegen pipeline.

Sorry for the delay. Looks reasonable to me.

Thu, May 27, 11:26 PM · Restricted Project
SjoerdMeijer updated the diff for D93838: [SCCP] Add Function Specialization pass.

Ran clang-format.

Thu, May 27, 6:21 AM · Restricted Project

Wed, May 26

SjoerdMeijer committed rG6c92215e07f4: [CostModel][AArch64] Add floating point arithmetic tests. NFC. (authored by SjoerdMeijer).
[CostModel][AArch64] Add floating point arithmetic tests. NFC.
Wed, May 26, 12:32 PM
SjoerdMeijer committed rGb6f6501b2412: [CostModel][AArch64] Add tests for bitreverse. NFC. (authored by SjoerdMeijer).
[CostModel][AArch64] Add tests for bitreverse. NFC.
Wed, May 26, 6:57 AM
SjoerdMeijer added a reviewer for D93838: [SCCP] Add Function Specialization pass: jaykang10.
Wed, May 26, 2:43 AM · Restricted Project

Tue, May 25

SjoerdMeijer added inline comments to D103105: [AArch64] Optimise bitreverse lowering in ISel.
Tue, May 25, 11:52 AM · Restricted Project

Mon, May 24

SjoerdMeijer updated the diff for D93838: [SCCP] Add Function Specialization pass.

This addresses most remarks:

  • transitioned to using InstructionCosts,
  • added tests/testing,
  • removed unnecessary things.
Mon, May 24, 5:34 AM · Restricted Project
SjoerdMeijer added inline comments to D93838: [SCCP] Add Function Specialization pass.
Mon, May 24, 5:30 AM · Restricted Project
SjoerdMeijer added inline comments to D93838: [SCCP] Add Function Specialization pass.
Mon, May 24, 4:00 AM · Restricted Project
SjoerdMeijer added inline comments to D93838: [SCCP] Add Function Specialization pass.
Mon, May 24, 3:50 AM · Restricted Project

May 20 2021

SjoerdMeijer added a comment to D93838: [SCCP] Add Function Specialization pass.

Ah cheers, many thanks for the comments and review! Will start addressing them now.

May 20 2021, 9:22 AM · Restricted Project
SjoerdMeijer added a comment to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Thanks, interesting, having a look there!

May 20 2021, 6:49 AM · Restricted Project
SjoerdMeijer added a comment to D102834: [SLPVectorizer] WIP Implement initial memory versioning (WIP!).

Yes, interesting. LoopVersioningLICM came to my mind, except that works on loops of course...

May 20 2021, 6:48 AM · Restricted Project
SjoerdMeijer added a comment to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

This should be much easier to implement after non-power-2 in SLP support.

May 20 2021, 6:29 AM · Restricted Project
SjoerdMeijer added a comment to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

May 20 2021, 6:22 AM · Restricted Project
SjoerdMeijer added a comment to D102748: [LoopUnroll] Don't unroll before vectorisation.

Not directly on topic for this review, but looking at some of the examples given, I'd point out that these are simply missed optimizations. If we're given vectorized IR and not performing other simple optimization (loop idiom, dce, etc..), those are optimization bugs we should probably fix.

It also sounds like it might be worth having the vectorizer recognize the case where it can vectorize all iterations in a single vector iteration and break the backedge if so. This would avoid the need for another pass of loop deletion or unrolling before scalar opts could easily kick in.

If we improved the robustness of our other optimizations w.r.t. vector instructions, we'd probably be a lot less sensitive to the proposed pass reordering.

May 20 2021, 2:20 AM · Restricted Project
SjoerdMeijer added a comment to D93838: [SCCP] Add Function Specialization pass.

getting this all right and enabled by default with the first commit is probably not achievable

Maybe yes, well…

If we could pick some the most obvious and profitable cases and enable this pass for them (simple heuristics, otherwise be conservative), plus, if we have PGO data, specialize very hot functions?

Yeah, I see concerns that another off by default pass is not ideal. Just look how many passes are in tree and disabled.

So +1 if we can run this pass even with conservative heuristics.

May 20 2021, 2:00 AM · Restricted Project
SjoerdMeijer added a comment to D93838: [SCCP] Add Function Specialization pass.

Thank you for further discussing this.

May 20 2021, 1:23 AM · Restricted Project

May 19 2021

SjoerdMeijer added a comment to D102748: [LoopUnroll] Don't unroll before vectorisation.

Thanks for those examples. I am going to play a little bit more with this.

May 19 2021, 5:24 AM · Restricted Project
SjoerdMeijer added a comment to D102748: [LoopUnroll] Don't unroll before vectorisation.

I believe unrolling before vectorisation is fundamentally the wrong approach.

It is indeed suboptimal for a subset of loops which can be vectorized by LV and the SLP.

But in a lot other cases, early unrolling enables other passes to perform many additional simplifications as @nikic mentioned and I think it is very easy to come up with examples to show that, because a lot of simplification passes don't work well on loops. Just one example below. With early unrolling, LLVM will eliminate the memset, without early unrolling it won't. I would expect simplifications due to early unrolling to be quite helpful for a lot of general non-benchmark code with few vectorizable loops.

void foo(char *Ptr) {
    memset(Ptr, 0, 16);

    for (unsigned i = 0; i < 16; i++ )
      Ptr[i] = i+1;
}
May 19 2021, 3:38 AM · Restricted Project
SjoerdMeijer added a comment to D102748: [LoopUnroll] Don't unroll before vectorisation.

I don't have suggestion on how to move forward *this* patch, because I think it's fundamentally the wrong direction to take.

May 19 2021, 2:27 AM · Restricted Project
SjoerdMeijer updated the summary of D102748: [LoopUnroll] Don't unroll before vectorisation.
May 19 2021, 1:49 AM · Restricted Project
SjoerdMeijer added a comment to D102748: [LoopUnroll] Don't unroll before vectorisation.

The full unroll pass during function simplification is quite important for certain code patterns and serves a completely different purpose than the runtime unrolling that happens late in the pipeline. It is critical that full unrolling happens relatively early, so that scalar optimizations have a chance to work on the fully unrolled loop.

May 19 2021, 1:37 AM · Restricted Project