Page MenuHomePhabricator
Feed Advanced Search

Yesterday

SjoerdMeijer added a parent revision for D70125: [LV] PreferPredicateOverEpilog respecting predicate loop hint: D69845: [ARM][MVE] canTailPredicateLoop.
Tue, Nov 12, 7:32 AM · Restricted Project
SjoerdMeijer added a child revision for D69845: [ARM][MVE] canTailPredicateLoop: D70125: [LV] PreferPredicateOverEpilog respecting predicate loop hint.
Tue, Nov 12, 7:32 AM · Restricted Project
SjoerdMeijer created D70125: [LV] PreferPredicateOverEpilog respecting predicate loop hint.
Tue, Nov 12, 7:31 AM · Restricted Project
SjoerdMeijer accepted D69350: [ARM] Replace arm_neon_vqadds with sadd_sat.
Tue, Nov 12, 1:21 AM · Restricted Project

Mon, Nov 11

SjoerdMeijer added a comment to D69946: [ARM][MVE] Enable narrow vector length.

All the codegen tests are dlstp.32 loops, but do we have or need to have some tests with size = 8, 16, or 64 too?

Mon, Nov 11, 6:23 AM · Restricted Project
SjoerdMeijer added inline comments to D69350: [ARM] Replace arm_neon_vqadds with sadd_sat.
Mon, Nov 11, 5:56 AM · Restricted Project
SjoerdMeijer added inline comments to D69945: [ARM][MVE] Tail predication conversion.
Mon, Nov 11, 5:29 AM · Restricted Project
SjoerdMeijer updated the diff for D69845: [ARM][MVE] canTailPredicateLoop.
  • fptrunc/fpextends: the loops are not that bad actually, but there's some quite terrible codegen around that, so let's indeed reject this for now.
  • stride -1: I think that should be okay, but I do see that complicates things quite a bit for the backend, so let's indeed reject that for now too.
  • this is now enabled/disabled by exisiting option -disable-mve-tail-predication.
Mon, Nov 11, 2:47 AM · Restricted Project

Fri, Nov 8

SjoerdMeijer updated the diff for D69845: [ARM][MVE] canTailPredicateLoop.

added one more FP test case.

Fri, Nov 8, 6:53 AM · Restricted Project
SjoerdMeijer updated the diff for D69845: [ARM][MVE] canTailPredicateLoop.

Thanks for checking. Got myself confused about the loads/stores, but I think we indeed want to accept extending loads and narrowing stores, so I've added checks and a bunch of tests:

  • for SEXT/ZEXT, the operand needs to be a load,
  • for TRUNC, the user needs to be a store.
Fri, Nov 8, 6:34 AM · Restricted Project

Thu, Nov 7

SjoerdMeijer updated the diff for D69845: [ARM][MVE] canTailPredicateLoop.
  • strides: now allowing 1 and -1, and added a test case for -1.
  • checking the operand of zext/sext/trunc to see if it is a load/store, and added a test case when trunc is actually allowed.
Thu, Nov 7, 8:55 AM · Restricted Project
SjoerdMeijer updated the diff for D69845: [ARM][MVE] canTailPredicateLoop.
Thu, Nov 7, 6:36 AM · Restricted Project

Wed, Nov 6

SjoerdMeijer committed rG6c2a4f5ff93e: [TTI][LV] preferPredicateOverEpilogue (authored by SjoerdMeijer).
[TTI][LV] preferPredicateOverEpilogue
Wed, Nov 6, 2:23 AM
SjoerdMeijer closed D69040: [TTI][LV] preferPredicateOverEpilogue.
Wed, Nov 6, 2:22 AM · Restricted Project

Tue, Nov 5

SjoerdMeijer added a comment to D69845: [ARM][MVE] canTailPredicateLoop.

Thanks for taking a look! Will take this on board.

Tue, Nov 5, 8:28 AM · Restricted Project
SjoerdMeijer added a comment to D69040: [TTI][LV] preferPredicateOverEpilogue.

Thanks all!

Tue, Nov 5, 7:42 AM · Restricted Project
SjoerdMeijer created D69845: [ARM][MVE] canTailPredicateLoop.
Tue, Nov 5, 7:32 AM · Restricted Project
SjoerdMeijer committed rG92164cf25d51: Recommit "[HardwareLoops] Optimisation remarks" (authored by SjoerdMeijer).
Recommit "[HardwareLoops] Optimisation remarks"
Tue, Nov 5, 1:08 AM
SjoerdMeijer closed D69660: [HardwareLoops] Optimisation remarks.
Tue, Nov 5, 1:07 AM · Restricted Project
SjoerdMeijer added a comment to D69660: [HardwareLoops] Optimisation remarks.

I guess reason you delete the case test/Transforms/HardwareLoops/unconditional-latch.ll is this case becomes redudant after adding more checking in test/Transforms/HardwareLoops/ARM/structure.ll?

Tue, Nov 5, 12:55 AM · Restricted Project

Mon, Nov 4

SjoerdMeijer added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

Nice one!

Mon, Nov 4, 6:09 AM · Restricted Project
SjoerdMeijer updated the diff for D69660: [HardwareLoops] Optimisation remarks.

Many thanks again!

Mon, Nov 4, 3:08 AM · Restricted Project

Fri, Nov 1

SjoerdMeijer updated the diff for D69660: [HardwareLoops] Optimisation remarks.

This is a pass manager's problem. By comparing the hardwareloop pass with other passes also using the optimisation remarks, I noticed a missing:

Fri, Nov 1, 10:09 AM · Restricted Project
SjoerdMeijer updated the diff for D69040: [TTI][LV] preferPredicateOverEpilogue.

I have:

  • changed the ARM implementation to check the required architecture extensions: check for MVE, the remaining checks are done in isHardwareLoopProfitable so there was no point in duplicating that.
  • added a new test case, to cover testing these architecture extension combo's, and profitability check.
Fri, Nov 1, 7:34 AM · Restricted Project
SjoerdMeijer added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

Hi Simon, I went through the code for the first time, and this is a first round of proper nitpicks from my side. Please ignore if you want to focus on the bigger picture at this point in the discussion, but these are just some things I noticed. General nitpick is that you should run clang-format as there are quite a few coding style issues: indentation, indentation of arguments, exceeding 80 columns, placement of * and & in arguments and return values, etc. And find some more nitpicks inlined.

Hi Sjoerd, thanks for you comments! i've fixed the inline nitpicks right away. I'll do a style pass for the actual commits.

Fri, Nov 1, 2:19 AM · Restricted Project
SjoerdMeijer added a comment to D69660: [HardwareLoops] Optimisation remarks.

Many thanks!!!
Now I've got something to stare at. :-)

Fri, Nov 1, 1:33 AM · Restricted Project

Thu, Oct 31

SjoerdMeijer created D69660: [HardwareLoops] Optimisation remarks.
Thu, Oct 31, 6:36 AM · Restricted Project
SjoerdMeijer updated the diff for D69628: [Clang] Pragma vectorize_width() implies vectorize(enable), take 3.

Thanks Michael!

Thu, Oct 31, 6:16 AM

Wed, Oct 30

SjoerdMeijer created D69628: [Clang] Pragma vectorize_width() implies vectorize(enable), take 3.
Wed, Oct 30, 10:22 AM

Tue, Oct 29

SjoerdMeijer added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

Hi Simon, I went through the code for the first time, and this is a first round of proper nitpicks from my side. Please ignore if you want to focus on the bigger picture at this point in the discussion, but these are just some things I noticed. General nitpick is that you should run clang-format as there are quite a few coding style issues: indentation, indentation of arguments, exceeding 80 columns, placement of * and & in arguments and return values, etc. And find some more nitpicks inlined.

Tue, Oct 29, 1:54 PM · Restricted Project
SjoerdMeijer added a comment to D69556: [CodeGen] Move ARMCodegenPrepare to TypePromotion.

I guess this step make sense: there was very little ARM specific about this, so why not promote this to a generic codegen pass. It's a generic optimisation, so other targets could benefit from this, but not sure if that then requires some evidence before moving this code.

Tue, Oct 29, 7:02 AM · Restricted Project
SjoerdMeijer added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

+1 to what Renato said, I like this direction!
FWIW: we are working on Arm's M-profile Vector Extension (MVE), another vector extension for which this is very useful.

Tue, Oct 29, 6:34 AM · Restricted Project
SjoerdMeijer accepted D69508: [ARM] Add vrev32 NEON fp16 patterns.

Yep, cheers, LGTM

Tue, Oct 29, 2:23 AM · Restricted Project

Mon, Oct 28

SjoerdMeijer added a comment to D69508: [ARM] Add vrev32 NEON fp16 patterns.

Yep! That's what I meant. Neon+fullfp16, this fails to select. This test runs in that configuration now. The RHS check lines are missing because it crashes failing to select.

Without fullfp16 you are right that we would never see a f16 vector, so the patterns would never be used. We would promote the f16's to f32's before that point, I believe.

Mon, Oct 28, 7:23 AM · Restricted Project
SjoerdMeijer added a comment to D69508: [ARM] Add vrev32 NEON fp16 patterns.

These Neon patterns for vrev32.16 appear to be missing, only the i16 patterns are present.

Mon, Oct 28, 6:53 AM · Restricted Project

Sun, Oct 20

SjoerdMeijer added a comment to D69040: [TTI][LV] preferPredicateOverEpilogue.

Thanks, will take that on board, and I will pick this up after the llvm dev conference.

Sun, Oct 20, 4:34 PM · Restricted Project

Fri, Oct 18

SjoerdMeijer committed rG9c155985f17f: [Arm][libsanitizer] Fix arm libsanitizer failure with bleeding edge glibc (authored by SjoerdMeijer).
[Arm][libsanitizer] Fix arm libsanitizer failure with bleeding edge glibc
Fri, Oct 18, 4:02 AM
SjoerdMeijer closed D69104: [Arm][libsanitizer] Fix arm libsanitizer failure with bleeding edge glibc.
Fri, Oct 18, 4:02 AM · Restricted Project, Restricted Project
SjoerdMeijer committed rL375220: [Arm][libsanitizer] Fix arm libsanitizer failure with bleeding edge glibc.
[Arm][libsanitizer] Fix arm libsanitizer failure with bleeding edge glibc
Fri, Oct 18, 4:01 AM

Thu, Oct 17

SjoerdMeijer added a comment to D69040: [TTI][LV] preferPredicateOverEpilogue.

Thinking about this some more, I think it would be best to at least check some features of the loop for legality:

    • no vector widths greater than 128 bits.
  • all vector operations should have the same number of lanes.
Thu, Oct 17, 6:03 AM · Restricted Project

Wed, Oct 16

SjoerdMeijer updated the summary of D69040: [TTI][LV] preferPredicateOverEpilogue.
Wed, Oct 16, 7:21 AM · Restricted Project
SjoerdMeijer updated the summary of D69040: [TTI][LV] preferPredicateOverEpilogue.
Wed, Oct 16, 7:21 AM · Restricted Project
SjoerdMeijer created D69040: [TTI][LV] preferPredicateOverEpilogue.
Wed, Oct 16, 7:21 AM · Restricted Project
SjoerdMeijer committed rL374992: Revert "[HardwareLoops] Optimisation remarks".
Revert "[HardwareLoops] Optimisation remarks"
Wed, Oct 16, 3:55 AM
SjoerdMeijer committed rG5a131889665f: Revert "[HardwareLoops] Optimisation remarks" (authored by SjoerdMeijer).
Revert "[HardwareLoops] Optimisation remarks"
Wed, Oct 16, 3:55 AM
SjoerdMeijer added a reverting change for rGad763751565b: [HardwareLoops] Optimisation remarks: rG5a131889665f: Revert "[HardwareLoops] Optimisation remarks".
Wed, Oct 16, 3:55 AM
SjoerdMeijer accepted D68838: [ARM] Fix arm_neon.h with -flax-vector-conversions=none, part 3.

Yep, thanks again!

Wed, Oct 16, 3:08 AM · Restricted Project
SjoerdMeijer added a comment to D68579: [HardwareLoops] Optimisation remarks.

Thanks for reviewing!

Wed, Oct 16, 2:11 AM · Restricted Project
SjoerdMeijer committed rGad763751565b: [HardwareLoops] Optimisation remarks (authored by SjoerdMeijer).
[HardwareLoops] Optimisation remarks
Wed, Oct 16, 2:09 AM
SjoerdMeijer closed D68579: [HardwareLoops] Optimisation remarks.
Wed, Oct 16, 2:09 AM · Restricted Project
SjoerdMeijer committed rL374980: [HardwareLoops] Optimisation remarks.
[HardwareLoops] Optimisation remarks
Wed, Oct 16, 2:08 AM

Tue, Oct 15

SjoerdMeijer added a comment to D67392: [ARM][ParallelDSP] Change smlad insertion order.

Thanks for clarifying.

Tue, Oct 15, 7:39 AM · Restricted Project
SjoerdMeijer added inline comments to D67392: [ARM][ParallelDSP] Change smlad insertion order.
Tue, Oct 15, 6:31 AM · Restricted Project

Oct 14 2019

SjoerdMeijer added a comment to D68579: [HardwareLoops] Optimisation remarks.

Just wanted to check if we are happy with this as an initial commit?

Oct 14 2019, 12:56 AM · Restricted Project
SjoerdMeijer committed rG52bfa73af841: [docs] loop pragmas: options implying transformations (authored by SjoerdMeijer).
[docs] loop pragmas: options implying transformations
Oct 14 2019, 12:47 AM
SjoerdMeijer closed D66199: [docs] loop pragmas.
Oct 14 2019, 12:47 AM · Restricted Project
SjoerdMeijer committed rL374756: [docs] loop pragmas: options implying transformations.
[docs] loop pragmas: options implying transformations
Oct 14 2019, 12:38 AM

Oct 11 2019

SjoerdMeijer added inline comments to D68862: [ARM] Allocatable Global Register Variables for ARM.
Oct 11 2019, 6:22 AM · Restricted Project, Restricted Project
SjoerdMeijer added a comment to D68862: [ARM] Allocatable Global Register Variables for ARM.

Bit of a drive-by comment, but I can't say I am big fan of all the string matching on the register names. Not sure if this is a fair comment, because I haven't looked closely at it yet, but could we use more the ARM::R[0-9] values more? Perhaps that's difficult from the Clang parts?

Oct 11 2019, 3:22 AM · Restricted Project, Restricted Project

Oct 10 2019

SjoerdMeijer added a comment to D66199: [docs] loop pragmas.

Sure, will do, thanks again for taking a look.

Oct 10 2019, 11:31 AM · Restricted Project
SjoerdMeijer updated the diff for D66199: [docs] loop pragmas.

Thanks! Typo fixed.

Oct 10 2019, 11:12 AM · Restricted Project
SjoerdMeijer accepted D68567: [ARM] VQSUB instruction.

Yep, looks good to me.

Oct 10 2019, 9:18 AM · Restricted Project
SjoerdMeijer added a comment to D66199: [docs] loop pragmas.

I have commit all my pragma patches, so now back to the last bit, this doc update.
This doc change should now reflect our implementation. Are we happy for this to go in?

Oct 10 2019, 1:44 AM · Restricted Project
SjoerdMeijer committed rG80371c74ae63: Recommit "[Clang] Pragma vectorize_width() implies vectorize(enable)" (authored by SjoerdMeijer).
Recommit "[Clang] Pragma vectorize_width() implies vectorize(enable)"
Oct 10 2019, 1:34 AM
SjoerdMeijer committed rL374288: Recommit "[Clang] Pragma vectorize_width() implies vectorize(enable)".
Recommit "[Clang] Pragma vectorize_width() implies vectorize(enable)"
Oct 10 2019, 1:25 AM
SjoerdMeijer accepted D68743: [ARM] Fix arm_neon.h with -flax-vector-conversions=none, part 2..

Thanks again, looks good.

Oct 10 2019, 12:48 AM · Restricted Project

Oct 9 2019

SjoerdMeijer committed rGd1170dbe5831: [LV] Emitting SCEV checks with OptForSize (authored by SjoerdMeijer).
[LV] Emitting SCEV checks with OptForSize
Oct 9 2019, 6:23 AM
SjoerdMeijer closed D68082: [LV] Emitting SCEV checks with OptForSize.
Oct 9 2019, 6:23 AM · Restricted Project
SjoerdMeijer committed rL374166: [LV] Emitting SCEV checks with OptForSize.
[LV] Emitting SCEV checks with OptForSize
Oct 9 2019, 6:23 AM
SjoerdMeijer added a comment to D68082: [LV] Emitting SCEV checks with OptForSize.

Many thanks again for reviewing. I will add the check-not before committing, and as I said, I will follow up soon to address the other improvements opportunities.

Oct 9 2019, 5:59 AM · Restricted Project
SjoerdMeijer updated the diff for D68082: [LV] Emitting SCEV checks with OptForSize.

Cheers, moved the test back to directory LoopVectorize where it once was instead of LoopVectorize\X86 (which was indeed a mistake, and then added the extra runline out of frustration as the skx core was unknown to me).

Oct 9 2019, 4:00 AM · Restricted Project
SjoerdMeijer accepted D68683: ARM] Fix arm_neon.h with -flax-vector-conversions=none.

Nice one, thanks for fixing! I didn't have the bandwidth to look into this.

Oct 9 2019, 12:20 AM · Restricted Project

Oct 8 2019

SjoerdMeijer updated the diff for D68579: [HardwareLoops] Optimisation remarks.

Added two more remarks

Oct 8 2019, 3:58 AM · Restricted Project
SjoerdMeijer updated the diff for D68579: [HardwareLoops] Optimisation remarks.

Thanks for taking a look!
Comments addressed.

Oct 8 2019, 2:09 AM · Restricted Project
SjoerdMeijer accepted D67990: [aarch64] fix generation of fp16 fmls .

Cheers, lgtm

Oct 8 2019, 1:35 AM · Restricted Project

Oct 7 2019

SjoerdMeijer added inline comments to D67990: [aarch64] fix generation of fp16 fmls .
Oct 7 2019, 8:28 AM · Restricted Project
SjoerdMeijer created D68579: [HardwareLoops] Optimisation remarks.
Oct 7 2019, 7:59 AM · Restricted Project
SjoerdMeijer accepted D68566: [ARM] VQADD instructions.

Looks like a nice bit of isel to me.

Oct 7 2019, 5:07 AM · Restricted Project
SjoerdMeijer updated the diff for D68082: [LV] Emitting SCEV checks with OptForSize.

Comments addressed:

  • cleaned up the test case a bit.
  • couldn't reuse an existing run-line, I guess because of -mcpu=skx, but a separate run line seems fine to me.
Oct 7 2019, 2:58 AM · Restricted Project

Oct 5 2019

SjoerdMeijer updated the diff for D68082: [LV] Emitting SCEV checks with OptForSize.

Thanks for your comments and thoughts.

Oct 5 2019, 1:19 AM · Restricted Project

Oct 4 2019

SjoerdMeijer added a comment to D68082: [LV] Emitting SCEV checks with OptForSize.

The earlier "fix at the core" thought was for LV to do something along these lines:

Oct 4 2019, 8:35 AM · Restricted Project
SjoerdMeijer added a comment to D68082: [LV] Emitting SCEV checks with OptForSize.

Apologies for the early ping and for the impatience! But I am just really keen on getting this fixed. :-)

Oct 4 2019, 5:14 AM · Restricted Project

Oct 1 2019

SjoerdMeijer added a comment to D68101: [MC][ELF] Prevent globals with an explicit section from being mergeable.

Hi Ben, this is not really my area of expertise, but it all starts to make some sense to me. I was expecting this transformation to happen earlier, but this is where the magic happens, and this probably where it belongs.
Just to make sure I haven't missed anything, I would like to take this patch and run some numbers, for which I need a little bit of time. If in the mean time someone with some more experience in this area here has a look too, that would be great....

Oct 1 2019, 6:07 AM · Restricted Project
SjoerdMeijer added a comment to D68101: [MC][ELF] Prevent globals with an explicit section from being mergeable.

I am getting up to speed with this... and created this reproducer:

Oct 1 2019, 2:58 AM · Restricted Project

Sep 30 2019

SjoerdMeijer retitled D68082: [LV] Emitting SCEV checks with OptForSize from [LV] Emitting SCEV checks with OptForSize to [SCEV] Don't add Predicates with OptForSize.
Sep 30 2019, 9:21 AM · Restricted Project
SjoerdMeijer updated the diff for D68082: [LV] Emitting SCEV checks with OptForSize.
  • modified SCEV so that it doesn't add Predicates when OptForSize is enabled. This has indeed the consequence that vectorisation will happen, it will scalarise the stores.
  • moved the test case to an exisiting optsize file.
Sep 30 2019, 9:16 AM · Restricted Project

Sep 27 2019

SjoerdMeijer added a comment to D68127: [ARM] Add a vrinta.f16.f16 alias.

Just checking because you mentioned:

Sep 27 2019, 3:50 AM · Restricted Project
SjoerdMeijer updated the diff for D68082: [LV] Emitting SCEV checks with OptForSize.

Thanks for all the pointers and info, that helped me getting back on track.
Hopefully things are actually as simple as the change in this patch.

Sep 27 2019, 3:23 AM · Restricted Project
SjoerdMeijer added a comment to D68101: [MC][ELF] Prevent globals with an explicit section from being mergeable.

Hello James! Thanks for informing us. If this does what you say, it essentially disables global merging with -fdata-sections, then indeed we would get a little bit unhappy about that.
I could run some numbers with this patch, but I can guess what the outcome will be..... I haven't looked at this patch or the PR, but does this need to be default behaviour, or can this e.g. be done under an option?

Sep 27 2019, 1:28 AM · Restricted Project
SjoerdMeijer accepted D67921: [ARM][MVE] Change VCTP operand.

looks good to me

Sep 27 2019, 12:12 AM · Restricted Project

Sep 26 2019

SjoerdMeijer added a reviewer for D68082: [LV] Emitting SCEV checks with OptForSize: uabelho.
Sep 26 2019, 5:41 AM · Restricted Project
SjoerdMeijer created D68082: [LV] Emitting SCEV checks with OptForSize.
Sep 26 2019, 5:39 AM · Restricted Project
SjoerdMeijer added a comment to D67752: [ARM] Loop unrolling preferences for LOB cores.

The added tests check exactly what's being changed/added here, so that's excellent.
I was just wondering if it would be good if we also add a more end-to-end/unit llc test, that shows actual loloop creation for these cases? That is, if these tests are not already there. I had only a very quick look, and am not sure, but you'll probably know.

Sep 26 2019, 1:50 AM

Sep 25 2019

SjoerdMeijer accepted D67957: [ARM] Cortex-M4 schedule additions.

Cheers, nice one, LGTM

Sep 25 2019, 3:00 AM · Restricted Project
SjoerdMeijer added a comment to D67957: [ARM] Cortex-M4 schedule additions.

Looks all like very mechanical changes to me. The only thing I am not a big fan of is this UnsupportedFeatures, but I understood that there is not really a much better alternative.

Sep 25 2019, 1:35 AM · Restricted Project

Sep 24 2019

SjoerdMeijer added a comment to D67392: [ARM][ParallelDSP] Change smlad insertion order.

The current implementation of comparing loads is quadratic, yes, but you could use a different algorithm, like splitting loads into a base pointer plus an offset, and constructing a map from base pointers to load offsets.

Sep 24 2019, 11:31 AM · Restricted Project
SjoerdMeijer added a comment to D67392: [ARM][ParallelDSP] Change smlad insertion order.

I also hadn't though much about complexity, but indeed, function RecordMemoryOps, for example, is a bit of an expensive hobby.
Looking at it again, the bookkeeping looks essential, I don't see an easy way to reduce complexity. Delaying it may help a bit, but fundamentally that won't change much I think.
The usual way to deal with expensive hobbies is to introduce a threshold, and bail if it exceeds that.

Sep 24 2019, 9:31 AM · Restricted Project
SjoerdMeijer added inline comments to D67921: [ARM][MVE] Change VCTP operand.
Sep 24 2019, 3:47 AM · Restricted Project
SjoerdMeijer committed rG0fcb3afb401c: [LV] Forced vectorization with runtime checks and OptForSize (authored by SjoerdMeijer).
[LV] Forced vectorization with runtime checks and OptForSize
Sep 24 2019, 1:03 AM
SjoerdMeijer committed rL372694: [LV] Forced vectorization with runtime checks and OptForSize.
[LV] Forced vectorization with runtime checks and OptForSize
Sep 24 2019, 1:03 AM