SjoerdMeijer (Sjoerd Meijer)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 26 2016, 2:17 AM (142 w, 12 h)

Recent Activity

Today

SjoerdMeijer added a comment to D53315: [ARM] Do not fuse VADD and VMUL, continued (2/2).

With one bonus question, are the fused operations fast on the M7..?

Tue, Oct 16, 8:45 AM
SjoerdMeijer added a comment to D35035: [InstCombine] Prevent memcpy generation for small data size.

Oops, sorry, forgot to reply, and also got distracted by a few other things. I ran one bigger benchmark, and didn't see anything worth mentioning. A first preliminary conclusion: looks like we don't miss much by not doing this lowering here. Disclaimer: I've tested only on Arm targets, definitely not on all interesting architecture combinations, and a handful of benchmarks.

Tue, Oct 16, 8:16 AM
SjoerdMeijer updated the diff for D53315: [ARM] Do not fuse VADD and VMUL, continued (2/2).

Would it be worth adding a test for the M7 though? We seem to be a little lacking in our m-class FP tests.

Tue, Oct 16, 8:02 AM
SjoerdMeijer updated the diff for D53315: [ARM] Do not fuse VADD and VMUL, continued (2/2).

should the f64 version not also be tested?

Tue, Oct 16, 6:49 AM
SjoerdMeijer updated the diff for D53314: [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2).

I think moving VFP4 check into the useFPVMLx method would help make this easier to read.

Tue, Oct 16, 5:45 AM
SjoerdMeijer updated the summary of D53314: [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2).
Tue, Oct 16, 2:01 AM
SjoerdMeijer retitled D53315: [ARM] Do not fuse VADD and VMUL, continued (2/2) from ARM] Do not fuse VADD and VMUL, continued (2/2) to [ARM] Do not fuse VADD and VMUL, continued (2/2).
Tue, Oct 16, 2:01 AM
SjoerdMeijer created D53315: [ARM] Do not fuse VADD and VMUL, continued (2/2).
Tue, Oct 16, 1:59 AM
SjoerdMeijer updated the summary of D53314: [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2).
Tue, Oct 16, 1:56 AM
SjoerdMeijer created D53314: [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2).
Tue, Oct 16, 1:56 AM

Fri, Oct 12

SjoerdMeijer accepted D53067: [AArch64] Swap comparison operands if that enables some folding..

Looks fine to me

Fri, Oct 12, 6:03 AM

Thu, Oct 11

SjoerdMeijer added inline comments to D53067: [AArch64] Swap comparison operands if that enables some folding..
Thu, Oct 11, 5:40 AM
SjoerdMeijer added inline comments to D53067: [AArch64] Swap comparison operands if that enables some folding..
Thu, Oct 11, 12:29 AM

Thu, Oct 4

SjoerdMeijer added a comment to D35035: [InstCombine] Prevent memcpy generation for small data size.

I am doing some experiments with a hack that simply comments out the call to InstCombiner::SimplifyAnyMemTransfer. I've ran 3 smaller benchmarks. 2 didn't show any difference. The 3rd shows 1 tiny regression and 1 tiny improvement in a test case, and will probably not even show a difference in the geomean. I am doing this as a background task; tomorrow I will ran bigger benchmarks, on different platforms. But I guess we need some numbers for non-Arm platforms too.

Thu, Oct 4, 8:33 AM
SjoerdMeijer committed rC343758: [AArch64][ARM] Context sensitive meaning of crypto.
[AArch64][ARM] Context sensitive meaning of crypto
Thu, Oct 4, 12:41 AM
SjoerdMeijer committed rL343758: [AArch64][ARM] Context sensitive meaning of crypto.
[AArch64][ARM] Context sensitive meaning of crypto
Thu, Oct 4, 12:40 AM
SjoerdMeijer closed D50179: [AArch64][ARM] Context sensitive meaning of option "crypto".
Thu, Oct 4, 12:40 AM
SjoerdMeijer added a comment to D50179: [AArch64][ARM] Context sensitive meaning of option "crypto".

Thanks!

Thu, Oct 4, 12:39 AM

Tue, Oct 2

SjoerdMeijer added a comment to D35035: [InstCombine] Prevent memcpy generation for small data size.

The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081.

Tue, Oct 2, 10:46 AM
SjoerdMeijer added a comment to D50179: [AArch64][ARM] Context sensitive meaning of option "crypto".

@efriedma : apologies for the ping, but does this look reasonable?

Tue, Oct 2, 12:54 AM
SjoerdMeijer accepted D52486: [AArch64][v8.5A] Add MTE as an optional AArch64 extension.

Looks okay to me.

Tue, Oct 2, 12:53 AM

Wed, Sep 26

SjoerdMeijer added inline comments to D52486: [AArch64][v8.5A] Add MTE as an optional AArch64 extension.
Wed, Sep 26, 3:33 AM
SjoerdMeijer accepted D52463: [ARM] Fix for PR39060.

Thanks for clarifying! Looks good, with only a nit.

Wed, Sep 26, 2:47 AM
SjoerdMeijer accepted D52488: [AArch64][v8.5A] Add Memory Tagging system registers.

LGTM

Wed, Sep 26, 12:52 AM
SjoerdMeijer accepted D52493: [AArch64][v8.5A] Test clang option for the Memory Tagging Extension.

LGTM

Wed, Sep 26, 12:51 AM
SjoerdMeijer accepted D52492: [AArch64][v8.5A] Test optional Armv8.5-A random number extension.

LGTM

Wed, Sep 26, 12:47 AM
SjoerdMeijer accepted D52481: [AArch64][v8.5A] Add Armv8.5-A random number instructions.

LGTM

Wed, Sep 26, 12:41 AM
SjoerdMeijer accepted D52482: [AArch64][v8.5A] Add speculation restriction system registers.

Looks okay to me.

Wed, Sep 26, 12:39 AM

Tue, Sep 25

SjoerdMeijer accepted D52478: [AArch64][AsmParser] Show name of missing feature for system instructions.

Looks okay to me.

Tue, Sep 25, 9:25 AM
SjoerdMeijer accepted D52491: [ARM/AArch64][v8.5A] Add Armv8.5-A target.

Looks okay to me

Tue, Sep 25, 9:18 AM
SjoerdMeijer added inline comments to D52463: [ARM] Fix for PR39060.
Tue, Sep 25, 8:29 AM

Mon, Sep 24

SjoerdMeijer added a comment to D52257: [Thumb1] Any imm of i8 type on Thumb1 should have cost of 1.

Yep, sounds good, cheers.

Mon, Sep 24, 7:52 AM
SjoerdMeijer committed rL342874: [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33.
[ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33
Mon, Sep 24, 7:24 AM
This revision was not accepted when it landed; it landed in state Needs Review.
Mon, Sep 24, 7:24 AM
SjoerdMeijer committed rL342862: [ARM][AArch64] Add feature +fp16fml.
[ARM][AArch64] Add feature +fp16fml
Mon, Sep 24, 7:23 AM
SjoerdMeijer committed rC342862: [ARM][AArch64] Add feature +fp16fml.
[ARM][AArch64] Add feature +fp16fml
Mon, Sep 24, 7:22 AM
SjoerdMeijer closed D50229: [ARM][AArch64] Add feature +fp16fml.
Mon, Sep 24, 7:22 AM
SjoerdMeijer updated the diff for D50179: [AArch64][ARM] Context sensitive meaning of option "crypto".

Added FIXMEs, like in D50229, that this needs reimplementation too after the TargerParser rewrite.

Mon, Sep 24, 7:19 AM

Fri, Sep 21

SjoerdMeijer updated the diff for D52289: [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33.

Thanks for the reviews.
Now take code size into account, and removed outdated comment.

Fri, Sep 21, 7:45 AM
SjoerdMeijer updated the diff for D50229: [ARM][AArch64] Add feature +fp16fml.

Added FIXMEs.

Fri, Sep 21, 3:47 AM
SjoerdMeijer commandeered D50229: [ARM][AArch64] Add feature +fp16fml.
Fri, Sep 21, 3:42 AM
SjoerdMeijer added a comment to D50229: [ARM][AArch64] Add feature +fp16fml.

Ah, and just for your info, the proposal was just sent to the dev list:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Fri, Sep 21, 3:12 AM
SjoerdMeijer added a comment to D50229: [ARM][AArch64] Add feature +fp16fml.

(I am now picking this up, and will try to progress this patch and also D50179)

Fri, Sep 21, 2:37 AM
SjoerdMeijer accepted D51983: [ARM] bottom-top mul support in ARMParallelDSP.
Fri, Sep 21, 1:55 AM
SjoerdMeijer added a comment to D52257: [Thumb1] Any imm of i8 type on Thumb1 should have cost of 1.

Ah, but looking a bit closer now, I am not sure this is the right thing to do. This changes makes any 8 bit value cheap, including negative numbers. And I am not sure if this is the right thing to do, since the Thumb1 immediates are positive numbers. It looks this a workaround for store-merging interacting badly with constant hoisting.

Fri, Sep 21, 1:08 AM
SjoerdMeijer added a comment to D52257: [Thumb1] Any imm of i8 type on Thumb1 should have cost of 1.

I think the test file can use a bit of clean up: we don't the attributes, metadata, etc. But more importantly, can it perhaps be further reduced? Do we need all this code?

Fri, Sep 21, 1:01 AM

Thu, Sep 20

SjoerdMeijer added a comment to D52289: [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33.

Good point. I wanted to worry about that later in a follow up patch, but perhaps that doesn't make sense. I will fix it now.

Thu, Sep 20, 3:23 AM
SjoerdMeijer updated the diff for D52289: [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33.

Reshuffled the tests a bit.

Thu, Sep 20, 3:00 AM
SjoerdMeijer created D52289: [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33.
Thu, Sep 20, 1:53 AM

Wed, Sep 19

SjoerdMeijer abandoned D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

This is done better in D35035 (and is no longer stalled).

Wed, Sep 19, 6:07 AM

Mon, Sep 17

SjoerdMeijer accepted D52102: [ARM] Disallow icmp with negative imm and overflow.

LGTM

Mon, Sep 17, 6:13 AM
SjoerdMeijer accepted D52080: [ARM] Cleanup ARM CGP isSupportedValue.

Looks like a straightforward addition/cleanup to me.

Mon, Sep 17, 6:09 AM
SjoerdMeijer added a comment to D35035: [InstCombine] Prevent memcpy generation for small data size.

I would quickly like to check the status of this patch: do you have plans to continue this work? If not, I would like to pick it up.

Mon, Sep 17, 5:56 AM
SjoerdMeijer added a comment to D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

Cool, I am going to pick that one up then. Cheers.

Mon, Sep 17, 5:48 AM
SjoerdMeijer added a comment to D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

One more thought then on this:

Mon, Sep 17, 5:11 AM
SjoerdMeijer added a comment to D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

Do keep in mind that there is more than one backend, more than one target architecture.

Mon, Sep 17, 1:41 AM
SjoerdMeijer added a comment to D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

Thanks for the feedback and suggestions! Summarising where we are:

  • I think this WIP patch generates the code that we want, for 32 bit and 64. In the 64-bit backends, the 8 byte memcpy is expandend to just a load and store (see also comments below about the backend dealing with the memcpy). So what I said earlier, that we also generate the libcall for X86 and AArch64 with -Oz, this wasn't true due to a problem in my test.
  • But the main problem now is that we are not happy with the current implementation.
Mon, Sep 17, 1:15 AM

Sep 14 2018

SjoerdMeijer added a comment to D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

Thanks both, those are fair points.

Sep 14 2018, 4:45 AM
SjoerdMeijer updated the diff for D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

fixed the test case.

Sep 14 2018, 3:17 AM
SjoerdMeijer created D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.
Sep 14 2018, 2:52 AM

Sep 13 2018

SjoerdMeijer accepted D51983: [ARM] bottom-top mul support in ARMParallelDSP.
Sep 13 2018, 8:11 AM
SjoerdMeijer accepted D52032: [ARM] Fix FixConsts for ARMCodeGenPrepare.

Lgtm

Sep 13 2018, 7:40 AM
SjoerdMeijer accepted D51978: [ARM] Allow truncs as sources in ARM CGP.

LGTM

Sep 13 2018, 1:35 AM

Sep 12 2018

SjoerdMeijer added inline comments to D51429: [AArch64] Return Address Signing B Key Support.
Sep 12 2018, 6:23 AM
SjoerdMeijer accepted D51044: [PatternMatch] Use generic One,Two,ThreeOps_match classes (NFC)..

Looks like a reasonable clean up to me.

Sep 12 2018, 5:11 AM

Sep 11 2018

SjoerdMeijer accepted D50758: [ARM] Allow bitcasts in ARMCodeGenPrepare.

LGTM

Sep 11 2018, 7:22 AM
SjoerdMeijer added inline comments to D50758: [ARM] Allow bitcasts in ARMCodeGenPrepare.
Sep 11 2018, 7:02 AM
SjoerdMeijer accepted D51424: [ARM] Exchange MAC operands in ARMParallelDSP.

That's a cool new trick!

Sep 11 2018, 6:49 AM
SjoerdMeijer accepted D51477: [AArch64] Add parsing of aarch64_vector_pcs attribute..

Looks like a straightforward addition to me.

Sep 11 2018, 6:19 AM
SjoerdMeijer accepted D51920: [ARM] Enable ARMCodeGenPrepare by default.

We've had the pass enabled downstream for a couple of weeks and it seems to be okay, so enable it by default.

Sep 11 2018, 5:41 AM

Aug 29 2018

SjoerdMeijer added inline comments to D50685: [AArch64] Support conversion between fp16 and fp128.
Aug 29 2018, 12:15 AM

Aug 28 2018

SjoerdMeijer updated subscribers of D50685: [AArch64] Support conversion between fp16 and fp128.

This looks good to me, but it's not really my area of expertise, so would appreciate if someone else can have a look too. Perhaps @olista01 or @efriedma ?

Aug 28 2018, 12:39 AM

Aug 23 2018

SjoerdMeijer accepted D51093: [ARM] Set __ARM_FEATURE_SIMD32 for +dsp cores.

Looks reasonable to me.

Aug 23 2018, 8:21 AM

Aug 22 2018

SjoerdMeijer accepted D51101: [ARM] Add smlald support in ARMParallelDSP.

Looks like a straightforward addition to me. Cheers!

Aug 22 2018, 6:51 AM
SjoerdMeijer accepted D51034: [ARM] Rotated operand patterns for *xtb16.

Looks like a straightforward fix to me.

Aug 22 2018, 12:34 AM

Aug 17 2018

SjoerdMeijer accepted D50885: [AArch64][SVE] Asm: Add SVE System registers.

Thanks, looks good to me.

Aug 17 2018, 6:55 AM
SjoerdMeijer added inline comments to D50885: [AArch64][SVE] Asm: Add SVE System registers.
Aug 17 2018, 2:32 AM
SjoerdMeijer committed rL339997: [ARM][NFC] ARMCodeGenPrepare: some refactoring and algorithm description.
[ARM][NFC] ARMCodeGenPrepare: some refactoring and algorithm description
Aug 17 2018, 12:34 AM
SjoerdMeijer closed D50846: [ARM][NFC] ARMCodeGenPrepare: some refactoring and algorithm description..
Aug 17 2018, 12:34 AM
SjoerdMeijer accepted D50228: [ARM/AArch64] Support FP16 +fp16fml instructions.

LGTM

Aug 17 2018, 12:23 AM

Aug 16 2018

SjoerdMeijer created D50846: [ARM][NFC] ARMCodeGenPrepare: some refactoring and algorithm description..
Aug 16 2018, 8:20 AM
SjoerdMeijer accepted D50759: [ARM] Allow zext in ARMCodeGenPrepare.

Looks okay to me.

Aug 16 2018, 3:54 AM
SjoerdMeijer accepted D50762: [ARM] Ignore GEPs in ARMCodeGenPrepare.

Looks like a straightforward fix to me.

Aug 16 2018, 3:44 AM

Aug 15 2018

SjoerdMeijer accepted D50769: [ARM] Typesize lower bound for ARMCodeGenPrepare.

Looks like a straightforward fix to me.

Aug 15 2018, 6:11 AM
SjoerdMeijer accepted D50067: [ARM] Handle signed icmps in ARMCodeGenPrepare.

LGTM

Aug 15 2018, 1:14 AM

Aug 14 2018

SjoerdMeijer added inline comments to D50067: [ARM] Handle signed icmps in ARMCodeGenPrepare.
Aug 14 2018, 9:02 AM
SjoerdMeijer accepted D50054: [ARM] Allow pointer values in ARMCodeGenPrepare.

Looks okay to me

Aug 14 2018, 8:10 AM
SjoerdMeijer added a comment to D50054: [ARM] Allow pointer values in ARMCodeGenPrepare.

Nit picking the subject/description:

Aug 14 2018, 7:38 AM
SjoerdMeijer committed rL339645: [ARM] ParallelDSP: add option to enable/disable the pass.
[ARM] ParallelDSP: add option to enable/disable the pass
Aug 14 2018, 12:44 AM
SjoerdMeijer closed D50511: [ARM] ParallelDSP: add option to disable the pass.
Aug 14 2018, 12:44 AM
SjoerdMeijer added a comment to D50511: [ARM] ParallelDSP: add option to disable the pass.

Thanks for reviewing.

Aug 14 2018, 12:40 AM

Aug 10 2018

SjoerdMeijer accepted D50518: [ARM] Disallow zexts in ARMCodeGenPrepare.

These changes look reasonable to me.

Aug 10 2018, 6:26 AM
SjoerdMeijer added inline comments to D50518: [ARM] Disallow zexts in ARMCodeGenPrepare.
Aug 10 2018, 3:48 AM
SjoerdMeijer accepted D50252: [ARM] Added FP16 VREV Vector Instrinsic CodeGen support.

Looks like a straight forward fix to me now.

Aug 10 2018, 3:18 AM

Aug 9 2018

SjoerdMeijer updated the diff for D50179: [AArch64][ARM] Context sensitive meaning of option "crypto".
Aug 9 2018, 1:16 PM
SjoerdMeijer updated the diff for D50179: [AArch64][ARM] Context sensitive meaning of option "crypto".

fixed typo

Aug 9 2018, 12:58 PM
SjoerdMeijer added inline comments to D50179: [AArch64][ARM] Context sensitive meaning of option "crypto".
Aug 9 2018, 8:33 AM
SjoerdMeijer created D50511: [ARM] ParallelDSP: add option to disable the pass.
Aug 9 2018, 7:26 AM
SjoerdMeijer committed rC339347: [AArch64][NFC] better matching of AArch64 target in aarch64-cpus.c tests.
[AArch64][NFC] better matching of AArch64 target in aarch64-cpus.c tests
Aug 9 2018, 7:08 AM
SjoerdMeijer committed rL339347: [AArch64][NFC] better matching of AArch64 target in aarch64-cpus.c tests.
[AArch64][NFC] better matching of AArch64 target in aarch64-cpus.c tests
Aug 9 2018, 7:08 AM