Page MenuHomePhabricator
Feed Advanced Search

Fri, Apr 9

SjoerdMeijer added inline comments to D99272: [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR..
Fri, Apr 9, 11:12 AM · Restricted Project
SjoerdMeijer accepted D99662: [AArch64] Add Machine InstCombiner patterns for FMUL indexed variant.

Thanks, LGTM

Fri, Apr 9, 10:53 AM · Restricted Project

Thu, Apr 8

SjoerdMeijer added inline comments to D99490: [NFC][LoopUnswitch] Move hasPartialIVCondition to LoopUtils.
Thu, Apr 8, 4:57 AM · Restricted Project
SjoerdMeijer added a comment to D99662: [AArch64] Add Machine InstCombiner patterns for FMUL indexed variant.

I think Dave also argued that this patch makes a lot of sense. Thus, I think left to do is addressing the previous nits.

Thu, Apr 8, 12:36 AM · Restricted Project

Wed, Apr 7

SjoerdMeijer commandeered D93838: [LLVM] [SCCP] : Add Function Specialization pass.

Thanks for your comments. I will take over this work, but will first address the remarks in D93762. After that, I will return to this and then try to address your remarks asap.

Wed, Apr 7, 11:45 AM · Restricted Project
SjoerdMeijer added a comment to D99662: [AArch64] Add Machine InstCombiner patterns for FMUL indexed variant.

Sorry for the delay, mostly nits inlined, one question about missing f16 tests.

Wed, Apr 7, 11:23 AM · Restricted Project

Tue, Apr 6

SjoerdMeijer committed rGd5f1131c812d: [AArch64] Default to zero-cycle-zeroing FP registers (authored by SjoerdMeijer).
[AArch64] Default to zero-cycle-zeroing FP registers
Tue, Apr 6, 1:48 AM
SjoerdMeijer closed D99586: [AArch64] Default to zero-cycle-zeroing FP registers..
Tue, Apr 6, 1:48 AM · Restricted Project
SjoerdMeijer committed rGef05b08c612d: [AArch64] Use 64-bit movi for zeroing halfs/floats (authored by SjoerdMeijer).
[AArch64] Use 64-bit movi for zeroing halfs/floats
Tue, Apr 6, 12:43 AM
SjoerdMeijer closed D99710: [AArch64] Use 64-bit movi for zeroing halfs/floats.
Tue, Apr 6, 12:42 AM · Restricted Project

Thu, Apr 1

SjoerdMeijer added a comment to D93762: SCCP: Refactor SCCPSolver.

Ah okay, thanks, got it! Makes perfect sense to me.
I will start looking into that.

Thu, Apr 1, 10:28 AM · Restricted Project
SjoerdMeijer added a comment to D99710: [AArch64] Use 64-bit movi for zeroing halfs/floats.

Thanks, and I will wait a few days with committing.

Thu, Apr 1, 9:15 AM · Restricted Project
SjoerdMeijer added inline comments to D99586: [AArch64] Default to zero-cycle-zeroing FP registers..
Thu, Apr 1, 9:14 AM · Restricted Project
SjoerdMeijer updated the diff for D99586: [AArch64] Default to zero-cycle-zeroing FP registers..

Adjust this to D99710, that uses movi d0 that zeros 64 bits and not 128 bits, which enables this as a default for all cores.

Thu, Apr 1, 7:53 AM · Restricted Project
SjoerdMeijer added a comment to D99596: [RFC] [LoopDist] Distribute vectorizable loops.

By just looking at this patch I find it a bit difficult to get an overview of all moving parts involved. I.e., this makes probably sense:

Thu, Apr 1, 6:22 AM · Restricted Project
SjoerdMeijer added a comment to D99723: [ARM] Transforming memcpy to Tail predicated Loop.

I know you've worked on this for a while and investigated different strategies, but I think we also need to argue here why we would like to emit a memcpy loop instead of e.g. having optimised versions in the clib. In other words, is this the best we can do for all different alignments, sizes, etc.?

Thu, Apr 1, 6:13 AM · Restricted Project
SjoerdMeijer updated the diff for D99710: [AArch64] Use 64-bit movi for zeroing halfs/floats.

Now using "movi d0".

Thu, Apr 1, 5:49 AM · Restricted Project
SjoerdMeijer added a comment to D99710: [AArch64] Use 64-bit movi for zeroing halfs/floats.

I think the exact suggestion was to use MOVID instead. I'm not sure how much it matters, but it may be a simpler instruction for some cores. This would then match what GCC emits.

Thu, Apr 1, 5:06 AM · Restricted Project
SjoerdMeijer added inline comments to D99710: [AArch64] Use 64-bit movi for zeroing halfs/floats.
Thu, Apr 1, 3:27 AM · Restricted Project
SjoerdMeijer accepted D99588: [ARM] Allow v6m runtime loop unrolling.

Nice one, thanks.

Thu, Apr 1, 2:39 AM · Restricted Project
SjoerdMeijer requested review of D99710: [AArch64] Use 64-bit movi for zeroing halfs/floats.
Thu, Apr 1, 2:30 AM · Restricted Project

Wed, Mar 31

SjoerdMeijer added inline comments to D99662: [AArch64] Add Machine InstCombiner patterns for FMUL indexed variant.
Wed, Mar 31, 11:45 AM · Restricted Project
SjoerdMeijer updated the diff for D93762: SCCP: Refactor SCCPSolver.

Applied clang-format.

Wed, Mar 31, 11:11 AM · Restricted Project
SjoerdMeijer added a comment to D99586: [AArch64] Default to zero-cycle-zeroing FP registers..

After some more discussions, it turns out the original revision was doing the right thing. Except that we should be using the .2s variant as that may be more efficient on some cores.

Wed, Mar 31, 8:01 AM · Restricted Project
SjoerdMeijer updated the diff for D93762: SCCP: Refactor SCCPSolver.

Rebase to get this applied and compiling again.

Wed, Mar 31, 3:51 AM · Restricted Project
SjoerdMeijer commandeered D93762: SCCP: Refactor SCCPSolver.

Taking over this work.

Wed, Mar 31, 3:48 AM · Restricted Project
SjoerdMeijer updated the diff for D99586: [AArch64] Default to zero-cycle-zeroing FP registers..

This sets FeatureNoZCZeroingFP for some older cores.

Wed, Mar 31, 1:18 AM · Restricted Project

Tue, Mar 30

SjoerdMeijer updated the summary of D99586: [AArch64] Default to zero-cycle-zeroing FP registers..
Tue, Mar 30, 12:01 PM · Restricted Project
SjoerdMeijer added a comment to D99586: [AArch64] Default to zero-cycle-zeroing FP registers..

Fair enough, let's refrain from micro-architectural details. But the point is that zero-cost zeroing idioms are supported on integer operations, which is why this is preferred. This should always gives the same or better performance, but it looks like you found a bit of corner case with dual issuing, which is a bit surprising but perhaps makes some sense for smaller in-order cores. I will add FeatureNoZCZeroingFP to the A55's description.

Tue, Mar 30, 11:59 AM · Restricted Project
SjoerdMeijer added a comment to D99588: [ARM] Allow v6m runtime loop unrolling.

Just a query on the context of this work: this wasn't enabled at that time because of some regressions. How does that look now? Does this work rely on some fixes to address that, or has the picture changed?

Tue, Mar 30, 6:38 AM · Restricted Project
SjoerdMeijer retitled D99586: [AArch64] Default to zero-cycle-zeroing FP registers. from [AArch64] Default to zero-cycling-zeroing FP registers. to [AArch64] Default to zero-cycle-zeroing FP registers..
Tue, Mar 30, 6:14 AM · Restricted Project
SjoerdMeijer requested review of D99586: [AArch64] Default to zero-cycle-zeroing FP registers..
Tue, Mar 30, 6:13 AM · Restricted Project

Wed, Mar 24

SjoerdMeijer added a comment to D99272: [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR..

Hi Stelios, many thanks for putting this together, good stuff.
I will do a code-review a bit later, but as there's potential for some corner cases here, first a testing question. Did you do a bootstrap build and e.g. ran the llvm test suite?

Wed, Mar 24, 9:09 AM · Restricted Project
SjoerdMeijer added inline comments to D99252: [LoopFlatten] Fix invalid assertion (PR49571).
Wed, Mar 24, 3:12 AM · Restricted Project
SjoerdMeijer accepted D99252: [LoopFlatten] Fix invalid assertion (PR49571).

Yep, agreed, seems reasonable. Thanks for fixing.

Wed, Mar 24, 3:05 AM · Restricted Project
SjoerdMeijer accepted D99251: [NFC] Remove redundant `struct` prefix.

I guess that must have been the C89 programmer in us... ;-)

Wed, Mar 24, 2:56 AM · Restricted Project
SjoerdMeijer accepted D99174: [ARM] Enable UpperBound unrolling for all loops.
Wed, Mar 24, 1:55 AM · Restricted Project

Tue, Mar 23

SjoerdMeijer added inline comments to D99174: [ARM] Enable UpperBound unrolling for all loops.
Tue, Mar 23, 1:38 PM · Restricted Project
SjoerdMeijer added a comment to D93838: [LLVM] [SCCP] : Add Function Specialization pass.

Sorry for the delayed response. Please look at the latest changes which contains the cost model. I have also shared the SPEC CPU 2017 benchmark numbers for various optimization modes we have added.

Tue, Mar 23, 1:00 PM · Restricted Project
SjoerdMeijer added inline comments to D99174: [ARM] Enable UpperBound unrolling for all loops.
Tue, Mar 23, 5:02 AM · Restricted Project

Mon, Mar 22

SjoerdMeijer committed rG7515e81e8c58: [AArch64] Add some float -> int -> float conversion patterns (authored by SjoerdMeijer).
[AArch64] Add some float -> int -> float conversion patterns
Mon, Mar 22, 4:06 AM
SjoerdMeijer closed D98956: [AArch64] Add some float -> int -> float conversion patterns.
Mon, Mar 22, 4:06 AM · Restricted Project
SjoerdMeijer updated the diff for D98956: [AArch64] Add some float -> int -> float conversion patterns.

Thanks Dave, forgot about the unsigned variants, but have added them now as well as the predicates.

Mon, Mar 22, 2:56 AM · Restricted Project

Fri, Mar 19

SjoerdMeijer requested review of D98956: [AArch64] Add some float -> int -> float conversion patterns.
Fri, Mar 19, 8:35 AM · Restricted Project

Thu, Mar 18

SjoerdMeijer committed rG90ecb862a003: [AArch64] Rewrite (add, csel) to cinc (authored by SjoerdMeijer).
[AArch64] Rewrite (add, csel) to cinc
Thu, Mar 18, 1:50 AM
SjoerdMeijer closed D98704: [AArch64] Rewrite (add, csel) to cinc.
Thu, Mar 18, 1:49 AM · Restricted Project
SjoerdMeijer added a comment to D98704: [AArch64] Rewrite (add, csel) to cinc.

Yeah, thanks Dave, I think I will be looking a bit more in this area, but this is a start...

Thu, Mar 18, 1:49 AM · Restricted Project

Wed, Mar 17

SjoerdMeijer added a comment to D98704: [AArch64] Rewrite (add, csel) to cinc.

This now generates cinc, which is even better.

Wed, Mar 17, 9:29 AM · Restricted Project
SjoerdMeijer updated the diff for D98704: [AArch64] Rewrite (add, csel) to cinc.
Wed, Mar 17, 9:23 AM · Restricted Project
SjoerdMeijer added a comment to D98704: [AArch64] Rewrite (add, csel) to cinc.

Thanks for the suggestion Dave. Just did the pen and paper exercise and agree that:

Wed, Mar 17, 2:15 AM · Restricted Project

Tue, Mar 16

SjoerdMeijer accepted D98243: [LV] Account for the cost of predication of scalarized load/store.

This looks very reasonable to me.

Tue, Mar 16, 6:57 AM · Restricted Project
SjoerdMeijer accepted D98210: [ARM] Add VREV MVE shuffle costs.

Looks reasonable to me. Perhaps wait a day in case @RKSimon has more comments.

Tue, Mar 16, 6:42 AM · Restricted Project
SjoerdMeijer accepted D98245: [ARM] Tone down the MVE scalarization overhead.

And the i64 costs go up because they are not legal ops? LGTM.

Tue, Mar 16, 6:28 AM · Restricted Project
SjoerdMeijer updated the summary of D98704: [AArch64] Rewrite (add, csel) to cinc.
Tue, Mar 16, 6:14 AM · Restricted Project
SjoerdMeijer requested review of D98704: [AArch64] Rewrite (add, csel) to cinc.
Tue, Mar 16, 6:13 AM · Restricted Project
SjoerdMeijer accepted D98693: [ARM] Use lrdsb for more thumb1 loads..

Okay, it's a bit of an indirect way, but fair enough I think.

Tue, Mar 16, 5:50 AM · Restricted Project
SjoerdMeijer added a comment to D98693: [ARM] Use lrdsb for more thumb1 loads..

Make sense to me, but was just wondering if you haven't seen any regressions? If the constant is hoisted, it could contribute to higher register pressure and spilling?

Tue, Mar 16, 4:12 AM · Restricted Project

Mon, Mar 15

SjoerdMeijer accepted D98264: [AArch64] Implement __rndr, __rndrrs intrinsics.

Thanks, nice one, LGTM.

Mon, Mar 15, 7:00 AM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D98264: [AArch64] Implement __rndr, __rndrrs intrinsics.
Mon, Mar 15, 2:23 AM · Restricted Project, Restricted Project

Mar 12 2021

SjoerdMeijer added inline comments to D98264: [AArch64] Implement __rndr, __rndrrs intrinsics.
Mar 12 2021, 2:30 AM · Restricted Project, Restricted Project

Mar 11 2021

SjoerdMeijer added inline comments to D98264: [AArch64] Implement __rndr, __rndrrs intrinsics.
Mar 11 2021, 1:08 AM · Restricted Project, Restricted Project
SjoerdMeijer accepted D97729: [ARM] Improve WLS lowering.

Thanks, that definitely helped.
LGTM

Mar 11 2021, 1:05 AM · Restricted Project

Mar 10 2021

SjoerdMeijer added a comment to D93838: [LLVM] [SCCP] : Add Function Specialization pass.

I have also become interested in this work as this regularly shows up as difference between clang and gcc. For the first attempt to add a function specialisation pass in D36432 very rightfully questions were asked about compile-times and code-size, and these questions were repeated for this work. The table https://reviews.llvm.org/D36432#836883 shows good numbers, and I think in order to progress this work we need something similar. I.e. the approach here is a bit different and things may have changed. But this is also mentioned in the description:

Mar 10 2021, 9:08 AM · Restricted Project
SjoerdMeijer added inline comments to D97729: [ARM] Improve WLS lowering.
Mar 10 2021, 5:51 AM · Restricted Project
SjoerdMeijer accepted D98269: [AArch64] Add missing intrinsics for scalar fp rounding.

Thanks, looks very reasonable to me.

Mar 10 2021, 5:00 AM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D98264: [AArch64] Implement __rndr, __rndrrs intrinsics.
Mar 10 2021, 2:42 AM · Restricted Project, Restricted Project

Mar 9 2021

SjoerdMeijer added inline comments to D98264: [AArch64] Implement __rndr, __rndrrs intrinsics.
Mar 9 2021, 10:41 AM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D98269: [AArch64] Add missing intrinsics for scalar fp rounding.
Mar 9 2021, 9:24 AM · Restricted Project, Restricted Project
SjoerdMeijer added a reviewer for D98269: [AArch64] Add missing intrinsics for scalar fp rounding: dmgreen.
Mar 9 2021, 9:16 AM · Restricted Project, Restricted Project
SjoerdMeijer added a reviewer for D98230: [LSR] Add reconciliation of unfoldable offsets: dmgreen.
Mar 9 2021, 9:13 AM · Restricted Project
SjoerdMeijer added inline comments to D98264: [AArch64] Implement __rndr, __rndrrs intrinsics.
Mar 9 2021, 9:09 AM · Restricted Project, Restricted Project
SjoerdMeijer added inline comments to D98264: [AArch64] Implement __rndr, __rndrrs intrinsics.
Mar 9 2021, 8:23 AM · Restricted Project, Restricted Project

Mar 5 2021

SjoerdMeijer added a comment to D98012: [RFC][doc] Document that RISC-V's __fp16 has different behavior.

However we would like have slight different behavior for fp16 other than ACLE: The evaluation format of fp16 set same as _Float16, which means no promotion are performed if there is no hardware half-precision supported.

Mar 5 2021, 3:47 AM · Restricted Project
SjoerdMeijer added a comment to rG9b302513f6d8: [AArch64] Add missing intrinsics for vrnd.

This is fixing https://bugs.llvm.org/show_bug.cgi?id=47829

Mar 5 2021, 3:36 AM
SjoerdMeijer accepted D97949: [AArch64] Add missing intrinsics for vrnd.

Yeah, that's fine, if they are missing, we could that separately.

Mar 5 2021, 3:19 AM · Restricted Project
SjoerdMeijer added a comment to D97949: [AArch64] Add missing intrinsics for vrnd.

Sorry, one more question. This implements the vector variants. How about the scalar ones? Have they been implemented already?

Mar 5 2021, 2:51 AM · Restricted Project
SjoerdMeijer added inline comments to D97949: [AArch64] Add missing intrinsics for vrnd.
Mar 5 2021, 2:22 AM · Restricted Project

Mar 4 2021

SjoerdMeijer added inline comments to D97949: [AArch64] Add missing intrinsics for vrnd.
Mar 4 2021, 11:48 AM · Restricted Project
SjoerdMeijer added inline comments to D97949: [AArch64] Add missing intrinsics for vrnd.
Mar 4 2021, 9:48 AM · Restricted Project
SjoerdMeijer added inline comments to D97949: [AArch64] Add missing intrinsics for vrnd.
Mar 4 2021, 9:20 AM · Restricted Project
SjoerdMeijer added inline comments to D97949: [AArch64] Add missing intrinsics for vrnd.
Mar 4 2021, 9:18 AM · Restricted Project
SjoerdMeijer added inline comments to D97947: [AArch64] Force runtime unrolling for in-order scheduling models.
Mar 4 2021, 6:50 AM · Restricted Project
SjoerdMeijer accepted D97280: [AArch64] Extend vecreduce -> udot handling to mla reductions.

Looks okay to me.

Mar 4 2021, 1:02 AM · Restricted Project

Mar 3 2021

SjoerdMeijer accepted D97775: [AArch64] Add missing intrinsics for vcls.

Thanks, LGTM.

Mar 3 2021, 1:56 AM · Restricted Project

Mar 2 2021

SjoerdMeijer added a comment to D97775: [AArch64] Add missing intrinsics for vcls.

Ah, Dave's remark is a good one. These intrinsics are available in both AAch64 and ARM, and I missed that you covered only ARM here; Dave meant to add these tests for AArch64 too.

Mar 2 2021, 11:58 AM · Restricted Project
SjoerdMeijer added a reviewer for D97775: [AArch64] Add missing intrinsics for vcls: dmgreen.

This is the Clang part. Do we need to add anything for LLVM, for example tests?

Mar 2 2021, 8:30 AM · Restricted Project
SjoerdMeijer added inline comments to D97729: [ARM] Improve WLS lowering.
Mar 2 2021, 3:18 AM · Restricted Project
SjoerdMeijer accepted D97759: [doc] Fix description of _Float16.

Yep, correct.

Mar 2 2021, 2:29 AM · Restricted Project, Restricted Project

Mar 1 2021

SjoerdMeijer abandoned D89894: [AArch64] Backedge indexing.

I am abandoning this in favour of D89693, which I have repurposed to address this, because most of the discussions happened there.

Mar 1 2021, 8:36 AM · Restricted Project
SjoerdMeijer commandeered D89894: [AArch64] Backedge indexing.
Mar 1 2021, 8:35 AM · Restricted Project
SjoerdMeijer updated the diff for D89693: [AArch64] Favor pre-increments and implement TTI::getPreferredAddressingMode.

This is changing our approach to preferring pre-indexed addressing modes, because:

  • this what we want for runtime unrolled loops, which is what we want to address next,
  • post indexed is currently better for some cases, but that's mainly cause because we miss an opportunity in the load/store optimiser. With that fixed, expectation is that pre-indexed gives the same or better perf than post-indexed.
  • for what it is worth, pre-indexed is also the default for ARM,
Mar 1 2021, 8:34 AM · Restricted Project

Feb 25 2021

SjoerdMeijer added a comment to D89693: [AArch64] Favor pre-increments and implement TTI::getPreferredAddressingMode.

Thanks for commenting Dave. I will have another go at this, and try to come up with a better analysis, at least one we understand.

Feb 25 2021, 4:58 AM · Restricted Project

Feb 23 2021

SjoerdMeijer added a comment to D82692: [ARM] do not consider sp as deprecated for ldm/stm.

Sorry missed the earlier message, but this has now been committed.

Feb 23 2021, 5:30 AM · Restricted Project
SjoerdMeijer committed rGe1c3bf6afe09: [ARM] do not consider sp as deprecated for ldm/stm (authored by SjoerdMeijer).
[ARM] do not consider sp as deprecated for ldm/stm
Feb 23 2021, 5:27 AM
SjoerdMeijer closed D82692: [ARM] do not consider sp as deprecated for ldm/stm.
Feb 23 2021, 5:27 AM · Restricted Project

Feb 22 2021

SjoerdMeijer added inline comments to D97188: [AArch64] Add patterns for add(udot(0, x, y), z) -> udot(z, x, y)..
Feb 22 2021, 5:33 AM · Restricted Project
SjoerdMeijer updated the diff for D89693: [AArch64] Favor pre-increments and implement TTI::getPreferredAddressingMode.

This abandons the idea of looking at the IV, and incorporates D89894 to look at the pointer uses.

Feb 22 2021, 12:41 AM · Restricted Project

Feb 19 2021

SjoerdMeijer added inline comments to D89693: [AArch64] Favor pre-increments and implement TTI::getPreferredAddressingMode.
Feb 19 2021, 9:41 AM · Restricted Project
SjoerdMeijer added inline comments to D89693: [AArch64] Favor pre-increments and implement TTI::getPreferredAddressingMode.
Feb 19 2021, 8:46 AM · Restricted Project
SjoerdMeijer requested review of D97050: [LoopInfo] Look through trunc instructions.
Feb 19 2021, 7:35 AM · Restricted Project