Page MenuHomePhabricator

NickGuy (Nicholas Guy)
User

Projects

User does not belong to any projects.

User Details

User Since
Jun 23 2020, 7:14 AM (100 w, 3 d)

Recent Activity

Mon, May 23

NickGuy accepted D126118: [ARM] Fix vcvtb/t.f16 input liveness.

LGTM

Mon, May 23, 2:35 AM · Restricted Project, Restricted Project

Apr 8 2022

NickGuy added inline comments to D114174: [ARM][CodeGen] Add support for complex addition and multiplication.
Apr 8 2022, 7:24 AM · Restricted Project, Restricted Project
NickGuy updated the diff for D114174: [ARM][CodeGen] Add support for complex addition and multiplication.
Apr 8 2022, 7:24 AM · Restricted Project, Restricted Project

Apr 4 2022

NickGuy accepted D123012: [AArch64] Alter mull buildvectors(ext(..)) combine to work on shuffles.

LGTM

Apr 4 2022, 5:20 AM · Restricted Project, Restricted Project
NickGuy added a comment to D123012: [AArch64] Alter mull buildvectors(ext(..)) combine to work on shuffles.

Code looks good to me, but a small nitpick on the function name

Apr 4 2022, 12:43 AM · Restricted Project, Restricted Project

Mar 31 2022

NickGuy committed rG7d676714fbf2: [AArch64] Set MaxBytesForLoopAlignment for more targets (authored by NickGuy).
[AArch64] Set MaxBytesForLoopAlignment for more targets
Mar 31 2022, 3:37 AM · Restricted Project, Restricted Project
NickGuy closed D122566: [AArch64] Set MaxBytesForLoopAlignment for more targets.
Mar 31 2022, 3:37 AM · Restricted Project, Restricted Project

Mar 29 2022

NickGuy added inline comments to D122566: [AArch64] Set MaxBytesForLoopAlignment for more targets.
Mar 29 2022, 9:24 AM · Restricted Project, Restricted Project
NickGuy updated the diff for D122566: [AArch64] Set MaxBytesForLoopAlignment for more targets.
Mar 29 2022, 9:23 AM · Restricted Project, Restricted Project

Mar 28 2022

NickGuy updated the diff for D122566: [AArch64] Set MaxBytesForLoopAlignment for more targets.
Mar 28 2022, 6:45 AM · Restricted Project, Restricted Project
NickGuy requested review of D122566: [AArch64] Set MaxBytesForLoopAlignment for more targets.
Mar 28 2022, 2:29 AM · Restricted Project, Restricted Project

Mar 21 2022

NickGuy updated the diff for D114174: [ARM][CodeGen] Add support for complex addition and multiplication.

It's been longer than I would've liked before I've gotten back to this, but I've;

Mar 21 2022, 6:55 AM · Restricted Project, Restricted Project

Feb 17 2022

NickGuy accepted D120018: [AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors.

I wasn't aware of the implicit truncation. In that case, LGTM

Feb 17 2022, 5:53 AM · Restricted Project
NickGuy added a comment to D120018: [AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors.

Looks good overall, with potential wins in both performance and codesize. Just 1 minor thing (and a nitpick about a comment).

Feb 17 2022, 2:36 AM · Restricted Project

Feb 10 2022

NickGuy added a comment to D114174: [ARM][CodeGen] Add support for complex addition and multiplication.

Thanks all for the comments so far (And thanks Dave for taking on the evening shift, as it were)

Feb 10 2022, 2:18 AM · Restricted Project, Restricted Project

Feb 9 2022

NickGuy added a comment to D114174: [ARM][CodeGen] Add support for complex addition and multiplication.

...
I'm not exactly sure why to do target specific pattern matching here. We could simply add generic complex intrinsics and the different patterns could be matched at each archs ISEL, no? I do agree that it should check in the backend if it should generate a complex operation of a given vector type.
...
Is there any strong reason to create target specific intrinsics instead of having generic intrinsics with polymorphism? Such as llvm.complex.add.v2f32?

Feb 9 2022, 3:31 AM · Restricted Project, Restricted Project

Feb 8 2022

NickGuy added a comment to D114174: [ARM][CodeGen] Add support for complex addition and multiplication.

The supportsComplexArithmetic method doesn't add a lot on its own. It would probably be better to represent it in terms of asking the backend if it has certain patterns of certain types available. An f32 fcadd, for example, or a i16 fcmul_with_rotate (I'm not sure what to name them).
The ComplexArithmeticGraph seems like internal details to the pass. It would be best not to pass it over the TTI interface boundary if we can. The backend just needs to create the correct intrinsics of the correct type.
Equally I would keep all the matching logic in the pass, not through matchComplexArithmeticIR. That would be similar to, for example, the MachineCombiner pass which has its own enum of a set of patterns.

Unfortunately, not all architectures/intrinsics follow the same patterns, so this was done to provide a way for architecture-specific stuff to be handled as needed. The MVE halving complex add (which I only now realise was included in this patch) is a good example of this; It only works on integers, and requires some extra instructions to be matched. This instruction is only in MVE, it is not part of NEON in any way.
The intrinsics that are common are also constructed differently; While the NEON intrinsics are separated out by their rotations (aarch64_neon_vcmla_rot0, aarch64_neon_vcmla_rot90), MVE squishes them into one and determines the rotation based on a parameter.
ComplexArithmeticGraph is used to encapsulate all the data the TTI might need in order to correctly generate the right intrinsic. It could be replaced by something a bit lighter (and that doesn't expose the implementation detail of the node list), but something will be needed to provide this info.

Feb 8 2022, 2:23 AM · Restricted Project, Restricted Project

Feb 7 2022

NickGuy updated the diff for D114174: [ARM][CodeGen] Add support for complex addition and multiplication.

Applied feedback received offline, redesigned the interface/implementation, and fixed up a number of fundamental issues in the previous iteration.

Feb 7 2022, 5:33 AM · Restricted Project, Restricted Project

Jan 5 2022

NickGuy committed rG13992498cd96: [AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops (authored by NickGuy).
[AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops
Jan 5 2022, 4:54 AM
NickGuy committed rG73d92faa2fc0: [CodeGen] Emit alignment "Max Skip" operand (authored by NickGuy).
[CodeGen] Emit alignment "Max Skip" operand
Jan 5 2022, 4:54 AM
NickGuy closed D114879: [AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops.
Jan 5 2022, 4:54 AM · Restricted Project
NickGuy closed D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.
Jan 5 2022, 4:54 AM · Restricted Project

Dec 15 2021

NickGuy updated the diff for D114879: [AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops.
Dec 15 2021, 3:37 AM · Restricted Project
NickGuy updated the diff for D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.
Dec 15 2021, 3:36 AM · Restricted Project

Dec 13 2021

NickGuy updated the diff for D114879: [AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops.
Dec 13 2021, 3:57 AM · Restricted Project
NickGuy updated the diff for D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.

Fixes the test to pass with this patch in isolation (i.e. without the child revision on top)

Dec 13 2021, 3:53 AM · Restricted Project

Dec 9 2021

NickGuy updated the diff for D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.
Dec 9 2021, 6:37 AM · Restricted Project

Dec 7 2021

NickGuy updated the diff for D114879: [AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops.
Dec 7 2021, 2:34 AM · Restricted Project

Dec 3 2021

NickGuy added inline comments to D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.
Dec 3 2021, 6:37 AM · Restricted Project
NickGuy updated the diff for D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.

It may be better to fold this into D114879 so that there is a single patch that can be tested properly.

Dec 3 2021, 6:36 AM · Restricted Project

Dec 1 2021

NickGuy added a comment to D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.

Okay, that's fine. Perhaps upload your WIP patches too then, stack them, so that we can actually see the (full) approach?

Dec 1 2021, 8:09 AM · Restricted Project
NickGuy requested review of D114879: [AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops.
Dec 1 2021, 7:51 AM · Restricted Project
NickGuy updated the diff for D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.
Dec 1 2021, 7:48 AM · Restricted Project
NickGuy added a comment to D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.

I've got a follow-up patch nearly ready with tests, yep. I want to settle on an approach here before implementing the other side of it.

Dec 1 2021, 3:07 AM · Restricted Project
NickGuy updated the diff for D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.
Dec 1 2021, 2:58 AM · Restricted Project

Nov 25 2021

NickGuy added a comment to D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.

Do we really want to add this to the base Align class? Or should it just be part of whatever controls the code alignment?

Nov 25 2021, 6:57 AM · Restricted Project
NickGuy retitled D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives from [CodeGen] Emit alignment "Max Skip" operand to [CodeGen] Emit alignment "Max Skip" operand for align directives.
Nov 25 2021, 6:39 AM · Restricted Project
NickGuy requested review of D114590: [CodeGen] Emit alignment "Max Skip" operand for align directives.
Nov 25 2021, 6:39 AM · Restricted Project

Nov 19 2021

NickGuy updated the diff for D114174: [ARM][CodeGen] Add support for complex addition and multiplication.

Nice one Nick.
Can you first run clang-format on this patch? It is pretty much unreadable with all these lint messages.

Nov 19 2021, 2:29 AM · Restricted Project, Restricted Project

Nov 18 2021

NickGuy requested review of D114174: [ARM][CodeGen] Add support for complex addition and multiplication.
Nov 18 2021, 10:29 AM · Restricted Project, Restricted Project

Sep 28 2021

NickGuy accepted D108766: [AArch64] Model Cortex-A55 Q register NEON instructions.

LGTM

Sep 28 2021, 2:23 AM · Restricted Project

Sep 21 2021

NickGuy committed rG9e4d72675f47: [AArch64] Improve schedule modelling on the Cortex-A55 (authored by NickGuy).
[AArch64] Improve schedule modelling on the Cortex-A55
Sep 21 2021, 5:04 AM
NickGuy closed D109323: [AArch64] Improve adrp schedule modelling on the Cortex-A55.
Sep 21 2021, 5:04 AM · Restricted Project

Sep 10 2021

NickGuy added inline comments to D109323: [AArch64] Improve adrp schedule modelling on the Cortex-A55.
Sep 10 2021, 1:36 AM · Restricted Project
NickGuy updated the diff for D109323: [AArch64] Improve adrp schedule modelling on the Cortex-A55.

Updated test

Sep 10 2021, 1:36 AM · Restricted Project

Sep 8 2021

NickGuy added a comment to D109323: [AArch64] Improve adrp schedule modelling on the Cortex-A55.

OK. Can you add a test? Something that would schedule instructions apart before and keep them together now.

Sep 8 2021, 8:24 AM · Restricted Project
NickGuy updated the diff for D109323: [AArch64] Improve adrp schedule modelling on the Cortex-A55.
Sep 8 2021, 8:24 AM · Restricted Project

Sep 6 2021

NickGuy added inline comments to D109323: [AArch64] Improve adrp schedule modelling on the Cortex-A55.
Sep 6 2021, 8:23 AM · Restricted Project
NickGuy updated the diff for D109323: [AArch64] Improve adrp schedule modelling on the Cortex-A55.
Sep 6 2021, 8:23 AM · Restricted Project
NickGuy requested review of D109323: [AArch64] Improve adrp schedule modelling on the Cortex-A55.
Sep 6 2021, 6:50 AM · Restricted Project

Aug 25 2021

NickGuy committed rG36fcf47fc80d: [AArch64] Generate SMOV in place of sext(fmov(...)) (authored by NickGuy).
[AArch64] Generate SMOV in place of sext(fmov(...))
Aug 25 2021, 7:24 AM
NickGuy closed D108633: [AArch64] Generate smov in place of sext(fmov(...)).
Aug 25 2021, 7:24 AM · Restricted Project
NickGuy updated the diff for D108633: [AArch64] Generate smov in place of sext(fmov(...)).

Addressed comments, adding additional test cases that cover cases with out-of-range indices.

Aug 25 2021, 3:22 AM · Restricted Project

Aug 24 2021

NickGuy requested review of D108633: [AArch64] Generate smov in place of sext(fmov(...)).
Aug 24 2021, 7:47 AM · Restricted Project

Aug 23 2021

NickGuy accepted D108287: [AArch64] Correct store ReadAdrBase operand.
Aug 23 2021, 2:39 AM · Restricted Project

Aug 10 2021

NickGuy accepted D107623: [AArch64] Correct sinking of shuffles to adds/subs.

LGTM

Aug 10 2021, 3:31 AM · Restricted Project

Jul 26 2021

NickGuy accepted D106532: [ARM] Attempt to distribute reductions.

Apart from a personal gripe (which I feel shouldn't stop this being merged), LGTM

Jul 26 2021, 8:44 AM · Restricted Project
NickGuy accepted D106531: [ARM] Turn vecreduce_add(add(x, y)) into vecreduce(x) + vecreduce(y).

LGTM

Jul 26 2021, 8:35 AM · Restricted Project
NickGuy accepted D106797: [ARM] Implement isLoad/StoreFromStackSlot for MVE stack stores accesses.

LGTM

Jul 26 2021, 8:28 AM · Restricted Project

Jul 16 2021

NickGuy committed rG9769535efd56: [AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling (authored by NickGuy).
[AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling
Jul 16 2021, 4:03 AM
NickGuy closed D105882: [AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling.
Jul 16 2021, 4:03 AM · Restricted Project

Jul 13 2021

NickGuy requested review of D105882: [AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling.
Jul 13 2021, 2:48 AM · Restricted Project

Jul 9 2021

NickGuy accepted D105541: [AArch64] Set the latency of A55 stores to 1.

No issues that I can see, LGTM

Jul 9 2021, 7:34 AM · Restricted Project

Jun 4 2021

NickGuy committed rG3043cbc4363a: [AArch64] Further enable UnrollAndJam (authored by NickGuy).
[AArch64] Further enable UnrollAndJam
Jun 4 2021, 6:19 AM
NickGuy closed D103604: [AArch64] Further enable UnrollAndJam.
Jun 4 2021, 6:19 AM · Restricted Project

Jun 3 2021

NickGuy updated the diff for D103604: [AArch64] Further enable UnrollAndJam.
Jun 3 2021, 9:11 AM · Restricted Project
NickGuy requested review of D103604: [AArch64] Further enable UnrollAndJam.
Jun 3 2021, 3:09 AM · Restricted Project

Apr 27 2021

NickGuy committed rG2b6e0c90f981: [AArch64] Enable runtime unrolling for in-order sched models (authored by NickGuy).
[AArch64] Enable runtime unrolling for in-order sched models
Apr 27 2021, 5:22 AM
NickGuy closed D97947: [AArch64] Enable runtime unrolling for in-order scheduling models.
Apr 27 2021, 5:22 AM · Restricted Project

Apr 23 2021

NickGuy updated the diff for D97947: [AArch64] Enable runtime unrolling for in-order scheduling models.
Apr 23 2021, 3:59 AM · Restricted Project

Apr 21 2021

NickGuy added inline comments to D97947: [AArch64] Enable runtime unrolling for in-order scheduling models.
Apr 21 2021, 7:26 AM · Restricted Project
NickGuy updated the diff for D97947: [AArch64] Enable runtime unrolling for in-order scheduling models.

I was under the impression that without a -mcpu it defaulted to cortex-a53 schedule. It looks like it's no-schedule though, which still counts as an in-order core as it has no MicroOpBufferSize. Can we check if ST->getSchedModel().ProcID != 0, which will be the "NoSchedModel".

Apr 21 2021, 7:26 AM · Restricted Project

Apr 19 2021

NickGuy added a comment to D97947: [AArch64] Enable runtime unrolling for in-order scheduling models.

So does this mean the new behavior will be the default if no CPU is specified? I'm not sure if we are ready for that yet, unless we are confident that the current heuristics work well for out-of-order cores too. (Last time I benchmarked this for out-of-order cores there were a few notable regressions, but it's been a few months since then)

Apr 19 2021, 9:07 AM · Restricted Project
NickGuy updated the diff for D97947: [AArch64] Enable runtime unrolling for in-order scheduling models.

Can you remind me about the impact of this? I.e., if -mcpu is omitted, we default to generic which is classified, or is using, an in-order schedmodel description on Android?

Apr 19 2021, 8:01 AM · Restricted Project
NickGuy updated the diff for D97947: [AArch64] Enable runtime unrolling for in-order scheduling models.

The very quick benchmarks I ran didn't show this to be great because of all that extra unrolling, mostly of remainder loops by the look of it. I'm sure you have been running more benchmarks, and perhaps the ones here are not very representative of general A64 code.

Apr 19 2021, 5:11 AM · Restricted Project

Mar 24 2021

NickGuy updated the diff for D97947: [AArch64] Enable runtime unrolling for in-order scheduling models.

This will still unroll the remainders of vectorized loops, which will be quite common but generally unhelpful to unroll. Can we try to prevent unrolling there too?

It seems to be difficult to identify the remainder loop; Checking the llvm.loop.isvectorized attribute (like in ARMTargetTransformInfo) caused all gains to be negated, and caused a slight regression when compared to without this change. I've instead included a check for the llvm.loop.unroll.disable attribute, which was seen on the remainder loop IR. This check causes no difference to the benchmark numbers. (Though thinking about it now, that might be due to it being handled elsewhere, making this check redundant)

Mar 24 2021, 8:53 AM · Restricted Project

Mar 8 2021

NickGuy updated the diff for D97947: [AArch64] Enable runtime unrolling for in-order scheduling models.

I've done some more benchmarks with the provided suggestions;

Mar 8 2021, 5:11 AM · Restricted Project

Mar 4 2021

NickGuy requested review of D97947: [AArch64] Enable runtime unrolling for in-order scheduling models.
Mar 4 2021, 6:40 AM · Restricted Project

Feb 8 2021

NickGuy committed rGcd880442ae66: [CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize… (authored by NickGuy).
[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize…
Feb 8 2021, 5:28 AM
NickGuy closed D95631: [CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold.
Feb 8 2021, 5:28 AM · Restricted Project

Feb 5 2021

NickGuy updated the diff for D95631: [CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold.

If this version is working that's great, but check it's still OK.

Feb 5 2021, 7:47 AM · Restricted Project

Feb 3 2021

NickGuy updated the diff for D95631: [CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold.

just have a brief look if there are any meaningful codegen difference in e.g. TSVC/Searching-dbl/Searching-dbl.test.

Feb 3 2021, 7:32 AM · Restricted Project

Feb 2 2021

NickGuy updated the diff for D95631: [CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold.

I don't think I would mind merging this with D95632 to keep things in one place; don't think separating brings much value in this case.

Feb 2 2021, 8:01 AM · Restricted Project
NickGuy abandoned D95632: [AArch64] Specify Tail Duplication Size Override.

Abandoning this and combining it with D95631.

Feb 2 2021, 8:01 AM · Restricted Project

Jan 28 2021

NickGuy requested review of D95632: [AArch64] Specify Tail Duplication Size Override.
Jan 28 2021, 9:57 AM · Restricted Project
NickGuy requested review of D95631: [CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold.
Jan 28 2021, 9:57 AM · Restricted Project

Jan 18 2021

NickGuy committed rG16bf02c3a19d: Reland "[AArch64] Attempt to sink mul operands"" (authored by NickGuy).
Reland "[AArch64] Attempt to sink mul operands""
Jan 18 2021, 8:01 AM
NickGuy committed rGf5fcbe4e3c68: [AArch64] Further restricts when a dup(*ext) can be rearranged (authored by NickGuy).
[AArch64] Further restricts when a dup(*ext) can be rearranged
Jan 18 2021, 8:00 AM
NickGuy closed D94778: [AArch64] Further restricts when a dup(*ext) can be rearranged.
Jan 18 2021, 8:00 AM · Restricted Project
NickGuy updated the diff for D94778: [AArch64] Further restricts when a dup(*ext) can be rearranged.

Addressed comments

Jan 18 2021, 2:48 AM · Restricted Project

Jan 15 2021

NickGuy requested review of D94778: [AArch64] Further restricts when a dup(*ext) can be rearranged.
Jan 15 2021, 6:51 AM · Restricted Project

Jan 14 2021

NickGuy added a comment to D91271: [AArch64] Attempt to sink mul operands.

This broke compilation of some sources...

Jan 14 2021, 7:53 AM · Restricted Project

Jan 13 2021

NickGuy committed rGdda60035e9f0: [AArch64] Attempt to sink mul operands (authored by NickGuy).
[AArch64] Attempt to sink mul operands
Jan 13 2021, 7:24 AM
NickGuy closed D91271: [AArch64] Attempt to sink mul operands.
Jan 13 2021, 7:23 AM · Restricted Project
NickGuy updated the diff for D91271: [AArch64] Attempt to sink mul operands.

Addressed comments; modifying checks and updating tests.

Jan 13 2021, 6:00 AM · Restricted Project

Jan 11 2021

NickGuy updated the diff for D91271: [AArch64] Attempt to sink mul operands.
Jan 11 2021, 5:25 AM · Restricted Project

Jan 8 2021

NickGuy added inline comments to D94234: [AArch64] Fix crash caused by invalid vector element type.
Jan 8 2021, 4:08 AM · Restricted Project
NickGuy committed rGed23229a64ae: [AArch64] Fix crash caused by invalid vector element type (authored by NickGuy).
[AArch64] Fix crash caused by invalid vector element type
Jan 8 2021, 4:03 AM
NickGuy closed D94234: [AArch64] Fix crash caused by invalid vector element type.
Jan 8 2021, 4:03 AM · Restricted Project

Jan 7 2021

NickGuy updated the diff for D94234: [AArch64] Fix crash caused by invalid vector element type.

this needs a test?

Jan 7 2021, 8:47 AM · Restricted Project