Page MenuHomePhabricator
Feed Advanced Search

Apr 1 2021

dmgreen accepted D98232: [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration.

Thanks for the test update. Reluctantly, this LGTM.

Apr 1 2021, 6:44 AM · Restricted Project
dmgreen accepted D99710: [AArch64] Use 64-bit movi for zeroing halfs/floats.

Thanks. This LGTM, so long as the Apple folks here are happy with changing the instruction issued.

Apr 1 2021, 6:12 AM · Restricted Project
dmgreen added inline comments to D99699: [AArch64][SVE] Lowering sve.dot to DOT node.
Apr 1 2021, 4:33 AM · Restricted Project
dmgreen added a comment to D99710: [AArch64] Use 64-bit movi for zeroing halfs/floats.

I think the exact suggestion was to use MOVID instead. I'm not sure how much it matters, but it may be a simpler instruction for some cores. This would then match what GCC emits.

Apr 1 2021, 3:28 AM · Restricted Project
dmgreen added inline comments to D99649: [ARM] Updates to arm-block-placement pass.
Apr 1 2021, 3:12 AM · Restricted Project
dmgreen added a comment to D99588: [ARM] Allow v6m runtime loop unrolling.

I'm not sure exactly why T1 unrolling wasn't enabled in the past. I think it was causing more trouble than it was worth, and not being a focus at the time was dropped fairly early. The extra tuning that was done for T2 after that would have helped T1 not regress too.

Apr 1 2021, 2:33 AM · Restricted Project
dmgreen accepted D99700: [LoopFlatten] Do not report CFG analyses as up-to-date.

Certainly sound more correct. LGTM. Thanks for the patch.

Apr 1 2021, 12:19 AM · Restricted Project
dmgreen added inline comments to D99699: [AArch64][SVE] Lowering sve.dot to DOT node.
Apr 1 2021, 12:14 AM · Restricted Project

Mar 31 2021

dmgreen added inline comments to D99649: [ARM] Updates to arm-block-placement pass.
Mar 31 2021, 9:57 AM · Restricted Project
dmgreen added a comment to D98232: [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration.

Quick update on the code quality side, I ran spec2006, the llvm benchmarks, the eigen benchmarks, and a few others, on x86 (FDO, thinlto) with this patch, and with/without enabling consider-local-interval-cost. No significant real effect.

Mar 31 2021, 7:34 AM · Restricted Project
dmgreen added a reviewer for D99649: [ARM] Updates to arm-block-placement pass: samparker.
Mar 31 2021, 5:03 AM · Restricted Project
dmgreen removed a reviewer for D99649: [ARM] Updates to arm-block-placement pass: samparker.

Thanks for the patch. Looks like a nice improvement.

Mar 31 2021, 5:03 AM · Restricted Project

Mar 30 2021

dmgreen committed rG3a6365a439ed: [ARM] Add FeatureHasNoBranchPredictor for Thumb1 cores (authored by dmgreen).
[ARM] Add FeatureHasNoBranchPredictor for Thumb1 cores
Mar 30 2021, 1:46 PM
dmgreen added a comment to D99586: [AArch64] Default to zero-cycle-zeroing FP registers..

OK. I think I see what's wrong. According to the A55 software optimization guide, the dual issue for a movi is a little more restrictive than fmov, which can lead to slower code. We would probably want to prefer the fmov there. Which probably applies to other inorder cpus.

Mar 30 2021, 10:15 AM · Restricted Project
dmgreen requested changes to D99586: [AArch64] Default to zero-cycle-zeroing FP registers..
Mar 30 2021, 9:45 AM · Restricted Project
dmgreen added a comment to D99586: [AArch64] Default to zero-cycle-zeroing FP registers..

What CPU is this expected to be better for? I don't buy the "int -> fp register transfer". I'm not going to pretend to know how cpus work internally, but there is no real register value it is transferring.

Mar 30 2021, 9:45 AM · Restricted Project
dmgreen accepted D99597: [test, ARM] Fix use of var defined in CHECK-NOT.

Sounds good.

Mar 30 2021, 8:10 AM · Restricted Project
dmgreen accepted D99591: [test, HardwareLoops] Fix use of var defined in CHECK-NOT.

SGTM. Thanks.

Mar 30 2021, 7:02 AM · Restricted Project
dmgreen requested review of D99588: [ARM] Allow v6m runtime loop unrolling.
Mar 30 2021, 6:30 AM · Restricted Project
dmgreen committed rGd4b3380dfe62: [ARM] Handle Splats in MVE lane interleaving (authored by dmgreen).
[ARM] Handle Splats in MVE lane interleaving
Mar 30 2021, 3:19 AM
dmgreen closed D97291: [ARM] Handle Splats in MVE lane interleaving.
Mar 30 2021, 3:19 AM · Restricted Project
dmgreen added a comment to D98232: [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration.

Yeah, I agree. 5% is too much to pay. https://reviews.llvm.org/D69437 measured this option as a 25% speed increase in a something that was important enough to fix, with a 0.1-0.2% compile time effect. That's a very different question.

Mar 30 2021, 2:45 AM · Restricted Project

Mar 29 2021

dmgreen accepted D99437: [AArch64] Remove custom zext/sext legalization code..

Yeah. None of the tests I ran produced different code either. They appear to lowered differently, but end up with identical codegen.

Mar 29 2021, 11:12 AM · Restricted Project
dmgreen accepted D99502: [InstructionCost] Don't conflate Invalid costs with Unknown costs..

I agree. "Invalid" is better than "Unknown".

Mar 29 2021, 10:50 AM · Restricted Project
dmgreen added a comment to D99437: [AArch64] Remove custom zext/sext legalization code..

I am not sure if you should directly check for i1 src types, because that would mean we miss other combinations that cause crashes in this function (e.g. sext <1 x i64> %x to <1 x i128>) which is caught be the legal type check. Alternatively we could explicitly check for element types that are valid for vectors?

Mar 29 2021, 1:27 AM · Restricted Project
dmgreen added a comment to D97291: [ARM] Handle Splats in MVE lane interleaving.

ping

Mar 29 2021, 1:07 AM · Restricted Project
dmgreen committed rG3a68c6d26c94: [ARM] Extend MVE lane interleaving to handle other non-instruction leaves (authored by dmgreen).
[ARM] Extend MVE lane interleaving to handle other non-instruction leaves
Mar 29 2021, 1:06 AM
dmgreen closed D97289: [ARM] Extend MVE lane interleaving to handle other non-instruction leaves.
Mar 29 2021, 1:06 AM · Restricted Project

Mar 28 2021

dmgreen committed rG6c88ffeda31a: [ARM] Fix the Changed value in the MVE lane interleaving pass. (authored by dmgreen).
[ARM] Fix the Changed value in the MVE lane interleaving pass.
Mar 28 2021, 3:48 PM
dmgreen committed rG7b6f760fcd19: [ARM] MVE vector lane interleaving (authored by dmgreen).
[ARM] MVE vector lane interleaving
Mar 28 2021, 11:35 AM
dmgreen closed D95804: [ARM] MVE vector lane interleaving.
Mar 28 2021, 11:35 AM · Restricted Project
dmgreen added a comment to D98232: [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration.

This would be really bad for us, because rust effectively always uses O3, and we expect reasonable compile-time tradeoffs to be made for it as well.

Worth noting that D35816 discussed a number of alternatives to this, one being to handle this in machine copy propagation instead, which should be both much less complex and not have compile-time concerns. The cited disadvantage is that it would only work inside a block. From the examples I've seen long eviction chains inside a single BB seem to be the main problem, so maybe it would be worthwhile to go back to that option. I don't really have a good view on this topic though.

Mar 28 2021, 11:23 AM · Restricted Project
dmgreen added a comment to D99437: [AArch64] Remove custom zext/sext legalization code..

Should we just be bailing on i1 src types? Otherwise if someone adds an i2 type it would just start to fail again.

Mar 28 2021, 9:30 AM · Restricted Project

Mar 26 2021

dmgreen accepted D98963: [LoopVectorize] Change the identity element for FAdd.

Thanks, I'm happy with this. LGTM

Mar 26 2021, 7:28 AM · Restricted Project

Mar 25 2021

dmgreen added a comment to D98963: [LoopVectorize] Change the identity element for FAdd.

It would presumably need to be SROA that added flags to phi's it created? I'm not sure where it would get that info from though.

I'm not sure either. We might need to apply FMF to load, stores, and function args to fill the gap. Another option might be to back-propagate the FMF from the fadd to its phi operand:

%s.0 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
%add = fadd fast float %s.0, %0

Not sure if there's some corner-case I'm overlooking, but I'm imagining something like what instcombine does to fill-in / restore no-wrap flags (and so the FMF setting could also happen in instcombine).

Mar 25 2021, 2:02 PM · Restricted Project
dmgreen added a comment to D98435: [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization.

Thanks. This LGTM, if there are no other comments.

Mar 25 2021, 1:56 PM · Restricted Project
dmgreen committed rGd97189600e26: [ARM] Revert WhileLoopStartLR to DoLoopStart (authored by dmgreen).
[ARM] Revert WhileLoopStartLR to DoLoopStart
Mar 25 2021, 9:44 AM
dmgreen closed D98413: [ARM] Revert WhileLoopStartLR to DoLoopStart.
Mar 25 2021, 9:44 AM · Restricted Project
dmgreen added a comment to D98413: [ARM] Revert WhileLoopStartLR to DoLoopStart.

Thanks. I had forgotten about this one, which is a bit of a bad sign.

Mar 25 2021, 9:40 AM · Restricted Project
dmgreen added a comment to D98963: [LoopVectorize] Change the identity element for FAdd.

It would presumably need to be SROA that added flags to phi's it created? I'm not sure where it would get that info from though.

Mar 25 2021, 6:32 AM · Restricted Project
dmgreen added inline comments to D99272: [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR..
Mar 25 2021, 5:23 AM · Restricted Project
dmgreen added a comment to D98232: [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration.

I'm worried that this comes up in a lot of places. Perhaps rare still, but important cases. The aarch64 example we have is just a matrix multiply, and is 25% slower with all the cascading spills, https://bugs.llvm.org/show_bug.cgi?id=26810 quotes the same. Like I said before though, the option didn't fix some examples of the same thing that we were seeing in ARM, so I'm not sure how reliably better it is.

Mar 25 2021, 5:18 AM · Restricted Project
dmgreen added a comment to D97947: [AArch64] Enable runtime unrolling for in-order scheduling models.

It seems to be difficult to identify the remainder loop; Checking the llvm.loop.isvectorized attribute (like in ARMTargetTransformInfo) caused all gains to be negated, and caused a slight regression when compared to without this change. I've instead included a check for the llvm.loop.unroll.disable attribute, which was seen on the remainder loop IR. This check causes no difference to the benchmark numbers. (Though thinking about it now, that might be due to it being handled elsewhere, making this check redundant)

Mar 25 2021, 2:31 AM · Restricted Project
dmgreen added a comment to D98963: [LoopVectorize] Change the identity element for FAdd.

I think the original version was probably better, only doing this without nsz. A vector like <0.0, -0.0, -0.0 -0.0> is going to be more difficult to materialize than one that is all zeros, without an obvious way of converting it to all zeros. And we should try to not pessimize the existing -Ofast cases.

Mar 25 2021, 1:02 AM · Restricted Project

Mar 24 2021

dmgreen committed rG14b2ec934ed8: [ARM] Enable UpperBound unrolling for all loops (authored by dmgreen).
[ARM] Enable UpperBound unrolling for all loops
Mar 24 2021, 9:40 AM
dmgreen closed D99174: [ARM] Enable UpperBound unrolling for all loops.
Mar 24 2021, 9:40 AM · Restricted Project
dmgreen committed rGdc206be77b32: [ARM] Regenerate some test checks. NFC (authored by dmgreen).
[ARM] Regenerate some test checks. NFC
Mar 24 2021, 8:35 AM
dmgreen added a comment to D99149: [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass.

This sounds good to me, so long as all the existing tests are still doing OK.

Mar 24 2021, 2:54 AM · Restricted Project
dmgreen added inline comments to D98435: [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization.
Mar 24 2021, 1:15 AM · Restricted Project
dmgreen added inline comments to D99174: [ARM] Enable UpperBound unrolling for all loops.
Mar 24 2021, 12:35 AM · Restricted Project

Mar 23 2021

dmgreen added inline comments to D99174: [ARM] Enable UpperBound unrolling for all loops.
Mar 23 2021, 6:29 AM · Restricted Project
dmgreen committed rG003fab9e8d9b: [ARM] Additional Upper bound unrolling test. NFC (authored by dmgreen).
[ARM] Additional Upper bound unrolling test. NFC
Mar 23 2021, 5:01 AM
dmgreen requested review of D99174: [ARM] Enable UpperBound unrolling for all loops.
Mar 23 2021, 4:51 AM · Restricted Project
dmgreen accepted D99075: [ARM] Handle debug instrs in ARM Low Overhead Loop pass.

Thanks for the fix. LGTM.

Mar 23 2021, 4:19 AM · Restricted Project

Mar 22 2021

dmgreen added inline comments to D99075: [ARM] Handle debug instrs in ARM Low Overhead Loop pass.
Mar 22 2021, 1:32 PM · Restricted Project
dmgreen added inline comments to D99075: [ARM] Handle debug instrs in ARM Low Overhead Loop pass.
Mar 22 2021, 7:16 AM · Restricted Project
dmgreen accepted D98956: [AArch64] Add some float -> int -> float conversion patterns.

Thanks. LGTM

Mar 22 2021, 3:34 AM · Restricted Project
dmgreen added a comment to D98963: [LoopVectorize] Change the identity element for FAdd.

Hmm. Yeah I was thinking it would be OK to always generate -0.0. Does that start creating mixed vectors though?

Mar 22 2021, 1:41 AM · Restricted Project

Mar 21 2021

dmgreen committed rG6d9d2049c853: [ARM] VINS f16 pattern (authored by dmgreen).
[ARM] VINS f16 pattern
Mar 21 2021, 5:00 AM
dmgreen closed D95471: [ARM] VINS f16 pattern.
Mar 21 2021, 5:00 AM · Restricted Project

Mar 19 2021

dmgreen committed rGa2e0312cda40: [ARM] Tone down the MVE scalarization overhead (authored by dmgreen).
[ARM] Tone down the MVE scalarization overhead
Mar 19 2021, 11:30 AM
dmgreen closed D98245: [ARM] Tone down the MVE scalarization overhead.
Mar 19 2021, 11:30 AM · Restricted Project
dmgreen added a comment to D98956: [AArch64] Add some float -> int -> float conversion patterns.

Sounds good to me. Can we add unsigned variants too, to keep them symmetric?

Mar 19 2021, 9:44 AM · Restricted Project

Mar 17 2021

dmgreen committed rG35e0567d58c2: [ARM] Add VREV MVE shuffle costs (authored by dmgreen).
[ARM] Add VREV MVE shuffle costs
Mar 17 2021, 2:22 PM
dmgreen closed D98210: [ARM] Add VREV MVE shuffle costs.
Mar 17 2021, 2:22 PM · Restricted Project
dmgreen added inline comments to D98210: [ARM] Add VREV MVE shuffle costs.
Mar 17 2021, 12:19 PM · Restricted Project
dmgreen accepted D98704: [AArch64] Rewrite (add, csel) to cinc.

The lowering/optimization of CSEL/CSINC/etc feels a bit weak to me, so any improvements are good to see. Tablegen improvements especially as (as far as I understand) they can help in both SelectionDag and GlobalISel.

Mar 17 2021, 10:47 AM · Restricted Project
dmgreen committed rGe2935dcfc4c4: [TTI] Add a Mask to getShuffleCost (authored by dmgreen).
[TTI] Add a Mask to getShuffleCost
Mar 17 2021, 10:47 AM
dmgreen closed D98206: [TTI] Add a Mask to getShuffleCost.
Mar 17 2021, 10:46 AM · Restricted Project
dmgreen added a comment to D98206: [TTI] Add a Mask to getShuffleCost.

Thanks.

Mar 17 2021, 10:14 AM · Restricted Project
dmgreen committed rG402f2cae7dca: [ARM] Use lrdsb for more thumb1 loads. (authored by dmgreen).
[ARM] Use lrdsb for more thumb1 loads.
Mar 17 2021, 8:29 AM
dmgreen closed D98693: [ARM] Use lrdsb for more thumb1 loads..
Mar 17 2021, 8:29 AM · Restricted Project
dmgreen added a comment to D98206: [TTI] Add a Mask to getShuffleCost.

Ping. Any other comments?

Mar 17 2021, 8:04 AM · Restricted Project
dmgreen added reviewers for D94964: [LangRef] Describe memory layout for vectors types: nlopes, aqjune.
Mar 17 2021, 7:52 AM · Restricted Project
dmgreen added a comment to D91937: [ISel] Port AArch64 SABD and UABD to DAGCombine.

I still need to do something with D91921 before it can be used with MVE. I was working on that, but it was a bit slow going and other things had come up in the meantime. I was looking at it recently to try and improve trunc/extend lowering, but need to get some time to sort through that properly. There is also some stuff to do with the way that MVE wants to do lane interleaving that I thought might be an issue, but I'm pretty sure that will be fine.

Mar 17 2021, 7:51 AM · Restricted Project
dmgreen requested review of D98781: [AArch64] Enable UseAA globally in the AArch64 backend.
Mar 17 2021, 7:21 AM · Restricted Project
dmgreen accepted D98708: [LoopVectorize] relax FMF constraint for FP induction.

Hi @dmgreen, yes of course you're right. I'd forgotten about the nsz requirement. It's definitely needed at compile time for vectorising FP reduction loops, i.e. clang -freassociative-math -fno-trapping-math -fno-signed-zeroes. I guess adding a check for nsz here is consistent with that?

Yes - clang derived its requirements from gcc, so we've passed that into the optimizer in some places (instcombine at least). I don't know of any practical examples where you could have FP reassociation and still guarantee sign-of-zero, but maybe I'm not being imaginative. :)
So currently there's no easy way (starting from C/C++ at least) to have IR that has reassoc without nsz.

Ok if I push this change, so we're consistent within the vectorizer? Then, I'll push a follow-up (we'll need a pile of new regression tests) to add the nsz requirement for both induction and reduction. That way, we'll be conservatively correct in requiring the extra flag, and we'll match the expected IR coming out of clang.

Note that the FMF requirements for fmul/fadd reduction/induction are different than the fmin/fmax patterns that we've also recently updated; fmin/fmax require nnan and nsz to rearrange, but not reassoc (since there's no FP math involved in those ops).

Mar 17 2021, 6:08 AM · Restricted Project
dmgreen committed rG3c25c40d51e8: [LV] Account for the cost of predication of scalarized load/store (authored by dmgreen).
[LV] Account for the cost of predication of scalarized load/store
Mar 17 2021, 3:58 AM
dmgreen closed D98243: [LV] Account for the cost of predication of scalarized load/store.
Mar 17 2021, 3:58 AM · Restricted Project
dmgreen added a comment to D98708: [LoopVectorize] relax FMF constraint for FP induction.

If we want to make FMF constraints consistent across the IR optimizer, we might want to add nsz too, but that's up for debate (users can't expect associative FP math and preservation of sign-of-zero at the same time?).

Mar 17 2021, 2:58 AM · Restricted Project
dmgreen abandoned D68717: [Codegen] More add_sat and sub_sat promotion.

Hmm Yeah. It was a long time ago now, I think this wasn't a universal win and other changes have been made in the same area.

Mar 17 2021, 2:49 AM · Restricted Project
dmgreen added inline comments to D94964: [LangRef] Describe memory layout for vectors types.
Mar 17 2021, 1:16 AM · Restricted Project

Mar 16 2021

dmgreen added a comment to D98704: [AArch64] Rewrite (add, csel) to cinc.

Should we be generating:

cmp
cset
cmp
csinc

That lets the csinc do the add, if it's only going to be adding 0/1. Maybe that's possible with an extra isel pattern from add(cset(..))?

Mar 16 2021, 1:03 PM · Restricted Project
dmgreen added inline comments to D98435: [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization.
Mar 16 2021, 9:26 AM · Restricted Project
dmgreen added a comment to D98232: [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration.

I don't feel like I know this code well enough to be confident LGTM'ing it, but the testing I ran still look good. Almost no changes. Thanks for working on the regression!

Mar 16 2021, 8:57 AM · Restricted Project
dmgreen accepted D98487: [AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE.

Thanks. LGTM, if no one else has comments.

Mar 16 2021, 8:01 AM · Restricted Project, Restricted Project
dmgreen added a comment to D98693: [ARM] Use lrdsb for more thumb1 loads..

Yeah as with any change like this, where register allocation is affected, some things will get better and some worse. It's always going to be a bit chaotic like that just because you might spill in a different place, and we only have 8 regs. In general this is an improvement in the testing I ran.

Mar 16 2021, 4:54 AM · Restricted Project
dmgreen requested review of D98693: [ARM] Use lrdsb for more thumb1 loads..
Mar 16 2021, 3:49 AM · Restricted Project
dmgreen added a comment to D98564: [AArch64] Peephole rule to remove redundant cmp after cset..

I imagine that the cset; cmp from the fp16 tests is just a missing optimization in ISel, and there might be good reason to do so there if it helps simplify the graph.

Mar 16 2021, 2:17 AM · Restricted Project
dmgreen added inline comments to D98487: [AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE.
Mar 16 2021, 2:08 AM · Restricted Project, Restricted Project

Mar 15 2021

dmgreen committed rG0b2aae42e5ea: [AArch64] Zero extended extract_vector_elt pattern (authored by dmgreen).
[AArch64] Zero extended extract_vector_elt pattern
Mar 15 2021, 7:56 AM
dmgreen closed D98599: [AArch64] Zero extended extract_vector_elt pattern.
Mar 15 2021, 7:56 AM · Restricted Project
dmgreen added a comment to D98487: [AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE.

Thanks. This looks sensible, from what I can tell.

Mar 15 2021, 7:09 AM · Restricted Project, Restricted Project
dmgreen added inline comments to D98232: [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration.
Mar 15 2021, 5:54 AM · Restricted Project
dmgreen added inline comments to D98487: [AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE.
Mar 15 2021, 4:49 AM · Restricted Project, Restricted Project

Mar 14 2021

dmgreen added a comment to D98512: [LoopVectorize] Simplify scalar cost calculation in getInstructionCost.

I agree that the IV update in the test should have a cost of 1. (It seems strange it doesn't already, I suspect it isn't matched the same as an increment 1).

Mar 14 2021, 1:30 PM · Restricted Project
dmgreen added a reviewer for D98435: [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization: spatel.

Hello. Fantastic to see this getting used in more cases. In-order reductions sound like a great use of it.

Mar 14 2021, 12:34 PM · Restricted Project
dmgreen added a comment to D98564: [AArch64] Peephole rule to remove redundant cmp after cset..

I see that this is an extension to the existing peephole optimizations, but is there any reason not to do this during ISel?

Mar 14 2021, 8:30 AM · Restricted Project
dmgreen committed rGb0b9126897ed: [AArch64] Expand build-vector-extract.ll tests to i8's. NFC (authored by dmgreen).
[AArch64] Expand build-vector-extract.ll tests to i8's. NFC
Mar 14 2021, 8:29 AM
dmgreen added reviewers for D98599: [AArch64] Zero extended extract_vector_elt pattern: sdesmalen, david-arm.
Mar 14 2021, 8:21 AM · Restricted Project