Page MenuHomePhabricator

dmgreen (Dave Green)
User

Projects

User does not belong to any projects.

User Details

User Since
May 24 2016, 8:35 AM (138 w, 18 h)

Recent Activity

Mon, Jan 14

dmgreen planned changes to D52508: [InstCombine] Clean up after IndVarSimplify.

Are you still working on this?

Mon, Jan 14, 8:15 AM
dmgreen added reviewers for D56032: [ARM] Combine ands+lsls to lsls+lsrs for Thumb1.: SjoerdMeijer, samparker.

Sounds good to me. Just adding Sjoerd/Sam who might know more about this code.

Mon, Jan 14, 8:07 AM

Wed, Jan 9

dmgreen accepted D56472: Change test/tools/lto/no-bitcode.s requirement from arm to aarch64.

Looks sensible to me.

Wed, Jan 9, 2:55 PM

Tue, Jan 8

dmgreen accepted D56381: [DA][NewPM] Handle transitive dependencies in the new-pm version of DA.

Some of D56386 seems to be mixed up in here I think?

Tue, Jan 8, 5:43 AM
dmgreen accepted D56386: [DA][NewPM] Add a printerpass and port the testsuite.

LGTM, with a couple of minor points.

Tue, Jan 8, 5:38 AM

Mon, Jan 7

dmgreen added a comment to D56255: [ARM] Size reduce teq to eors.

Perhaps we add a MIR test? To test this more directly.

Mon, Jan 7, 2:13 AM

Mon, Dec 31

dmgreen added a comment to D55851: Implement basic loop fusion pass.

Great stuff. This looks like it will be very useful. I have a few thoughts/questions/nits.

Mon, Dec 31, 12:24 PM

Fri, Dec 21

dmgreen created D56008: [ARM] Alter the register allocation order for optsize on Thumb2.
Fri, Dec 21, 9:00 AM

Tue, Dec 18

dmgreen accepted D55729: [CodeGenPrepare] Fix bad IR created by large offset GEP splitting..

Looks good to me. Apologies for the delay.

Tue, Dec 18, 1:11 PM

Dec 13 2018

dmgreen accepted D55108: [AArch64] Re-run load/store optimizer after aggressive tail duplication.

This looks find to me then, especially at aggressive opt level. IIRC, there is a test for the pass pipeline I would expect needs updating.

Dec 13 2018, 10:09 AM
dmgreen committed rC349059: Fix CodeCompleteTest.cpp for older gcc plus ccache builds.
Fix CodeCompleteTest.cpp for older gcc plus ccache builds
Dec 13 2018, 9:23 AM
dmgreen committed rL349059: Fix CodeCompleteTest.cpp for older gcc plus ccache builds.
Fix CodeCompleteTest.cpp for older gcc plus ccache builds
Dec 13 2018, 9:23 AM

Dec 11 2018

dmgreen added a comment to D55108: [AArch64] Re-run load/store optimizer after aggressive tail duplication.

Do you have any compile time numbers for this?

Dec 11 2018, 9:58 AM

Dec 10 2018

dmgreen added a comment to rL348585: [Targets] Add errors for tiny and kernel codemodel on targets that don't….

Oh, yeah, that looks wrong! It should be testing a powerpc target.

Dec 10 2018, 1:02 PM
dmgreen committed rL348796: [Targets] Fixup incorrect targets in codemodel tests.
[Targets] Fixup incorrect targets in codemodel tests
Dec 10 2018, 12:58 PM

Dec 7 2018

dmgreen committed rL348585: [Targets] Add errors for tiny and kernel codemodel on targets that don't….
[Targets] Add errors for tiny and kernel codemodel on targets that don't…
Dec 7 2018, 4:13 AM
dmgreen closed D50141: Add errors for tiny codemodel on targets other than AArch64.
Dec 7 2018, 4:13 AM
dmgreen committed rC348582: Add a AArch64 triple to tiny codemodel test..
Add a AArch64 triple to tiny codemodel test.
Dec 7 2018, 3:19 AM
dmgreen committed rL348582: Add a AArch64 triple to tiny codemodel test..
Add a AArch64 triple to tiny codemodel test.
Dec 7 2018, 3:19 AM

Dec 6 2018

dmgreen accepted D55288: [test] Fix tests for changed optimizer warning message.

This looks fine, of course, to go alongside D49281.

Dec 6 2018, 12:49 PM

Dec 4 2018

dmgreen accepted D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..

When @dmgreen is happy with the unrolling changes, I think you're good to go.

Dec 4 2018, 2:15 PM

Dec 2 2018

dmgreen added inline comments to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
Dec 2 2018, 12:11 PM
dmgreen added inline comments to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
Dec 2 2018, 12:09 PM
dmgreen added reviewers for D54411: [Codegen] Merge tail blocks with no successors after block placement: craig.topper, rnk, RKSimon.

I ran a few bootstraps to check compile time (on AArch64). It didn't seem to make any noticable difference there, and this certainly looks like its good for codesize.

Dec 2 2018, 12:06 PM

Nov 25 2018

dmgreen added inline comments to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
Nov 25 2018, 12:09 PM

Nov 23 2018

dmgreen added a comment to D54841: [LoopSimplifyCFG] Don't delete LCSSA Phis.

Thanks for the fix!

Nov 23 2018, 2:59 AM

Nov 22 2018

dmgreen abandoned D54706: [LoopSimplifyCFG] Remove phis whilst removing CFG edges.

Thanks for the fix!

Nov 22 2018, 4:30 AM
dmgreen added a comment to D54021: [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches.

Hello, Another one for you, this time to do with LCSSA not being preserved, run with "opt -loop-simplifycfg -indvars simplified.ll -S":

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-arm-none-eabi"
Nov 22 2018, 4:28 AM

Nov 19 2018

dmgreen added inline comments to D54706: [LoopSimplifyCFG] Remove phis whilst removing CFG edges.
Nov 19 2018, 9:20 AM
dmgreen created D54706: [LoopSimplifyCFG] Remove phis whilst removing CFG edges.
Nov 19 2018, 9:09 AM
dmgreen added a comment to D54021: [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches.

Hello. I have an error from this, which I think may well be a knock-on affect for the code later in the pass. Something like this:

Nov 19 2018, 8:07 AM

Nov 13 2018

dmgreen updated the diff for D50141: Add errors for tiny codemodel on targets other than AArch64.

I'm happy to change stuff around. I believe it's the way it is at the moment because of the place this is called, half way into a constructor (making virtuals difficult) and some archs requiring different pieces of info (like aarch64 being different for JIT's). I figured if in doubt, don't change too much at once.

Nov 13 2018, 10:04 AM
dmgreen added a comment to D54411: [Codegen] Merge tail blocks with no successors after block placement.

Hello. I agree that this looks like an improvement. Can you add a testcase?

Nov 13 2018, 8:25 AM

Nov 7 2018

dmgreen updated the diff for D54142: [ARM] Cortex-M4 schedule.

Cleanup using tablegen classes.

Nov 7 2018, 3:09 AM
dmgreen added inline comments to D54142: [ARM] Cortex-M4 schedule.
Nov 7 2018, 3:09 AM

Nov 6 2018

dmgreen added a comment to D54142: [ARM] Cortex-M4 schedule.

Unfortunately, this also increased codesize a little at -Oz, which I will have to look into.

Nov 6 2018, 2:49 AM
dmgreen created D54142: [ARM] Cortex-M4 schedule.
Nov 6 2018, 2:46 AM

Nov 5 2018

dmgreen planned changes to D53405: [Inliner] Attempt to more accurately model the cost of loops at minsize.

(but can cause problems for cases where the blocks are not in the form they will appear in assembly).

I'm not sure what sort of issue you're running into here?

Nov 5 2018, 8:17 AM
dmgreen committed rL346134: [Inliner] Penalise inlining of calls with loops at Oz.
[Inliner] Penalise inlining of calls with loops at Oz
Nov 5 2018, 6:56 AM
dmgreen closed D52716: [Inliner] Penalise inlining of calls with loops at Oz.
Nov 5 2018, 6:56 AM

Nov 4 2018

dmgreen added inline comments to D54021: [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches.
Nov 4 2018, 1:02 PM

Nov 3 2018

dmgreen added reviewers for D53980: [ARM, AArch64] Move ARM/AArch64 target parsers into separate files to enable future changes.: SjoerdMeijer, olista01, efriedma, peter.smith, t.p.northover.

Hello! Drive by comments. I've not looked in any depth.

Nov 3 2018, 1:31 PM

Nov 1 2018

dmgreen updated subscribers of D53876: Preserve loop metadata when splitting exit blocks.
Nov 1 2018, 2:31 AM

Oct 31 2018

dmgreen added a comment to D52716: [Inliner] Penalise inlining of calls with loops at Oz.

Friendly Ping

Oct 31 2018, 3:21 PM

Oct 25 2018

dmgreen accepted D53582: [AArch64] Add EXT patterns for 64-bit EXT of a subvector of a 128-bit vector.

I've managed to convince myself that this looks OK. A couple of nits depending on what you think of them.

Oct 25 2018, 4:22 AM
dmgreen accepted D53580: [AArch64] Refactor definition of EXT patterns to use a multiclass.

Nice cleanup. LGTM

Oct 25 2018, 4:20 AM
dmgreen accepted D53579: [AArch64] Do 64-bit vector move of 0 and -1 by extracting from the 128-bit move.

I was thinking about how this might affect other little cores like the A53/A55, especially around the dual issue on q registers. I don't think it will make much difference though, and the CSE benefits look like a bigger win.

Oct 25 2018, 4:20 AM

Oct 24 2018

dmgreen updated the diff for D52716: [Inliner] Penalise inlining of calls with loops at Oz.

Now ignores loops that will never be executed. I also have some code that uses SCEV to calculate if the backedge count is <= 1 and allow inlining there. It doesn't seem to come up very often though and needed some plumbing to get SE's/TLI's in the right places.

Oct 24 2018, 9:12 AM

Oct 22 2018

dmgreen accepted D53453: [ARM] Make InstrEmitter mark CPSR defs dead for Thumb1..

Anyway, this looks good to me.

Oct 22 2018, 3:18 AM
dmgreen updated subscribers of D53453: [ARM] Make InstrEmitter mark CPSR defs dead for Thumb1..

Interesting. I didn't realise it worked like that. I had presumed that lot of passes would have to be taught about the optional defs, as opposed to them not being marked as dead correctly.

Oct 22 2018, 3:10 AM
dmgreen accepted D53452: [ARM] Allow TBB formation for Thumb1 in more cases..

Yep, this came up a few times in the tests I ran. Looks like a nice improvement, and I don't think it complicate things too much.

Oct 22 2018, 2:49 AM

Oct 18 2018

dmgreen created D53405: [Inliner] Attempt to more accurately model the cost of loops at minsize.
Oct 18 2018, 10:35 AM
dmgreen closed D51780: ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4..
Oct 18 2018, 1:41 AM

Oct 16 2018

dmgreen added a comment to D53136: [LNT] Come up with MIN_PERCENTAGE_CHANGE value.

This looks like something that would be useful for us, where some of our benchmarks are very low noise. I currently have a couple of test-ish LNT instances with changes similar to this, either changing this value or setting ignore_small=False from the daily report page.

Oct 16 2018, 2:11 PM
Herald updated subscribers of D53190: ARM: avoid infinite combining loop.
Oct 16 2018, 9:13 AM
Herald updated subscribers of D32564: AArch64: compress jump tables to minimum size needed to reach destinations.
Oct 16 2018, 9:12 AM

Oct 15 2018

dmgreen added inline comments to D52508: [InstCombine] Clean up after IndVarSimplify.
Oct 15 2018, 12:41 PM
dmgreen added a comment to D52508: [InstCombine] Clean up after IndVarSimplify.

I think we should deal with do while in another patch.

Yeah, defo. I just need to come up with a sensible way to fix it. I feel some of this is pushing against the edges of what instcombine should be doing, but I'll keep pushing until someone tells me to stop.

Oct 15 2018, 2:52 AM
dmgreen updated the diff for D52508: [InstCombine] Clean up after IndVarSimplify.
Oct 15 2018, 2:51 AM

Oct 14 2018

dmgreen added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

Interesting. Nice improvements. What about small trees? It would seem that any tree less that 75 nodes would always be recalculated. Do the timings you ran include things to show that this is better? Or was that just looking at larger trees at the time?

Oct 14 2018, 1:22 PM

Oct 12 2018

dmgreen accepted D53177: [builtins] Implement __aeabi_uread4/8 and __aeabi_uwrite4/8..

Yeah, I agree. LGTM.

Oct 12 2018, 1:29 PM
dmgreen added a comment to D53177: [builtins] Implement __aeabi_uread4/8 and __aeabi_uwrite4/8..

Nice idea. The rtabi seems to says:

Write functions return the value written, read functions the value read.

Oct 12 2018, 5:49 AM
dmgreen added a comment to D52177: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A.

Yeah, that looks like similar IR to what I was looking at. The vectorised version on Skylake (https://godbolt.org/z/RBS2Os) has a lot of shuffling, perhaps that's deemed unprofitable on Goldmont?

Oct 12 2018, 5:42 AM

Oct 11 2018

dmgreen updated subscribers of D52177: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A.

Oh, no. That's not what I wanted to hear. I presume we are looking at the same bit of code!

Oct 11 2018, 3:21 PM
dmgreen added inline comments to D52508: [InstCombine] Clean up after IndVarSimplify.
Oct 11 2018, 11:32 AM
dmgreen updated the diff for D52508: [InstCombine] Clean up after IndVarSimplify.

Whilst we're here, can anyone think of a good way to simplify:
(S + -32) - (-32 & (S + umax(31 - S, -32)))
That's the "do" case. I think if we distribute the -32& through the add, that with the rest of instcombine + cse + instcombine again does get us down to:
S & 31

Oct 11 2018, 11:32 AM
dmgreen updated the diff for D52508: [InstCombine] Clean up after IndVarSimplify.

OK, now one bit ol' matcher. Thanks for the suggestions.

Oct 11 2018, 8:02 AM
dmgreen committed rL344239: [InstCombine] Demand bits of UMin.
[InstCombine] Demand bits of UMin
Oct 11 2018, 4:31 AM
dmgreen closed D53036: [InstCombine] Demand bits of UMin.
Oct 11 2018, 4:31 AM
dmgreen added a comment to D53036: [InstCombine] Demand bits of UMin.

This is the page I look at as reference for alive coding:
https://github.com/nunoplopes/alive/blob/newsema/rise4fun/language

Oct 11 2018, 4:09 AM
dmgreen committed rL344237: [InstCombine] Demand bits of UMax.
[InstCombine] Demand bits of UMax
Oct 11 2018, 4:06 AM
dmgreen closed D53033: [InstCombine] Demand bits of UMAX.
Oct 11 2018, 4:06 AM
dmgreen committed rL344236: [InstCombine] Add tests for demand bits of min/max. NFC..
[InstCombine] Add tests for demand bits of min/max. NFC.
Oct 11 2018, 3:48 AM

Oct 10 2018

dmgreen updated subscribers of D33935: Allow rematerialization of ARM Thumb MOVi8 instruction in some contexts.

I ran some benchmarks for thumb1, they look great.

Oct 10 2018, 10:31 AM
dmgreen added inline comments to D52508: [InstCombine] Clean up after IndVarSimplify.
Oct 10 2018, 8:54 AM
dmgreen added inline comments to D53036: [InstCombine] Demand bits of UMin.
Oct 10 2018, 7:01 AM
dmgreen updated the diff for D53036: [InstCombine] Demand bits of UMin.

Just added a new test, and changed some of the others around a little.

Oct 10 2018, 7:01 AM
dmgreen updated the diff for D52508: [InstCombine] Clean up after IndVarSimplify.

I've taken the demand parts parts out of this, and added some extra tests for the do / while and signed /unsigned cases.

Oct 10 2018, 4:48 AM
dmgreen updated the diff for D53036: [InstCombine] Demand bits of UMin.

Luckily, it appears that Alive can do countLeadingOnes too (although I don't see it anywhere in the sources I have).
https://rise4fun.com/Alive/O9i

Oct 10 2018, 3:54 AM
dmgreen updated the diff for D53033: [InstCombine] Demand bits of UMAX.

Updated to use activeBits. Thanks for the suggestions.

Oct 10 2018, 3:46 AM

Oct 9 2018

dmgreen added a comment to D53033: [InstCombine] Demand bits of UMAX.

Ah, I must have been looking at the wrong bit of code. Thanks!

Oct 9 2018, 12:25 PM
dmgreen created D53036: [InstCombine] Demand bits of UMin.
Oct 9 2018, 12:18 PM
dmgreen created D53033: [InstCombine] Demand bits of UMAX.
Oct 9 2018, 12:00 PM
dmgreen added a comment to D52508: [InstCombine] Clean up after IndVarSimplify.

It turns out the other case I ran into above ((S + -32) - (32 & (S + umax(31 - S, -32)))) was from do loops, not while loops. Signed will also be different to unsigned, with signed cases not having quite as small simplified forms.
These are the cases:
https://godbolt.org/z/SE-xhD
With some possible simplifications:
https://rise4fun.com/Alive/slxj

Oct 9 2018, 11:57 AM
dmgreen added a comment to D53005: Implement machine unroller utility class.

Hello, see the part about context in https://llvm.org/docs/Phabricator.html#phabricator-request-review-web. It's easier to review things if we can see the code around the patch as well as the code in the patch.

Oct 9 2018, 11:34 AM
dmgreen added a comment to D53005: Implement machine unroller utility class.

Hello. Very nice. I don't think I can speak to much of the detail here, especially the Hexagon parts, but can you:

  • Add full context to the patch (-U99999)
  • Replace the copyright headers to be more "llvmy"
Oct 9 2018, 2:48 AM

Oct 5 2018

dmgreen committed rC343843: [AArch64] Use filecheck captures for metadata node numbers in test. NFC.
[AArch64] Use filecheck captures for metadata node numbers in test. NFC
Oct 5 2018, 3:23 AM
dmgreen committed rL343843: [AArch64] Use filecheck captures for metadata node numbers in test. NFC.
[AArch64] Use filecheck captures for metadata node numbers in test. NFC
Oct 5 2018, 3:23 AM

Oct 2 2018

dmgreen updated the diff for D52716: [Inliner] Penalise inlining of calls with loops at Oz.

I've added a new memcpy test from the original reproducer. It's a byte memcpy (people seem to love writing those), which I think is worth focusing on because its small, but still increases codesize. It expands to:

Oct 2 2018, 11:38 AM
dmgreen planned changes to D52508: [InstCombine] Clean up after IndVarSimplify.

Unfortunately, since I did this everything seems to have changed and we now end up with something like:
(S + -32) - (32 & (S + umax(31 - S, -32)))

Oct 2 2018, 6:04 AM
dmgreen committed rL343569: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A.
[InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A
Oct 2 2018, 2:50 AM
dmgreen closed D52177: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A.
Oct 2 2018, 2:50 AM
dmgreen added a comment to D52177: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A.

Let me know if you see anything funny from this patch.

Oct 2 2018, 2:49 AM
dmgreen committed rL343561: [InstCombine] Tests for ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A. NFC.
[InstCombine] Tests for ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A. NFC
Oct 2 2018, 2:09 AM

Oct 1 2018

dmgreen added a comment to D52716: [Inliner] Penalise inlining of calls with loops at Oz.

For loop-noinline.ll the "Simplified" instructions (the ones that cost nothing) appear to be:
br %body
3 x phis
one of the geps in the loop
the gep outside the loop
the ret

Oct 1 2018, 1:20 PM
dmgreen added inline comments to D35035: [InstCombine] Prevent memcpy generation for small data size.
Oct 1 2018, 6:26 AM
dmgreen created D52716: [Inliner] Penalise inlining of calls with loops at Oz.
Oct 1 2018, 4:01 AM

Sep 30 2018

dmgreen abandoned D44043: [DAGCombine] Remove AND in SETCC if we can prove they are unneeded.

Yep sure, with 52177, I no longer have a motivating case for this.

Sep 30 2018, 2:18 AM

Sep 28 2018

dmgreen accepted D52644: [ARM] Prevent DSP and SIM32 being set for v6m.

LGTM, Thanks

Sep 28 2018, 2:36 AM

Sep 26 2018

dmgreen accepted D52470: [ARM/AArch64][v8.5A] Add Armv8.5-A target.

LGTM

Sep 26 2018, 4:31 AM