dmgreen (Dave Green)
User

Projects

User does not belong to any projects.

User Details

User Since
May 24 2016, 8:35 AM (121 w, 6 d)

Recent Activity

Today

dmgreen committed rL342958: [LoopUnroll] Add check to Latch's terminator in UnrollRuntimeLoopRemainder.
[LoopUnroll] Add check to Latch's terminator in UnrollRuntimeLoopRemainder
Tue, Sep 25, 3:12 AM
dmgreen closed D51486: Add check to Latch's terminator in UnrollRuntimeLoopRemainder.
Tue, Sep 25, 3:12 AM
dmgreen added a comment to D51486: Add check to Latch's terminator in UnrollRuntimeLoopRemainder.

Thanks!

Tue, Sep 25, 3:09 AM

Yesterday

dmgreen updated the diff for D52177: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A.

Turns out there wasn't any conflict, but I've tried to clean this up a little and add a few more tests.

Mon, Sep 24, 12:01 PM
dmgreen added inline comments to D52238: [CodeGen] Enable tail calls for functions with NonNull attributes..
Mon, Sep 24, 7:20 AM
dmgreen accepted D52257: [Thumb1] Any imm of i8 type on Thumb1 should have cost of 1.

Sounds good. I'm happy with this, if no one else has any issues.

Mon, Sep 24, 7:19 AM

Fri, Sep 21

dmgreen added inline comments to D52243: [ConstHoist] Do not rebase single (or few) dependent constant.
Fri, Sep 21, 2:06 AM
dmgreen added a comment to D52243: [ConstHoist] Do not rebase single (or few) dependent constant.

This does seem, from the tests I ran, to reduce codesize on average. Especially on Thumb1.

Fri, Sep 21, 1:56 AM
dmgreen added a comment to D52257: [Thumb1] Any imm of i8 type on Thumb1 should have cost of 1.

Is this because we can just use a MOVS and wont have to fill in any higher bits? And MOVS's aren't trivially rematerialisable? And Thumb2/Arm are handled by getT2SOImmVal?

Fri, Sep 21, 1:20 AM

Thu, Sep 20

dmgreen added inline comments to D52289: [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33.
Thu, Sep 20, 3:41 AM

Wed, Sep 19

dmgreen updated the diff for D52238: [CodeGen] Enable tail calls for functions with NonNull attributes..

Minor edit to test (and renamed it to tailcall-dup.ll).

Wed, Sep 19, 4:43 AM
dmgreen updated the diff for D52238: [CodeGen] Enable tail calls for functions with NonNull attributes..

Added a tailcall duplication opt test to Transforms/CodeGenPrepare.

Wed, Sep 19, 4:42 AM
dmgreen added inline comments to D52238: [CodeGen] Enable tail calls for functions with NonNull attributes..
Wed, Sep 19, 4:39 AM

Tue, Sep 18

dmgreen added inline comments to D52238: [CodeGen] Enable tail calls for functions with NonNull attributes..
Tue, Sep 18, 2:50 PM
dmgreen added a comment to D52177: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A.

Thanks for pointing to D52070. I think I saw that (it gave me the idea for this), but hadn't realised it had come back out.

Tue, Sep 18, 10:34 AM
dmgreen created D52238: [CodeGen] Enable tail calls for functions with NonNull attributes..
Tue, Sep 18, 8:51 AM
dmgreen added a comment to D51486: Add check to Latch's terminator in UnrollRuntimeLoopRemainder.

Sorry for the delay. Sure, I can do that.

Tue, Sep 18, 3:38 AM
dmgreen committed rL342455: [AArch64] Attempt to parse more operands as expressions.
[AArch64] Attempt to parse more operands as expressions
Tue, Sep 18, 2:48 AM
dmgreen closed D51792: [AArch64] Attempt to parse expressions as adr/adrp operands.
Tue, Sep 18, 2:48 AM
dmgreen added a comment to D51792: [AArch64] Attempt to parse expressions as adr/adrp operands.

OK. Thanks guys.

Tue, Sep 18, 2:47 AM

Mon, Sep 17

dmgreen created D52177: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A.
Mon, Sep 17, 10:02 AM

Fri, Sep 14

dmgreen updated the diff for D51792: [AArch64] Attempt to parse expressions as adr/adrp operands.

I seem to have stepped into a bit of a rabbit hole here.. and a lot of things seem to be a bit broken.

Fri, Sep 14, 5:32 AM
dmgreen added a comment to D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.

It seems a bit odd to not inline an 8byte memcpy on a 64bit system, even at minsize. Perhaps this should be based on whether i64 is a legal type?

Fri, Sep 14, 3:29 AM

Wed, Sep 12

dmgreen accepted D51780: ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4..

The benchmarks came back as about a 0.2% difference in cycle count, and (crucially) there's no way when deciding function alignment to check for OptSize so we'd inevitably pessimize some cases.

Wed, Sep 12, 10:41 AM
dmgreen committed rC342053: [CodeGen] Align rtti and vtable data.
[CodeGen] Align rtti and vtable data
Wed, Sep 12, 7:10 AM
dmgreen committed rL342053: [CodeGen] Align rtti and vtable data.
[CodeGen] Align rtti and vtable data
Wed, Sep 12, 7:10 AM
dmgreen closed D51416: [RTTI] Align rtti types to prevent over-alignment.
Wed, Sep 12, 7:10 AM
dmgreen added a comment to D51416: [RTTI] Align rtti types to prevent over-alignment.

That sounds like a good idea. Thanks for the help on this one.

Wed, Sep 12, 7:07 AM
dmgreen committed rL342043: [CGP] Ensure splitgep gives deterministic output.
[CGP] Ensure splitgep gives deterministic output
Wed, Sep 12, 3:21 AM
dmgreen closed D51851: [CGP] Ensure splitgep gives deterministic output.
Wed, Sep 12, 3:21 AM
dmgreen committed rL342039: [SimplifyCFG] Put an alignment on generated switch tables.
[SimplifyCFG] Put an alignment on generated switch tables
Wed, Sep 12, 2:56 AM
dmgreen closed D51800: [SimpliftyCFG] Put an alignment on generated switch tables.
Wed, Sep 12, 2:56 AM

Tue, Sep 11

dmgreen accepted D51153: Break LoopUtils into an Analysis file..

Yep, LGTM!

Tue, Sep 11, 12:27 PM

Mon, Sep 10

dmgreen updated the diff for D51416: [RTTI] Align rtti types to prevent over-alignment.

Now using getABITypeAlignment, plus added a simple test

Mon, Sep 10, 10:14 AM
dmgreen added a comment to D51780: ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4..

Interesting question. There are a couple of reasons to think the tradeoffs are different there. First, I'd probably (based purely on intuition rather than data) expect a loop to be executed more times than most functions. Second, when r7 is the FP, a function is almost guaranteed to start with a 16-bit instruction (push {..., r7, lr}).

Mon, Sep 10, 6:51 AM
dmgreen created D51851: [CGP] Ensure splitgep gives deterministic output.
Mon, Sep 10, 2:16 AM

Sun, Sep 9

dmgreen added a comment to D51153: Break LoopUtils into an Analysis file..

Yeah, IVDescriptors sounds like a good name to me. Thanks for makeing this cleaner!

Sun, Sep 9, 1:23 PM
dmgreen accepted D51837: Move a transformation routine from LoopUtils to LoopVectorize..

Sounds good to me. WIth some optional nits.

Sun, Sep 9, 1:22 PM
dmgreen accepted D51838: Move createMinMaxOp() out of RecurrenceDescriptor..

LGTM!

Sun, Sep 9, 1:19 PM
dmgreen added a comment to D51780: ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4..

Good stuff. The downstream version of this we have uses Subtarget->isMClass() && Subtarget->hasThumb2(). (Dont ask me why it's downstream, I'm not sure. It's from before my time). I think you are right about the M7 though, the results there are just different, not necessarily better.

Sun, Sep 9, 1:10 PM

Fri, Sep 7

dmgreen accepted D51472: [ARM] Fix correctness checks in promoteToConstantPool..

Constant Island pass has a bit of a reputation for being the source of difficult bugs. With enough Mir testing, perhaps that doesn't need to be the case though. The general idea of turning the adr's into ldr's seems like a good one, maybe it just needs some better machinary in the pass to handle the pool entries more easily?

Fri, Sep 7, 11:33 AM
dmgreen accepted D51469: [ARM] Use preferred alignment for constants in promoteToConstantPool..

This looks fine to me, if like you say the over aligning to 16 thing is not something that should come up a lot.

Fri, Sep 7, 11:21 AM
dmgreen created D51800: [SimpliftyCFG] Put an alignment on generated switch tables.
Fri, Sep 7, 10:12 AM
dmgreen created D51792: [AArch64] Attempt to parse expressions as adr/adrp operands.
Fri, Sep 7, 8:27 AM

Thu, Sep 6

dmgreen added a comment to D51472: [ARM] Fix correctness checks in promoteToConstantPool..

Hello. Good to see this getting some love.

Thu, Sep 6, 1:49 PM
dmgreen added a comment to D51683: Fix arm_neon.h and arm_fp16.h generation for compiling with std=c89.

These new changes look good to me.

Thu, Sep 6, 6:29 AM
dmgreen added a comment to D40480: MemorySSA backed Dead Store Elimination..

Thanks for taking a look at this, it would be good to make use of it. You may want to check the llvm-commits mails for more of Daniels thoughts, as they unfortunately didn't make it into phabricator.

Thu, Sep 6, 4:56 AM
dmgreen accepted D51486: Add check to Latch's terminator in UnrollRuntimeLoopRemainder.

LGTM. Thanks for the patch

Thu, Sep 6, 2:47 AM
dmgreen committed rL341527: [SLC] Add an alignment to CreateGlobalString.
[SLC] Add an alignment to CreateGlobalString
Thu, Sep 6, 1:45 AM
dmgreen closed D51410: [SLC] Add an alignment to printf->puts global string.
Thu, Sep 6, 1:45 AM

Wed, Sep 5

dmgreen added a comment to D51153: Break LoopUtils into an Analysis file..

These classes to me look large enough to warrant their own file. Maybe something like RecurrenceDescriptor.h/cpp? Or a name that captures InductionDescriptor too? I'm not sure what that would be though..

Wed, Sep 5, 2:47 AM
dmgreen added a comment to D51486: Add check to Latch's terminator in UnrollRuntimeLoopRemainder.

Because I wanted to make sure it worked, I wrote this. Feel free to clean it up and use it as a unittest if you like:

Wed, Sep 5, 2:47 AM
dmgreen added a comment to D51486: Add check to Latch's terminator in UnrollRuntimeLoopRemainder.

I'm happy with this as is, but if you want the API to be tested and ensure it keeps working, you could add a unittest. Something like unittests/Transforms/Utils/BasicBlockUtilsTest.cpp or the ones is CloningTest.cpp.

Wed, Sep 5, 2:30 AM

Mon, Sep 3

dmgreen updated the diff for D51416: [RTTI] Align rtti types to prevent over-alignment.

I've become less sure about what the various alignments should be here. There seem to be multiple ways to calculate such things, I'm not sure which are best. For example, for want of a better option I've used getPointerAlign in ItaniumRTTIBuilder::BuildTypeInfo.

Mon, Sep 3, 9:08 AM
dmgreen updated the diff for D51410: [SLC] Add an alignment to printf->puts global string.

Moved to CreateGlobalString

Mon, Sep 3, 1:50 AM

Thu, Aug 30

dmgreen added a reviewer for D51486: Add check to Latch's terminator in UnrollRuntimeLoopRemainder: anna.

We do seem to return false in every other case, I'm not very sure why this would be different and considered a precondition we would assert on.

Thu, Aug 30, 9:30 AM
dmgreen accepted D51462: [ARM] Enable GEP offset splitting for 32-bit ARM..

Yeah, I agree. Looks like it really helps out thumb1 code, and the code in CGP looks generic enough, not aarch64 specific.

Thu, Aug 30, 9:30 AM
dmgreen added a comment to D51486: Add check to Latch's terminator in UnrollRuntimeLoopRemainder.

Hello. Please upload with full context to make patched easier to read. See https://llvm.org/docs/Contributing.html

Thu, Aug 30, 7:30 AM
dmgreen committed rL341058: [AArch64] Optimise load(adr address) to ldr address.
[AArch64] Optimise load(adr address) to ldr address
Thu, Aug 30, 4:56 AM
dmgreen closed D51030: [AArch64] Optimise load(adr address) to ldr address.
Thu, Aug 30, 4:56 AM

Wed, Aug 29

dmgreen added a comment to D50141: Add errors for tiny codemodel on targets other than AArch64.

Ping. Everyone happy with this now the main part has gone in?

Wed, Aug 29, 6:37 AM
dmgreen created D51416: [RTTI] Align rtti types to prevent over-alignment.
Wed, Aug 29, 5:33 AM
dmgreen added inline comments to D51377: [NFC] Make getPreferredAlignment honor section markings..
Wed, Aug 29, 3:55 AM
dmgreen created D51410: [SLC] Add an alignment to printf->puts global string.
Wed, Aug 29, 3:52 AM
dmgreen added a comment to D51030: [AArch64] Optimise load(adr address) to ldr address.

Thanks.

Wed, Aug 29, 3:34 AM

Tue, Aug 28

dmgreen updated the diff for D51030: [AArch64] Optimise load(adr address) to ldr address.
Tue, Aug 28, 2:57 PM
dmgreen added inline comments to D51030: [AArch64] Optimise load(adr address) to ldr address.
Tue, Aug 28, 2:56 PM
dmgreen added a comment to D51377: [NFC] Make getPreferredAlignment honor section markings..

Thanks for this.

Tue, Aug 28, 2:54 PM
dmgreen updated the diff for D51030: [AArch64] Optimise load(adr address) to ldr address.

It may be better to make this match the alignment on the global directly, not looking through the ADR. I'm not sure how that's done yet.

Tue, Aug 28, 12:30 PM
dmgreen added inline comments to D51030: [AArch64] Optimise load(adr address) to ldr address.
Tue, Aug 28, 12:12 PM

Mon, Aug 27

dmgreen updated the diff for D51030: [AArch64] Optimise load(adr address) to ldr address.

Many tablegen crashes later.. :)

Mon, Aug 27, 11:59 AM
dmgreen added a comment to D50658: Hot cold splitting pass.

Hello. What version of llvm is this based on? It's still using DEBUG macros. It will need rebasing onto trunk.

Mon, Aug 27, 11:50 AM

Aug 22 2018

dmgreen added a comment to D51030: [AArch64] Optimise load(adr address) to ldr address.

I think that this is likely to be the only time in the test suite that there is a R_AARCH64_LD_PREL_LO19 to a data symbol in a shared library. In theory this should generate a R_AARCH64_COPY dynamic relocation. I think that this was fixed as part of the changes made in https://sourceware.org/bugzilla/show_bug.cgi?id=21532 "[AArch64] Allow COPY relocation elimination" as prior to that R_AARCH64_LD_PREL_LO19 wasn't in the switch statement to generate copy relocations. The change made it into binutils 2.29.

Aug 22 2018, 9:30 AM
dmgreen committed rL340398: [AArch64] Add Tiny Code Model for AArch64.
[AArch64] Add Tiny Code Model for AArch64
Aug 22 2018, 4:35 AM
dmgreen committed rC340398: [AArch64] Add Tiny Code Model for AArch64.
[AArch64] Add Tiny Code Model for AArch64
Aug 22 2018, 4:35 AM
dmgreen closed D49674: [AArch64] Add Tiny Code Model for AArch64.
Aug 22 2018, 4:35 AM
dmgreen added a comment to D49674: [AArch64] Add Tiny Code Model for AArch64.

Thanks

Aug 22 2018, 4:35 AM
dmgreen committed rL340397: [AArch64] Add Tiny Code Model for AArch64.
[AArch64] Add Tiny Code Model for AArch64
Aug 22 2018, 4:32 AM
dmgreen closed D49673: [AArch64] Add Tiny Code Model for AArch64.
Aug 22 2018, 4:32 AM

Aug 21 2018

dmgreen added a comment to D51030: [AArch64] Optimise load(adr address) to ldr address.
  • I can't see anything obviously wrong with the BFD code for handling LO19
Aug 21 2018, 10:41 AM
dmgreen updated the diff for D49673: [AArch64] Add Tiny Code Model for AArch64.

Thanks Peter. This change just adds a FIXME for tiny+tls.

Aug 21 2018, 10:25 AM
dmgreen added a comment to D51030: [AArch64] Optimise load(adr address) to ldr address.

Unfortunately I'm hitting a number of linker problems in testing this.

  • Armlink seems to do fine, but I am testing embedded code there.
  • ld.bfd links the test-suite fine with fpic (except for the expected errors where relocations are out of range). But without pic gives an error about relocating against stderr that I'm not sure of yet, but might point to this alignment check being wrong in some way. Unfortunately the errors are not clear what is out of range, it may be a legitimate error (although ld.gold and lld both are "successful" at linking the same program).
  • ld.gold doesn't like load literal relocations, producing invalid instructions.
  • lld, with some extra patches, does fine for anything that isn't a tls relocation I think.
Aug 21 2018, 2:12 AM
dmgreen created D51030: [AArch64] Optimise load(adr address) to ldr address.
Aug 21 2018, 2:11 AM

Aug 19 2018

dmgreen added inline comments to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
Aug 19 2018, 2:35 AM
dmgreen added inline comments to D50698: [UnJ] Ensure unroll_and_jam metadata is removed once consumed..
Aug 19 2018, 2:31 AM

Aug 15 2018

dmgreen added a comment to D49673: [AArch64] Add Tiny Code Model for AArch64.

Hmm.. So I was expecting this would keep working the same as small model. I believe that would be less efficient but functionally correct (providing you are not un-aligning page boundaries). In adding some tests I can see that there are some code differences though (because it goes through LOADgot). I think the example in arm64-tls-execs is correct, and the other tests here produce the same code (as does your example from above). I've not tested this very well though, and I'm not sure what else would be required. As far as I can tell it should be OK, but producing an error for tiny+tls, same as large+tls, is an option.

Aug 15 2018, 4:21 AM
dmgreen committed rL339762: [UnJ] Rename hasInvariantIterationCount to hasIterationCountInvariantInParent….
[UnJ] Rename hasInvariantIterationCount to hasIterationCountInvariantInParent…
Aug 15 2018, 4:00 AM
dmgreen committed rC339759: Fix ASTMatchersTraversalTest testcase compile on older compilers.
Fix ASTMatchersTraversalTest testcase compile on older compilers
Aug 15 2018, 3:40 AM
dmgreen committed rL339759: Fix ASTMatchersTraversalTest testcase compile on older compilers.
Fix ASTMatchersTraversalTest testcase compile on older compilers
Aug 15 2018, 3:40 AM

Aug 14 2018

dmgreen added a comment to D49673: [AArch64] Add Tiny Code Model for AArch64.

Generally looking good to me. Apologies for the delay.

Thanks for the review

Aug 14 2018, 12:04 PM
dmgreen updated the diff for D49673: [AArch64] Add Tiny Code Model for AArch64.
Aug 14 2018, 12:03 PM
dmgreen added a comment to D50717: [SimplifyCFG] Remove pointer from SmallPtrSet before deletion.

I'm no expert, but I thought that SmallPtrSet->erase(BB) would treat BB as an opaque value that it removes from it's set, not accessing it's content in any way. You may be right about the eraseFromParent invalidating things though.

Aug 14 2018, 10:05 AM
dmgreen accepted D50717: [SimplifyCFG] Remove pointer from SmallPtrSet before deletion.

I guess because it's only erasing it from a SmallPtrSet, it's not actually _using_ it, but this looks like a very sensible change.

Aug 14 2018, 9:35 AM
dmgreen added a comment to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..

I was testing the code and ran into some problems with debug metadata on the loop nodes (actually using -Rpass=unroll in that case). Can you make sure that works as expected?

Aug 14 2018, 4:35 AM
dmgreen created D50698: [UnJ] Ensure unroll_and_jam metadata is removed once consumed..
Aug 14 2018, 4:33 AM

Aug 13 2018

dmgreen added a comment to D49673: [AArch64] Add Tiny Code Model for AArch64.

Ping

Aug 13 2018, 12:20 PM

Aug 12 2018

dmgreen added a comment to D50511: [ARM] ParallelDSP: add option to disable the pass.

The pass will also need "new pass manager" code at some point.

Aug 12 2018, 5:33 AM
dmgreen accepted D50511: [ARM] ParallelDSP: add option to disable the pass.

This looks like a sensible idea.

Aug 12 2018, 5:29 AM

Aug 11 2018

dmgreen committed rL339501: [UnJ] Improve explicit loop count checks.
[UnJ] Improve explicit loop count checks
Aug 11 2018, 12:38 AM
dmgreen closed D50075: [UnJ] Improve explicit loop count checks.
Aug 11 2018, 12:38 AM