This is an archive of the discontinued LLVM Phabricator instance.

AArch64: Allow offsets to be folded into addresses with ELF.
ClosedPublic

Authored by pcc on Apr 2 2018, 10:26 PM.

Details

Summary

This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. It reduces the size of Chromium for
Android's .text section by 46KB, or 56KB with ThinLTO (which exposes
more opportunities to use a direct access rather than a GOT access).

Because the addend range is limited in COFF and Mach-O, this is
enabled for ELF only.

Event Timeline

pcc created this revision.Apr 2 2018, 10:26 PM
t.p.northover added inline comments.Apr 5 2018, 6:51 AM
llvm/test/CodeGen/AArch64/arm64-atomic-128.ll
32–33 ↗(On Diff #140738)

It's a bit of a shame to lose this optimization. Did you look into enhancing AArch64LoadStoreOptimizer.cpp to account for the new instruction shapes?

pcc added inline comments.Apr 8 2018, 9:41 PM
llvm/test/CodeGen/AArch64/arm64-atomic-128.ll
32–33 ↗(On Diff #140738)

I hacked on the load store optimizer to teach it to pair merge and zero store merge instructions with reg+globaladdr operands. It ended up making the pass significantly more complicated (+ ~100 lines), and only ended up reducing the size of .text in Chromium for Android by 1KB (i.e. it fired only about 250 times). So despite the impression that one might get from looking at the test cases, it seems like such an optimization wouldn't be pulling its weight in real world code.

I imagine that in the vast majority of cases, the potentially pair mergeable instructions do not use the address of a global but rather a parameter or a value loaded from memory, which will of course be in a register and therefore still pair mergeable.

I can share the patch if you'd like, but my initial conclusion is that it doesn't seem worth the added complexity to me.

t.p.northover accepted this revision.Apr 9 2018, 12:55 AM

Fair enough, thanks for looking into it. LGTM!

This revision is now accepted and ready to land.Apr 9 2018, 12:55 AM
This revision was automatically updated to reflect the committed changes.
yroux added subscribers: pcc, yroux.Apr 10 2018, 7:08 AM

Hi Peter,

pcc reopened this revision.Apr 10 2018, 10:19 PM
This revision is now accepted and ready to land.Apr 10 2018, 10:19 PM
pcc updated this revision to Diff 141947.Apr 10 2018, 10:20 PM

Proposed reland with a new implementation and an offset bounds check to preserve the code model.

While I was looking at the bot failure from when I originally landed
this patch, I noticed that there is actually a simple way to implement
the offset bounds check while preserving the ldp optimization and
allowing the computed address to be reused, and that is to implement
the offset folding as a DAG combine. I'm still working on code size
numbers but I don't see a reason why they would regress from the
previous patch.

pcc added a comment.Apr 10 2018, 10:55 PM

So with this patch the code size reduction for Chromium without ThinLTO is 108KB -- more than double the 46KB with the previous patch. I guess address reuse was a lot more important than I thought.

pcc updated this revision to Diff 141951.Apr 10 2018, 11:05 PM
  • MinOffset -> Offset
t.p.northover added inline comments.Apr 12 2018, 11:58 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10642

I'm not quite sure about the signs here. I think your code ends up treating negative offsets as huge positive ones, which is equivalent to ignoring them (as it does with ISD::SUB uses).

That's OK since I expect it's a rare case, but I'd be a bit happier if there was a comment about how it works.

10644–10645

Is this some object format constraint? Because in straight LLVM IR I don't think there's any rule against going out of bounds.

10660

Shouldn't this be MinOffset? It looks like this will always be GV+0, when we started with GV+GN->getOffset()

pcc added inline comments.Apr 12 2018, 12:19 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10642

Yes, that is exactly what happens. I will leave a comment here.

10644–10645

It is a code model constraint. Imagine that you have an executable with virtual size of exactly 2GB (which just fits into the small code model), a function at the very start and a variable at the very end. If the function takes the address of the variable and adds a large out-of-bounds offset, folding the offset would push the address outside of the range allowed by the code model, and the linker would be unable to relocate the ADRP instruction. We would still be able to compute the address of the start of the variable and add the offset in code.

10660

Yes it should, good catch.

pcc updated this revision to Diff 142235.Apr 12 2018, 12:36 PM
  • Address review comments
pcc marked 2 inline comments as done.Apr 12 2018, 12:37 PM
t.p.northover accepted this revision.Apr 12 2018, 1:34 PM

Thanks for updating the patch. LGTM!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10644–10645

OK, that makes sense.

This revision was automatically updated to reflect the committed changes.
pcc reopened this revision.Apr 13 2018, 4:30 PM

Re-opening for the second proposed reland of this patch.

This reland includes a check to prevent the DAG combiner from folding an
offset that is smaller than the existing one. This can cause oscillations
between two possible DAGs, which was the cause of the hang and later assertion
failure observed on the lnt-ctmark-aarch64-O3-flto bot.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/

This revision is now accepted and ready to land.Apr 13 2018, 4:30 PM
pcc updated this revision to Diff 142487.Apr 13 2018, 4:30 PM

Proposed reland

t.p.northover accepted this revision.Apr 23 2018, 12:56 AM

Yep, I think that ought to work. Sorry about the delay.

This revision was automatically updated to reflect the committed changes.