This is an archive of the discontinued LLVM Phabricator instance.

[BOLT] Align constant islands to 8 bytes
ClosedPublic

Authored by yota9 on Mar 19 2022, 4:44 AM.

Details

Summary

AArch64 requires CI to be aligned to 8 bytes due to access instructions
restrictions. E.g. the ldr with imm, where imm must be aligned to 8 bytes.

Diff Detail

Event Timeline

yota9 created this revision.Mar 19 2022, 4:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2022, 4:44 AM
yota9 requested review of this revision.Mar 19 2022, 4:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2022, 4:44 AM
yota9 updated this revision to Diff 416697.Mar 19 2022, 7:44 AM

Add test

yota9 edited the summary of this revision. (Show Details)Mar 19 2022, 7:44 AM
Amir added a comment.Mar 21 2022, 7:21 AM

LGTM but I'm not an AArch64 expert. Rafael is on a vacation, so please tag someone knowledgeable from LLVM AArch64 community (e.g. git blame llvm/unittests/Support/Host.cpp in TEST(getLinuxHostCPUName, AArch64)) to double-check if it's a correct and the only constant island access restriction that needs to be enforced.

bolt/test/runtime/AArch64/constant-island-alignment.s
1 ↗(On Diff #416698)

nit: bolt tool -> BOLT

yota9 marked an inline comment as done.Mar 22 2022, 8:43 AM
yota9 added a subscriber: rafaelauler.

Hello @t.p.northover ! I would we thankful I you will review the patch while the @rafaelauler is on vacation. The very main part is located in BinaryEmitter.cpp, that is we will always emit constant island on 8 bytes alignment since currently we don't track here in BOLT is the instruction that will load data from constant island may support unaligned access or not - we will just always aligne it on 8 bytes boundaries. Thank you for your time in advance!

Amir added a comment.Mar 23 2022, 9:29 AM

Tests pass, so I'm waiting for clearance from aarch64 reviewer

Amir accepted this revision.Mar 23 2022, 4:13 PM

Accepting to unblock dependent diffs.

We'd still appreciate @t.p.northover or someone from ARM community to have a glance.

This revision is now accepted and ready to land.Mar 23 2022, 4:13 PM
maksfb requested changes to this revision.Mar 23 2022, 5:55 PM

@yota9, similar to my concerns on other diffs, I think it's better for regression tests to check for a specific condition and not to rely on the correct execution of the test binary. This way you will be able to run tests on other architectures.

Regarding the change itself, have you considered aligning constant islands so that they straddle a minimum number of cache lines? The trivial way to achieve this is to align them at cache line boundary, but it's not always the optimal way.

This revision now requires changes to proceed.Mar 23 2022, 5:55 PM

Hello @maksfb !

This way you will be able to run tests on other architectures.

Sure, but the testing is currently does not run aarch64 tests at all.
I don't really mind, but sometimes it is just inconvenient to make such tests. And I still would like some tests to be runnable.. I think we need to discuss the testing policy.

Regarding the change itself, have you considered aligning constant islands so that they straddle a minimum number of cache lines?

This could be discussed, but we need to check if there will be any winning in this, I don't think this is a good way just to align to cache line size, it should be more clever system, but this is beyond the goals of this review. But thank you for you suggestion anyway :)

I don't really mind, but sometimes it is just inconvenient to make such tests. And I still would like some tests to be runnable.. I think we need to discuss the testing policy.

It's fair to say that sometimes creating non-runnable test is way easier. However, in this case, will it be sufficient to locate the island in the output binary (e.g. with llvm-objdump) and make sure the address is aligned ({{0|8}})?

Regarding the change itself, have you considered aligning constant islands so that they straddle a minimum number of cache lines?

This could be discussed, but we need to check if there will be any winning in this, I don't think this is a good way just to align to cache line size, it should be more clever system, but this is beyond the goals of this review. But thank you for you suggestion anyway :)

Totally, I don't mean it for this diff. However, could you create a constant for the alignment (instead of sizeof(uint64_t)) that could be easily changed for running experiments?

yota9 updated this revision to Diff 418225.Mar 25 2022, 8:04 AM

@maksfb Thank you for your comments! Resolved both of them.

maksfb accepted this revision.Mar 26 2022, 5:03 PM

Overall LGTM.

I have a general concern that the estimation of the size for islands and the actual emission are decoupled and could get out of sync. At the same time, I don't have a good suggestion on how to fix it. For functions, to calculate the size we create a temporary emitter and run the same code used for real emission, get the size of the emitted code and then discard the code.

bolt/test/AArch64/constant-island-alignment.s
13

nit: place the CHECK next to the asm directive that corresponds to the island being checked. I don't know how the names are assigned for islands and if it's possible to give it a distinctive name.

This revision is now accepted and ready to land.Mar 26 2022, 5:03 PM
yota9 marked an inline comment as done.Mar 27 2022, 12:29 PM

Thank you for your review @maksfb ! I agree with your concerns but currently I don't have a good idea too. Within the time I hope we will find a solution :)

bolt/test/AArch64/constant-island-alignment.s
13

Done. The $d is the arm ABI name, there is no way to manipulate this AFAIK

This revision was automatically updated to reflect the committed changes.
yota9 marked an inline comment as done.