This is an archive of the discontinued LLVM Phabricator instance.

[clang] Implement -falign-loops=N (N is a power of 2) for non-LTO
ClosedPublic

Authored by MaskRay on Jul 23 2021, 12:34 PM.

Details

Summary

GCC supports multiple forms of -falign-loops=.
-falign-loops= is currently ignored in Clang.

This patch implements the simplest but the most useful form where N is a
power of 2.

The underlying implementation uses a llvm::TargetOptions option for now.
Bitcode generation ignores this option.
The user can specify a global -Wl,-plugin-opt=-align-loops=128.

Diff Detail

Event Timeline

MaskRay created this revision.Jul 23 2021, 12:34 PM
MaskRay requested review of this revision.Jul 23 2021, 12:34 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 23 2021, 12:34 PM

Can we hook this up to a LLVM IR function attribute, instead of making it a codegen flag?

Can we hook this up to a LLVM IR function attribute, instead of making it a codegen flag?

The current TargetLoweringBase::PrefLoopAlignment is global. I have considered a function attribute, but it seems overkill for now.
(Inlining behavior is a bit unclear.)
The current use cases just need a global value instead of a refined per-function value.

LGTM. I'll let someone familiar with the old option explicitly approve it.

clang/test/Driver/falign-loops.c
7–8

I would generally expect to see the <= x bound tested with x and x+1, not just x+1.

llvm/test/CodeGen/RISCV/loop-alignment.ll
3–4 ↗(On Diff #361303)

Nit: it's a convention of the RISC-V backend codegen tests to wrap the RUN lines.

MaskRay updated this revision to Diff 361467.Jul 24 2021, 10:20 AM
MaskRay marked 2 inline comments as done.

comments. add a test/CodeGen test. add HelpText.

MaskRay edited the summary of this revision. (Show Details)Jul 24 2021, 10:22 AM

Can we hook this up to a LLVM IR function attribute, instead of making it a codegen flag?

The current TargetLoweringBase::PrefLoopAlignment is global. I have considered a function attribute, but it seems overkill for now.
(Inlining behavior is a bit unclear.)
The current use cases just need a global value instead of a refined per-function value.

global module metadata is also an option

(what's the motivation for adding this feature - do you have a use-case in mind?)

(the usual: This should probably be committed as separate patches - at least LLVM, then Clang pieces)

MaskRay added inline comments.Jul 24 2021, 10:58 AM
llvm/test/CodeGen/RISCV/loop-alignment.ll
3–4 ↗(On Diff #361303)

only 86 columns. compiler-rt is even transiting to 100 column.

Can we hook this up to a LLVM IR function attribute, instead of making it a codegen flag?

The current TargetLoweringBase::PrefLoopAlignment is global. I have considered a function attribute, but it seems overkill for now.
(Inlining behavior is a bit unclear.)
The current use cases just need a global value instead of a refined per-function value.

global module metadata is also an option

(what's the motivation for adding this feature - do you have a use-case in mind?)

(the usual: This should probably be committed as separate patches - at least LLVM, then Clang pieces)

I would like to have this for experimenting on RISCV. I was proposing to add similar hidden options like X86 in D106570.

jrtc27 added inline comments.Jul 24 2021, 11:01 AM
llvm/test/CodeGen/RISCV/loop-alignment.ll
3–4 ↗(On Diff #361303)

compiler-rt is not the RISC-V backend :)

10 ↗(On Diff #361303)

This isn't autogenerated? Also NOT on .p2align isn't great in general, .balign and .align don't match that yet could have been emitted.

13 ↗(On Diff #361303)
  • not _ in check prefixes

Can we hook this up to a LLVM IR function attribute, instead of making it a codegen flag?

The current TargetLoweringBase::PrefLoopAlignment is global. I have considered a function attribute, but it seems overkill for now.
(Inlining behavior is a bit unclear.)
The current use cases just need a global value instead of a refined per-function value.

global module metadata is also an option

Using a global module metadata needs to think of the merging behavior.
The behavior isn't clear.

(what's the motivation for adding this feature - do you have a use-case in mind?)

Use case: x86 has a cl::opt. RISC-V is exploring D106570.

(the usual: This should probably be committed as separate patches - at least LLVM, then Clang pieces)

(Can commit the llvm/ part first.)

MaskRay marked 2 inline comments as done.Jul 24 2021, 11:55 AM
MaskRay added inline comments.
llvm/test/CodeGen/RISCV/loop-alignment.ll
3–4 ↗(On Diff #361303)

Wrapping lines here just makes the code less readable.

13 ↗(On Diff #361303)

I find that _ in check prefixes is also popular.

It has the benefit that _ cannot conflict with -NOT -LABEL` etc.

jrtc27 added inline comments.Jul 24 2021, 1:06 PM
llvm/test/CodeGen/RISCV/loop-alignment.ll
3–4 ↗(On Diff #361303)

That's your personal opinion, which I disagree with, and it's not true if your terminal isn't wide enough. Going against existing convention in the backend tests should only be done with very good reason, and personal opinion is not that.

13 ↗(On Diff #361303)

I have never seen it before and there are zero uses of it in RISC-V CodeGen tests. Please conform to the existing style by using -.

MaskRay marked an inline comment as done.Jul 24 2021, 2:12 PM
MaskRay added inline comments.
llvm/test/CodeGen/RISCV/loop-alignment.ll
3–4 ↗(On Diff #361303)

Lines longer than 80-column (in this case just 86) are pretty common among tests. I really hope test/CodeGen/RISCV/ can be more tolerant on this matter.

Even the Linux scripts/checkpatch.pl has increased the limit to 100 because in many cases wrapping lines for strict 80-conformance just harms readability.

Of course I don't want to waste time arguing on this matter. So if this turns out to be an issue for RISC-V folks, I'll update it to save my time.

luismarques added inline comments.Jul 24 2021, 3:38 PM
llvm/test/CodeGen/RISCV/loop-alignment.ll
3–4 ↗(On Diff #361303)

Of course I don't want to waste time arguing on this matter. So if this turns out to be an issue for RISC-V folks, I'll update it to save my time.

Personally, I don't particularly care. I don't know if @asb has strong feelings about this. If you think it would be beneficial to relax this convention please raise the issue on llvm-dev. Let's not keep discussing this in every patch touching RISC-V :-)

MaskRay added inline comments.Jul 24 2021, 3:45 PM
llvm/test/CodeGen/RISCV/loop-alignment.ll
3–4 ↗(On Diff #361303)

Personally I don't even think the generic case needs to be raised on llvm-dev:) There are just so many column>80 cases in llvm/test and clang/test. Actually, If someone wants to enforce the 80-column rule more rigidly, that probably needs a discussion.

That said, the argument here is about a subdirectory: llvm/test/CodeGen/RISCV/ ...

asb added inline comments.Jul 27 2021, 5:00 AM
llvm/test/CodeGen/RISCV/loop-alignment.ll
3–4 ↗(On Diff #361303)

I don't have a strong view on this one to be honest. I think I've typically wrapped at 80 columns for these RUN lines after being asked to, but ultimately I think choosing a logical point to split has a greater impact on readability than keeping it strictly to 80 columns.

jrtc27 added inline comments.Jul 27 2021, 6:18 AM
llvm/test/CodeGen/RISCV/loop-alignment.ll
3–4 ↗(On Diff #361303)

FWIW I care less about argument lists extending beyond 80 columns, but I do think the | is a logical point at which to wrap it if you have a long line and keeps things more readable.

Does this work with LTO?

MaskRay added a comment.EditedJul 27 2021, 3:06 PM

Does this work with LTO?

-falign-loops= doesn't affect linker code generation options.

-Wl,-mllvm,--align-loops=128 can be used for now.

MaskRay added a comment.EditedAug 3 2021, 8:58 PM

Ping.

  • -falign-loops= is currently silently ignored.
  • -fliang-loops= has user interest from at least x86 and RISC-V.
  • This patch makes the driver option work for the non-LTO case and gives the LTO case -Wl,-mllvm,--align-loops=128 (some other features were done this way. They had a -mllvm before a driver option.)

I think this is good enough.

The LTO case require a function attribute (this doesn't apply to synthesized functions)/module flags metadata which may be overkill.

luismarques accepted this revision.Aug 4 2021, 1:22 AM

Still LGTM.

BTW, I liked that in the old version the help string included "In GCC =0 is the same as =1".

This revision is now accepted and ready to land.Aug 4 2021, 1:22 AM
MaskRay updated this revision to Diff 364224.Aug 4 2021, 12:46 PM
MaskRay retitled this revision from [clang] Add -falign-loops=N where N is a power of 2 to [clang] Implement -falign-loops=N (N is a power of 2) for non-LTO.
MaskRay edited the summary of this revision. (Show Details)

rebase

Still LGTM.

BTW, I liked that in the old version the help string included "In GCC =0 is the same as =1".

I find that in GCC =0 is not necessarily =1, but not particular clear about its exact behavior;-)

This revision was landed with ongoing or failed builds.Aug 5 2021, 12:18 PM
This revision was automatically updated to reflect the committed changes.
craig.topper added inline comments.Aug 6 2021, 7:47 PM
clang/lib/Driver/ToolChains/Clang.cpp
4749

gcc 5.4 is throwing a -Wparentheses warning here. I'm in the middle of something else in my tree or I would just fix it. Maybe isPowerOf2_32 would be more readable anyway?

MaskRay added inline comments.Aug 6 2021, 8:04 PM
clang/lib/Driver/ToolChains/Clang.cpp
4749

maybe just add parens (Value - 1) ... this is probably a quite common pattern. And the line below has err_drv_alignment_not_power_of_two which is self-explanatory.