Page MenuHomePhabricator

[LLVM][NFC] Adding an Alignment type to LLVM
ClosedPublic

Authored by gchatelet on Jul 16 2019, 2:58 AM.

Details

Summary

This patch introduces a type to straighten LLVM's alignment management.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html

The next steps are:

  • Use this type throughout LLVM
  • Harden the type by setting ALIGN_HARDENED_ALIGNMENT to 1

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
gchatelet added inline comments.Jul 17 2019, 5:44 AM
llvm/include/llvm/Support/Alignment.h
101–103 ↗(On Diff #210092)
107 ↗(On Diff #210092)

Yes that's a possibility I wanted to explore as well, but I'm not sure about the tradeoffs at this point.
IMHO the code would be much more straightforward compared to a templated version.

I'm interested in comments from other developers here.

Storing alignment as a byte would allow alignment up to 2^254

0:0
1:1
2:2
3:4
4:8
...
255: 2^254
115 ↗(On Diff #210092)

Yes semantically this is exactly Optional<Align>, the use of this type is pervasive in LLVM where the alignment requirements are first unknown and set down the road when needed.
Problem is llvm::Optional needs an additional bool to know if the value is set or not so MaybeAlign is more compact than Optional<Align> (https://godbolt.org/z/8_uA_D) it's also more customizable.

157–159 ↗(On Diff #210092)

SGTM will remove

179–181 ↗(On Diff #210092)

ALIGN_DEPRECATED is defined only when ALIGN_HARDENED_ALIGNMENT is set.
As of this patch there will be no deprecation message, it's useful as a second step to find fishy code.
Fixing the code while introducing the type is not feasible considering the patch is already humongous.

192–194 ↗(On Diff #210092)

I concur this does not make sense but this is what the code is doing right now. Since this patch is about introducing a type and not fix the code, I've to add this function for now.
This is why these functions will be deprecated afterwards.

210 ↗(On Diff #210092)

Same as previous comment, this is actual code that seems to be wrong but not to be fixed in this patch.
e.g. of isAlignedBogus https://github.com/llvm/llvm-project/blob/8b7041a5c6f0a373d4886ca807d89790ad6dedab/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L8701

251 ↗(On Diff #210092)

SGTM

252 ↗(On Diff #210092)

I'm happy to optimize this afterwards if you don't mind. The review will be easier if I just move code around without changing it.

First things first, thx a lot for the review. I really appreciate it.

Ok so I thought about endianness more and I think that's not an issue because computations are generally done in native endianness.

Ok

I do however think you should definitely have either the 64 bit version be the default or have both a 32 and 64 bit version. This type is most useful in cases where tools like lld, llvm-objcopy, yaml2obj, etc... would be using an internal structure or temporary variable and generally you want to handle both 32-bit and 64-bit operations in these cases. The solution is always to default to the highest bit size which is 64-bits because it will handle the 32-bit case just fine. Unless there's a compelling reason to not use 64-bits I think you should just make it your default. Certainly before lld or llvm-objcopy can use this we need to have a 64-bit version.

Throughout LLVM we mostly use unsigned to represent an alignment. I don't know for sure what was the rationale for using it instead of uint64_t, it can be for historic reasons (just use the machine's word type) or for space reasons.
A quick estimation via git grep -E "^\s+unsigned [Aa]lign.*" | wc -l gives the following number of occurrences (for the whole llvm-project repo)

  • unsigned 200
  • uint16_t 4
  • uint32_t 132
  • uint64_t 44

    One of the occurrences for the uint16_t is in llvm/include/llvm/CodeGen/TargetLowering.h so for this one at least is seems that storage is important.

    This tends to prove that the Alignment type should be parameterized as you suggest.

In those 44 cases and 132 cases includes all ELF code. Also consider lld for more uses. You alienate no one if you support up to 64-bits but you alienate all ELF programmers if you don't support 64-bits. I am one of those people. I cannot use this if 64-bits isn't supported. Other than trying to help llvm code quality improve and being a good citizen I have no reason to review this if I don't get 64-bit support.

llvm/include/llvm/Support/Alignment.h
101–103 ↗(On Diff #210092)

Those are all easily refactored. I'm not sure how "IsPadded" is being propogated though but if that's needed you can check to see if the value changes. In general needing to check for padding like most of those examples are is anti pattern. The error checking case can just be X != alignTo(X, Align)

107 ↗(On Diff #210092)

0 would represent an alignment of '1' not '0' but you might take that route in the "maybe" to encode a missing value.

115 ↗(On Diff #210092)

Right I understand the optimization I'm saying you should match the interface modulo a few operations specific to alignments.

179–181 ↗(On Diff #210092)

You can just remove all the deprecated methods and then gradually refactor other code. Nothing calls this yet so there's no reason to add deprecated functionality.

192–194 ↗(On Diff #210092)

You should introduce a new proper type, and fix code as its migrated over, not introduce more broken code and gradually unbreak it.

First things first, thx a lot for the review. I really appreciate it.

Ok so I thought about endianness more and I think that's not an issue because computations are generally done in native endianness.

Ok

I do however think you should definitely have either the 64 bit version be the default or have both a 32 and 64 bit version. This type is most useful in cases where tools like lld, llvm-objcopy, yaml2obj, etc... would be using an internal structure or temporary variable and generally you want to handle both 32-bit and 64-bit operations in these cases. The solution is always to default to the highest bit size which is 64-bits because it will handle the 32-bit case just fine. Unless there's a compelling reason to not use 64-bits I think you should just make it your default. Certainly before lld or llvm-objcopy can use this we need to have a 64-bit version.

Throughout LLVM we mostly use unsigned to represent an alignment. I don't know for sure what was the rationale for using it instead of uint64_t, it can be for historic reasons (just use the machine's word type) or for space reasons.
A quick estimation via git grep -E "^\s+unsigned [Aa]lign.*" | wc -l gives the following number of occurrences (for the whole llvm-project repo)

  • unsigned 200
  • uint16_t 4
  • uint32_t 132
  • uint64_t 44

    One of the occurrences for the uint16_t is in llvm/include/llvm/CodeGen/TargetLowering.h so for this one at least is seems that storage is important.

    This tends to prove that the Alignment type should be parameterized as you suggest.

In those 44 cases and 132 cases includes all ELF code. Also consider lld for more uses. You alienate no one if you support up to 64-bits but you alienate all ELF programmers if you don't support 64-bits. I am one of those people. I cannot use this if 64-bits isn't supported. Other than trying to help llvm code quality improve and being a good citizen I have no reason to review this if I don't get 64-bit support.

Please don't get me wrong, I'm absolutely willing to support 64-bits. Here I'm trying to see what's the best design since there are different tradeoffs:

  • Have a single 64-bits type: accommodates all usage but can be a problem where space matters
  • Have a templated type so people can use whatever suites them best, this complicates the code though.
  • Have a compressed representation in a single byte but paying an extra cost when accessing it.

I don't think I ever said I would not add 64-bits support, if it looks like so I apologize for the misunderstanding.

gchatelet marked 32 inline comments as done.Jul 19 2019, 7:44 AM
gchatelet added inline comments.
llvm/include/llvm/Support/Alignment.h
101–103 ↗(On Diff #210092)

SGTM, I'll remove it.

115 ↗(On Diff #210092)

Acknowledged. Will do.

179–181 ↗(On Diff #210092)

I understand your reluctance to see these deprecated methods in new code but I've went through the exercise of the whole refactoring. It took me about 4 weeks of full time job. This change is not light, it is massive (patch is 26 000 lines). My decision to go this route was to make the review as straightforward as possible so it has a chance to happen. I also wanted to leave breadcrumbs along the way so fishy code could be fixed as a second step.

Since you seem to feel strongly about this I will remove the deprecated methods but I'm not so sure this refactoring will happen thoroughly considering the required investment.

192–194 ↗(On Diff #210092)

I don't think it introduces "more broken code" since it factors the broken code in a single place but I understand your position and I will try to follow that path.
I'm worried that the incremental fixing may take forever and that I may be dragged away before finishing it.

262 ↗(On Diff #210092)

IMHO ELF shouldn't use the MaybeAlign type since it represent something that is not yet defined.
AFAIU ELF always deals with known alignment so it should use Align that has well defined semantics.
I envision an Align assumeAligned(uint64_t) function that will make 0 be a Align(1)

321 ↗(On Diff #210092)
339 ↗(On Diff #210092)

I'm not sure about this one, I would encourage using the bool operator==(Align Lhs, Align Rhs) version with the assumeAligned() function I was talking earlier - and not compare directly with 0.

441 ↗(On Diff #210092)

Here are the definitions: https://llvm.org/doxygen/Optional_8h_source.html
I've tightened the defintions for MaybeAlign compared to Optional: the code will check fail when comparing the None case to a real value.

501 ↗(On Diff #210092)

Yes I removed them.

llvm/unittests/Support/AlignmentTest.cpp
30–31 ↗(On Diff #210092)

Got rid of them.

gchatelet updated this revision to Diff 210839.Jul 19 2019, 7:58 AM
gchatelet marked 7 inline comments as done.
  • Address most of the comments
gchatelet updated this revision to Diff 210840.Jul 19 2019, 8:06 AM
  • Remove old documentation, fix Log2_32 -> Log2_64
gchatelet updated this revision to Diff 211033.Jul 22 2019, 2:45 AM
  • Enable asserts since we will now fix the code while introducing the
gchatelet updated this revision to Diff 211055.Jul 22 2019, 5:05 AM
  • Add more division tests
gchatelet updated this revision to Diff 211089.Jul 22 2019, 7:05 AM
  • Fix include guard

A gentle ping

jfb added inline comments.Jul 25 2019, 12:27 PM
llvm/include/llvm/Support/Alignment.h
21 ↗(On Diff #211089)

I don't think the prior history is useful here.

73 ↗(On Diff #211089)

You can remove the ctor and instead do uint8_t ShiftValue = 0; above.

99 ↗(On Diff #211089)

Why is this better than using MaybeAligned = Optional<Align>?
Do we care about packing that extra bit?
It also seems that Align can have a shift of 255 and this can't.

gchatelet updated this revision to Diff 212162.Jul 29 2019, 6:39 AM
gchatelet marked 4 inline comments as done.
  • Make MaybeAlign an llvm::Optional.
  • Remove unnecessary documentation,

Thx for the review @jfb. Anything else?

llvm/include/llvm/Support/Alignment.h
21 ↗(On Diff #211089)

Indeed. I removed it.

99 ↗(On Diff #211089)

When the data type for Align was unsigned or uint64_t, llvm::Optional was costly in terms of storage

sizeof(llvm::Optional<unsigned>) == 8
sizeof(llvm::Optional<uint64_t>) == 16

But with Align using uint8_t I agree it's not a big deal anymore sizeof(llvm::Optional<uint8_t>) == 2

I've added a comment about Align::ShiftValue being less than 64 by construction.

jfb added inline comments.Jul 29 2019, 5:12 PM
llvm/include/llvm/Support/Alignment.h
40 ↗(On Diff #212162)

final is kinda weird here. Not that I'd inherit from this... but why have final?

146 ↗(On Diff #212162)

Are the places where Log2 with undefined alignment is called today? Can we report fatal error instead (or assert)?

gchatelet marked 4 inline comments as done.
  • Assert when Log2 on undefined MaybeAlign.
llvm/include/llvm/Support/Alignment.h
40 ↗(On Diff #212162)

It was a way to document that this type is not to be derived from. It does not have a virtual destructor.
Now I don't have a strong opinion and I'm happy to remove it if you think it's too weird.

146 ↗(On Diff #212162)

Since I wanted to first introduce the type and then fix invalid code I never had a chance to run LLVM with all the asserts.
By looking at the code it is definitely possible but I can't tell for sure if it happens. It's one of the motivation for introducing the type in the first place.

For instance it is used in TargetCallingConv where A may well be 0. In that case Log2_32 will return -1, hence the + 1 to encode invalid as zero.

Now that it's clear we want to fix the code as we introduce the type, we can treat it as an error as you suggest.

  • Fix documentation
jfb accepted this revision.Jul 30 2019, 9:09 AM

Overall this looks good, thanks!

llvm/include/llvm/Support/Alignment.h
40 ↗(On Diff #212162)

It's just not something we really do anywhere.

146 ↗(On Diff #212162)

Thanks, I think this is probably the right approach since it forces the changes as adoption happens.

This revision is now accepted and ready to land.Jul 30 2019, 9:09 AM
  • Remove final from Align and MaybeAlign
gchatelet marked 3 inline comments as done.Jul 31 2019, 12:42 AM

Thank you for the review @jfb and @jakehehrlich. I'll submit the patch now and start working on the migration.
Let me know if you have additional comments.

llvm/include/llvm/Support/Alignment.h
40 ↗(On Diff #212162)

I removed the final here and below.

This revision was automatically updated to reflect the committed changes.
gchatelet marked an inline comment as done.

This patch broke buildbot, I've submitted to fix MSVC https://reviews.llvm.org/rL367400

foad added a subscriber: foad.Jul 31 2019, 5:10 AM

This also broke ninja check-llvm-unit on a Release build:

Failing Tests (7):
    LLVM-Unit :: Support/./SupportTests/Alignment.CantConvertUnsetMaybe
    LLVM-Unit :: Support/./SupportTests/Alignment.CompareAlignToUndefMaybeAlign
    LLVM-Unit :: Support/./SupportTests/Alignment.CompareMaybeAlignToZero
    LLVM-Unit :: Support/./SupportTests/Alignment.ComparisonsWithZero
    LLVM-Unit :: Support/./SupportTests/Alignment.Division
    LLVM-Unit :: Support/./SupportTests/Alignment.InvalidCTors
    LLVM-Unit :: Support/./SupportTests/Alignment.Log2

Typical failure:

FAIL: LLVM-Unit :: Support/./SupportTests/Alignment.Division (2696 of 3890)
******************** TEST 'LLVM-Unit :: Support/./SupportTests/Alignment.Division' FAILED ********************
Note: Google Test filter = Alignment.Division
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Alignment
[ RUN      ] Alignment.Division
/home/jayfoad2/git/llvm-project/llvm/unittests/Support/AlignmentTest.cpp:70: Failure
Death test: Align(Value) / 2
    Result: failed to die.
 Error msg:
[  DEATH   ] 
/home/jayfoad2/git/llvm-project/llvm/unittests/Support/AlignmentTest.cpp:71: Failure
Death test: MaybeAlign(Value) / 2
    Result: failed to die.
 Error msg:
[  DEATH   ] 
/home/jayfoad2/git/llvm-project/llvm/unittests/Support/AlignmentTest.cpp:79: Failure
Death test: Align(8) / 0
    Result: failed to die.
 Error msg:
[  DEATH   ] 
/home/jayfoad2/git/llvm-project/llvm/unittests/Support/AlignmentTest.cpp:80: Failure
Death test: Align(8) / 3
    Result: failed to die.
 Error msg:
[  DEATH   ] 
[  FAILED  ] Alignment.Division (2 ms)
[----------] 1 test from Alignment (2 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (2 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Alignment.Division

 1 FAILED TEST

This also broke ninja check-llvm-unit on a Release build:

Thx for letting me know @foad , I'm on it.

@gchatelet This is breaking windows bots: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-win-fast/builds/1933 - please can you take a look? I think we'll need to revert until you have a fix.

gchatelet added a comment.EditedJul 31 2019, 5:53 AM

@gchatelet This is breaking windows bots: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-win-fast/builds/1933 - please can you take a look? I think we'll need to revert until you have a fix.

If this is about the missing default constructor it's already fixed:
https://reviews.llvm.org/rL367400

As for the failing tests in release the patch is sent as well
https://reviews.llvm.org/rL367427

Thanks - rL367427 seemed to be the key

aprantl added inline comments.
llvm/trunk/include/llvm/Support/Alignment.h
39

Can you please doxygen-ify all the comments in this file? I.e. // -> ///

113

This comment also doesn't follow the LLVM rules. It should be just
/// Checks that ...

gchatelet marked 2 inline comments as done.Aug 1 2019, 12:29 AM

@aprantl I've created https://reviews.llvm.org/D65558 to address your comments since this patch is already submitted.

Thanks for looking at this. I haven't looked at this or the related patches at all, and I know this has landed already. I haven't really looked at the details of the patch, but I did want to highlight a possible conflict in interpretation of 0 alignment. In the elf gABI, the two alignment fields (sh_addralign and p_align) both have the statement "Values 0 and 1 mean no alignment is required". In other words, 0 is not technically an undefined alignment any more than 1 is, at least in the ELF context. That being said, if you are always treating an alignment of 0 as an alignment of 1, I don't think there's any issue.

Thanks for looking at this. I haven't looked at this or the related patches at all, and I know this has landed already. I haven't really looked at the details of the patch, but I did want to highlight a possible conflict in interpretation of 0 alignment. In the elf gABI, the two alignment fields (sh_addralign and p_align) both have the statement "Values 0 and 1 mean no alignment is required". In other words, 0 is not technically an undefined alignment any more than 1 is, at least in the ELF context. That being said, if you are always treating an alignment of 0 as an alignment of 1, I don't think there's any issue.

Thx for the feedback @jhenderson!
I had a quick exchange with @lattner a while ago. He told me that 0 most probably means 1 byte aligned, most of the time, which is inline with what you say.

The type has landed but there's a lot more work to get the codebase converted to use it.
This query shows the patches submitted so far.

I'm actively working on it, along the way I learned that the unsigned sometimes means an alignment in bytes (powers of two) but also sometimes the log2 of it, in which case 0 means 1ULL << 0, i.e. 1 byte aligned (see D65945).
I'm as careful as possible and I make sure I understand the consequences when I introduce the type to a part of the codebase, the whole migration will certainly take time.