This is an archive of the discontinued LLVM Phabricator instance.

[LLVM][NFC] Adding an Alignment type to LLVM
ClosedPublic

Authored by gchatelet on Jul 16 2019, 2:58 AM.

Details

Summary

This patch introduces a type to straighten LLVM's alignment management.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html

The next steps are:

  • Use this type throughout LLVM
  • Harden the type by setting ALIGN_HARDENED_ALIGNMENT to 1

Event Timeline

gchatelet created this revision.Jul 16 2019, 2:58 AM
gchatelet updated this revision to Diff 210071.Jul 16 2019, 5:28 AM
  • Moved functions around, added more documentation.
gchatelet updated this revision to Diff 210074.Jul 16 2019, 5:50 AM
  • Moved comparison functions outside of the classes.
gchatelet updated this revision to Diff 210081.Jul 16 2019, 6:14 AM
  • Fixing typo
gchatelet updated this revision to Diff 210092.Jul 16 2019, 7:17 AM
  • Adding missing inline

Ok so I thought about endianness more and I think that's not an issue because computations are generally done in native endianness. I do however think you should definitely have either the 64 bit version be the default or have both a 32 and 64 bit version. This type is most useful in cases where tools like lld, llvm-objcopy, yaml2obj, etc... would be using an internal structure or temporary variable and generally you want to handle both 32-bit and 64-bit operations in these cases. The solution is always to default to the highest bit size which is 64-bits because it will handle the 32-bit case just fine. Unless there's a compelling reason to not use 64-bits I think you should just make it your default. Certainly before lld or llvm-objcopy can use this we need to have a 64-bit version.

jakehehrlich added inline comments.Jul 16 2019, 2:50 PM
llvm/include/llvm/Support/Alignment.h
70

2^32 isn't a value that uint32_t can represent so you might change this. 2^31 is the largest valid power of two, 2^32 - 1 is the largest value that can be represented. I don't think you need to bring negative values in to this since its all using uint32_t. Also my math on this is rusty but I think int32_t(uint32_t(1) << uint32_t(31)) == -1.

95

Is halving ever really needed?

102–104

Seems like you'd just want to provide helper functions like "alignTo" that do this for you. There's a whole family of them but personally I've primarily used "alignTo" and one other really odd highly specific one for ELF to deal with segment alignment.

108

If size is a concern yet we only want to represent powers of two I think encoding the power two only would let us represent 64-bit alignments using only a 1 byte value at the cost of a slight bit of increased computation on each operation. These operations tend to not be very performance critical however.

116

What's the use case for this type? I don't think any ELF tools have a use case for this at least. ELF uses 0 to mean the same thing as 1 so we'd just want something that converted to 1 asap.

I think its interface should copy Optional if this has a use case. e.g. using Optional<Align> should function close to the same, perhaps with some operations from Align that act in sensible ways.

158–160

I don't think you should be providing this. it's easy enough to use "isSet" and their own assert. If I was reading this I wouldn't expect this to be an assert under the hood and nothing else.

180–182

Why add a deprecated method to new code? Nothing is using this so there's no need for backward compatibility. Also the two methods above look equally dangerous but boil down to Optional's 'get' or dereference.

193–195

Why does choosing false make sense here? It isn't clear to me what should be returned here one way or the other. If padding is needed then one would assume you'd also need a method to perform the padding required. If this was an empty alignment how could you do that?

211

I see a lot of "bogus" or "dangerous" or "deprecated" things being added and these all huge big red flags to me. Can you better justify the need for them?

252

I'd probably not use const Align & and instead just use Align. This matches how alignment is used over the code base anyhow things like ArrayRef and StringRef are used.

253

You can use (Size + A.value() - 1) & -A.value() since you explicitly only represent powers of two. Not critical by any means but a possible implementation that saves on division operations which can be almost as slow as an allocation.

You can think of what you're doing as being like a right shift followed by a left shift which zeros out the extra bits produced in the bump. -A.value() is a mask where all those bits being zeroed out are just zero. I've found that both myself and many other people get confused by this trick so a comment is worth while if you use it.

263

I think this being the default makes sense but it would be worth showing examples of where this is used. This for instance matches what ELF should do with zero alignment but thats really just the same as saying that what MaybeAlignment calls undefined is actully an alignment of 1.

272

Is there a reason to 1) copy the naming convention here or 2) to not just call this "Log2"? Also this operation will just be "return value" if you use the 1-byte representation proposed above.

A merit of using the bit shift representation is that the cost of converting to the origional is low (1 << value) but using the current representation Log2_32 is quite complicated under the hood and requires a large handful of bitwise operations.

322
  1. Does any existing code use this
  2. Why add the '1'? Id expect this to be the inverse of 1 << x but instead its the inverse of 1 << (x - 1) or as you do below (1 << x) >> 1
340

Would be nice to have a version for ELF specifically that treated '0' and '1' as equal.

374

Ditto about ELF specific option. Not sure how other parts of the code treat '0' so this might be a follow up.

442

How does Optional<T> handle equality in the None case? I think we should match those semantics....maybe depends on what they are.

502

Won't std::max do this automatically since you implemented the comparison operators?

519–529

These are again consistent with how ELF treats 0 since 0 <= x for all alignments x in ELF but some motivation for how these are used is required IMO since it isn't a priori clear what these should do.

553–563

ditto on justification for these.

llvm/unittests/Support/AlignmentTest.cpp
31–32

These values are wrong and not powers of two.

First things first, thx a lot for the review. I really appreciate it.

Ok so I thought about endianness more and I think that's not an issue because computations are generally done in native endianness.

Ok

I do however think you should definitely have either the 64 bit version be the default or have both a 32 and 64 bit version. This type is most useful in cases where tools like lld, llvm-objcopy, yaml2obj, etc... would be using an internal structure or temporary variable and generally you want to handle both 32-bit and 64-bit operations in these cases. The solution is always to default to the highest bit size which is 64-bits because it will handle the 32-bit case just fine. Unless there's a compelling reason to not use 64-bits I think you should just make it your default. Certainly before lld or llvm-objcopy can use this we need to have a 64-bit version.

Throughout LLVM we mostly use unsigned to represent an alignment. I don't know for sure what was the rationale for using it instead of uint64_t, it can be for historic reasons (just use the machine's word type) or for space reasons.
A quick estimation via git grep -E "^\s+unsigned [Aa]lign.*" | wc -l gives the following number of occurrences (for the whole llvm-project repo)

  • unsigned 200
  • uint16_t 4
  • uint32_t 132
  • uint64_t 44

One of the occurrences for the uint16_t is in llvm/include/llvm/CodeGen/TargetLowering.h so for this one at least is seems that storage is important.

This tends to prove that the Alignment type should be parameterized as you suggest.

gchatelet marked 8 inline comments as done.Jul 17 2019, 5:44 AM

A few comments for now, I'll reply to the other ones soonish.

llvm/include/llvm/Support/Alignment.h
70

Thx for noticing, this comment is nonsense indeed.
-1 is all ones and can't be a power of two so there's no need bothering.

95
102–104
108

Yes that's a possibility I wanted to explore as well, but I'm not sure about the tradeoffs at this point.
IMHO the code would be much more straightforward compared to a templated version.

I'm interested in comments from other developers here.

Storing alignment as a byte would allow alignment up to 2^254

0:0
1:1
2:2
3:4
4:8
...
255: 2^254
116

Yes semantically this is exactly Optional<Align>, the use of this type is pervasive in LLVM where the alignment requirements are first unknown and set down the road when needed.
Problem is llvm::Optional needs an additional bool to know if the value is set or not so MaybeAlign is more compact than Optional<Align> (https://godbolt.org/z/8_uA_D) it's also more customizable.

158–160

SGTM will remove

gchatelet added inline comments.Jul 17 2019, 5:44 AM
llvm/include/llvm/Support/Alignment.h
180–182

ALIGN_DEPRECATED is defined only when ALIGN_HARDENED_ALIGNMENT is set.
As of this patch there will be no deprecation message, it's useful as a second step to find fishy code.
Fixing the code while introducing the type is not feasible considering the patch is already humongous.

193–195

I concur this does not make sense but this is what the code is doing right now. Since this patch is about introducing a type and not fix the code, I've to add this function for now.
This is why these functions will be deprecated afterwards.

211

Same as previous comment, this is actual code that seems to be wrong but not to be fixed in this patch.
e.g. of isAlignedBogus https://github.com/llvm/llvm-project/blob/8b7041a5c6f0a373d4886ca807d89790ad6dedab/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L8701

252

SGTM

253

I'm happy to optimize this afterwards if you don't mind. The review will be easier if I just move code around without changing it.

First things first, thx a lot for the review. I really appreciate it.

Ok so I thought about endianness more and I think that's not an issue because computations are generally done in native endianness.

Ok

I do however think you should definitely have either the 64 bit version be the default or have both a 32 and 64 bit version. This type is most useful in cases where tools like lld, llvm-objcopy, yaml2obj, etc... would be using an internal structure or temporary variable and generally you want to handle both 32-bit and 64-bit operations in these cases. The solution is always to default to the highest bit size which is 64-bits because it will handle the 32-bit case just fine. Unless there's a compelling reason to not use 64-bits I think you should just make it your default. Certainly before lld or llvm-objcopy can use this we need to have a 64-bit version.

Throughout LLVM we mostly use unsigned to represent an alignment. I don't know for sure what was the rationale for using it instead of uint64_t, it can be for historic reasons (just use the machine's word type) or for space reasons.
A quick estimation via git grep -E "^\s+unsigned [Aa]lign.*" | wc -l gives the following number of occurrences (for the whole llvm-project repo)

  • unsigned 200
  • uint16_t 4
  • uint32_t 132
  • uint64_t 44

One of the occurrences for the uint16_t is in llvm/include/llvm/CodeGen/TargetLowering.h so for this one at least is seems that storage is important.

This tends to prove that the Alignment type should be parameterized as you suggest.

In those 44 cases and 132 cases includes all ELF code. Also consider lld for more uses. You alienate no one if you support up to 64-bits but you alienate all ELF programmers if you don't support 64-bits. I am one of those people. I cannot use this if 64-bits isn't supported. Other than trying to help llvm code quality improve and being a good citizen I have no reason to review this if I don't get 64-bit support.

llvm/include/llvm/Support/Alignment.h
102–104

Those are all easily refactored. I'm not sure how "IsPadded" is being propogated though but if that's needed you can check to see if the value changes. In general needing to check for padding like most of those examples are is anti pattern. The error checking case can just be X != alignTo(X, Align)

108

0 would represent an alignment of '1' not '0' but you might take that route in the "maybe" to encode a missing value.

116

Right I understand the optimization I'm saying you should match the interface modulo a few operations specific to alignments.

180–182

You can just remove all the deprecated methods and then gradually refactor other code. Nothing calls this yet so there's no reason to add deprecated functionality.

193–195

You should introduce a new proper type, and fix code as its migrated over, not introduce more broken code and gradually unbreak it.

First things first, thx a lot for the review. I really appreciate it.

Ok so I thought about endianness more and I think that's not an issue because computations are generally done in native endianness.

Ok

I do however think you should definitely have either the 64 bit version be the default or have both a 32 and 64 bit version. This type is most useful in cases where tools like lld, llvm-objcopy, yaml2obj, etc... would be using an internal structure or temporary variable and generally you want to handle both 32-bit and 64-bit operations in these cases. The solution is always to default to the highest bit size which is 64-bits because it will handle the 32-bit case just fine. Unless there's a compelling reason to not use 64-bits I think you should just make it your default. Certainly before lld or llvm-objcopy can use this we need to have a 64-bit version.

Throughout LLVM we mostly use unsigned to represent an alignment. I don't know for sure what was the rationale for using it instead of uint64_t, it can be for historic reasons (just use the machine's word type) or for space reasons.
A quick estimation via git grep -E "^\s+unsigned [Aa]lign.*" | wc -l gives the following number of occurrences (for the whole llvm-project repo)

  • unsigned 200
  • uint16_t 4
  • uint32_t 132
  • uint64_t 44

One of the occurrences for the uint16_t is in llvm/include/llvm/CodeGen/TargetLowering.h so for this one at least is seems that storage is important.

This tends to prove that the Alignment type should be parameterized as you suggest.

In those 44 cases and 132 cases includes all ELF code. Also consider lld for more uses. You alienate no one if you support up to 64-bits but you alienate all ELF programmers if you don't support 64-bits. I am one of those people. I cannot use this if 64-bits isn't supported. Other than trying to help llvm code quality improve and being a good citizen I have no reason to review this if I don't get 64-bit support.

Please don't get me wrong, I'm absolutely willing to support 64-bits. Here I'm trying to see what's the best design since there are different tradeoffs:

  • Have a single 64-bits type: accommodates all usage but can be a problem where space matters
  • Have a templated type so people can use whatever suites them best, this complicates the code though.
  • Have a compressed representation in a single byte but paying an extra cost when accessing it.

I don't think I ever said I would not add 64-bits support, if it looks like so I apologize for the misunderstanding.

gchatelet marked 32 inline comments as done.Jul 19 2019, 7:44 AM
gchatelet added inline comments.
llvm/include/llvm/Support/Alignment.h
102–104

SGTM, I'll remove it.

116

Acknowledged. Will do.

180–182

I understand your reluctance to see these deprecated methods in new code but I've went through the exercise of the whole refactoring. It took me about 4 weeks of full time job. This change is not light, it is massive (patch is 26 000 lines). My decision to go this route was to make the review as straightforward as possible so it has a chance to happen. I also wanted to leave breadcrumbs along the way so fishy code could be fixed as a second step.

Since you seem to feel strongly about this I will remove the deprecated methods but I'm not so sure this refactoring will happen thoroughly considering the required investment.

193–195

I don't think it introduces "more broken code" since it factors the broken code in a single place but I understand your position and I will try to follow that path.
I'm worried that the incremental fixing may take forever and that I may be dragged away before finishing it.

263

IMHO ELF shouldn't use the MaybeAlign type since it represent something that is not yet defined.
AFAIU ELF always deals with known alignment so it should use Align that has well defined semantics.
I envision an Align assumeAligned(uint64_t) function that will make 0 be a Align(1)

322
340

I'm not sure about this one, I would encourage using the bool operator==(Align Lhs, Align Rhs) version with the assumeAligned() function I was talking earlier - and not compare directly with 0.

442

Here are the definitions: https://llvm.org/doxygen/Optional_8h_source.html
I've tightened the defintions for MaybeAlign compared to Optional: the code will check fail when comparing the None case to a real value.

502

Yes I removed them.

llvm/unittests/Support/AlignmentTest.cpp
31–32

Got rid of them.

gchatelet updated this revision to Diff 210839.Jul 19 2019, 7:58 AM
gchatelet marked 7 inline comments as done.
  • Address most of the comments
gchatelet updated this revision to Diff 210840.Jul 19 2019, 8:06 AM
  • Remove old documentation, fix Log2_32 -> Log2_64
gchatelet updated this revision to Diff 211033.Jul 22 2019, 2:45 AM
  • Enable asserts since we will now fix the code while introducing the
gchatelet updated this revision to Diff 211055.Jul 22 2019, 5:05 AM
  • Add more division tests
gchatelet updated this revision to Diff 211089.Jul 22 2019, 7:05 AM
  • Fix include guard

A gentle ping

jfb added inline comments.Jul 25 2019, 12:27 PM
llvm/include/llvm/Support/Alignment.h
22

I don't think the prior history is useful here.

74

You can remove the ctor and instead do uint8_t ShiftValue = 0; above.

100

Why is this better than using MaybeAligned = Optional<Align>?
Do we care about packing that extra bit?
It also seems that Align can have a shift of 255 and this can't.

gchatelet updated this revision to Diff 212162.Jul 29 2019, 6:39 AM
gchatelet marked 4 inline comments as done.
  • Make MaybeAlign an llvm::Optional.
  • Remove unnecessary documentation,

Thx for the review @jfb. Anything else?

llvm/include/llvm/Support/Alignment.h
22

Indeed. I removed it.

100

When the data type for Align was unsigned or uint64_t, llvm::Optional was costly in terms of storage

sizeof(llvm::Optional<unsigned>) == 8
sizeof(llvm::Optional<uint64_t>) == 16

But with Align using uint8_t I agree it's not a big deal anymore sizeof(llvm::Optional<uint8_t>) == 2

I've added a comment about Align::ShiftValue being less than 64 by construction.

jfb added inline comments.Jul 29 2019, 5:12 PM
llvm/include/llvm/Support/Alignment.h
41

final is kinda weird here. Not that I'd inherit from this... but why have final?

147

Are the places where Log2 with undefined alignment is called today? Can we report fatal error instead (or assert)?

gchatelet marked 4 inline comments as done.
  • Assert when Log2 on undefined MaybeAlign.
llvm/include/llvm/Support/Alignment.h
41

It was a way to document that this type is not to be derived from. It does not have a virtual destructor.
Now I don't have a strong opinion and I'm happy to remove it if you think it's too weird.

147

Since I wanted to first introduce the type and then fix invalid code I never had a chance to run LLVM with all the asserts.
By looking at the code it is definitely possible but I can't tell for sure if it happens. It's one of the motivation for introducing the type in the first place.

For instance it is used in TargetCallingConv where A may well be 0. In that case Log2_32 will return -1, hence the + 1 to encode invalid as zero.

Now that it's clear we want to fix the code as we introduce the type, we can treat it as an error as you suggest.

  • Fix documentation
jfb accepted this revision.Jul 30 2019, 9:09 AM

Overall this looks good, thanks!

llvm/include/llvm/Support/Alignment.h
41

It's just not something we really do anywhere.

147

Thanks, I think this is probably the right approach since it forces the changes as adoption happens.

This revision is now accepted and ready to land.Jul 30 2019, 9:09 AM
  • Remove final from Align and MaybeAlign
gchatelet marked 3 inline comments as done.Jul 31 2019, 12:42 AM

Thank you for the review @jfb and @jakehehrlich. I'll submit the patch now and start working on the migration.
Let me know if you have additional comments.

llvm/include/llvm/Support/Alignment.h
41

I removed the final here and below.

This revision was automatically updated to reflect the committed changes.
gchatelet marked an inline comment as done.

This patch broke buildbot, I've submitted to fix MSVC https://reviews.llvm.org/rL367400

foad added a subscriber: foad.Jul 31 2019, 5:10 AM

This also broke ninja check-llvm-unit on a Release build:

Failing Tests (7):
    LLVM-Unit :: Support/./SupportTests/Alignment.CantConvertUnsetMaybe
    LLVM-Unit :: Support/./SupportTests/Alignment.CompareAlignToUndefMaybeAlign
    LLVM-Unit :: Support/./SupportTests/Alignment.CompareMaybeAlignToZero
    LLVM-Unit :: Support/./SupportTests/Alignment.ComparisonsWithZero
    LLVM-Unit :: Support/./SupportTests/Alignment.Division
    LLVM-Unit :: Support/./SupportTests/Alignment.InvalidCTors
    LLVM-Unit :: Support/./SupportTests/Alignment.Log2

Typical failure:

FAIL: LLVM-Unit :: Support/./SupportTests/Alignment.Division (2696 of 3890)
******************** TEST 'LLVM-Unit :: Support/./SupportTests/Alignment.Division' FAILED ********************
Note: Google Test filter = Alignment.Division
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Alignment
[ RUN      ] Alignment.Division
/home/jayfoad2/git/llvm-project/llvm/unittests/Support/AlignmentTest.cpp:70: Failure
Death test: Align(Value) / 2
    Result: failed to die.
 Error msg:
[  DEATH   ] 
/home/jayfoad2/git/llvm-project/llvm/unittests/Support/AlignmentTest.cpp:71: Failure
Death test: MaybeAlign(Value) / 2
    Result: failed to die.
 Error msg:
[  DEATH   ] 
/home/jayfoad2/git/llvm-project/llvm/unittests/Support/AlignmentTest.cpp:79: Failure
Death test: Align(8) / 0
    Result: failed to die.
 Error msg:
[  DEATH   ] 
/home/jayfoad2/git/llvm-project/llvm/unittests/Support/AlignmentTest.cpp:80: Failure
Death test: Align(8) / 3
    Result: failed to die.
 Error msg:
[  DEATH   ] 
[  FAILED  ] Alignment.Division (2 ms)
[----------] 1 test from Alignment (2 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (2 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Alignment.Division

 1 FAILED TEST

This also broke ninja check-llvm-unit on a Release build:

Thx for letting me know @foad , I'm on it.

@gchatelet This is breaking windows bots: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-win-fast/builds/1933 - please can you take a look? I think we'll need to revert until you have a fix.

gchatelet added a comment.EditedJul 31 2019, 5:53 AM

@gchatelet This is breaking windows bots: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-win-fast/builds/1933 - please can you take a look? I think we'll need to revert until you have a fix.

If this is about the missing default constructor it's already fixed:
https://reviews.llvm.org/rL367400

As for the failing tests in release the patch is sent as well
https://reviews.llvm.org/rL367427

Thanks - rL367427 seemed to be the key

aprantl added inline comments.
llvm/trunk/include/llvm/Support/Alignment.h
39 ↗(On Diff #212526)

Can you please doxygen-ify all the comments in this file? I.e. // -> ///

113 ↗(On Diff #212526)

This comment also doesn't follow the LLVM rules. It should be just
/// Checks that ...

gchatelet marked 2 inline comments as done.Aug 1 2019, 12:29 AM

@aprantl I've created https://reviews.llvm.org/D65558 to address your comments since this patch is already submitted.

Thanks for looking at this. I haven't looked at this or the related patches at all, and I know this has landed already. I haven't really looked at the details of the patch, but I did want to highlight a possible conflict in interpretation of 0 alignment. In the elf gABI, the two alignment fields (sh_addralign and p_align) both have the statement "Values 0 and 1 mean no alignment is required". In other words, 0 is not technically an undefined alignment any more than 1 is, at least in the ELF context. That being said, if you are always treating an alignment of 0 as an alignment of 1, I don't think there's any issue.

Thanks for looking at this. I haven't looked at this or the related patches at all, and I know this has landed already. I haven't really looked at the details of the patch, but I did want to highlight a possible conflict in interpretation of 0 alignment. In the elf gABI, the two alignment fields (sh_addralign and p_align) both have the statement "Values 0 and 1 mean no alignment is required". In other words, 0 is not technically an undefined alignment any more than 1 is, at least in the ELF context. That being said, if you are always treating an alignment of 0 as an alignment of 1, I don't think there's any issue.

Thx for the feedback @jhenderson!
I had a quick exchange with @lattner a while ago. He told me that 0 most probably means 1 byte aligned, most of the time, which is inline with what you say.

The type has landed but there's a lot more work to get the codebase converted to use it.
This query shows the patches submitted so far.

I'm actively working on it, along the way I learned that the unsigned sometimes means an alignment in bytes (powers of two) but also sometimes the log2 of it, in which case 0 means 1ULL << 0, i.e. 1 byte aligned (see D65945).
I'm as careful as possible and I make sure I understand the consequences when I introduce the type to a part of the codebase, the whole migration will certainly take time.