This is an archive of the discontinued LLVM Phabricator instance.

[IR] Don't assume all functions are 4 byte aligned
ClosedPublic

Authored by rs on Nov 30 2018, 3:38 AM.

Details

Summary

In some cases different alignments for function might be used to save
space e.g. thumb mode with -Oz will try to use 2 byte function
alignment. Similar patch that fixed this in other areas exists here
https://reviews.llvm.org/D46110

Diff Detail

Event Timeline

rs created this revision.Nov 30 2018, 3:38 AM
efriedma added inline comments.
lib/IR/ConstantFold.cpp
1080

Can you fix this to just call getPointerAlignment() instead of trying to recreate it?

rs updated this revision to Diff 176378.Dec 3 2018, 5:36 AM
rs marked an inline comment as done.

Updated to use getPointerAlignment and removed test file that was reliant on compiler using assumptions about function alignment.

efriedma accepted this revision.Dec 3 2018, 11:35 AM

LGTM with formatting fixed.

lib/IR/ConstantFold.cpp
1081–1084

80 columns.

This revision is now accepted and ready to land.Dec 3 2018, 11:35 AM
This revision was automatically updated to reflect the committed changes.
rs updated this revision to Diff 176966.Dec 6 2018, 6:52 AM

This was approved previously https://reviews.llvm.org/D55115 but when committed it caused failures on the sanitizer buildbots when building
llvm with clang (containing this patch). This is now fixed because I've added a check to see if getting the parent module returns null if it does then set the alignment to 0.

Sure, looks fine.

We've noticed significant regressions on x86 in code size that seem likely root caused to this going in... see my comments below. I think this has caused a regression on most targets. =[

lib/IR/ConstantFold.cpp
1081–1085

This is actually a pretty significant regression for platforms where functions are aligned.

The implementation of getPointerAlignment doesn't do any checks and always returns zero for functions. While this may be necessary on some platforms, it seems bad to just do this blindly. In particular, it doesn't even look at explicit alignment in that case, making this a significant and surprising regression in that case.

While I generally support fixing this, I think we need to enhance getPointerAlignment to actually return something useful if we're going to delegate to it here. This may require putting some of the underlying alignment constraints into the datalayout (such as the thumb-mode LSB stashing).

Last but not least, I don't understand why this code can or should support global variables that are not in a module, and then constant fold them differently. That seems like a bad precedent to set. Maybe we should assert that there is a module here and fix the code that fails to put its globals into a module?

efriedma added inline comments.Dec 27 2018, 6:44 PM
lib/IR/ConstantFold.cpp
1081–1085

I've never seen code that computes "FnPtr & -4", but I guess you must have some somewhere. (We fixed getPointerAlignment for all the other places a while back, and nobody noticed.) Which target are you seeing the size regression on? Are you sure LLVM was actually computing the alignment correctly for your code?

In general, there are two aspects to function pointer alignment: one, whether an arbitrary function pointer has known alignment, and two, whether we can compute better alignment by examining a specific function. For the first, there are some targets where the alignment is known: AArch64 function pointers point to an instruction which is four-byte-aligned, PowerPC function pointers point to a descriptor which is pointer-aligned, etc. For the second, we can often do better: x86 function pointers always have the alignment of the function, ARM function pointers have the alignment of the function in ARM mode, etc.

I'm not sure how much of this makes sense to put into the datalayout. We could add a "function pointer alignment" integer for the first. We could also add a boolean "does the alignment of a function always dictate the pointer alignment" bit for the second. If we want to handle targets like ARM, though, we probably want to move the folding elsewhere and use a target hook, instead of trying to put arbitrary predicates into the datalayout.

chandlerc added inline comments.Dec 27 2018, 6:53 PM
lib/IR/ConstantFold.cpp
1081–1085

I don't think they're literally masking, but I think function pointers go down code paths that simplify heavily with alignment guarantees.

Target is x86. We're trying to confirm that this is actually what is happening.

Regarding the design points, I think a reasonable model we can use here: data layout could gave a field for function pointers, and it could take the following forms:
a) is the function alignment (x86)
b) is a known, fixed alignment of N, where N is in the datalayout string (AArch64, PPC)
c) cannot be assumed due to potential use of LSB (ARM due to Thumb mode)

We could even make (c) be represented by a fixed alignment value of 1 and we could make (a) be represented by a fixed alignment value of 0. This would match typical "special" alignment values in LLVM. And the duplication of a pointer's alignment for cases like PPC doesn't seem bad.

I feel like this would cover the vast majority of cases we're ever likely to care about with minimal complexity.

craig.topper added inline comments.
lib/IR/ConstantFold.cpp
1081–1085

One thing we observed. This breaks optimization of the check for the virtual bit in member function pointers in the Itanium C++ ABI which set bit 0 if the member function is virtual. If we can see the creation of the function pointer and the use of the function pointer we could previously tell when it wasn't virtual because we could see it was just a function address which this code assumed was aligned and therefore bit 0 was 0.

chandlerc added inline comments.Jan 2 2019, 3:12 PM
lib/IR/ConstantFold.cpp
1081–1085

Is someone working on a fix? Can we put this behind a flag or something until everything is ready?

Currently, still have regressed workloads...

rupprecht added inline comments.
lib/IR/ConstantFold.cpp
1081–1085

Here is a repro recipe that demonstrates the size increase (sorry for all the steps, I'll try to get it down to pure .ll later):

$ sudo apt install libprotobuf-dev protobuf-compiler
$ cat list.proto
syntax = "proto2";
message Value { optional string name = 1; }
message List { repeated Value values = 1; }
$ protoc list.proto --cpp_out=.
$ clang++ -c list.pb.cc -O2

The object file increases ~1% using clang built w/ this revision. Looking at some of the code that is optimized differently, they are dealing with function pointers, but AFAICT not literally masking them, so probably what Chandler suggested -- we're missing out on optimizations due to lack of alignment guarantees.

@rupprecht, does it look related to the memptr.isvirtual bit that C++ member function pointers use?

@rupprecht, does it look related to the memptr.isvirtual bit that C++ member function pointers use?

Yep, when I compile clang in debug mode and compare .ll I see memptr.virtual where it was previously inlined.

While I'm very happy we have a precise analysis of this, I also want to point out that it would better to do this analysis after reverting the patch. We need to be reverting to green, including for performance.

rs added a subscriber: miyuki.Jan 4 2019, 8:46 AM

@chandlerc I've reverted the patch r350402 sorry for causing the performance issue. I'll investigate the datalayout idea a bit more, @miyuki original solution to the function pointer aligment issue did something like that https://reviews.llvm.org/D44781

Ranjeet, thanks for reverting. Please add me as a reviewer/subscriber to the follow up patch and I'll test for size regressions on it