This is an archive of the discontinued LLVM Phabricator instance.

[AIX] Change the linkage of profiling counter/data to be private
ClosedPublic

Authored by jsji on Sep 24 2021, 8:43 AM.

Details

Summary

We generate symbols like profc/profd for each function, and put them into csects.
When there are weak functions, we generate weak symbols for the functions as well,
with ELF (and some others), linker (binder) will discard and only keep one copy of the weak symbols.

However, on AIX, the current binder can NOT discard the weak symbols if we put all of them into the same csect,
as binder can NOT discard a subset of a csect.

This creates a unique challenge for using those symbols to calculate some relative offsets.

This patch changed the linkage of profc/profd symbols to be private, so that all the profc/profd for each weak symbol will be *local* to objects, and all kept in the csect, so we won't have problem. Although only one of the counters will be used, all the pointer in the profd is correct.

The downside is that we won't be able to discard the duplicated counters and profile data,
but those can not be discarded even if we keep the weak linkage,
due to the binder limitation of not discarding a subsect of the csect either .

Diff Detail

Event Timeline

jsji created this revision.Sep 24 2021, 8:43 AM
jsji requested review of this revision.Sep 24 2021, 8:43 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 24 2021, 8:43 AM
jsji updated this revision to Diff 374867.Sep 24 2021, 8:52 AM

Restore the limitation of sample profiling.

Whitney accepted this revision.Sep 24 2021, 9:08 AM
This revision is now accepted and ready to land.Sep 24 2021, 9:08 AM
MaskRay requested changes to this revision.Sep 24 2021, 9:25 AM
MaskRay added inline comments.
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
869

Then you can just keep the existing weak symbols.

The weak symbols will not be selected due to linker semantics.

This revision now requires changes to proceed.Sep 24 2021, 9:25 AM
MaskRay added inline comments.Sep 24 2021, 9:26 AM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
978

This place is for binary formats which can properly GC metadata sections. If XCOFF can't, it shouldn't be added here.

jsji updated this revision to Diff 374879.Sep 24 2021, 9:44 AM

Address comments.

jsji added inline comments.Sep 24 2021, 9:46 AM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
869

Not sure what you meant?

jsji added inline comments.Sep 24 2021, 10:07 AM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
869

To be clear, keeping the weak linkage is causing issues in our test, and we investigate the problem , together with AIX binder owner, and it is the limitation in binder support that we can't ensure the relocation getting resolved to the intended weak symbol. So this is what we can do right now for enabling PGO.

We may be able to lift it this limitation later if binder get updated and do what we want, but this is limitation for now.

we can NOT guarantee that the relocations get resolved to the intended weak symbol, so we can not ensure the correctness of the relative CounterPtr, so we have to use private linkage for counter and data symbols.

In other binary formats the first weak definition is selected and other weak definitions are discarded.
Do you mean that AIX ld doesn't pick the first weak definition?

jsji added a comment.EditedSep 24 2021, 10:40 AM

In other binary formats the first weak definition is selected and other weak definitions are discarded.
Do you mean that AIX ld doesn't pick the first weak definition?

No. I think this is exactly what is causing problem here.

eg: if we have two weak symbols (weak_func), and we generate the weak counter/data as usual:

__llvm_prf_cnts:

__profc_weak_func(1):

__profc_weak_func(2):

__llvm_prf_data:

__profd_weak func(1):

Pos_Rel __profc_weak_func  <=== Relocation to calculate the relative pointer

__profd_weak_func(2):

Pos_Rel __profc_weak_func  <=== Relocation to calculate the relative pointer

The relative pointer calculation in one of the __profd_weak_fun will be wrong after linking.

In other binary formats the first weak definition is selected and other weak definitions are discarded.
Do you mean that AIX ld doesn't pick the first weak definition?

No. I think this is exactly what is causing problem here.

eg: if we have two weak symbols (weak_func), and we generate the weak counter/data as usual:

__llvm_prf_cnts:

__profc_weak_func(1):

__profc_weak_func(2):

__llvm_prf_data:

__profd_weak func(1):

Pos_Rel __profc_weak_func  <=== Relocation to calculate the relative pointer

__profd_weak_func(2):

Pos_Rel __profc_weak_func  <=== Relocation to calculate the relative pointer

The relative pointer calculation in one of the __profd_weak_fun will be wrong after linking.

Can you elaborate the Pos_Rel __profc_weak_func notation?

Note: In other binary formats' scheme, it doesn't matter whether non-first weak definitions are discarded or not.
After linking, the non-first weak definitions will be unreachable anyway, so their internal references don't really matter.

jsji added a comment.Sep 24 2021, 12:31 PM

Pos_Rel profc_weak_func notation is just the normal relocation to symbol profc_weak_func.

The relocation here is to calculate the relative pointer (offset) to profcweak_func from profdweak__func.

In example above, the offsets value after binding will be the same for both instances, as the symbol profcweak__func will be the one that is chosen by binder. Let us say it is the value of first instances.

Then when we calculate the profc_weak_func pointer for 2nd instances, we get garbage value.

The description is still unclear.

Say a.o has a weak foo, __profc_foo, __profd_foo, b.o has a weak foo, __profc_foo, __profd_foo.
The linker picks the definitions from a.o. In the PGO implementation, it doesn't whether the non-discarded b.o __profd_foo has garbage value.

jsji added a comment.Sep 24 2021, 1:00 PM

The description is still unclear.

Say a.o has a weak foo, __profc_foo, __profd_foo, b.o has a weak foo, __profc_foo, __profd_foo.
The linker picks the definitions from a.o. In the PGO implementation, it doesn't whether the non-discarded b.o __profd_foo has garbage value.

We have 3 sets weak symbols here: weak_func, profc_weak_foo, profd_weak_func. We can't ensure that binder always choose 3 of them from same object.

The description is still unclear.

Say a.o has a weak foo, __profc_foo, __profd_foo, b.o has a weak foo, __profc_foo, __profd_foo.
The linker picks the definitions from a.o. In the PGO implementation, it doesn't whether the non-discarded b.o __profd_foo has garbage value.

We have 3 sets weak symbols here: weak_func, profc_weak_foo, profd_weak_func. We can't ensure that binder always choose 3 of them from same object.

Oh, this is a serious enough bug. I am not sure working around the bug by switching to PrivateLinkage will help.
The values will be wrong as well.

jsji added a comment.Sep 24 2021, 1:47 PM

We have verified all the profile counters in SPEC , all are OK.

jsji added a comment.EditedSep 24 2021, 2:12 PM

Using the PrivateLinkage, all the profc/profd for each weak symbol will be *local* to objects, and all kept in the csect,
so we won't have problem. Although only one of the counters will be used, all the pointer in the profd is correct.

The downside is that we won't be able to discard the duplicated counters and profile data,
but those can not be discarded even if we keep the weak linkage,
due to the binder limitation of not discarding only part of the csect either .

jsji added a comment.Sep 27 2021, 6:53 AM

@MaskRay Do you have further comments or alternative solutions? Thanks.

The description isn't clear why the previous weak symbol usage does not work. Have you verified that it does not work?

jsji edited the summary of this revision. (Show Details)Sep 27 2021, 10:27 AM
jsji edited the summary of this revision. (Show Details)
jsji edited the summary of this revision. (Show Details)Sep 27 2021, 10:36 AM
jsji edited the summary of this revision. (Show Details)
jsji added a comment.Sep 27 2021, 10:38 AM

The description isn't clear why the previous weak symbol usage does not work. Have you verified that it does not work?

Yes, we started with the weak symbols for profc/profd first, and tested it. We met problems, and we spent time investigate why they are not working.

I have update the description to include most of the details. Let me know if it is still unclear.

jsji edited the summary of this revision. (Show Details)Sep 27 2021, 10:39 AM
jsji edited the summary of this revision. (Show Details)
jsji edited the summary of this revision. (Show Details)Sep 27 2021, 10:42 AM
MaskRay added a comment.EditedSep 27 2021, 10:42 AM

Sorry, I still don't understand this response:

We have 3 sets weak symbols here: weak_func, profc_weak_foo, profd_weak_func. We can't ensure that binder always choose 3 of them from same object.

So this part of the description isn't clear to me:

However, on AIX, the current binder can NOT discard the weak symbols if we put all of them into the same csect, as binder can NOT discard only part of a csect.

This creates a unique challenge for using those symbols to calculate some relative offset.

Say a.o has a weak foo, __profc_foo, __profd_foo. The __profd_foo references the __profc_foo

b.o has another set of weak foo, __profc_foo, __profd_foo. The __profd_foo references the __profc_foo.

What's the linker (called binder?) behavior? "We can't ensure that binder always choose 3 of them from same object."?
But is that a problem? For linkonce_odr functions, they should have identical semantics in a.o and b.o.
(There can be ODR issues if you use advanced PGO techniques like value profiling, but I assume that it does not work and will be difficult to make work and more importantly, it is also of lower value anyway.)

So the linker picking the a.o foo and the b.o __profc_foo/__profd_foo isn't a poblem.

I understand that PrivateLinkage can avoid the issues but you need to justify the complexity in InstrProfiling.cpp with the value.
Currently I don't think using weak symbol is a big problem, especially if the AIX linker will be fixed anyway in the future.

I expect that the description includes the justification.

jsji edited the summary of this revision. (Show Details)Sep 27 2021, 10:43 AM
jsji edited the summary of this revision. (Show Details)
jsji edited the summary of this revision. (Show Details)Sep 27 2021, 11:15 AM
jsji added a comment.EditedSep 27 2021, 1:25 PM

OK, I may not describe the example clearly.

Let me use example code with offsets and llvm internal calculations as the example, so that you might be clearer.

Let us say we have 2 objects , which both have weak function foo (and some non-weak functions).

[2054]  m   0x110000ec0     .data     1  unamex                    __llvm_prf_cnts
[2056]  m   0x110000ef8     .data     1    weak                    __profc__foo
[1389]  m   0x110001540     .data     1  unamex                    __llvm_prf_cnts
[1391]  m   0x1100015a8     .data     1    weak                    __profc__foo                <====== chosen by binder

[2290]  m   0x110001ca8     .data     1  unamex                    __llvm_prf_data
[2292]  m   0x110001ce0     .data     1    weak                    __profd__foo
[1633]  m   0x110003678     .data     1  unamex                    __llvm_prf_data
[1635]  m   0x1100036b0     .data     1    weak                    __profd__foo                <=======  chosen by binder

In binding, binder chose 0x1100015a8 for __profc__foo, and 0x1100036b0 for __profd__foo.
(Not the 1st in the csect, but 1st seen by binder)

At the beginning CountersDelta is 0xfffffffffffff218. ( 0x110000ec0 [2054] - 0x110001ca8 [2290] ) .

The first record is for non-weak function, so we are OK.

Then we move forward to read foo counters,
CountersDelta is then updated to 0xfffffffffffff1e0 ( 0xfffffffffffff218 - sizeof(Data))
CounterPtr is now 0x1100015a8 - 0x1100036b0 = 0xffffffffffffdef8

CountersOffset is now 0x1ffffffffffffda3 ( CounterPtr - CountersDelta / sizeof(unit64_6))

CountersOffset > MaxNumCounters !

If we have more weak symbols, the symbols chosen by binder may be interleaving in the csects,
we won't be able to calculate the CountersOffset correctly for all of them.

If we use private linkage, we will use 0x11000ef8 and 0x110001ce0 to calculate the counterPtr, then we get everything as expected:

CounterPtr: 0xfffffffffffff218
CountersDelta: 0xfffffffffffff218
CountersOffset: 0x0
jsji edited the summary of this revision. (Show Details)Sep 27 2021, 1:26 PM

However, on AIX, the current binder can NOT discard the weak symbols if we put all of them into the same csect, as binder can NOT discard only part of a csect.

"as binder can NOT discard only part of a csect." this part caused confusion to me...

CountersOffset is now 0x1ffffffffffffda3 ( CounterPtr - CountersDelta / sizeof(unit64_6))

CountersOffset > MaxNumCounters !

It is true that there are two pairs of weak profc/profd, but the first profc has a zero value, so I assume that it is fine?

The second pair of profc/profd has correct value.

I haven't verified but it is possible that PGO picks the first pair (zero value) and have an incorrect counter for foo.


Does your example imply that the weak symbol __profc_foo has 2 definitions with conflicting values?
I presume it can cause more fallout.
Any idea whether the linker bug will be fixed?

jsji added a comment.EditedSep 27 2021, 2:07 PM

"as binder can NOT discard only part of a csect." this part caused confusion to me...

OK, yeah, csect is special to XCOFF, so this is not the same concept of sections in ELF.

I haven't verified but it is possible that PGO picks the first pair (zero value) and have an incorrect counter for foo.

As I mentioned before, we verified (with llvm-profdata show --all-functions --counts) on all SPEC benchmarks that all the counters are good with private linkage.


Does your example imply that the weak symbol __profc_foo has 2 definitions with conflicting values?

No, they are identical weak functions. Those offsets are actually from real example using std::stringbuf,
the foo is actually _ZNSt3__115basic_stringbufIcNS_11char_traitsIcEENS_9allocatorIcEEEC2Ej.

Any idea whether the linker bug will be fixed?

Unfortunately, I just double confirmed with AIX linker (binder) owner,
there is NO plan in near future of lifting the limitation that binder can NOT discard only part of a csect.

MaskRay added a comment.EditedSep 27 2021, 3:02 PM

"as binder can NOT discard only part of a csect." this part caused confusion to me...

OK, yeah, csect is special to XCOFF, so this is not the same concept of sections in ELF.

"as binder can NOT discard a subset of a csect."?

I haven't verified but it is possible that PGO picks the first pair (zero value) and have an incorrect counter for foo.

As I mentioned before, we verified (with llvm-profdata show --all-functions --counts) on all SPEC benchmarks that all the counters are good with private linkage.


Does your example imply that the weak symbol __profc_foo has 2 definitions with conflicting values?

No, they are identical weak functions. Those offsets are actually from real example using std::stringbuf,
the foo is actually _ZNSt3__115basic_stringbufIcNS_11char_traitsIcEENS_9allocatorIcEEEC2Ej.

Any idea whether the linker bug will be fixed?

Unfortunately, I just double confirmed with AIX linker (binder) owner,
there is NO plan in near future of lifting the limitation that binder can NOT discard only part of a csect.

The linker (binder) doesn't have to discard a subset of csect, but it should ensure there are not two non-local symbols with the same name.
The latter is what I request.

Without section deduplication we just end up with PDP-11 a.out extended with .weak directive, it isn't too bad. Mach-O uses this model as well.
(PE-COFF/ELF properly deduplicates in the absence of section based GC.)
Having two non-local symbols could cause serious symbol resolution problems.

jsji edited the summary of this revision. (Show Details)Sep 27 2021, 5:39 PM
jsji added a comment.EditedSep 27 2021, 5:46 PM

"as binder can NOT discard a subset of a csect."?

Thanks, this does looks better!

The linker (binder) doesn't have to discard a subset of csect, but it should ensure there are not two non-local symbols with the same name.
The latter is what I request.

Without section deduplication we just end up with PDP-11 a.out extended with .weak directive, it isn't too bad. Mach-O uses this model as well.
(PE-COFF/ELF properly deduplicates in the absence of section based GC.)
Having two non-local symbols could cause serious symbol resolution problems.

That I can definitely forward to AIX binder team, but we can't expect they implement it right away for us.

But yeah, once they can fix it, we can update to revert back to weak linkages.

MaskRay added inline comments.Sep 28 2021, 1:25 PM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
865

"Due to the current limitation of binder,"

Replace "current" with "as of version XX or date"

jsji updated this revision to Diff 375688.Sep 28 2021, 1:40 PM
jsji marked an inline comment as done.

Address comments.

jsji marked an inline comment as done.Sep 28 2021, 1:40 PM
MaskRay added a comment.EditedSep 28 2021, 2:01 PM

I still think the description can be rephrased to be shorter and clearer, and am not very sure this workaround should be added.
But I will take a vacation and be back after one week and don't want to appear that I am blocking this change.

So LGTM once you decrease the number of if (TT.isOSBinFormatXCOFF()) { from two to one.
If you track where Linkage and Visibility are updated it should be clear one place is not needed.

Also ideally the driver change should be in a separate patch.

jsji retitled this revision from [AIX] Enable PGO without LTO to [AIX] Change the linkage of profiling counter/data to be private.Sep 28 2021, 2:41 PM
jsji edited the summary of this revision. (Show Details)
jsji edited the summary of this revision. (Show Details)
jsji edited the summary of this revision. (Show Details)
jsji updated this revision to Diff 375709.Sep 28 2021, 2:57 PM

Split the Driver changes to another patch, also fixed the duplicate code.

jsji updated this revision to Diff 375716.Sep 28 2021, 3:07 PM

Remove empty line.

MaskRay accepted this revision.Sep 28 2021, 3:18 PM
This revision is now accepted and ready to land.Sep 28 2021, 3:18 PM
jsji added a comment.Sep 28 2021, 3:27 PM

@MaskRay Thank you so much for your review and discussion!

This revision was landed with ongoing or failed builds.Sep 28 2021, 5:47 PM
This revision was automatically updated to reflect the committed changes.