Page MenuHomePhabricator

Mark openmp internal global dso_local
Needs ReviewPublic

Authored by kamleshbhalui on Aug 19 2021, 4:11 PM.

Details

Summary

Starting from clang-12 openmp started generating internal global variable with got relocation even when static relocation enabled.
In clang-11 shouldAssumeDSOLocal was assuming it dso_local based on static relocation model.
Since shouldAssumeDSOLocal is cleaned up now for respecting dso_local generated from frontend, marking openmp internal globals as dso_local.

Diff Detail

Event Timeline

kamleshbhalui created this revision.Aug 19 2021, 4:11 PM
kamleshbhalui requested review of this revision.Aug 19 2021, 4:11 PM
Herald added a project: Restricted Project. · View Herald Transcript

clang formatted

ABataev added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
2186–2190

I would not do it here since the linkage is changing in some places. Better to make this kind of change explicitly where required + need to add tests.

updated test and make changes local to auto generated global vars for lock.

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2021, 3:12 AM

assume dso local if relocation model static

assume dso local if relocation model static

I think these are two separate fixes. TargetMachine::shouldAssumeDSOLocal() change might make sense,
but as the comment there notes, this really should be fixed in in CGOpenMPRuntime::getOrCreateInternalVariable() itself (is that not the right default?),
and in it's every user that incorrectly unmarks said global as being `dso_local.

assume dso local if relocation model static

I think these are two separate fixes. TargetMachine::shouldAssumeDSOLocal() change might make sense,
but as the comment there notes, this really should be fixed in in CGOpenMPRuntime::getOrCreateInternalVariable() itself (is that not the right default?),
and in it's every user that incorrectly unmarks said global as being `dso_local.

Please see this revision https://reviews.llvm.org/differential/diff/367953/
Should I get back to this?

MaskRay requested changes to this revision.EditedAug 21 2021, 9:15 AM

If you read the comment in TargetMachine::shouldAssumeDSOLocal: this is the wrong direction. dso_local is assumed to be marked by the frontend.

Direct accesses and GOT accesses are trade-offs. Direct accesses are not always preferred. The MachO special case is an unfortunate case for their xnu kernel, not a good example here.

This revision now requires changes to proceed.Aug 21 2021, 9:15 AM

updated test and make changes local to auto generated global vars for lock.

MaskRay added inline comments.Aug 23 2021, 8:31 PM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
2197

Can be variable be preemptible on ELF? (i.e. default visibility non-local linkage) If yes, it cannot be marked dso_local in that case.

2197

in that case

> When -fpic is used.

-fpic -shared may give a linker error.

updated test and make changes local to auto generated global vars for lock.

I do not understand why the most original diff here was wrong.
Is is the wrong default for getOrCreateInternalVariable() to default to dso_local?

updated test and make changes local to auto generated global vars for lock.

I do not understand why the most original diff here was wrong.
Is is the wrong default for getOrCreateInternalVariable() to default to dso_local?

Need to check that the linkage/visibility of the created vars is not changing in place. If it is not changing, then it is good to go. Otherwise, need to add some another function/extra parameter with linkage/visibility settings.

updated test and make changes local to auto generated global vars for lock.

I do not understand why the most original diff here was wrong.
Is is the wrong default for getOrCreateInternalVariable() to default to dso_local?

Need to check that the linkage/visibility of the created vars is not changing in place. If it is not changing, then it is good to go. Otherwise, need to add some another function/extra parameter with linkage/visibility settings.

I'm having trouble parsing the comment, but i don't really see why the fact that some users may be changing linkage
should deter us from having the right default. If some users change linkage, then regardless of where this is dealt with,
the users will have to be adjusted, just a different set of them. It seems to me that fixing the ones that already mess with linkage is the right choice,
not fixing the ones that *don't* mess with linkage.

updated test and make changes local to auto generated global vars for lock.

I do not understand why the most original diff here was wrong.
Is is the wrong default for getOrCreateInternalVariable() to default to dso_local?

Need to check that the linkage/visibility of the created vars is not changing in place. If it is not changing, then it is good to go. Otherwise, need to add some another function/extra parameter with linkage/visibility settings.

I'm having trouble parsing the comment, but i don't really see why the fact that some users may be changing linkage
should deter us from having the right default.

Yes, but the fix should not introduce bugs, if possible. Need to check the users and make them work correctly too, that's why need to check the users and if some of them change the linkage/visibility, then need to add a new function or add an extra parameter to track whether the need to add dso_local attribute.

If some users change linkage, then regardless of where this is dealt with, the users will have to be adjusted, just a different set of them. It seems to me that fixing the ones that already mess with linkage is the right choice, not fixing the ones that *don't* mess with linkage.

That's all I'm asking, actually. Or just keep the original code (if no time to go through the users) and fix dso_local only where required.

If you read the comment in TargetMachine::shouldAssumeDSOLocal: this is the wrong direction. dso_local is assumed to be marked by the frontend.

Direct accesses and GOT accesses are trade-offs. Direct accesses are not always preferred. The MachO special case is an unfortunate case for their xnu kernel, not a good example here.

@MaskRay I would like to know more about these trade-offs details, any supporting documents will do.
Considering below testcase, can you shed some light how code generated by llc-12 is better than llc-11 for given testcase?
https://godbolt.org/z/x9xojWb58
And FYI this testcase does not work when build as Linux Kernel Module. LKM loader throw error("Unknown rela relocation: 42")?

MaskRay requested changes to this revision.Sep 1 2021, 10:05 AM

If you read the comment in TargetMachine::shouldAssumeDSOLocal: this is the wrong direction. dso_local is assumed to be marked by the frontend.

Direct accesses and GOT accesses are trade-offs. Direct accesses are not always preferred. The MachO special case is an unfortunate case for their xnu kernel, not a good example here.

@MaskRay I would like to know more about these trade-offs details, any supporting documents will do.
Considering below testcase, can you shed some light how code generated by llc-12 is better than llc-11 for given testcase?
https://godbolt.org/z/x9xojWb58

This is a very minor issue. First, global variable access is rarely a performance bottleneck.
Second, if the symbol turns out to be non-preemptible (which implies that it is defined in the component), the R_X86_64_REX_GOTPCRELX GOT indirection can be optimized out.
The only minor issue is slightly longer code sequence.

And FYI this testcase does not work when build as Linux Kernel Module. LKM loader throw error("Unknown rela relocation: 42")?

This is a kernel issue. Please mention the justification (is it related to OpenMP?) in the first place.
The kernel can be compiled in -fpie mode. In this mode, taking the address of a default-visibility undefined symbol uses R_X86_64_REX_GOTPCRELX.
So the kernel should support R_X86_64_REX_GOTPCRELX anyway.


If we think the optimization is meaningful:

Depending on the property of .gomp_critical_user_.atomic_reduction.var different DSOLocal strategies should be used.
If it is TU-local, make it local linkage. If it is linked image local, make it hidden visibility.
If it may be defined in a shared object and shared with other shared objects or the main executable, (not so sure because such symbol interposition does not work in other binary formats), use dso_preemptable as is.

I believe the current forced dso_local is definitely incorrect because it may break -fpic -shared links.

This revision now requires changes to proceed.Sep 1 2021, 10:05 AM

If you read the comment in TargetMachine::shouldAssumeDSOLocal: this is the wrong direction. dso_local is assumed to be marked by the frontend.

Direct accesses and GOT accesses are trade-offs. Direct accesses are not always preferred. The MachO special case is an unfortunate case for their xnu kernel, not a good example here.

@MaskRay I would like to know more about these trade-offs details, any supporting documents will do.
Considering below testcase, can you shed some light how code generated by llc-12 is better than llc-11 for given testcase?
https://godbolt.org/z/x9xojWb58

This is a very minor issue. First, global variable access is rarely a performance bottleneck.
Second, if the symbol turns out to be non-preemptible (which implies that it is defined in the component), the R_X86_64_REX_GOTPCRELX GOT indirection can be optimized out.
The only minor issue is slightly longer code sequence.

And FYI this testcase does not work when build as Linux Kernel Module. LKM loader throw error("Unknown rela relocation: 42")?

This is a kernel issue. Please mention the justification (is it related to OpenMP?) in the first place.
The kernel can be compiled in -fpie mode. In this mode, taking the address of a default-visibility undefined symbol uses R_X86_64_REX_GOTPCRELX.
So the kernel should support R_X86_64_REX_GOTPCRELX anyway.


If we think the optimization is meaningful:

Depending on the property of .gomp_critical_user_.atomic_reduction.var different DSOLocal strategies should be used.
If it is TU-local, make it local linkage. If it is linked image local, make it hidden visibility.
If it may be defined in a shared object and shared with other shared objects or the main executable, (not so sure because such symbol interposition does not work in other binary formats), use dso_preemptable as is.

I believe the current forced dso_local is definitely incorrect because it may break -fpic -shared links.

@kamleshbhalui ,make the changes accordingly that honour -fpic and don't mark dsolocal in this case.

mark dso_local only when non-pic