Page MenuHomePhabricator

[OpenMP] Add TLS-based implementation for threadprivate directive
ClosedPublic

Authored by sfantao on Jun 25 2015, 4:54 PM.

Details

Summary

Using TLS to implement threadprivate directive has shown 10x performance improvements if compared with the current cache-based implementation in PPC machines.

This patch introduces a TLS-based implementation that is currently activated only for PPC machines. It also creates CGOpenMPRuntimes.cpp, meant to extend the OpenMP codegeneration class in order to drive optimized implementations for different targets.

This patch complements the OpenMP runtime patch under review in http://lists.cs.uiuc.edu/pipermail/openmp-commits/2015-June/000347.html

Diff Detail

Event Timeline

sfantao updated this revision to Diff 28515.Jun 25 2015, 4:54 PM
sfantao retitled this revision from to [OpenMP] Add TLS-based implementation for threadprivate directive.
sfantao updated this object.
sfantao edited the test plan for this revision. (Show Details)
sfantao added reviewers: ABataev, hfinkel, rjmccall.
sfantao added a subscriber: Unknown Object (MLST).
ABataev edited edge metadata.Jun 25 2015, 11:03 PM

Samuel,
Thanks for the patch!

  1. Why you don't want to reuse an existing clang infrastructure for TLS support? I think it would be much easier and portable solution to reuse an existing code, rather than inventing a new one.
  2. I don't like the idea that this feature is enabled only for a specific platform. It requires a new class, some complex code changes etc. Instead I prefer to have a driver/frontend option that makes the compiler to switch codegen from cached version to TLS. If you want to set this feature on by default for the PowerPC you can set this option in frontend if current architecture is PowerPC. In this case we don't need an additional CGOpenMPRuntime-based class.
lib/CodeGen/CGOpenMPRuntime.cpp
1022

Use CtorCGF.VoidPtrTy rather than CtorCGF.ConvertTypeForMem(CGM.getContext().VoidPtrTy)

1052

DtorCGF.VoidPtrTy

lib/CodeGen/CGOpenMPRuntime.h
355 ↗(On Diff #28515)

Should start with an upper case letter, i.e. UseTLSForThreadPrivate

sfantao updated this revision to Diff 28679.EditedJun 29 2015, 8:59 AM
sfantao edited edge metadata.

Use -fopenmp-use-tls option to enable TLS use in OpenMP code generation. The option is set by default for PPC machines.

Hi Alexey,

Thanks for the review!

  1. Why you don't want to reuse an existing clang infrastructure for TLS support? I think it would be much easier and portable solution to reuse an existing code, rather than inventing a new one.

I am afraid I don't understand exactly what you mean by TLS infrastructure. In my understanding, creating a TLS global variable will make each thread to have different storage for the variable but will not cause the Ctors/Dtors to run when a thread is spawn by the OpenMP runtime. Therefore the OpenMP runtime has to be aware of which Ctors/Dtors to use to a given variable even if it is declared as TLS so it can run them when it creates a thread. So from a codegen viewpoint, the step of registering the Dtors and Ctors is still required, only the cache creation and look-up can be skipped. That's the reason why I am reusing the existing code to do the registration.

It is possible, however, that I am missing some feature in clang that would enable the execution of the Cotrs/Dtors when threads are spawn without intervention of any runtime library. If so, please give me a pointer and I will update the patch to use that instead.

  1. I don't like the idea that this feature is enabled only for a specific platform. It requires a new class, some complex code changes etc. Instead I prefer to have a driver/frontend option that makes the compiler to switch codegen from cached version to TLS. If you want to set this feature on by default for the PowerPC you can set this option in frontend if current architecture is PowerPC. In this case we don't need an additional CGOpenMPRuntime-based class.

Done! Let me know if the option name okay.

lib/CodeGen/CGOpenMPRuntime.cpp
1022

Done!

1052

Done!

lib/CodeGen/CGOpenMPRuntime.h
355 ↗(On Diff #28515)

Done!

  1. Why you don't want to reuse an existing clang infrastructure for TLS support? I think it would be much easier and portable solution to reuse an existing code, rather than inventing a new one.

I am afraid I don't understand exactly what you mean by TLS infrastructure. In my understanding, creating a TLS global variable will make each thread to have different storage for the variable but will not cause the Ctors/Dtors to run when a thread is spawn by the OpenMP runtime. Therefore the OpenMP runtime has to be aware of which Ctors/Dtors to use to a given variable even if it is declared as TLS so it can run them when it creates a thread. So from a codegen viewpoint, the step of registering the Dtors and Ctors is still required, only the cache creation and look-up can be skipped. That's the reason why I am reusing the existing code to do the registration.

It is possible, however, that I am missing some feature in clang that would enable the execution of the Cotrs/Dtors when threads are spawn without intervention of any runtime library. If so, please give me a pointer and I will update the patch to use that instead.

Samuel, look at file Decl.cpp, VarDecl::getTLSKind().
I think it is enough to modify this code a little bit:

  1. Why you don't want to reuse an existing clang infrastructure for TLS support? I think it would be much easier and portable solution to reuse an existing code, rather than inventing a new one.

I am afraid I don't understand exactly what you mean by TLS infrastructure. In my understanding, creating a TLS global variable will make each thread to have different storage for the variable but will not cause the Ctors/Dtors to run when a thread is spawn by the OpenMP runtime. Therefore the OpenMP runtime has to be aware of which Ctors/Dtors to use to a given variable even if it is declared as TLS so it can run them when it creates a thread. So from a codegen viewpoint, the step of registering the Dtors and Ctors is still required, only the cache creation and look-up can be skipped. That's the reason why I am reusing the existing code to do the registration.

It is possible, however, that I am missing some feature in clang that would enable the execution of the Cotrs/Dtors when threads are spawn without intervention of any runtime library. If so, please give me a pointer and I will update the patch to use that instead.

Look at file Decl.cpp, method VarDecl::getTLSKind().
I think it would be enough to modify this method to enable codegen for threadprivates just like for TLS:

case TSCS_unspecified:
  if (!hasAttr<ThreadAttr>() && (!getASTContext().getLangOpts().OpenMPUseTLS || !hasAttr<OMPThreadPrivateAttr>))
    return TLS_None;
  return getASTContext().getLangOpts().isCompatibleWithMSVC(
             LangOptions::MSVC2015)
             ? TLS_Dynamic
             : TLS_Static;

Plus you have to disable cached codegen for threadprivates if getASTContext().getLangOpts().OpenMPUseTLS is true

  1. Why you don't want to reuse an existing clang infrastructure for TLS support? I think it would be much easier and portable solution to reuse an existing code, rather than inventing a new one.

I am afraid I don't understand exactly what you mean by TLS infrastructure. In my understanding, creating a TLS global variable will make each thread to have different storage for the variable but will not cause the Ctors/Dtors to run when a thread is spawn by the OpenMP runtime. Therefore the OpenMP runtime has to be aware of which Ctors/Dtors to use to a given variable even if it is declared as TLS so it can run them when it creates a thread. So from a codegen viewpoint, the step of registering the Dtors and Ctors is still required, only the cache creation and look-up can be skipped. That's the reason why I am reusing the existing code to do the registration.

It is possible, however, that I am missing some feature in clang that would enable the execution of the Cotrs/Dtors when threads are spawn without intervention of any runtime library. If so, please give me a pointer and I will update the patch to use that instead.

Look at file Decl.cpp, method VarDecl::getTLSKind().
I think it would be enough to modify this method to enable codegen for threadprivates just like for TLS:

case TSCS_unspecified:
  if (!hasAttr<ThreadAttr>() && (!getASTContext().getLangOpts().OpenMPUseTLS || !hasAttr<OMPThreadPrivateAttr>))
    return TLS_None;
  return getASTContext().getLangOpts().isCompatibleWithMSVC(
             LangOptions::MSVC2015)
             ? TLS_Dynamic
             : TLS_Static;

Plus you have to disable cached codegen for threadprivates if getASTContext().getLangOpts().OpenMPUseTLS is true

Alexey, I tried to have the TLS global be created based on the attributes set in the way you suggested. The problem I see is that the attribute will be checked before SEMA takes places, so it will requires changes in the OpenMP threadprivate SEMA and also in the SEMA for thread _local. The latter requires understanding that some declaration is an OpenMP thread_local, not a user specified thread_local.

To give you an example, thread local variables can only have trivial Ctors/Dtors but OpenMP threadprivate variables can also have non-trivial ones. So if I add that attribute check in TLSKind() that will cause the thread local SEMA to fail.

I also see that the emission of the globals sometimes happen before the ThreadPrivate attribute is set. So, doing this would require some adjustments there, i.e. have the TLS property of the global being set during the threadprivate SEMA.

In my view, taking this approach seems a little more disruptive, but I am happy to do that if you think that is the right thing.

Let me know your thoughts.

Thanks!
Samuel

sfantao updated this revision to Diff 29392.Jul 9 2015, 2:37 PM

Leverage the TLS_dynamic implementation for OpenMP threadprivate and fix the regression tests accordingly.

Sema was changed to allow thread local vars to be declared thread private If TLS support is enabled for OpenMP codegen. Using thread locals in threadprivate is only a problem for the no-tls implementation.

The use of tls for OpenMP thread private is set to 'true' by default for PPC targets, as it has shown to be about 10x faster than the no-tls implementation.

ABataev added inline comments.Jul 9 2015, 9:42 PM
lib/AST/Decl.cpp
1818

thread_local variable cannot be threadprivate, restore the original code

lib/Driver/Tools.cpp
3953–3971

Change it to :
if (!Args.hasFlag(options::OPT_fopenmp_use_tls,

                       options::OPT_fnoopenmp_use_tls,
                       getToolChain().getArch()==llvm::Triple::ppc || getToolChain().getArch()==llvm::Triple::ppc64 || getToolChain().getArch()==llvm::Triple::ppc64le))
CmdArgs.push_back("-fnoopenmp-use-tls");
sfantao updated this revision to Diff 29443.Jul 10 2015, 7:17 AM

Restore error for when thread_local is used in threadprivate directives even if TLS can be used for OpenMP codegen.

Refactor how default value for the -fnoopenmp-use-tls is obtained from the target architecture.

sfantao added inline comments.Jul 10 2015, 7:22 AM
lib/AST/Decl.cpp
1818

Done!

lib/Driver/Tools.cpp
3953–3971

Done!

ABataev accepted this revision.Jul 12 2015, 8:33 PM
ABataev edited edge metadata.
This revision is now accepted and ready to land.Jul 12 2015, 8:33 PM

The 29443 version of the patch causes the following regression on x86_64-apple-darwin14...

FAIL: Clang :: OpenMP/threadprivate_codegen.cpp (4453 of 8377)
******************** TEST 'Clang :: OpenMP/threadprivate_codegen.cpp' FAILED ********************
Script:
--
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -verify -fopenmp -fnoopenmp-use-tls -DBODY -triple x86_64-unknown-unknown -x c++ -emit-llvm /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp -fexceptions -fcxx-exceptions -o - | /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/FileCheck /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -fopenmp -fnoopenmp-use-tls -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/tools/clang/test/OpenMP/Output/threadprivate_codegen.cpp.tmp /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -fopenmp -fnoopenmp-use-tls -DBODY -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -g -std=c++11 -include-pch /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/tools/clang/test/OpenMP/Output/threadprivate_codegen.cpp.tmp -verify /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp -emit-llvm -o - | /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/FileCheck --check-prefix=CHECK-DEBUG /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -verify -fopenmp -DBODY -triple x86_64-unknown-unknown -x c++ -emit-llvm /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp -fexceptions -fcxx-exceptions -o - | /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/FileCheck /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp --check-prefix=CHECK-TLS
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/tools/clang/test/OpenMP/Output/threadprivate_codegen.cpp.tmp /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -fopenmp -DBODY -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -g -std=c++11 -include-pch /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/tools/clang/test/OpenMP/Output/threadprivate_codegen.cpp.tmp -verify /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp -emit-llvm -o - | /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/FileCheck --check-prefix=CHECK-TLS /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp
--
Exit Code: 1

Command Output (stderr):
--
/sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp:362:15: error: expected string not found in input
// CHECK-TLS: [[INIT_LABEL]]:
              ^
<stdin>:73:1: note: scanning from here
; <label>:4 ; preds = %0
^
<stdin>:73:1: note: with variable "INIT_LABEL" equal to "4"
; <label>:4 ; preds = %0
^
<stdin>:73:10: note: possible intended match here
; <label>:4 ; preds = %0
         ^

--

********************
sfantao updated this revision to Diff 29582.Jul 13 2015, 8:56 AM
sfantao edited edge metadata.

Minor change in the threadprivate_codegen regression tests to address issue detected for x86-darwin14.

The 29443 version of the patch causes the following regression on x86_64-apple-darwin14...

FAIL: Clang :: OpenMP/threadprivate_codegen.cpp (4453 of 8377)
******************** TEST 'Clang :: OpenMP/threadprivate_codegen.cpp' FAILED ********************
Script:
--
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -verify -fopenmp -fnoopenmp-use-tls -DBODY -triple x86_64-unknown-unknown -x c++ -emit-llvm /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp -fexceptions -fcxx-exceptions -o - | /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/FileCheck /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -fopenmp -fnoopenmp-use-tls -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/tools/clang/test/OpenMP/Output/threadprivate_codegen.cpp.tmp /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -fopenmp -fnoopenmp-use-tls -DBODY -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -g -std=c++11 -include-pch /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/tools/clang/test/OpenMP/Output/threadprivate_codegen.cpp.tmp -verify /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp -emit-llvm -o - | /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/FileCheck --check-prefix=CHECK-DEBUG /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -verify -fopenmp -DBODY -triple x86_64-unknown-unknown -x c++ -emit-llvm /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp -fexceptions -fcxx-exceptions -o - | /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/FileCheck /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp --check-prefix=CHECK-TLS
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/tools/clang/test/OpenMP/Output/threadprivate_codegen.cpp.tmp /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp
gtimeout 1m  /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/clang -cc1 -internal-isystem /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/bin/../lib/clang/3.7.0/include -nostdsysteminc -fopenmp -DBODY -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -g -std=c++11 -include-pch /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/tools/clang/test/OpenMP/Output/threadprivate_codegen.cpp.tmp -verify /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp -emit-llvm -o - | /sw/src/fink.build/llvm37-3.7.0-1/build/stage3/./bin/FileCheck --check-prefix=CHECK-TLS /sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp
--
Exit Code: 1

Command Output (stderr):
--
/sw/src/fink.build/llvm37-3.7.0-1/cfe-3.7.0.src/test/OpenMP/threadprivate_codegen.cpp:362:15: error: expected string not found in input
// CHECK-TLS: [[INIT_LABEL]]:
              ^
<stdin>:73:1: note: scanning from here
; <label>:4 ; preds = %0
^
<stdin>:73:1: note: with variable "INIT_LABEL" equal to "4"
; <label>:4 ; preds = %0
^
<stdin>:73:10: note: possible intended match here
; <label>:4 ; preds = %0
         ^

--

********************

Hi jhowarth,

Can you please verify if the last revision fixes the problem for you?

Thanks!
Samuel

I can confirm that the ID 29582 version of the proposed patch eliminates the regression on x86_64-apple-darwin14.

sfantao closed this revision.Jul 13 2015, 3:55 PM

Committed in r242080.

Thanks!
Samuel