This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF] Add time-trace to ELF LLD (1/2)
ClosedPublic

Authored by russell.gallop on Dec 5 2019, 5:11 AM.

Details

Summary

Following on from RFC here: https://reviews.llvm.org/D69043

This patch extends the TimeProfiler to support multiple threads (for ThinLTO) by making TimeTraceProfilerInstance thread local. timeTraceProfilerFinishThread() moves the thread local instance to a global vector of instances and timeTraceProfilerWrite() writes recorded trace data from all instances. This reduces locking over the initial implementation presented in the RFC and is much quicker even on relatively modest 6 core machine.

In the generated trace, threads are identified based on their thread ids. Totals are reported with artificial thread ids higher than the real ones.

Diff Detail

Event Timeline

russell.gallop created this revision.Dec 5 2019, 5:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2019, 5:11 AM
anton-afanasyev accepted this revision.Dec 9 2019, 1:01 PM

LGTM, thanks!

This revision is now accepted and ready to land.Dec 9 2019, 1:01 PM

Btw, is it planned to add test for multithreading?

russell.gallop added a comment.EditedDec 10 2019, 2:53 AM

Btw, is it planned to add test for multithreading?

Thanks for the review.

There are tests for multi-threaded ThinLto using this within LLD in the next review (https://reviews.llvm.org/D71060).

Were you thinking of this, or of unit tests?

Btw, is it planned to add test for multithreading?

Thanks for the review.

There are tests for multi-threaded ThinLto using this within LLD in the next review (https://reviews.llvm.org/D71060).

Were you thinking of this, or of unit tests?

I mean the test like clang/test/Driver/check-time-trace.cpp, but for the multithreaded invocation. Yes, for instance, for ThinLTO (but I see no tests in the D71060).

I mean the test like clang/test/Driver/check-time-trace.cpp, but for the multithreaded invocation. Yes, for instance, for ThinLTO (but I see no tests in the D71060).

Ah, my mistake. They should be on D71060 but it looks like they got lost somewhere along the way! I'll add them.

ruiu added a comment.Dec 10 2019, 9:47 PM

Can you share an example output of lld with this patch?

Can you share an example output of lld with this patch?

This patch doesn't change lld, just provides support to D71060. I'll add an example to that review.

ruiu accepted this revision.Dec 11 2019, 7:34 PM

LGTM

This revision was automatically updated to reflect the committed changes.
Kai added a subscriber: Kai.Dec 16 2019, 2:24 AM

This PR requires TLS support even if LLVM_ENABLE_THREADS is set to OFF. Is this intended?

In D71059#1785471, @Kai wrote:

This PR requires TLS support even if LLVM_ENABLE_THREADS is set to OFF. Is this intended?

I hadn't considered that, thanks. I'll fix this to work with LLVM_ENABLE_THREADS=OFF.

This patch breaks the build on Apple Clang (Mac OSX 10.13.6).

$ ninja clang
[27/2870] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/TimeProfiler.cpp.o
FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/TimeProfiler.cpp.o 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++  -DGTEST_HAS_RTTI=0 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Support -I/llvm-project/llvm/lib/Support -I/usr/include/libxml2 -Iinclude -I/llvm-project/llvm/include -fPIC -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -O2 -g -DNDEBUG   -std=c++14  -fno-exceptions -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/TimeProfiler.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/TimeProfiler.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/TimeProfiler.cpp.o -c /llvm-project/llvm/lib/Support/TimeProfiler.cpp
In file included from /llvm-project/llvm/lib/Support/TimeProfiler.cpp:13:
/llvm-project/llvm/include/llvm/Support/TimeProfiler.h:17:8: error: thread-local storage is not supported for the current target
extern thread_local std::unique_ptr<TimeTraceProfiler>
       ^
/llvm-project/llvm/lib/Support/TimeProfiler.cpp:36:1: error: thread-local storage is not supported for the current target
thread_local std::unique_ptr<TimeTraceProfiler> TimeTraceProfilerInstance =
^
2 errors generated.

I guess this is related to the LLVM_ENABLE_THREADS issue mentioned above.

This patch breaks the build on Apple Clang (Mac OSX 10.13.6).

$ ninja clang
[27/2870] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/TimeProfiler.cpp.o
FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/TimeProfiler.cpp.o 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++  -DGTEST_HAS_RTTI=0 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Support -I/llvm-project/llvm/lib/Support -I/usr/include/libxml2 -Iinclude -I/llvm-project/llvm/include -fPIC -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -O2 -g -DNDEBUG   -std=c++14  -fno-exceptions -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/TimeProfiler.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/TimeProfiler.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/TimeProfiler.cpp.o -c /llvm-project/llvm/lib/Support/TimeProfiler.cpp
In file included from /llvm-project/llvm/lib/Support/TimeProfiler.cpp:13:
/llvm-project/llvm/include/llvm/Support/TimeProfiler.h:17:8: error: thread-local storage is not supported for the current target
extern thread_local std::unique_ptr<TimeTraceProfiler>
       ^
/llvm-project/llvm/lib/Support/TimeProfiler.cpp:36:1: error: thread-local storage is not supported for the current target
thread_local std::unique_ptr<TimeTraceProfiler> TimeTraceProfilerInstance =
^
2 errors generated.

I guess this is related to the LLVM_ENABLE_THREADS issue mentioned above.

I've put a fix up here: https://reviews.llvm.org/D71548 that I think should work. I don't have a system where it fails to build with LLVM_ENABLE_THREADS=OFF.

russell.gallop updated this revision to Diff 236388.EditedJan 6 2020, 11:23 AM
russell.gallop edited the summary of this revision. (Show Details)

I reverted this at the end of last year MacOS build issues.

This new version fixes some build issues on various platforms and with LLVM_ENABLE_THREADS=OFF by using LLVM_THREAD_LOCAL.

Not all minimum host toolchains [1] support thread_local (i.e. Apple Clang 6.0) so this falls back to thread on MacOS and other platforms which don't advertise it with has_feature(cxx_thread_local) (even if they support thread_local like gcc). On Windows this uses __declspec(thread) which (from vs2017) is effectively thread_local [2].

This involved rolling back use of unique_ptr to raw pointers.

In time (with higher minimum host tools) LLVM_THREAD_LOCAL could become:

#if LLVM_ENABLE_THREADS
#define LLVM_THREAD_LOCAL thread_local
#else
#define LLVM_THREAD_LOCAL
#endif

and this change should still work.

Tested with Linux with clang and gcc host compiler and Windows with MSVC2017. Also tested with LLVM_ENABLE_THREADS=OFF.

Please take another look. Thanks.

[1] https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library
[2] https://docs.microsoft.com/en-us/cpp/cpp/thread?view=vs-2017

russell.gallop reopened this revision.Jan 6 2020, 11:23 AM
This revision is now accepted and ready to land.Jan 6 2020, 11:23 AM
Quuxplusone added inline comments.Jan 6 2020, 1:09 PM
llvm/lib/Support/TimeProfiler.cpp
255–266

Can TimeTraceProfilerInstance ever actually be non-null here?
If so, are you at all worried that this unguarded access in timeTraceProfilerCleanup() might race with the guarded access in timeTraceProfilerFinishThread()?

264

Once this new'ed pointer gets into the vector, I don't see any codepath that ever calls delete on the pointers in the vector — timeTraceProfilerCleanup() just calls clear() and drops them on the floor without deleting. Am I missing something?

Fix memory leak.

russell.gallop marked 3 inline comments as done.Jan 7 2020, 3:45 AM
russell.gallop added inline comments.
llvm/lib/Support/TimeProfiler.cpp
255–266

It looks like Phabricator has lost the location of your comments since I updated the patch. This comment was made against timeTraceProfilerCleanup().

Can TimeTraceProfilerInstance ever actually be non-null here?

Yes, it is the instance used on the main thread (rather than worker threads from the thread pool) so it will be non-null when time-trace is enabled.

If so, are you at all worried that this unguarded access in timeTraceProfilerCleanup() might race with the guarded access in timeTraceProfilerFinishThread()?

Do you mean unguarded access to TimeTraceProfilerInstance? TimeTraceProfilerInstance does not need guarding as it is not shared, ThreadTimeTraceProfilerInstances does.

With LLVM_ENABLE_THREADS=ON TimeTraceProfilerInstance is thread local so no guard is required. With LLVM_ENABLE_THREADS=OFF there are no other threads calling timeTraceProfilerFinishThread() to race.

I'll add a comment to timeTraceProfilerCleanup() to try and clarify.

264

Ah, you're right. Fixed. Thanks.

russell.gallop marked an inline comment as done.

Add comment to clarify usage of timeTraceProfilerCleanup.

russell.gallop marked an inline comment as done.Jan 7 2020, 5:38 AM

Ping. Okay to land new version?

Thanks

I'm not qualified to comment on the functionality. This seems like it'll be OK with respect to thread_local on Mac OSX. No further comments from me.

llvm/lib/Support/TimeProfiler.cpp
133

const auto &E?

Add consts to references in lambda functions.

russell.gallop marked an inline comment as done.Jan 24 2020, 9:11 AM

Thanks, added const.

I'm not qualified to comment on the functionality. This seems like it'll be OK with respect to thread_local on Mac OSX. No further comments from me.

Thanks. As those were the recent changes and it was signed off before I'll recommit next week (unless anyone objects).

This revision was automatically updated to reflect the committed changes.