Download Raw Diff

Details

Reviewers

beanz
MaskRay
phosek
xinxinw1
smeenai
compnerd

Group Reviewers

Restricted Project

Commits

rG3dab7fede201: [CMake] Add clang-bolt target

Summary

This patch adds CLANG_BOLT_INSTRUMENT option that applies BOLT instrumentation
to Clang, performs a bootstrap build with the resulting Clang, merges resulting
fdata files into a single profile file, and uses it to perform BOLT optimization
on the original Clang binary.

The projects and targets used for bootstrap/profile collection are configurable via
CLANG_BOLT_INSTRUMENT_PROJECTS and CLANG_BOLT_INSTRUMENT_TARGETS.
The defaults are "llvm" and "count" respectively, which results in a profile with
~5.3B dynamically executed instructions.

The intended use of the functionality is through BOLT CMake cache file, similar
to PGO 2-stage build:

cmake <llvm-project>/llvm -C <llvm-project>/clang/cmake/caches/BOLT.cmake
ninja clang++-bolt # pulls clang-bolt

Stats with a recent checkout (clang-16), pre-built BOLT and Clang, 72vCPU/224G

CMake configure with host Clang + BOLT.cmake	1m6.592s
Instrumenting Clang with BOLT	2m50.508s
CMake configure `llvm` with instrumented Clang	5m46.364s (~5x slowdown)
CMake build `not` with instrumented Clang	0m6.456s
Merging fdata files	0m9.439s
Optimizing Clang with BOLT	0m39.201s

Building Clang:

cmake ../llvm-project/llvm -DCMAKE_C_COMPILER=... -DCMAKE_CXX_COMPILER=...
  -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=clang 
  -DLLVM_TARGETS_TO_BUILD=Native -GNinja

	Release	BOLT-optimized
cmake	0m24.016s	0m22.333s
ninja clang	5m55.692s	4m35.122s

I know it's not rigorous, but shows a ballpark figure.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Amir created this revision.Aug 30 2022, 2:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2022, 2:23 PM

Herald added subscribers: treapster, wenlei, mgorny. · View Herald Transcript

Amir requested review of this revision.Aug 30 2022, 2:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2022, 2:23 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Amir retitled this revision from [clang][BOLT] Add clangbolt target (WIP) to [clang][BOLT] Add clang-bolt target (WIP).Aug 30 2022, 2:24 PM

Amir added reviewers: beanz, MaskRay.

Herald added a subscriber: StephenFan. · View Herald TranscriptAug 30 2022, 2:25 PM

CMAKE_CURRENT_BINARY_DIR already contains bin/

Harbormaster completed remote builds in B184244: Diff 456799.Aug 30 2022, 3:14 PM

Succeeded instrumenting Clang with BOLT

Harbormaster completed remote builds in B184458: Diff 457102.Aug 31 2022, 3:50 PM

Successfully invoke the bootstrap/profiling build

Amir added reviewers: phosek, xinxinw1, smeenai, compnerd.Aug 31 2022, 9:59 PM

Harbormaster completed remote builds in B184511: Diff 457172.Aug 31 2022, 10:21 PM

This was already on my list of build system features I'd like to implement and I'm glad someone else is already looking into it, thank you! I have two high level comments about your approach.

The first one is related to the use of Clang build as the training data. I think that Clang build is both unnecessarily heavyweight, but also not particularly representative of typical workloads (most Clang users don't use it to build Clang). Ideally, we would give vendors the flexibility to supply their own training data. I'd prefer reusing the existing perf-training setup to do so. In fact, I'd imagine most vendors would likely use the same training data for both PGO and BOLT and that use case should be supported.

The second one is related to applicability. I don't think this mechanism should be limited only to Clang. Ideally, it should be possible to instrument and optimize other tools in the toolchain distribution as well; LLD is likely going to be the most common one after Clang.

Succeeded in producing optimized Clang. Switch the default profiling target
from lld to count, which produces a sufficient Clang coverage of 5.3B exec
insns (along with configure-stage Clang invocations).

Amir retitled this revision from [clang][BOLT] Add clang-bolt target (WIP) to [clang][BOLT] Add clang-bolt target.Sep 1 2022, 2:21 PM

Amir edited the summary of this revision. (Show Details)

Hi Petr, thank you for your comments!

In D132975#3763264, @phosek wrote:

This was already on my list of build system features I'd like to implement and I'm glad someone else is already looking into it, thank you! I have two high level comments about your approach.

The first one is related to the use of Clang build as the training data. I think that Clang build is both unnecessarily heavyweight, but also not particularly representative of typical workloads (most Clang users don't use it to build Clang). Ideally, we would give vendors the flexibility to supply their own training data. I'd prefer reusing the existing perf-training setup to do so. In fact, I'd imagine most vendors would likely use the same training data for both PGO and BOLT and that use case should be supported.

Agree that perf-training might be useful for vendors. I'll try to enable it in a follow-up diff.

Please note that the target for profile collection is not hardcoded to clang, it's configurable via CLANG_BOLT_INSTRUMENT_PROJECTS and CLANG_BOLT_INSTRUMENT_TARGETS. Right now it's the llvm/not tool (the smallest possible).

The second one is related to applicability. I don't think this mechanism should be limited only to Clang. Ideally, it should be possible to instrument and optimize other tools in the toolchain distribution as well; LLD is likely going to be the most common one after Clang.

I thought about it, and I think we can accommodate optimizing arbitrary targets by providing an interface to instrument specified target(s) via -DBOLT_INSTRUMENT_TARGETS. For each of the target binaries, CMake would create targets like bolt-instrument-$TARGET and bolt-optimize-$TARGET.
For bolt-instrument-$TARGET, BOLT would instrument the target binary, placing instrumented binary next to the original one (e.g. $target-bolt.inst). End users would use those instrumented binaries on representative workloads to collect the profile. For bolt-optimize-$TARGET, BOLT would post-process the profiles and create optimized binary ($target-bolt).

I appreciate your suggestions. Do you think we can move incrementally from this diff towards more general uses in follow-up diffs?

Harbormaster completed remote builds in B184681: Diff 457400.Sep 1 2022, 4:49 PM

Amir edited the summary of this revision. (Show Details)Sep 1 2022, 5:41 PM

Fix up paths

Amir edited the summary of this revision. (Show Details)Sep 1 2022, 6:11 PM

Harbormaster completed remote builds in B184732: Diff 457467.Sep 1 2022, 6:16 PM

srhines added a subscriber: srhines.Sep 1 2022, 9:42 PM

n-omer added a subscriber: n-omer.Sep 2 2022, 3:15 AM

russell.gallop added a subscriber: russell.gallop.Sep 2 2022, 8:16 AM

In D132975#3765541, @Amir wrote:

Hi Petr, thank you for your comments!

In D132975#3763264, @phosek wrote:

This was already on my list of build system features I'd like to implement and I'm glad someone else is already looking into it, thank you! I have two high level comments about your approach.

The first one is related to the use of Clang build as the training data. I think that Clang build is both unnecessarily heavyweight, but also not particularly representative of typical workloads (most Clang users don't use it to build Clang). Ideally, we would give vendors the flexibility to supply their own training data. I'd prefer reusing the existing perf-training setup to do so. In fact, I'd imagine most vendors would likely use the same training data for both PGO and BOLT and that use case should be supported.

Agree that perf-training might be useful for vendors. I'll try to enable it in a follow-up diff.

Please note that the target for profile collection is not hardcoded to clang, it's configurable via CLANG_BOLT_INSTRUMENT_PROJECTS and CLANG_BOLT_INSTRUMENT_TARGETS. Right now it's the llvm/not tool (the smallest possible).

The second one is related to applicability. I don't think this mechanism should be limited only to Clang. Ideally, it should be possible to instrument and optimize other tools in the toolchain distribution as well; LLD is likely going to be the most common one after Clang.

I thought about it, and I think we can accommodate optimizing arbitrary targets by providing an interface to instrument specified target(s) via -DBOLT_INSTRUMENT_TARGETS. For each of the target binaries, CMake would create targets like bolt-instrument-$TARGET and bolt-optimize-$TARGET.
For bolt-instrument-$TARGET, BOLT would instrument the target binary, placing instrumented binary next to the original one (e.g. $target-bolt.inst). End users would use those instrumented binaries on representative workloads to collect the profile. For bolt-optimize-$TARGET, BOLT would post-process the profiles and create optimized binary ($target-bolt).

I appreciate your suggestions. Do you think we can move incrementally from this diff towards more general uses in follow-up diffs?

That's fine with me. Do you envision replacing the use of LLVM build for training with perf-training or supporting both? I'd lean towards the former for simplicity but would be curious to hear about your use cases and plans.

clang/CMakeLists.txt
884	We could consider moving this block to a separate file which would then be included here since this file is already getting pretty large and the logic in this block is self-contained. That could be done in a follow up change though.
958	I'd like to avoid dependency on shell to make this compatible with Windows. Can we move this logic into a Python script akin to https://github.com/llvm/llvm-project/blob/607f14d9605da801034e7119c297c3f58ebce603/clang/utils/perf-training/perf-helper.py?

phosek added inline comments.Sep 2 2022, 11:58 AM

clang/CMakeLists.txt
933–940	I don't think this is sufficient in the general case, we would need to pass additional variables like `CMAKE_AR` the same way we do for the existing bootstrap logic, see https://github.com/llvm/llvm-project/blob/dc549bf0013e11e8fcccba8a8d59c3a4bb052a3b/clang/CMakeLists.txt#L825. For example, on Fuchsia builders we don't have any system-wide toolchain installation, instead we manually set all necessary `CMAKE_<TOOL>` variables for the first stage, so this call will fail for us because it won't be able to find tools like the archiver. Since handling this properly would likely duplicate a lot of the existing logic from the existing bootstrap logic, I'm wondering if we should instead try to refactor the existing logic and break it up into macros/functions which could then be reused here as well.

rrbutani added a subscriber: rrbutani.Sep 2 2022, 8:19 PM

Will there be eventually a way to build a fully optimised clang/lld with ThinLTO, PGO, and Bolt?

In D132975#3768391, @tschuett wrote:

Will there be eventually a way to build a fully optimised clang/lld with ThinLTO, PGO, and Bolt?

Short answer is likely yes.
For clang, I think this diff should be compatible with PGO, with a caveat that BOLT should be applied to stage-2 clang built with PGO, which means that BOOTSTRAP_ options should be set carefully. And for sure it's compatible with ThinLTO - this one is completely orthogonal.
For lld, I can envision a similar fully automated optimized build, but likely in a future separate diff.

Amir retitled this revision from [clang][BOLT] Add clang-bolt target to [CMake] Add clang-bolt target.Sep 6 2022, 3:39 PM

MaskRay added inline comments.Sep 8 2022, 10:29 PM

clang/CMakeLists.txt
933–940	Supporting other cmake variables will be awesome. I use something like `-DCMAKE_CXX_ARCHIVE_CREATE="$HOME/llvm/out/stable/bin/llvm-ar qcS --thin <TARGET> <OBJECTS>" -DCMAKE_CXX_ARCHIVE_FINISH=:` to make my build smaller.

Amir added a child revision: D133633: [CMake] Add ClangBootstrap configuration.Sep 9 2022, 10:49 PM

Address @phosek's comment about dependency on shell

Amir marked an inline comment as done.Sep 12 2022, 1:08 PM

Amir marked an inline comment as not done.

Amir added inline comments.

clang/CMakeLists.txt
933–940	Addressed in D133633

Harbormaster completed remote builds in B186213: Diff 459539.Sep 12 2022, 1:49 PM

Add an ability to pass extra cmake flags

Amir marked an inline comment as done.Sep 12 2022, 4:32 PM

Amir added inline comments.

clang/CMakeLists.txt
933–940	Done: -DCLANG_BOLT_INSTRUMENT_EXTRA_CMAKE_FLAGS is passed to the cmake step of bolt-instrumentation-profile target. I tested it with -DCLANG_BOLT_INSTRUMENT_EXTRA_CMAKE_FLAGS='-DCMAKE_CXX_ARCHIVE_CREATE="<path/to/llvm/bin>/llvm-ar qcS --thin <TARGET> <OBJECTS>" -DCMAKE_CXX_ARCHIVE_FINISH=:' and that appeared to work.

Harbormaster completed remote builds in B186252: Diff 459588.Sep 12 2022, 5:24 PM

LGTM

This revision is now accepted and ready to land.Sep 21 2022, 1:19 AM

Closed by commit rG3dab7fede201: [CMake] Add clang-bolt target (authored by Amir). · Explain WhySep 23 2022, 1:10 AM

This revision was automatically updated to reflect the committed changes.

Amir added a commit: rG3dab7fede201: [CMake] Add clang-bolt target.

In D132975#3763264, @phosek wrote:

This was already on my list of build system features I'd like to implement and I'm glad someone else is already looking into it, thank you! I have two high level comments about your approach.

The first one is related to the use of Clang build as the training data. I think that Clang build is both unnecessarily heavyweight, but also not particularly representative of typical workloads (most Clang users don't use it to build Clang). Ideally, we would give vendors the flexibility to supply their own training data. I'd prefer reusing the existing perf-training setup to do so. In fact, I'd imagine most vendors would likely use the same training data for both PGO and BOLT and that use case should be supported.

Do you happen to know any existing perf-training sets? Or is there a simple way to create one?

In D132975#3860896, @Amir wrote:

In D132975#3763264, @phosek wrote:

This was already on my list of build system features I'd like to implement and I'm glad someone else is already looking into it, thank you! I have two high level comments about your approach.

The first one is related to the use of Clang build as the training data. I think that Clang build is both unnecessarily heavyweight, but also not particularly representative of typical workloads (most Clang users don't use it to build Clang). Ideally, we would give vendors the flexibility to supply their own training data. I'd prefer reusing the existing perf-training setup to do so. In fact, I'd imagine most vendors would likely use the same training data for both PGO and BOLT and that use case should be supported.

Do you happen to know any existing perf-training sets? Or is there a simple way to create one?

I'm working on a script for generating perf-training sets from Ninja-based build systems, I can contribute it to LLVM if you think it'd be useful.

In D132975#3861334, @phosek wrote:

In D132975#3860896, @Amir wrote:

In D132975#3763264, @phosek wrote:

This was already on my list of build system features I'd like to implement and I'm glad someone else is already looking into it, thank you! I have two high level comments about your approach.

The first one is related to the use of Clang build as the training data. I think that Clang build is both unnecessarily heavyweight, but also not particularly representative of typical workloads (most Clang users don't use it to build Clang). Ideally, we would give vendors the flexibility to supply their own training data. I'd prefer reusing the existing perf-training setup to do so. In fact, I'd imagine most vendors would likely use the same training data for both PGO and BOLT and that use case should be supported.

Do you happen to know any existing perf-training sets? Or is there a simple way to create one?

I'm working on a script for generating perf-training sets from Ninja-based build systems, I can contribute it to LLVM if you think it'd be useful.

Yes, that would be super useful. BOLT should then also leverage that.

Diff 462415

clang/CMakeLists.txt

Show First 20 Lines • Show All 437 Lines • ▼ Show 20 Lines
else()		else()
set(HAVE_CLANG_PLUGIN_SUPPORT OFF)		set(HAVE_CLANG_PLUGIN_SUPPORT OFF)
endif()		endif()
CMAKE_DEPENDENT_OPTION(CLANG_PLUGIN_SUPPORT		CMAKE_DEPENDENT_OPTION(CLANG_PLUGIN_SUPPORT
"Build clang with plugin support" ON		"Build clang with plugin support" ON
"HAVE_CLANG_PLUGIN_SUPPORT" OFF)		"HAVE_CLANG_PLUGIN_SUPPORT" OFF)

# If libstdc++ is statically linked, clang-repl needs to statically link libstdc++		# If libstdc++ is statically linked, clang-repl needs to statically link libstdc++
# itself, which is not possible in many platforms because of current limitations in		# itself, which is not possible in many platforms because of current limitations in
# JIT stack. (more platforms need to be supported by JITLink)		# JIT stack. (more platforms need to be supported by JITLink)
if(NOT LLVM_STATIC_LINK_CXX_STDLIB)		if(NOT LLVM_STATIC_LINK_CXX_STDLIB)
set(HAVE_CLANG_REPL_SUPPORT ON)		set(HAVE_CLANG_REPL_SUPPORT ON)
endif()		endif()

option(CLANG_ENABLE_ARCMT "Build ARCMT." ON)		option(CLANG_ENABLE_ARCMT "Build ARCMT." ON)
option(CLANG_ENABLE_STATIC_ANALYZER		option(CLANG_ENABLE_STATIC_ANALYZER
"Include static analyzer in clang binary." ON)		"Include static analyzer in clang binary." ON)
▲ Show 20 Lines • Show All 421 Lines • ▼ Show 20 Lines	foreach(target ${CLANG_BOOTSTRAP_TARGETS})
if(target MATCHES "^stage[0-9]*")		if(target MATCHES "^stage[0-9]*")
add_custom_target(${target} DEPENDS ${NEXT_CLANG_STAGE}-${target})		add_custom_target(${target} DEPENDS ${NEXT_CLANG_STAGE}-${target})
endif()		endif()

ExternalProject_Add_StepTargets(${NEXT_CLANG_STAGE} ${target})		ExternalProject_Add_StepTargets(${NEXT_CLANG_STAGE} ${target})
endforeach()		endforeach()
endif()		endif()

		if (CLANG_BOLT_INSTRUMENT)
		phosekUnsubmitted Not Done Reply Inline Actions We could consider moving this block to a separate file which would then be included here since this file is already getting pretty large and the logic in this block is self-contained. That could be done in a follow up change though. phosek: We could consider moving this block to a separate file which would then be included here since…
		set(CLANG_PATH ${LLVM_RUNTIME_OUTPUT_INTDIR}/clang)
		set(CLANGXX_PATH ${CLANG_PATH}++)
		set(CLANG_INSTRUMENTED ${CLANG_PATH}-bolt.inst)
		set(CLANGXX_INSTRUMENTED ${CLANGXX_PATH}-bolt.inst)
		set(CLANG_OPTIMIZED ${CLANG_PATH}-bolt)
		set(CLANGXX_OPTIMIZED ${CLANGXX_PATH}-bolt)

		# Instrument clang with BOLT
		add_custom_target(clang-instrumented
		DEPENDS ${CLANG_INSTRUMENTED}
		)
		add_custom_command(OUTPUT ${CLANG_INSTRUMENTED}
		DEPENDS clang llvm-bolt
		COMMAND llvm-bolt ${CLANG_PATH} -o ${CLANG_INSTRUMENTED}
		-instrument --instrumentation-file-append-pid
		--instrumentation-file=${CMAKE_CURRENT_BINARY_DIR}/prof.fdata
		COMMENT "Instrumenting clang binary with BOLT"
		VERBATIM
		)

		# Make a symlink from clang-bolt.inst to clang++-bolt.inst
		add_custom_target(clang++-instrumented
		DEPENDS ${CLANGXX_INSTRUMENTED}
		)
		add_custom_command(OUTPUT ${CLANGXX_INSTRUMENTED}
		DEPENDS clang-instrumented
		COMMAND ${CMAKE_COMMAND} -E create_symlink
		${CLANG_INSTRUMENTED}
		${CLANGXX_INSTRUMENTED}
		COMMENT "Creating symlink from BOLT instrumented clang to clang++"
		VERBATIM
		)

		# Build specified targets with instrumented Clang to collect the profile
		set(STAMP_DIR ${CMAKE_CURRENT_BINARY_DIR}/bolt-instrumented-clang-stamps/)
		set(BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR}/bolt-instrumented-clang-bins/)
		set(build_configuration "$<CONFIG>")
		include(ExternalProject)
		ExternalProject_Add(bolt-instrumentation-profile
		DEPENDS clang++-instrumented
		PREFIX bolt-instrumentation-profile
		SOURCE_DIR ${CMAKE_SOURCE_DIR}
		STAMP_DIR ${STAMP_DIR}
		BINARY_DIR ${BINARY_DIR}
		EXCLUDE_FROM_ALL 1
		CMAKE_ARGS
		${CLANG_BOLT_INSTRUMENT_EXTRA_CMAKE_FLAGS}
		# We shouldn't need to set this here, but INSTALL_DIR doesn't
		# seem to work, so instead I'm passing this through
		-DCMAKE_INSTALL_PREFIX=${CMAKE_INSTALL_PREFIX}
		-DCMAKE_C_COMPILER=${CLANG_INSTRUMENTED}
		-DCMAKE_CXX_COMPILER=${CLANGXX_INSTRUMENTED}
		-DCMAKE_ASM_COMPILER=${CLANG_INSTRUMENTED}
		-DCMAKE_ASM_COMPILER_ID=Clang
		-DCMAKE_BUILD_TYPE=Release
		-DLLVM_ENABLE_PROJECTS=${CLANG_BOLT_INSTRUMENT_PROJECTS}
		phosekUnsubmitted Done Reply Inline Actions I don't think this is sufficient in the general case, we would need to pass additional variables like `CMAKE_AR` the same way we do for the existing bootstrap logic, see https://github.com/llvm/llvm-project/blob/dc549bf0013e11e8fcccba8a8d59c3a4bb052a3b/clang/CMakeLists.txt#L825. For example, on Fuchsia builders we don't have any system-wide toolchain installation, instead we manually set all necessary `CMAKE_<TOOL>` variables for the first stage, so this call will fail for us because it won't be able to find tools like the archiver. Since handling this properly would likely duplicate a lot of the existing logic from the existing bootstrap logic, I'm wondering if we should instead try to refactor the existing logic and break it up into macros/functions which could then be reused here as well. phosek: I don't think this is sufficient in the general case, we would need to pass additional…
		MaskRayUnsubmitted Not Done Reply Inline Actions Supporting other cmake variables will be awesome. I use something like `-DCMAKE_CXX_ARCHIVE_CREATE="$HOME/llvm/out/stable/bin/llvm-ar qcS --thin <TARGET> <OBJECTS>" -DCMAKE_CXX_ARCHIVE_FINISH=:` to make my build smaller. MaskRay: Supporting other cmake variables will be awesome. I use something like `…
		AmirAuthorUnsubmitted Done Reply Inline Actions Done: -DCLANG_BOLT_INSTRUMENT_EXTRA_CMAKE_FLAGS is passed to the cmake step of bolt-instrumentation-profile target. I tested it with -DCLANG_BOLT_INSTRUMENT_EXTRA_CMAKE_FLAGS='-DCMAKE_CXX_ARCHIVE_CREATE="<path/to/llvm/bin>/llvm-ar qcS --thin <TARGET> <OBJECTS>" -DCMAKE_CXX_ARCHIVE_FINISH=:' and that appeared to work. Amir: Done: -DCLANG_BOLT_INSTRUMENT_EXTRA_CMAKE_FLAGS is passed to the cmake step of bolt…
		AmirAuthorUnsubmitted Done Reply Inline Actions Addressed in D133633 Amir: Addressed in D133633
		-DLLVM_TARGETS_TO_BUILD=${LLVM_TARGETS_TO_BUILD}
		BUILD_COMMAND ${CMAKE_COMMAND} --build ${BINARY_DIR}
		--config ${build_configuration}
		--target ${CLANG_BOLT_INSTRUMENT_TARGETS}
		INSTALL_COMMAND ""
		STEP_TARGETS configure build
		USES_TERMINAL_CONFIGURE 1
		USES_TERMINAL_BUILD 1
		USES_TERMINAL_INSTALL 1
		)

		# Merge profiles into one using merge-fdata
		add_custom_target(clang-bolt-profile
		DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/prof.fdata
		)
		add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/prof.fdata
		DEPENDS merge-fdata bolt-instrumentation-profile-build
		WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
		phosekUnsubmitted Done Reply Inline Actions I'd like to avoid dependency on shell to make this compatible with Windows. Can we move this logic into a Python script akin to https://github.com/llvm/llvm-project/blob/607f14d9605da801034e7119c297c3f58ebce603/clang/utils/perf-training/perf-helper.py? phosek: I'd like to avoid dependency on shell to make this compatible with Windows. Can we move this…
		COMMAND ${Python3_EXECUTABLE}
		${CMAKE_CURRENT_SOURCE_DIR}/utils/perf-training/perf-helper.py merge-fdata
		$<TARGET_FILE:merge-fdata> ${CMAKE_CURRENT_BINARY_DIR}/prof.fdata
		${CMAKE_CURRENT_BINARY_DIR}
		COMMENT "Preparing BOLT profile"
		VERBATIM
		)

		# Optimize original (pre-bolt) Clang using the collected profile
		add_custom_target(clang-bolt
		DEPENDS ${CLANG_OPTIMIZED}
		)
		add_custom_command(OUTPUT ${CLANG_OPTIMIZED}
		DEPENDS clang-bolt-profile
		COMMAND llvm-bolt ${CLANG_PATH}
		-o ${CLANG_OPTIMIZED}
		-data ${CMAKE_CURRENT_BINARY_DIR}/prof.fdata
		-reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions
		-split-all-cold -split-eh -dyno-stats -icf=1 -use-gnu-stack
		COMMENT "Optimizing Clang with BOLT"
		VERBATIM
		)

		# Make a symlink from clang-bolt to clang++-bolt
		add_custom_target(clang++-bolt
		DEPENDS ${CLANGXX_OPTIMIZED}
		)
		add_custom_command(OUTPUT ${CLANGXX_OPTIMIZED}
		DEPENDS clang-bolt
		COMMAND ${CMAKE_COMMAND} -E create_symlink
		${CLANG_OPTIMIZED}
		${CLANGXX_OPTIMIZED}
		COMMENT "Creating symlink from BOLT optimized clang to clang++"
		VERBATIM
		)
		endif()

if (LLVM_ADD_NATIVE_VISUALIZERS_TO_SOLUTION)		if (LLVM_ADD_NATIVE_VISUALIZERS_TO_SOLUTION)
add_subdirectory(utils/ClangVisualizers)		add_subdirectory(utils/ClangVisualizers)
endif()		endif()
add_subdirectory(utils/hmaptool)		add_subdirectory(utils/hmaptool)

if(CLANG_BUILT_STANDALONE)		if(CLANG_BUILT_STANDALONE)
llvm_distribution_add_targets()		llvm_distribution_add_targets()
process_llvm_pass_plugins()		process_llvm_pass_plugins()
endif()		endif()

set(CLANG_INSTALL_LIBDIR_BASENAME "lib${CLANG_LIBDIR_SUFFIX}")		set(CLANG_INSTALL_LIBDIR_BASENAME "lib${CLANG_LIBDIR_SUFFIX}")

configure_file(		configure_file(
${CLANG_SOURCE_DIR}/include/clang/Config/config.h.cmake		${CLANG_SOURCE_DIR}/include/clang/Config/config.h.cmake
${CLANG_BINARY_DIR}/include/clang/Config/config.h)		${CLANG_BINARY_DIR}/include/clang/Config/config.h)

clang/cmake/caches/BOLT.cmake

This file was added.

				set(CMAKE_BUILD_TYPE Release CACHE STRING "")
				set(CLANG_BOLT_INSTRUMENT ON CACHE BOOL "")
				set(CLANG_BOLT_INSTRUMENT_PROJECTS "llvm" CACHE STRING "")
				set(CLANG_BOLT_INSTRUMENT_TARGETS "count" CACHE STRING "")
				set(CMAKE_EXE_LINKER_FLAGS "-Wl,--emit-relocs,-znow" CACHE STRING "")
				set(CLANG_BOLT_INSTRUMENT_EXTRA_CMAKE_FLAGS "" CACHE STRING "")

				set(LLVM_ENABLE_PROJECTS "bolt;clang" CACHE STRING "")
				set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "")

				# Disable function splitting enabled by default in GCC8+
				if("${CMAKE_CXX_COMPILER_ID}" MATCHES "GNU")
				set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fno-reorder-blocks-and-partition")
				set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-reorder-blocks-and-partition")
				endif()

clang/utils/perf-training/perf-helper.py

Show All 32 Lines	print('Usage: %s clean <path> <extension>\n' % __file__ +
'\tRemoves all files with extension from <path>.')		'\tRemoves all files with extension from <path>.')
return 1		return 1
for filename in findFilesWithExtension(args[0], args[1]):		for filename in findFilesWithExtension(args[0], args[1]):
os.remove(filename)		os.remove(filename)
return 0		return 0

def merge(args):		def merge(args):
if len(args) != 3:		if len(args) != 3:
print('Usage: %s clean <llvm-profdata> <output> <path>\n' % __file__ +		print('Usage: %s merge <llvm-profdata> <output> <path>\n' % __file__ +
'\tMerges all profraw files from path into output.')		'\tMerges all profraw files from path into output.')
return 1		return 1
cmd = [args[0], 'merge', '-o', args[1]]		cmd = [args[0], 'merge', '-o', args[1]]
cmd.extend(findFilesWithExtension(args[2], "profraw"))		cmd.extend(findFilesWithExtension(args[2], "profraw"))
subprocess.check_call(cmd)		subprocess.check_call(cmd)
return 0		return 0

		def merge_fdata(args):
		if len(args) != 3:
		print('Usage: %s merge-fdata <merge-fdata> <output> <path>\n' % __file__ +
		'\tMerges all fdata files from path into output.')
		return 1
		cmd = [args[0], '-o', args[1]]
		cmd.extend(findFilesWithExtension(args[2], "fdata"))
		subprocess.check_call(cmd)
		return 0

def dtrace(args):		def dtrace(args):
parser = argparse.ArgumentParser(prog='perf-helper dtrace',		parser = argparse.ArgumentParser(prog='perf-helper dtrace',
description='dtrace wrapper for order file generation')		description='dtrace wrapper for order file generation')
parser.add_argument('--buffer-size', metavar='size', type=int, required=False,		parser.add_argument('--buffer-size', metavar='size', type=int, required=False,
default=1, help='dtrace buffer size in MB (default 1)')		default=1, help='dtrace buffer size in MB (default 1)')
parser.add_argument('--use-oneshot', required=False, action='store_true',		parser.add_argument('--use-oneshot', required=False, action='store_true',
help='Use dtrace\'s oneshot probes')		help='Use dtrace\'s oneshot probes')
parser.add_argument('--use-ustack', required=False, action='store_true',		parser.add_argument('--use-ustack', required=False, action='store_true',
▲ Show 20 Lines • Show All 333 Lines • ▼ Show 20 Lines	def genOrderFile(args):
# Write the order file.		# Write the order file.
with open(opts.output_path, 'w') as f:		with open(opts.output_path, 'w') as f:
f.write("\n".join(result))		f.write("\n".join(result))
f.write("\n")		f.write("\n")

return 0		return 0

commands = {'clean' : clean,		commands = {'clean' : clean,
'merge' : merge,		'merge' : merge,
'dtrace' : dtrace,		'dtrace' : dtrace,
'cc1' : cc1,		'cc1' : cc1,
'gen-order-file' : genOrderFile}		'gen-order-file' : genOrderFile,
		'merge-fdata' : merge_fdata,
		}

def main():		def main():
f = commands[sys.argv[1]]		f = commands[sys.argv[1]]
sys.exit(f(sys.argv[2:]))		sys.exit(f(sys.argv[2:]))

if __name__ == '__main__':		if __name__ == '__main__':
main()		main()

This is an archive of the discontinued LLVM Phabricator instance.

[CMake] Add clang-bolt target
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 462415

clang/CMakeLists.txt

clang/cmake/caches/BOLT.cmake

clang/utils/perf-training/perf-helper.py

This is an archive of the discontinued LLVM Phabricator instance.

[CMake] Add clang-bolt targetClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 462415

clang/CMakeLists.txt

clang/cmake/caches/BOLT.cmake

clang/utils/perf-training/perf-helper.py

[CMake] Add clang-bolt target
ClosedPublic