This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
CMakeLists.txt
-
cmake/modules/
-
modules/
-
AddLLVM.cmake
-
CrossCompile.cmake
-
tools/
-
llvm-nm/
-
CMakeLists.txt
-
llvm-readobj/
-
CMakeLists.txt
-
llvm-shlib/
-
CMakeLists.txt
-
utils/
-
extract_symbols.py

Differential D149119

[CMake] Use LLVM own tools in extract_symbols.py
ClosedPublic

Authored by ikudrin on Apr 24 2023, 8:52 PM.

Download Raw Diff

Details

Reviewers

john.brawn
daltenty
jsji
simon_tatham
tmatheson
mstorsjo
phosek
chandlerc
beanz
rnk
stevewan
hubert.reinterpretcast
DavidSpickett
jhenderson

Commits

rGf649599ea933: [CMake] Use LLVM own tools in extract_symbols.py

Summary

As for now, extract_symbols.py can use several tools to extract symbols from object files and libraries and to guess if the target is 32-bit Windows. The tools are being found via PATH, so in most cases, they are just system tools. This approach has a number of limitations, in particular:

System tools may not be able to handle the target format in case of cross-platform builds,
They cannot read symbols from LLVM bitcode files, so the staged LTO build with plugins is not supported,
The auto-selected tools may be suboptimal (see D113557),
Support for multiple tools for a single task increases the complexity of the script code.

The patch proposes using LLVM's own tools to solve these issues. Specifically, llvm-readobj detects the target platform, and llvm-nm reads symbols from all supported formats, including bitcode files. The tools can be built in Release mode for the host platform or overridden using CMake settings LLVM_READOBJ and LLVM_NM respectively. The implementation also supports using precompiled tools via LLVM_NATIVE_TOOL_DIR.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ikudrin created this revision.Apr 24 2023, 8:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2023, 8:52 PM

Herald added subscribers: ekilmer, inglorion. · View Herald Transcript

ikudrin requested review of this revision.Apr 24 2023, 8:52 PM

This looks like a nice addition. Would it make sense to use llvm-nm always, not restricted to bootstrap builds? And would that work on Windows and allow us to simplify this script substantially by using one tool for all platforms?

In D149119#4295110, @tmatheson wrote:

This looks like a nice addition. Would it make sense to use llvm-nm always, not restricted to bootstrap builds? And would that work on Windows and allow us to simplify this script substantially by using one tool for all platforms?

If I understand it right, we might not be able to build llvm-nm in cases like cross-platform building, right?

Supporting only a single tool and simplifying the script would be my preference as well. I see that the script already supports llvm-readobj, do we need the llvm-nm support in that case?

In D149119#4298024, @phosek wrote:

Supporting only a single tool and simplifying the script would be my preference as well. I see that the script already supports llvm-readobj, do we need the llvm-nm support in that case?

I remember seeing it mentioned somewhere (I don't immediately see where right now though), llvm-readobj only handles regular object files, while llvm-nm handles bitcode too.

In D149119#4297540, @ikudrin wrote:

If I understand it right, we might not be able to build llvm-nm in cases like cross-platform building, right?

LLVM has a way to build tools that need to run on the build machine as part of the build (tablegen for example), llvm-nm could be added to that system and then it would be available when extract_symbols.py is run. It would be an issue if llvm-nm ever had to depend on extract_symbols.py but that is not currently the case afaik.

Do LLVM's current portability goals include the constraint that you can only build LLVM for a platform it can also target? If not, then there surely still needs to be some kind of escape hatch so that you can avoid needing llvm-nm to already support the object file format of the host platform.

I suppose you could say that in that unusual situation it's up to you to adapt extract_symbols.py so that it has some other way to get the answers.

In D149119#4312518, @simon_tatham wrote:

Do LLVM's current portability goals include the constraint that you can only build LLVM for a platform it can also target? If not, then there surely still needs to be some kind of escape hatch so that you can avoid needing llvm-nm to already support the object file format of the host platform.

I suppose you could say that in that unusual situation it's up to you to adapt extract_symbols.py so that it has some other way to get the answers.

If I understand it right, the main targets to use extract_symbols.py are AIX and Windows. For both these platforms llvm-nm and llvm-readobj can be built. Support for more exotic situations can be added relatively easily should it be required.

Reworked the patch to always use llvm-nm to extract symbols and llvm-readobj to detect targeting 32-bit Windows.

Herald added a reviewer: jhenderson. · View Herald TranscriptMay 8 2023, 12:08 AM

Herald added a subscriber: MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B230569: Diff 520277.May 8 2023, 1:16 AM

LGTM, thank you for doing this. Please give it a couple of days in case others have comments.

This revision is now accepted and ready to land.May 9 2023, 4:17 AM

I've not really looked into this patch significantly, so this may well be addressed in the patch, given I see you have modified stuff to do with the NATIVE build, but in the past I have seen LLVM using its own tools to build other parts of its system. I believe it was using llvm-nm to extract the list of symbols needed for export, possibly to do with part of the clang build, possibly even using this script, I don't remember. The problem was that it was using the just-built version of llvm-nm, rather than specifically one from a release build. On a debug build this caused particularly slow builds for me, so much so that I stopped building the relevant parts of LLVM. Please don't introduce a similar situation/make the situation worse (it's quite possible this was fixed some time ago, but I haven't tried recently, nor do I remember the exact thing causing the issue): much like tablegen, any parts of the LLVM build that use just-built tools should make use of release builds, even in debug configuration, at least if an appropriate cmake option is specified.

One potential area of concern here: If llvm-driver is ever extended to work as a plugin loader (thus exporting its symbols), removing support for the pre-installed host tools could cause a cyclic dependency.

In D149119#4329274, @tmatheson wrote:

LGTM, thank you for doing this. Please give it a couple of days in case others have comments.

Thanks!

In D149119#4329285, @jhenderson wrote:

I've not really looked into this patch significantly, so this may well be addressed in the patch, given I see you have modified stuff to do with the NATIVE build, but in the past I have seen LLVM using its own tools to build other parts of its system. I believe it was using llvm-nm to extract the list of symbols needed for export, possibly to do with part of the clang build, possibly even using this script, I don't remember. The problem was that it was using the just-built version of llvm-nm, rather than specifically one from a release build. On a debug build this caused particularly slow builds for me, so much so that I stopped building the relevant parts of LLVM. Please don't introduce a similar situation/make the situation worse (it's quite possible this was fixed some time ago, but I haven't tried recently, nor do I remember the exact thing causing the issue): much like tablegen, any parts of the LLVM build that use just-built tools should make use of release builds, even in debug configuration, at least if an appropriate cmake option is specified.

Your concerns are legit, but the tools in this patch follow the same principle as TableGen, i.e. if LLVM_OPTIMIZED_TABLEGEN is ON then the tools are forced to be built with optimization.

In D149119#4329618, @beanz wrote:

One potential area of concern here: If llvm-driver is ever extended to work as a plugin loader (thus exporting its symbols), removing support for the pre-installed host tools could cause a cyclic dependency.

In that case, we will need to add an option to build the tools without plugin support so that they can be used in the build process.

In D149119#4331207, @ikudrin wrote:

In D149119#4329274, @tmatheson wrote:

LGTM, thank you for doing this. Please give it a couple of days in case others have comments.

Thanks!

In D149119#4329285, @jhenderson wrote:

I've not really looked into this patch significantly, so this may well be addressed in the patch, given I see you have modified stuff to do with the NATIVE build, but in the past I have seen LLVM using its own tools to build other parts of its system. I believe it was using llvm-nm to extract the list of symbols needed for export, possibly to do with part of the clang build, possibly even using this script, I don't remember. The problem was that it was using the just-built version of llvm-nm, rather than specifically one from a release build. On a debug build this caused particularly slow builds for me, so much so that I stopped building the relevant parts of LLVM. Please don't introduce a similar situation/make the situation worse (it's quite possible this was fixed some time ago, but I haven't tried recently, nor do I remember the exact thing causing the issue): much like tablegen, any parts of the LLVM build that use just-built tools should make use of release builds, even in debug configuration, at least if an appropriate cmake option is specified.

Your concerns are legit, but the tools in this patch follow the same principle as TableGen, i.e. if LLVM_OPTIMIZED_TABLEGEN is ON then the tools are forced to be built with optimization.

Thanks - that's fine with me (though raises the question as to whether we should be renaming that variable at some point...).

Closed by commit rGf649599ea933: [CMake] Use LLVM own tools in extract_symbols.py (authored by ikudrin). · Explain WhyMay 15 2023, 4:21 PM

This revision was automatically updated to reflect the committed changes.

ikudrin added a commit: rGf649599ea933: [CMake] Use LLVM own tools in extract_symbols.py.

Revision Contents

Path

Size

llvm/

CMakeLists.txt

5 lines

cmake/

modules/

AddLLVM.cmake

31 lines

CrossCompile.cmake

16 lines

tools/

llvm-nm/

CMakeLists.txt

2 lines

llvm-readobj/

CMakeLists.txt

2 lines

llvm-shlib/

CMakeLists.txt

15 lines

utils/

extract_symbols.py

177 lines

Diff 522378

llvm/CMakeLists.txt

Show First 20 Lines • Show All 1,123 Lines • ▼ Show 20 Lines	if( ${CMAKE_SYSTEM_NAME} MATCHES SunOS )
# special hack for Solaris to handle crazy system sys/regset.h		# special hack for Solaris to handle crazy system sys/regset.h
include_directories("${LLVM_MAIN_INCLUDE_DIR}/llvm/Support/Solaris")		include_directories("${LLVM_MAIN_INCLUDE_DIR}/llvm/Support/Solaris")
endif( ${CMAKE_SYSTEM_NAME} MATCHES SunOS )		endif( ${CMAKE_SYSTEM_NAME} MATCHES SunOS )

# Make sure we don't get -rdynamic in every binary. For those that need it,		# Make sure we don't get -rdynamic in every binary. For those that need it,
# use export_executable_symbols(target).		# use export_executable_symbols(target).
set(CMAKE_SHARED_LIBRARY_LINK_CXX_FLAGS "")		set(CMAKE_SHARED_LIBRARY_LINK_CXX_FLAGS "")

set(LLVM_EXTRACT_SYMBOLS_FLAGS ""
CACHE STRING "Additional options to pass to llvm/utils/extract_symbols.py.
These cannot override the options set by cmake, but can add extra options
such as --tools.")

include(AddLLVM)		include(AddLLVM)
include(TableGen)		include(TableGen)

include(LLVMDistributionSupport)		include(LLVMDistributionSupport)

if( MINGW AND NOT "${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang" )		if( MINGW AND NOT "${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang" )
# People report that -O3 is unreliable on MinGW. The traditional		# People report that -O3 is unreliable on MinGW. The traditional
# build also uses -O2 for that reason:		# build also uses -O2 for that reason:
▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

llvm/cmake/modules/AddLLVM.cmake

Show First 20 Lines • Show All 1,240 Lines • ▼ Show 20 Lines	while(NOT "${new_libs}" STREQUAL "")
set(newer_libs "")		set(newer_libs "")
endwhile()		endwhile()
list(REMOVE_DUPLICATES static_libs)		list(REMOVE_DUPLICATES static_libs)
if (MSVC)		if (MSVC)
set(mangling microsoft)		set(mangling microsoft)
else()		else()
set(mangling itanium)		set(mangling itanium)
endif()		endif()
		get_host_tool_path(llvm-nm LLVM_NM llvm_nm_exe llvm_nm_target)
		get_host_tool_path(llvm-readobj LLVM_READOBJ llvm_readobj_exe llvm_readobj_target)
add_custom_command(OUTPUT ${exported_symbol_file}		add_custom_command(OUTPUT ${exported_symbol_file}
COMMAND "${Python3_EXECUTABLE}" ${LLVM_MAIN_SRC_DIR}/utils/extract_symbols.py ${LLVM_EXTRACT_SYMBOLS_FLAGS} --mangling=${mangling} ${static_libs} -o ${exported_symbol_file}		COMMAND "${Python3_EXECUTABLE}"
		${LLVM_MAIN_SRC_DIR}/utils/extract_symbols.py
		--mangling=${mangling} ${static_libs}
		-o ${exported_symbol_file}
		--nm=${llvm_nm_exe}
		--readobj=${llvm_readobj_exe}
WORKING_DIRECTORY ${LLVM_LIBRARY_OUTPUT_INTDIR}		WORKING_DIRECTORY ${LLVM_LIBRARY_OUTPUT_INTDIR}
DEPENDS ${LLVM_MAIN_SRC_DIR}/utils/extract_symbols.py ${static_libs}		DEPENDS ${LLVM_MAIN_SRC_DIR}/utils/extract_symbols.py
		${static_libs} ${llvm_nm_target} ${llvm_readobj_target}
VERBATIM		VERBATIM
COMMENT "Generating export list for ${target}")		COMMENT "Generating export list for ${target}")
add_llvm_symbol_exports( ${target} ${exported_symbol_file} )		add_llvm_symbol_exports( ${target} ${exported_symbol_file} )
# If something links against this executable then we want a		# If something links against this executable then we want a
# transitive link against only the libraries whose symbols		# transitive link against only the libraries whose symbols
# we aren't exporting.		# we aren't exporting.
set_target_properties(${target} PROPERTIES INTERFACE_LINK_LIBRARIES "${other_libs}")		set_target_properties(${target} PROPERTIES INTERFACE_LINK_LIBRARIES "${other_libs}")
# The default import library suffix that cmake uses for cygwin/mingw is		# The default import library suffix that cmake uses for cygwin/mingw is
▲ Show 20 Lines • Show All 1,157 Lines • ▼ Show 20 Lines	if(git_result EQUAL 0)
return()		return()
endif()		endif()
endif()		endif()
set(${out_var} "${git_dir}/logs/HEAD" PARENT_SCOPE)		set(${out_var} "${git_dir}/logs/HEAD" PARENT_SCOPE)
endif()		endif()
endif()		endif()
endfunction()		endfunction()

function(setup_host_tool tool_name setting_name exe_var_name target_var_name)		function(get_host_tool_path tool_name setting_name exe_var_name target_var_name)
set(${setting_name}_DEFAULT "${tool_name}")		set(${setting_name}_DEFAULT "")

if(LLVM_NATIVE_TOOL_DIR)		if(LLVM_NATIVE_TOOL_DIR)
if(EXISTS "${LLVM_NATIVE_TOOL_DIR}/${tool_name}${LLVM_HOST_EXECUTABLE_SUFFIX}")		if(EXISTS "${LLVM_NATIVE_TOOL_DIR}/${tool_name}${LLVM_HOST_EXECUTABLE_SUFFIX}")
set(${setting_name}_DEFAULT "${LLVM_NATIVE_TOOL_DIR}/${tool_name}${LLVM_HOST_EXECUTABLE_SUFFIX}")		set(${setting_name}_DEFAULT "${LLVM_NATIVE_TOOL_DIR}/${tool_name}${LLVM_HOST_EXECUTABLE_SUFFIX}")
endif()		endif()
endif()		endif()

set(${setting_name} "${${setting_name}_DEFAULT}" CACHE		set(${setting_name} "${${setting_name}_DEFAULT}" CACHE
STRING "Host ${tool_name} executable. Saves building if cross-compiling.")		STRING "Host ${tool_name} executable. Saves building if cross-compiling.")

if(NOT ${setting_name} STREQUAL "${tool_name}")		if(${setting_name})
set(exe_name ${${setting_name}})		set(exe_name ${${setting_name}})
set(target_name ${${setting_name}})		set(target_name "")
elseif(LLVM_USE_HOST_TOOLS)		elseif(LLVM_USE_HOST_TOOLS)
build_native_tool(${tool_name} exe_name DEPENDS ${tool_name})		get_native_tool_path(${tool_name} exe_name)
set(target_name ${exe_name})		set(target_name ${exe_name})
else()		else()
set(exe_name $<TARGET_FILE:${tool_name}>)		set(exe_name $<TARGET_FILE:${tool_name}>)
set(target_name ${tool_name})		set(target_name ${tool_name})
endif()		endif()
set(${exe_var_name} "${exe_name}" CACHE STRING "")		set(${exe_var_name} "${exe_name}" CACHE STRING "")
set(${target_var_name} "${target_name}" CACHE STRING "")		set(${target_var_name} "${target_name}" CACHE STRING "")
endfunction()		endfunction()

		function(setup_host_tool tool_name setting_name exe_var_name target_var_name)
		get_host_tool_path(${tool_name} ${setting_name} ${exe_var_name} ${target_var_name})
		# Set up a native tool build if necessary
		if(LLVM_USE_HOST_TOOLS AND NOT ${setting_name})
		build_native_tool(${tool_name} exe_name DEPENDS ${tool_name})
		add_custom_target(${target_var_name} DEPENDS ${exe_name})
		endif()
		endfunction()

llvm/cmake/modules/CrossCompile.cmake

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	add_custom_command(OUTPUT ${${project_name}_${target_name}_BUILD}/CMakeCache.txt
DEPENDS CREATE_${project_name}_${target_name}		DEPENDS CREATE_${project_name}_${target_name}
COMMENT "Configuring ${target_name} ${project_name}...")		COMMENT "Configuring ${target_name} ${project_name}...")

add_custom_target(CONFIGURE_${project_name}_${target_name}		add_custom_target(CONFIGURE_${project_name}_${target_name}
DEPENDS ${${project_name}_${target_name}_BUILD}/CMakeCache.txt)		DEPENDS ${${project_name}_${target_name}_BUILD}/CMakeCache.txt)

endfunction()		endfunction()

		function(get_native_tool_path target output_path_var)
		if(CMAKE_CONFIGURATION_TYPES)
		set(output_path "${${PROJECT_NAME}_NATIVE_BUILD}/Release/bin/${target}")
		else()
		set(output_path "${${PROJECT_NAME}_NATIVE_BUILD}/bin/${target}")
		endif()
		set(${output_path_var} ${output_path}${LLVM_HOST_EXECUTABLE_SUFFIX} PARENT_SCOPE)
		endfunction()

# Sets up a native build for a tool, used e.g. for cross-compilation and		# Sets up a native build for a tool, used e.g. for cross-compilation and
# LLVM_OPTIMIZED_TABLEGEN. Always builds in Release.		# LLVM_OPTIMIZED_TABLEGEN. Always builds in Release.
# - target: The target to build natively		# - target: The target to build natively
# - output_path_var: A variable name which receives the path to the built target		# - output_path_var: A variable name which receives the path to the built target
# - DEPENDS: Any additional dependencies for the target		# - DEPENDS: Any additional dependencies for the target
function(build_native_tool target output_path_var)		function(build_native_tool target output_path_var)
cmake_parse_arguments(ARG "" "" "DEPENDS" ${ARGN})		cmake_parse_arguments(ARG "" "" "DEPENDS" ${ARGN})

if(CMAKE_CONFIGURATION_TYPES)		get_native_tool_path(${target} output_path)
set(output_path "${${PROJECT_NAME}_NATIVE_BUILD}/Release/bin/${target}")
else()
set(output_path "${${PROJECT_NAME}_NATIVE_BUILD}/bin/${target}")
endif()
set(output_path ${output_path}${LLVM_HOST_EXECUTABLE_SUFFIX})

# Make chain of preceding actions		# Make chain of preceding actions
if(CMAKE_GENERATOR MATCHES "Visual Studio")		if(CMAKE_GENERATOR MATCHES "Visual Studio")
get_property(host_targets GLOBAL PROPERTY ${PROJECT_NAME}_HOST_TARGETS)		get_property(host_targets GLOBAL PROPERTY ${PROJECT_NAME}_HOST_TARGETS)
set_property(GLOBAL APPEND PROPERTY ${PROJECT_NAME}_HOST_TARGETS ${output_path})		set_property(GLOBAL APPEND PROPERTY ${PROJECT_NAME}_HOST_TARGETS ${output_path})
endif()		endif()

llvm_ExternalProject_BuildCmd(build_cmd ${target} ${${PROJECT_NAME}_NATIVE_BUILD}		llvm_ExternalProject_BuildCmd(build_cmd ${target} ${${PROJECT_NAME}_NATIVE_BUILD}
Show All 9 Lines

llvm/tools/llvm-nm/CMakeLists.txt

Show All 19 Lines	add_llvm_tool(llvm-nm
llvm-nm.cpp		llvm-nm.cpp

DEPENDS		DEPENDS
NmOptsTableGen		NmOptsTableGen
intrinsics_gen		intrinsics_gen
GENERATE_DRIVER		GENERATE_DRIVER
)		)

		setup_host_tool(llvm-nm LLVM_NM llvm_nm_exe llvm_nm_target)

if(LLVM_INSTALL_BINUTILS_SYMLINKS)		if(LLVM_INSTALL_BINUTILS_SYMLINKS)
add_llvm_tool_symlink(nm llvm-nm)		add_llvm_tool_symlink(nm llvm-nm)
endif()		endif()

llvm/tools/llvm-readobj/CMakeLists.txt

Show All 24 Lines	add_llvm_tool(llvm-readobj
Win64EHDumper.cpp		Win64EHDumper.cpp
WindowsResourceDumper.cpp		WindowsResourceDumper.cpp
XCOFFDumper.cpp		XCOFFDumper.cpp
DEPENDS		DEPENDS
ReadobjOptsTableGen		ReadobjOptsTableGen
GENERATE_DRIVER		GENERATE_DRIVER
)		)

		setup_host_tool(llvm-readobj LLVM_READOBJ llvm_readobj_exe llvm_readobj_target)

add_llvm_tool_symlink(llvm-readelf llvm-readobj)		add_llvm_tool_symlink(llvm-readelf llvm-readobj)

if(LLVM_INSTALL_BINUTILS_SYMLINKS)		if(LLVM_INSTALL_BINUTILS_SYMLINKS)
add_llvm_tool_symlink(readelf llvm-readobj)		add_llvm_tool_symlink(readelf llvm-readobj)
endif()		endif()

llvm/tools/llvm-shlib/CMakeLists.txt

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	else()
# Write out the full lib names into file to be read by the python script.		# Write out the full lib names into file to be read by the python script.
file(WRITE ${LIBSFILE} "${FILE_CONTENT}")		file(WRITE ${LIBSFILE} "${FILE_CONTENT}")
endif()		endif()

# Generate the exports file dynamically.		# Generate the exports file dynamically.
set(GEN_SCRIPT ${CMAKE_CURRENT_SOURCE_DIR}/gen-msvc-exports.py)		set(GEN_SCRIPT ${CMAKE_CURRENT_SOURCE_DIR}/gen-msvc-exports.py)

set(LLVM_EXPORTED_SYMBOL_FILE ${LLVM_BINARY_DIR}/${CMAKE_CFG_INTDIR}/libllvm-c.exports)		set(LLVM_EXPORTED_SYMBOL_FILE ${LLVM_BINARY_DIR}/${CMAKE_CFG_INTDIR}/libllvm-c.exports)
if(NOT LLVM_NM)		get_host_tool_path(llvm-nm LLVM_NM llvm_nm_exe llvm_nm_target)
if(CMAKE_CROSSCOMPILING)
build_native_tool(llvm-nm llvm_nm)
set(llvm_nm_target "${llvm_nm}")
else()
set(llvm_nm $<TARGET_FILE:llvm-nm>)
set(llvm_nm_target llvm-nm)
endif()
else()
set(llvm_nm ${LLVM_NM})
set(llvm_nm_target "")
endif()

add_custom_command(OUTPUT ${LLVM_EXPORTED_SYMBOL_FILE}		add_custom_command(OUTPUT ${LLVM_EXPORTED_SYMBOL_FILE}
COMMAND "${Python3_EXECUTABLE}" ${GEN_SCRIPT} --libsfile ${LIBSFILE} ${GEN_UNDERSCORE} --nm "${llvm_nm}" -o ${LLVM_EXPORTED_SYMBOL_FILE}		COMMAND "${Python3_EXECUTABLE}" ${GEN_SCRIPT} --libsfile ${LIBSFILE} ${GEN_UNDERSCORE} --nm "${llvm_nm_exe}" -o ${LLVM_EXPORTED_SYMBOL_FILE}
DEPENDS ${LIB_NAMES} ${llvm_nm_target}		DEPENDS ${LIB_NAMES} ${llvm_nm_target}
COMMENT "Generating export list for LLVM-C"		COMMENT "Generating export list for LLVM-C"
VERBATIM )		VERBATIM )

# Finally link the target.		# Finally link the target.
add_llvm_library(LLVM-C SHARED INSTALL_WITH_TOOLCHAIN ${SOURCES} DEPENDS intrinsics_gen)		add_llvm_library(LLVM-C SHARED INSTALL_WITH_TOOLCHAIN ${SOURCES} DEPENDS intrinsics_gen)

if (LLVM_INTEGRATED_CRT_ALLOC AND MSVC)		if (LLVM_INTEGRATED_CRT_ALLOC AND MSVC)
# Make sure we search LLVMSupport first, before the CRT libs		# Make sure we search LLVMSupport first, before the CRT libs
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -INCLUDE:malloc")		set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -INCLUDE:malloc")
endif()		endif()

endif()		endif()

llvm/utils/extract_symbols.py

Show All 17 Lines
from __future__ import print_function		from __future__ import print_function
import sys		import sys
import re		import re
import os		import os
import subprocess		import subprocess
import multiprocessing		import multiprocessing
import argparse		import argparse

# Define functions which extract a list of pairs of (symbols, is_def) from a		# Define a function which extracts a list of pairs of (symbols, is_def) from a
# library using several different tools. We use subprocess.Popen and yield a		# library using llvm-nm becuase it can work both with regular and bitcode files.
# symbol at a time instead of using subprocess.check_output and returning a list		# We use subprocess.Popen and yield a symbol at a time instead of using
# as, especially on Windows, waiting for the entire output to be ready can take		# subprocess.check_output and returning a list as, especially on Windows, waiting
# a significant amount of time.		# for the entire output to be ready can take a significant amount of time.
		def nm_get_symbols(tool, lib):
def dumpbin_get_symbols(lib):		# '-P' means the output is in portable format,
process = subprocess.Popen(['dumpbin','/symbols',lib], bufsize=1,		# '-g' means we only get global symbols,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,		# '-Xany' enforce handling both 32- and 64-bit objects on AIX,
universal_newlines=True)		# '--no-demangle' ensure that C++ symbol names are not demangled; note
process.stdin.close()		# that llvm-nm do not demangle by default, but the system nm on AIX does
for line in process.stdout:		# that, so the behavior may change in the future,
# Look for external symbols		# '-p' do not waste time sorting the symbols.
match = re.match("^.+(SECT\|UNDEF).+External\s+\\|\s+(\S+).*$", line)		cmd = [tool,'-P','-g','-Xany','--no-demangle','-p']
if match:
yield (match.group(2), match.group(1) != "UNDEF")
process.wait()

def nm_get_symbols(lib):
# -P means the output is in portable format, and -g means we only get global
# symbols.
cmd = ['nm','-P','-g']
if sys.platform.startswith('aix'):
cmd += ['-Xany','-C','-p']
process = subprocess.Popen(cmd+[lib], bufsize=1,		process = subprocess.Popen(cmd+[lib], bufsize=1,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,		stdout=subprocess.PIPE, stdin=subprocess.PIPE,
universal_newlines=True)		universal_newlines=True)
process.stdin.close()		process.stdin.close()
for line in process.stdout:		for line in process.stdout:
# Look for external symbols that are defined in some section		# Look for external symbols that are defined in some section
# The POSIX format is:		# The POSIX format is:
# name type value size		# name type value size
# The -P flag displays the size field for symbols only when applicable,		# The -P flag displays the size field for symbols only when applicable,
# so the last field is optional. There's no space after the value field,		# so the last field is optional. There's no space after the value field,
# but \s+ match newline also, so \s+\S* will match the optional size field.		# but \s+ match newline also, so \s+\S* will match the optional size field.
match = re.match("^(\S+)\s+[BDGRSTuVW]\s+\S+\s+\S*$", line)		match = re.match("^(\S+)\s+[BDGRSTuVW]\s+\S+\s+\S*$", line)
if match:		if match:
yield (match.group(1), True)		yield (match.group(1), True)
# Look for undefined symbols, which have type U and may or may not		# Look for undefined symbols, which have type U and may or may not
# (depending on which nm is being used) have value and size.		# (depending on which nm is being used) have value and size.
match = re.match("^(\S+)\s+U\s+(\S+\s+\S*)?$", line)		match = re.match("^(\S+)\s+U\s+(\S+\s+\S*)?$", line)
if match:		if match:
yield (match.group(1), False)		yield (match.group(1), False)
process.wait()		process.wait()

def readobj_get_symbols(lib):		# Define a function which determines if the target is 32-bit Windows (as that's
process = subprocess.Popen(['llvm-readobj','--symbols',lib], bufsize=1,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
universal_newlines=True)
process.stdin.close()
for line in process.stdout:
# When looking through the output of llvm-readobj we expect to see Name,
# Section, then StorageClass, so record Name and Section when we see
# them and decide if this is an external symbol when we see
# StorageClass.
match = re.search('Name: (\S+)', line)
if match:
name = match.group(1)
match = re.search('Section: (\S+)', line)
if match:
section = match.group(1)
match = re.search('StorageClass: (\S+)', line)
if match:
storageclass = match.group(1)
if section != 'IMAGE_SYM_ABSOLUTE' and \
storageclass == 'External':
yield (name, section != 'IMAGE_SYM_UNDEFINED')
process.wait()

# Define functions which determine if the target is 32-bit Windows (as that's
# where calling convention name decoration happens).		# where calling convention name decoration happens).
		def readobj_is_32bit_windows(tool, lib):
def dumpbin_is_32bit_windows(lib):		output = subprocess.check_output([tool,'--file-header',lib],
# dumpbin /headers can output a huge amount of data (>100MB in a debug
# build) so we read only up to the 'machine' line then close the output.
process = subprocess.Popen(['dumpbin','/headers',lib], bufsize=1,
stdout=subprocess.PIPE, stdin=subprocess.PIPE,
universal_newlines=True)
process.stdin.close()
retval = False
for line in process.stdout:
match = re.match('.+machine $(\S+)$', line)
if match:
retval = (match.group(1) == 'x86')
break
process.stdout.close()
process.wait()
return retval

def objdump_is_32bit_windows(lib):
output = subprocess.check_output(['objdump','-f',lib],
universal_newlines=True)
for line in output.splitlines():
match = re.match('.+file format (\S+)', line)
if match:
return (match.group(1) == 'pe-i386')
return False

def readobj_is_32bit_windows(lib):
output = subprocess.check_output(['llvm-readobj','--file-header',lib],
universal_newlines=True)		universal_newlines=True)
for line in output.splitlines():		for line in output.splitlines():
match = re.match('Format: (\S+)', line)		match = re.match('Format: (\S+)', line)
if match:		if match:
return (match.group(1) == 'COFF-i386')		return (match.group(1) == 'COFF-i386')
return False		return False

# On AIX, there isn't an easy way to detect 32-bit windows objects with the system toolchain,
# so just assume false.
def aix_is_32bit_windows(lib):
return False

# MSVC mangles names to ?<identifier_mangling>@<type_mangling>. By examining the		# MSVC mangles names to ?<identifier_mangling>@<type_mangling>. By examining the
# identifier/type mangling we can decide which symbols could possibly be		# identifier/type mangling we can decide which symbols could possibly be
# required and which we can discard.		# required and which we can discard.
def should_keep_microsoft_symbol(symbol, calling_convention_decoration):		def should_keep_microsoft_symbol(symbol, calling_convention_decoration):
# Keep unmangled (i.e. extern "C") names		# Keep unmangled (i.e. extern "C") names
if not '?' in symbol:		if not '?' in symbol:
if calling_convention_decoration:		if calling_convention_decoration:
# Remove calling convention decoration from names		# Remove calling convention decoration from names
▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines	while len(arg) > 0:
arg = match.group(2)		arg = match.group(2)
continue		continue
# Some other kind of name that we can't handle		# Some other kind of name that we can't handle
components.append((arg, False))		components.append((arg, False))
return components		return components
return components		return components

def extract_symbols(arg):		def extract_symbols(arg):
get_symbols, should_keep_symbol, calling_convention_decoration, lib = arg		llvm_nm_path, should_keep_symbol, calling_convention_decoration, lib = arg
symbol_defs = dict()		symbol_defs = dict()
symbol_refs = set()		symbol_refs = set()
for (symbol, is_def) in get_symbols(lib):		for (symbol, is_def) in nm_get_symbols(llvm_nm_path, lib):
symbol = should_keep_symbol(symbol, calling_convention_decoration)		symbol = should_keep_symbol(symbol, calling_convention_decoration)
if symbol:		if symbol:
if is_def:		if is_def:
symbol_defs[symbol] = 1 + symbol_defs.setdefault(symbol,0)		symbol_defs[symbol] = 1 + symbol_defs.setdefault(symbol,0)
else:		else:
symbol_refs.add(symbol)		symbol_refs.add(symbol)
return (symbol_defs, symbol_refs)		return (symbol_defs, symbol_refs)

Show All 17 Lines	def get_template_name(sym, mangling):
# If any component is a template then return it		# If any component is a template then return it
for name, is_template in names:		for name, is_template in names:
if is_template:		if is_template:
return name		return name

# Not a template		# Not a template
return None		return None

if __name__ == '__main__':		def parse_tool_path(parser, tool, val):
tool_exes = ['dumpbin','nm','objdump','llvm-readobj']
parser = argparse.ArgumentParser(
description='Extract symbols to export from libraries')
parser.add_argument('--mangling', choices=['itanium','microsoft'],
required=True, help='expected symbol mangling scheme')
parser.add_argument('--tools', choices=tool_exes, nargs='*',
help='tools to use to extract symbols and determine the'
' target')
parser.add_argument('libs', metavar='lib', type=str, nargs='+',
help='libraries to extract symbols from')
parser.add_argument('-o', metavar='file', type=str, help='output to file')
args = parser.parse_args()

# Determine the function to use to get the list of symbols from the inputs,
# and the function to use to determine if the target is 32-bit windows.
tools = { 'dumpbin' : (dumpbin_get_symbols, dumpbin_is_32bit_windows),
'nm' : (nm_get_symbols, None),
'objdump' : (None, objdump_is_32bit_windows),
'llvm-readobj' : (readobj_get_symbols, readobj_is_32bit_windows) }
get_symbols = None
is_32bit_windows = aix_is_32bit_windows if sys.platform.startswith('aix') else None
# If we have a tools argument then use that for the list of tools to check
if args.tools:
tool_exes = args.tools
# Find a tool to use by trying each in turn until we find one that exists
# (subprocess.call will throw OSError when the program does not exist)
get_symbols = None
for exe in tool_exes:
try:		try:
# Close std streams as we don't want any output and we don't		# Close std streams as we don't want any output and we don't
# want the process to wait for something on stdin.		# want the process to wait for something on stdin.
p = subprocess.Popen([exe], stdout=subprocess.PIPE,		p = subprocess.Popen([val], stdout=subprocess.PIPE,
stderr=subprocess.PIPE,		stderr=subprocess.PIPE,
stdin=subprocess.PIPE,		stdin=subprocess.PIPE,
universal_newlines=True)		universal_newlines=True)
p.stdout.close()		p.stdout.close()
p.stderr.close()		p.stderr.close()
p.stdin.close()		p.stdin.close()
p.wait()		p.wait()
# Keep going until we have a tool to use for both get_symbols and		return val
# is_32bit_windows		except Exception:
if not get_symbols:		parser.error(f'Invalid path for {tool}')
get_symbols = tools[exe][0]
if not is_32bit_windows:		if __name__ == '__main__':
is_32bit_windows = tools[exe][1]		parser = argparse.ArgumentParser(
if get_symbols and is_32bit_windows:		description='Extract symbols to export from libraries')
break		parser.add_argument('--mangling', choices=['itanium','microsoft'],
except OSError:		required=True, help='expected symbol mangling scheme')
continue		parser.add_argument('--nm', metavar='path',
if not get_symbols:		type=lambda x: parse_tool_path(parser, 'nm', x),
print("Couldn't find a program to read symbols with", file=sys.stderr)		help='path to the llvm-nm executable')
exit(1)		parser.add_argument('--readobj', metavar='path',
if not is_32bit_windows:		type=lambda x: parse_tool_path(parser, 'readobj', x),
print("Couldn't find a program to determining the target", file=sys.stderr)		help='path to the llvm-readobj executable')
exit(1)		parser.add_argument('libs', metavar='lib', type=str, nargs='+',
		help='libraries to extract symbols from')
		parser.add_argument('-o', metavar='file', type=str, help='output to file')
		args = parser.parse_args()

# How we determine which symbols to keep and which to discard depends on		# How we determine which symbols to keep and which to discard depends on
# the mangling scheme		# the mangling scheme
if args.mangling == 'microsoft':		if args.mangling == 'microsoft':
should_keep_symbol = should_keep_microsoft_symbol		should_keep_symbol = should_keep_microsoft_symbol
else:		else:
should_keep_symbol = should_keep_itanium_symbol		should_keep_symbol = should_keep_itanium_symbol

Show All 14 Lines	for lib in args.libs:
break		break
if not any([lib.endswith(s) for s in suffixes]):		if not any([lib.endswith(s) for s in suffixes]):
print("Don't know what to do with argument "+lib, file=sys.stderr)		print("Don't know what to do with argument "+lib, file=sys.stderr)
exit(1)		exit(1)
libs.append(lib)		libs.append(lib)

# Check if calling convention decoration is used by inspecting the first		# Check if calling convention decoration is used by inspecting the first
# library in the list		# library in the list
calling_convention_decoration = is_32bit_windows(libs[0])		calling_convention_decoration = readobj_is_32bit_windows(args.readobj, libs[0])

# Extract symbols from libraries in parallel. This is a huge time saver when		# Extract symbols from libraries in parallel. This is a huge time saver when
# doing a debug build, as there are hundreds of thousands of symbols in each		# doing a debug build, as there are hundreds of thousands of symbols in each
# library.		# library.
pool = multiprocessing.Pool()		pool = multiprocessing.Pool()
try:		try:
# Only one argument can be passed to the mapping function, and we can't		# Only one argument can be passed to the mapping function, and we can't
# use a lambda or local function definition as that doesn't work on		# use a lambda or local function definition as that doesn't work on
# windows, so create a list of tuples which duplicates the arguments		# windows, so create a list of tuples which duplicates the arguments
# that are the same in all calls.		# that are the same in all calls.
vals = [(get_symbols, should_keep_symbol, calling_convention_decoration, x) for x in libs]		vals = [(args.nm, should_keep_symbol, calling_convention_decoration, x) for x in libs]
# Do an async map then wait for the result to make sure that		# Do an async map then wait for the result to make sure that
# KeyboardInterrupt gets caught correctly (see		# KeyboardInterrupt gets caught correctly (see
# http://bugs.python.org/issue8296)		# http://bugs.python.org/issue8296)
result = pool.map_async(extract_symbols, vals)		result = pool.map_async(extract_symbols, vals)
pool.close()		pool.close()
libs_symbols = result.get(3600)		libs_symbols = result.get(3600)
except KeyboardInterrupt:		except KeyboardInterrupt:
# On Ctrl-C terminate everything and exit		# On Ctrl-C terminate everything and exit
Show All 34 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CMake] Use LLVM own tools in extract_symbols.pyClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 522378

llvm/CMakeLists.txt

llvm/cmake/modules/AddLLVM.cmake

llvm/cmake/modules/CrossCompile.cmake

llvm/tools/llvm-nm/CMakeLists.txt

llvm/tools/llvm-readobj/CMakeLists.txt

llvm/tools/llvm-shlib/CMakeLists.txt

llvm/utils/extract_symbols.py

[CMake] Use LLVM own tools in extract_symbols.py
ClosedPublic