This is an archive of the discontinued LLVM Phabricator instance.

Replace gen_dynamic_list.py with a portable shell script
AbandonedPublic

Authored by krytarowski on Dec 31 2018, 2:00 PM.

Details

Reviewers
mgorny
vitalybuka
Summary

Reimplement the Python script with a shell one, as
Python is not available in all environments.

This is especially needed in the integration process
of compiler-rt sanitizers into the NetBSD distribution,
where Python is absent and must not be used as a tool.

New script is also designed to be applicable for
cross-building to a different CPU architecture.
It checks the target CPU with file(1) rather than
relying on host's uname value.

No functional change for existing users. The shell
script generates exactly the same result as the
original Python file for all the .syms files on NetBSD.

Diff Detail

Repository
rL LLVM

Event Timeline

krytarowski created this revision.Dec 31 2018, 2:00 PM

Also, note that technically all commands in shell may fail, and especially commands operating on external files, so you may consider adding some more error handling. You may get very weird results e.g. if you ran out of space in the middle of the script.

lib/sanitizer_common/scripts/gen_dynamic_list.sh
19

You still have .py here.

33

You can fold the command inside, i.e. if ! $my_nm ...; then

34

Why are you using printf when the same result could be achieved by simpler echo?

42

By convention, global variables in shell script should be uppercase.

45

You probably want to respect $TMPDIR here. Also, $(...) is strongly preferred over backticks which are unreadable and non-nesting friendly. Quote the filenames, in case TMPDIR contained spaces.

And, well, speaking of portability mktemp(1) is not POSIX.

48

Quote the file paths inside too.

50

Why not

: "${NM:=nm}"

?

61

Don't use the x hack, it is really, really obsolete and even autotools discourage that by now.

63

echo

Also, don't you want to exit 1 or like on error?

177

I think the standard awk way would be to use ~ regex match here.

199

Quote file paths here and below.

Why are you passing $tmpfile2 twice?

krytarowski marked 8 inline comments as done.Dec 31 2018, 9:32 PM

Also, note that technically all commands in shell may fail, and especially commands operating on external files, so you may consider adding some more error handling. You may get very weird results e.g. if you ran out of space in the middle of the script.

Is it fine to go here for set -e?

lib/sanitizer_common/scripts/gen_dynamic_list.sh
33

Right, but it's easier for debugging this code with unfolded commands.

34

No reason, I can switch to echo.

42

Style in LLVM? It depends whom you ask. Uppercase symbols are for global variables pulled from environment. If this is the style of LLVM, I will align to it.

45

Right with $TMPDIR.

Is there a better replacement for mktemp(1)? It's available in all modern BSDs, Darwin, GNU and commercial Unices (Solaris, HP/UX). The API behind the command is POSIX though (mktemp(3)).

And if I would need to invent mktemp(1) with a handwritten random number generator, it's tricky to get such number as well.

backticks vs $() is a matter of taste here, if LLVM recommends one or the other, I will switch it (or keep it).

61

It's not a hack, but portability idiom. There might be issues with evaluating empty strings "" or using -z.

63

Right, missing exit.

177

For some reason ^_interceptor_ didn't work as expected and there is need to skip namespaced (and mangled) symbols. We always know the length/position here so it's a straightforward solution.

199

There are two passes NR==FNR { to catch the first one and the other branch for the 2nd one.

Do we support building in paths with spaces in names of directories? It doesn't cost much to add quotes so I will add them.

krytarowski marked an inline comment as done.Dec 31 2018, 9:37 PM

Does this work and produce exactly the same result on Linux (.syms content) with the original .py script and the .sh version?

lib/sanitizer_common/scripts/gen_dynamic_list.sh
50

:= would be needed if nm could be expanded to something, we probably don't need it property.

krytarowski marked an inline comment as done.Dec 31 2018, 9:48 PM
krytarowski added inline comments.
lib/sanitizer_common/scripts/gen_dynamic_list.sh
19

It probably overly closely emulated the original version.. I will change it.

Shell is not available on Windows, python however is considered a portable shell within the project. There are a number of tools within the LLVM project that require python, I'm not sure that removing the dependency is the right thing to do here.

Alternatively, I think this could be implemented in CMake script.

Shell is not available on Windows, python however is considered a portable shell within the project. There are a number of tools within the LLVM project that require python, I'm not sure that removing the dependency is the right thing to do here.

It's not portable enough to be kept as a build dependency (it's fine for tests and auxiliary infrastructure). I need to replace it with something else.

Python is not a base of any BSD and every project integrating gen_dynamic_list will need to reinvent/replace this Python script. It's the current case with NetBSD.

As an alternative to shell, I can write it fully with awk(1) (+ nm(1), file(1)) but it will probably face the same concern as sh(1).

Alternatively, I think this could be implemented in CMake script.

In the end I'm replacing CMake with BSD Makefiles too. I'm aware that CMake isn't used by at least RTEMS too.
Not sure if I can express this script with CMake rules (at least few years ago it used to be at least impracticably difficult to script with CMake, if possible at all).

A hybrid solution of keeping this .sh file around in upstream repo is fine too.

I can write a C program that is built as a tool and generates these .syms files (and calls with system(3) (C89 call) and assume that there is available nm(1) and file(1)).

If there is no interest upstream I can keep the shell script downstream.

mgorny added a comment.Jan 1 2019, 7:26 AM

Hmm, a C++ program using LLVMSupport, alike tablegen, would probably work for everyone, wouldn't it?

krytarowski abandoned this revision.Jan 1 2019, 8:16 AM

Using LLVM libraries is fine, a little bit of dependency as we need to get LLVM libraries/headers to craft tools.

We have currently a similar issue with tablegen used as a tool.

I will keep using downstream this .sh for now. If someone intends to pick this .sh (and even integrate with LLVM) feel free to do so.

lib/sanitizer_common/sanitizer_common_interceptors.inc