This is an archive of the discontinued LLVM Phabricator instance.

[lit] Fix tool substitution when paths or arguments contain spaces
AbandonedPublic

Authored by zturner on Nov 28 2018, 1:35 PM.

Details

Summary

If you try to substitute a path or argument with spaces in it, this would lead to an invalid substitution. This should fix it.

Diff Detail

Event Timeline

zturner created this revision.Nov 28 2018, 1:35 PM
stella.stamenova accepted this revision.Nov 28 2018, 1:48 PM

I tested this with your other change and the tests ran successfully. Some of the test code does use single quotes on Windows and double quotes on Linux, though, so it may be worth checking for both regardless of platform.

This revision is now accepted and ready to land.Nov 28 2018, 1:48 PM
rnk added inline comments.Nov 28 2018, 3:57 PM
llvm/utils/lit/lit/llvm/subst.py
134–139

This is in LLVM, so let's just assume we're dealing with the lit internal shell or bash. In that case, we can use ' everywhere and the escaping is easy:

return "\'" + x.replace(r"'", r"'\''") + "\'"

Then you don't need a big comment explaining why we only care about spaces etc.

zturner abandoned this revision.EditedNov 29 2018, 11:41 AM

So unfortunately I think this isn't going to work, and I don't think there's a good way to fix it. This is just one example, but it illustrates the general problem that comes up in several other places as well.

Somewhere, for clang, we have this:

ToolSubst('%clang_analyze_cc1', command='%clang_cc1', extra_args=['-analyze', '%analyze']+additional_flags),
ToolSubst('%clang_cc1', command=self.config.clang, extra_args=['-cc1', '-internal-isystem', builtin_include_dir, '-nostdsysteminc']+additional_flags),

The order here matters. Since we apply substitutions in forward list order, if we see %clang_analyze_cc1 in a test file, we apply the first substitution, and since now everything is quoted, it is replaced with this:

'%clang_cc1' '-analyze' '%analyze`

Now we apply substitutions again, and we end up with this:

''path/to/clang' '-cc1' '-internal-system' 'some/dir' '-nostdsysteminc''

So now this entire string is quoted twice. By the time it goes to actually run the line, you have something like this:

"d:\src\llvmbuild\cl\debug\x64\bin\clang.EXE -cc1 -internal-isystem d:\src\llvmbuild\cl\debug\x64\lib\clang\8.0.0\include -nostdsysteminc" "-analyze" "-analyzer-constraints=range" "-triple" "x86_64-unknown-linux-gnu" "-analyzer-checker=core" "-verify" "D:\src\llvm-mono\clang\test\Analysis\z3\apsint.c"

Note here that d:\src\llvmbuild\cl\debug\x64\bin\clang.EXE -cc1 -internal-isystem d:\src\llvmbuild\cl\debug\x64\lib\clang\8.0.0\include -nostdsysteminc will be treated as a single executable name, which obviously is incorrect.

I don't really think there's a good way to handle this. We could do something like force the user to write:

ToolSubst('%clang_analyze_cc1', command=NestedSubstitution('%clang_cc1'), extra_args=['-analyze', NestedSubstitution('%analyze')]+additional_flags)

but that seems to just be adding more complexity. So I'll probably just fix this at the call site.