This is an archive of the discontinued LLVM Phabricator instance.

Fix dependency file escaping
ClosedPublic

Authored by probinson on Apr 22 2015, 1:32 PM.

Details

Summary

When writing a dependency (.d) file, if space or # is immediately
preceded by backslashes, escape the backslashes as well as the space
or # character.
This straddles the fence between BSD Make (which does no escaping at
all, and does not support space or # in filespecs) and GNU Make (which
does support escaping, but will fall back to the filespec as-written
if the escaping doesn't match an existing file).

Diff Detail

Event Timeline

probinson updated this revision to Diff 24255.Apr 22 2015, 1:32 PM
probinson retitled this revision from to Fix dependency file escaping.
probinson updated this object.
probinson edited the test plan for this revision. (Show Details)
probinson added reviewers: silvas, emaste.
probinson added a subscriber: Unknown Object (MLST).
silvas requested changes to this revision.Apr 22 2015, 2:31 PM
silvas edited edge metadata.
silvas added inline comments.
lib/Frontend/DependencyFile.cpp
298
This revision now requires changes to proceed.Apr 22 2015, 2:31 PM
probinson added inline comments.Apr 22 2015, 3:20 PM
lib/Frontend/DependencyFile.cpp
298

No, we're trying to produce something that GNU Make can use. What gcc produces is broken by any definition.

$ cat tspace.c
#include "foo\ bar.h"
#include "foo\#bar.h"
int i;
$ ls
foo\ bar.h foo\#bar.h tspace.c
$ gcc -c -MMD tspace.c
$ cat tspace.d
tspace.o: tspace.c foo\\\ bar.h foo\\#bar.h
$ make -f tspace.d tspace.o
make: *** No rule to make target foo\', needed by tspace.o'. Stop.
$

silvas added inline comments.Apr 22 2015, 4:15 PM
lib/Frontend/DependencyFile.cpp
298

Please file a bug with gcc.

Also, curiously that .d file seems to work for me on windows under cygwin:

Sean@Sean-PC ~
$ cat >tspace.c
#include "foo\ bar.h"
#include "foo\#bar.h"
int i;

Sean@Sean-PC ~
$ mkdir foo

Sean@Sean-PC ~
$ cd foo

Sean@Sean-PC ~/foo
$ mkdir ' bar.h'

Sean@Sean-PC ~/foo
$ mkdir '#bar.h'

Sean@Sean-PC ~/foo
$ cd ..

Sean@Sean-PC ~
$ cat >tspace.d
tspace.o: tspace.c foo\\\ bar.h foo\\#bar.h

Sean@Sean-PC ~
$ make -f tspace.d
cc    -c -o tspace.o tspace.c
make: cc: Command not found
<builtin>: recipe for target 'tspace.o' failed
make: *** [tspace.o] Error 127

Also, clang's current output seems to work fine on cygwin. In what cases is clang's output currently breaking? Is it just the case of filenames containing backslashes followed by space/hash on unix?

The set of strings that can be correctly escaped in make is inherently limited, so we are inherently compromising on what file names can be properly represented. Does it make sense to extend the set of properly representable filenames to files containing backslashes in their name?

E.g. include "foo\ bar.h". On unix we would be instructing make to look for a file 'foo\ bar.h' in the current directory but on windows in my testing above we would be instructing make to look for a file ' bar.h' in directory 'foo'. Any project for which that would actually be correct behavior is insane. I think that realistically, backslashes are only going to end up in includes on windows where they will be intended as directory separators, and will result in compilation errors on other platforms. I believe that clang's current output actually works correctly in that case.

We might be overthinking this. What case are we fixing? And does it make sense to fix that case?

probinson added inline comments.Apr 22 2015, 6:51 PM
lib/Frontend/DependencyFile.cpp
298

Cygwin is not doing the right thing for me. Using Clang's depfile:
{code}
probinson@PROBINSON-T3610 ~
$ /cygdrive/e/upstream/build/debug/bin/clang -c -MMD tspace.c

probinson@PROBINSON-T3610 ~
$ cat tspace.d
tspace.o: tspace.c foo\\ bar.h foo\\#bar.h

probinson@PROBINSON-T3610 ~
$ touch 'foo\#bar.h'

probinson@PROBINSON-T3610 ~
$ make -f tspace.d tspace.o
make: *** No rule to make target 'bar.h', needed by 'tspace.o'. Stop.
{code}
Escaping the space is no good unless you escape the preceding backslash.

With gcc's depfile:
{code}
probinson@PROBINSON-T3610 ~
$ cat tspace.d
tspace.o: tspace.c foo\\\ bar.h foo\\#bar.h

probinson@PROBINSON-T3610 ~/foo
$ touch 'foo/#bar.h'

probinson@PROBINSON-T3610 ~
$ make -f tspace.d tspace.o
make: 'tspace.o' is up to date.
{code}
I suspect it's looking at directory foo, and treating #bar.h as a comment.

But if I add the extra backslash manually, it works correctly.
{code}
probinson@PROBINSON-T3610 ~
$ cat tspace.d
tspace.o: tspace.c foo\\\ bar.h foo\\\#bar.h

probinson@PROBINSON-T3610 ~
$ touch 'foo/#bar.h'

probinson@PROBINSON-T3610 ~
$ make -f tspace.d tspace.o
cc -c -o tspace.o tspace.c
{code}

So, what am I fixing? If we're going to do anything special with space and hash, then we should do a special thing that causes them to work properly with a tool that knows how to interpret the special thing. The only tool I have handy that is willing to do a special thing with space and hash is GNU Make (on Linux or Windows). The special thing to do is backslash-quote those characters. But that doesn't work in GNU Make unless you also backslash-quote the preceding backslashes; and that characteristic is the same for quoting either space or hash.

Therefore, what clang produces won't work at all, and what gcc produces will work only for space and not for hash. My fix makes Clang's output work for both space and hash. Provably.

I freely admit--always have, I think--that this is a rather obscure corner case that just isn't going to happen in practice. But it's provably wrong, and my fix provably corrects it. This is a problem?

Alternatively, we can just abandon any special processing for space and hash, and say Don't Do That. But leaving this thing in its current provably wrong state just leaves me professionally offended.

probinson added inline comments.Apr 23 2015, 12:48 PM
lib/Frontend/DependencyFile.cpp
298

And why I want to fix this... aside from the professionally-offended part... :-)

We are looking at modifying the PS4 toolchain's builder to move away from double-quotes, to doing things the GNU-Make style way. But I don't want to require our builder to correctly interpret broken depfiles; Clang should be producing correctly formed depfiles, and our builder should be able to depend on depfiles being correctly formed.

Please file a bug with gcc.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65852
I traded comments on this with somebody yesterday; not actually Confirmed, but he also didn't reject it.

probinson updated this revision to Diff 24568.Apr 28 2015, 12:11 PM
probinson edited edge metadata.

Rebase on top of the other depfile change.

Ping. Sean, I filed a gcc bug as noted previously, which two people have commented on and admitted it looks funny.

silvas added a comment.May 8 2015, 2:37 PM

Please add a lengthy comment describing exactly how to reproduce the case where GCC's output is "broken" and our deviation from its behavior results in things not being broken. Also, in the test file, please isolate all testing of this behavioral deviation and identify it as such. Other than that, this LGTM.

thakis added a subscriber: thakis.May 8 2015, 3:09 PM

Does gcc intend to fix this soon? Isn't being compatible with gcc important
than the other things?

Does gcc intend to fix this soon? Isn't being compatible with gcc important
than the other things?

If gcc emitted an incorrect relocation, would you argue that it's important to be compatible with gcc? Even if you could not point to any linker that handled that buggy relocation in a reasonable way?

probinson updated this revision to Diff 25392.May 8 2015, 4:51 PM
probinson edited edge metadata.

Update the test to isolate the not-like-gcc behavior, and incorporate a Lengthy Comment on the function implementing this behavior as requested by reviewer.

If a program (say, ninja) tries to be compatible with gnu make's depfile parsing, it would previously convert "\ " to a space from what I understand. Now it's going to get "\\\ " and think that that's "\ ".

You're mixing things up. #include "\ " would be converted by gcc to "\\ " (because it escapes the space but not the backslash) which would be de-escaped by GNU Make as "\" followed by a space delimiter.
Now Clang will give it "\\\ " which will be handled as "\ " which is correct.
(Remember that the string you hand to #include is NOT a normal C string; it has no escaping in it.)
--paulr

From: cfe-commits-bounces@cs.uiuc.edu [mailto:cfe-commits-bounces@cs.uiuc.edu] On Behalf Of Nico Weber
Sent: Friday, May 08, 2015 4:36 PM
To: reviews+D9208+public+0ac91376ffbd3e34@reviews.llvm.org
Cc: cfe-commits@cs.uiuc.edu
Subject: Re: [PATCH] Fix dependency file escaping

This revision was automatically updated to reflect the committed changes.

Paul, can you go ahead and commit the non-controversial part of this patch? It sounds like we're getting really held up on the escaping of # when the other part of this patch is pure win.

Looks like we timed-out simultaneously... if people really insist that the # part is a problem, I can undo that as a post-commit fixup.