This is an archive of the discontinued LLVM Phabricator instance.

Use filename in linemarker when compiling preprocessed source
ClosedPublic

Authored by twoh on Jan 16 2017, 11:54 PM.

Details

Summary

Clang appears to always use name as specified on the command
line, whereas gcc uses the name as specified in the linemarker at the
first line when compiling a preprocessed source. This results mismatch
between two compilers in FILE symbol table entry. This patch makes clang
to resemble gcc's behavior in finding the original source file name and
use it as an input file name.

Even with this patch, values of FILE symbol table entry may still be
different because clang uses dirname+basename for the entry whlie gcc
uses basename only. I'll write a patch for that once this patch is
committed.

Diff Detail

Repository
rL LLVM

Event Timeline

twoh created this revision.Jan 16 2017, 11:54 PM

Hey @twoh! Can you say a little more about the larger picture here? What part of our output is different from gcc's and what does that affect? We already have code that processes linemarkers (e.g. https://reviews.llvm.org/diffusion/L/browse/cfe/trunk/lib/Lex/PPDirectives.cpp;292239$1296) - can you use that to get the right output?

lib/Frontend/FrontendAction.cpp
390 ↗(On Diff #84637)

This seems fragile. Does the directive specifying the source file name only apply to preprocessed files, or can we remove the if statement here and always look for the marker? If we must have a special case for preprocessed files, can we add a method to FrontendInputFile to check for preprocessed files? Also see my other comment about the existing processing of linemarkers, which also processes markers not at the beginning of the file.

twoh added a comment.Jan 17 2017, 11:18 AM

Hello @inglorion, Thanks for the comment! Following is the repro for the issue we have:

$ touch test.c
$ gcc -E -o test.i test.c
$ gcc -c -o test.o test.i
$ readelf -a test.o | grep test
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS test.c
$ clang test.c -E -o test.i
$ clang -c -o test.o test.i
$ readelf -a test.o | grep test
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS test.i
$ clang -c -o test.o test.c
$ readelf -a test.o | grep test
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS test.c

As you can see, FILE symbol is different for gcc and clang if preprocessed file is given as an input.

I'm trying to resemble the gcc behavior of https://github.com/gcc-mirror/gcc/blob/master/libcpp/init.c#L641-L647. It is only for preprocessed file, and only for the first line. I was thinking of using existing code for handling linemarkers as well, but decided not to do it because it may slowdown the general cases.

I like your idea of having a method to check if it is preprocessed input. I'll update it shortly. Thanks!

twoh updated this revision to Diff 84711.Jan 17 2017, 11:31 AM

Addressing comments from @inglorion

ddunbar resigned from this revision.Jan 17 2017, 11:35 AM

I added a couple of comments about the test. I still want to take a look at if we can get the file name using the existing code. It may take me a few days before I can get to that, so I'm sending the comments on the test in the meantime.

test/Frontend/preprocessed-input.c
2 ↗(On Diff #84711)

We don't put the input filename in the object file for all targets. To make the test portable, you can add "-triple i686-linux-gnu" to the arguments you're passing to clang.

3 ↗(On Diff #84711)

I don't think readelf is necessarily available on all systems. Can you use llvm-objdump -t instead?

twoh updated this revision to Diff 84887.Jan 18 2017, 2:08 PM

Addressing comments about the test from @inglorion. Thanks!

twoh updated this revision to Diff 84889.Jan 18 2017, 2:12 PM

Something was wrong with the previous update.

I've had some more time to look at this, and I think your approach is basically good. My idea of reading the source location after we've done some parsing would be a lot more work to implement: the string we care about is currently passed as a StringRef to the functions BeginSourceFile calls; to implement what I was thinking, we would need some object we could query for the name. Let's punt on that. With that said, I have a couple of inline comments. Once you've addressed those, I'm happy with the diff.

lib/Frontend/FrontendAction.cpp
192 ↗(On Diff #84889)

Since you're relying on InputFile not being null, please make it a reference instead of a pointer.

204 ↗(On Diff #84889)

Two comments here. First, the code reads tokens until it reaches the next line, but we really only care about the first 3 tokens, and only if they are hash, numeric literal, and string literal, in that order. Secondly, there is a handy class StringLiteralParser which will decode strings in source code, which is probably more correct than just stripping quotes (and also what the other linemarker processing code uses). How about:

This will read tokens until we reach the next line, but we really only care about the first three, and only if the first two are a hash and a numeric literal. How about:

Token T;
if (RawLexer->LexFromRawLexer(T) || T.getKind() != tok::hash)
  return false;
if (RawLexer->LexFromRawLexer(T) || T.isAtStartOfLine() ||
    T.getKind() != tok::numeric_constant)
  return false;
if (RawLexer->LexFromRawLexer(T) || T.isAtStartOfLine() ||
    T.getKind() != tok::string_literal)
  return false;

StringLiteralParser Literal(T, CI.getPreprocessor());
InputFile = Literal.GetString().str();
return true;

You will need to #include "clang/Lex/LiteralSupport.h"

389 ↗(On Diff #84889)

The lifetime of this string doesn't match the lifetime of the string that normally backs the filename (which is stored in Input). I don't think anything currently depends on the StringRef being valid after BeginSourceFile returns, but it may be worth adding a comment to that effect (or extending the lifetime).

twoh added inline comments.Jan 19 2017, 4:00 PM
lib/Frontend/FrontendAction.cpp
192 ↗(On Diff #84889)

I made it a pointer to make it clear that InputFile can be modified inside this function. Does LLVM have a guideline for this? I couldn't find one from http://llvm.org/docs/CodingStandards.html.

twoh updated this revision to Diff 85059.Jan 19 2017, 4:39 PM

Addressing comments from @inglorion. Thanks!

This revision is now accepted and ready to land.Jan 20 2017, 8:41 AM
twoh added a comment.Jan 23 2017, 1:42 PM

Hello @inglorion, I'm experiencing an issue with landing this patch with my userid now. If you have a commit privilege, can you please land it for me?

This revision was automatically updated to reflect the committed changes.