This is an archive of the discontinued LLVM Phabricator instance.

Fix USR generation in the presence of #line directives or linemarkes
Needs ReviewPublic

Authored by mikhail.ramalho on Feb 6 2018, 7:29 AM.

Details

Summary

Small change on how the USRGen code prints the location.

The patch fixes an issue when there are #line directives or linemarkes in the file, e.g.:

#line 3
int Func(int arg);

#line 10 "file.h"
int Func(int arg1);

# 1 "file1.c" 1
int Func(int arg);

These testes were added to test/Index/usrs.cpp.

I changed the printLoc function to, instead of adding the declaration offset to the name, it gets the presumedLoc (which handles line directives or linemarkes), and adds the line and column to the name, in the format <line>:<column>.

The patch only changes the printLoc function in lib/index/USRGeneration.cpp; the other changes are test updates.

Diff Detail

Event Timeline

mikhail.ramalho created this revision.Feb 6 2018, 7:29 AM
pmatos added a subscriber: pmatos.Feb 21 2018, 6:50 AM

Could you elaborate on why the old behaviour is broken?

Sure. Basically, the previous code would not generate the USR for the function's parameters.

The issue was that SM.getFileEntryForID would return NULL because there is no actual file, that's why I changed to get the presumedLoc and build the name using the column/line.

I know that using column/line not the preferable method to generate USR but I couldn't find a way to generate the offset of a presumed location.

This comment was removed by ilya-biryukov.

Sorry for the delay.

Sure. Basically, the previous code would not generate the USR for the function's parameters.
The issue was that SM.getFileEntryForID would return NULL because there is no actual file

Why wasn't there a file for function parameter? Function parameters *are* declared in some file, or am I missing something?

More general question is: how do we want USRs for function parameters to work, specifically should USR of the same param of different declarations be the same or different?

Why wasn't there a file for function parameter? Function parameters *are*
declared in some file, or am I missing something?

They are declared in some file defined by the line markers; the file are
not registered in the SourceManager as actual files, so getting the
FileEntry will always fail, that's why I changed it to get the PresumedLoc.

More general question is: how do we want USRs for function parameters to

work, specifically should USR of the same param of different declarations
be the same or different?

That's a good point, this patch will generated different names for the same
function param if a function is first defined then declared somewhere else.

I guess it should follow the USR generation pattern for FunctionDecls, what
do you think?

They are declared in some file defined by the line markers; the file are
not registered in the SourceManager as actual files, so getting the
FileEntry will always fail, that's why I changed it to get the PresumedLoc.

That's the part I'm confused about.
Does any of the examples in the current patch have this case (files are not registered in the source manager, but defined by the file markers)?
I assume all examples in the current patch will produce USRs even without your changes, is this correct or am I missing something?

More general question is: how do we want USRs for function parameters to

work, specifically should USR of the same param of different declarations
be the same or different?

That's a good point, this patch will generated different names for the same
function param if a function is first defined then declared somewhere else.

I guess it should follow the USR generation pattern for FunctionDecls, what
do you think?

I guess it depends on the use-case, but USRs for function params do not seem to provide much value if they aren't equal across different decls of the same function.

But I'm not sure whether they were designed with this use-case in mind or not.
E.g. if they are equal, we can two Decls with the same USR, but different names:

int func(int param1);
int func(int param2); 
// param1 and param2 could both have the same USR, but different names. That might (or might not) be surprising.
ilya-biryukov added a comment.EditedApr 25 2018, 2:00 AM

I assume all examples in the current patch will produce USRs even without your changes, is this correct or am I missing something?

Or is it the case that whenever there's a #line directive we get into a "virtual" file that's not registered in the SourceManager?

Hi,

Or is the that whenever there's a #line directive we get into a
"virtual" file that's not registered in the SourceManager?

The virtual file is actually registered in the SourceManager but the
FileEntry for it is NULL (USRGeneration.cpp:33), which forces printLoc to
return true (USRGeneration.cpp:38) and the USR is not generated.

I believe the USR gen for params should have follow the functionDecl
convention. I'm reworking the patch now.

int func(int param1);
int func(int param2);
// param1 and param2 could both have the same USR, but different names.
That might (or might not) be surprising.

I agree here, they should have the same USR.

The virtual file is actually registered in the SourceManager but the
FileEntry for it is NULL (USRGeneration.cpp:33), which forces printLoc to
return true (USRGeneration.cpp:38) and the USR is not generated.

I still don't get why the file is virtual. Looking at the code in SourceManager, presumed location uses FileEntry of passed location (actually, its expansion location, but that shouldn't matter) and then translates line numbers according to the #line directives in the file.
So the question is: why is FileEntry null for original location, but not null for PresumedLoc?

Sorry for confusion, if any, I just want to understand to make sure we're looking at the right place to solve your original problem.

int func(int param1);
int func(int param2);
// param1 and param2 could both have the same USR, but different names.
That might (or might not) be surprising.

I agree here, they should have the same USR.

Let's get more opinions on this, I'm not 100% certain about it myself :-)

@arphaman, @bkramer, @hokein, @ioeric, @sammccall, should parameters of different declarations for the same function overload have the same or different USRs? WDYT?

Re: locations in parameter USRs:

OK, so reading through the code, it looks like locations are included in USRs:

  • for macros (except from system headers)
  • for decls that aren't externally visible (static etc, function parameters, locals)
  • an objective-c class extension case I don't really understand

After thinking about this a bit, and use cases like rename and find-declaration that could be USR based, I think including some location information is the right way to go, which I think is the current behavior.

(I think for the patch itself we're waiting on details about the case that doesn't currently work?)

Hi,

After thinking about this a bit, and use cases like rename and
find-declaration that could be USR based, I think including some location
information is the right way to go, which I think is the current behavior.

What do you man by location information? Only the filename or filename +
offset (current behaviour)?

Hi,

After thinking about this a bit, and use cases like rename and
find-declaration that could be USR based, I think including some location
information is the right way to go, which I think is the current behavior.

What do you man by location information? Only the filename or filename +
offset (current behaviour)?

Filename + offset for things like function parameters, where we have to identify the particular function declaration they're part of.
For static functions themselveds, just the filename. I think this is current behavior in both cases.

Hi,

Filename + offset for things like function parameters, where we have to
identify the particular function declaration they're part of.
For static functions themselveds, just the filename. I think this is
current behavior in both cases.

But then we're back to the initial question, what to do when there are
linemarkers that change the filename/line number?

Should we ignore linemarkers and use filename + offset of the real file?
Should we try to calculate the offset of the decl in the virtual file?

Thank you,

Should we ignore linemarkers and use filename + offset of the real file?

I would say "yes". Let's not rely on linemarkers, unless we can explain why that's a good idea.

Should we try to calculate the offset of the decl in the virtual file?

Where do virtual files come from in the first place?

mikhail.ramalho added a comment.EditedMay 2 2018, 7:18 AM

Hi,

I would say "yes". Let's not rely on linemarkers, unless we can explain
why that's a good idea.

Sounds reasonable to me, specially considering cases like rename and
find-declaration.

Where do virtual files come from in the first place?

From the linemarker:
https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html

For instance:

$ cat foo.c

int f(int a);

# 1 "file1.c" 1
int g(int b);

clang generates:

|-FunctionDecl 0x6866ff0 <foo.c:2:1, col:12> col:5 f 'int (int)'
| `-ParmVarDecl 0x6866f30 <col:7, col:11> col:11 a 'int'
`-FunctionDecl 0x6867138 <file1.c:1:1, col:12> col:5 g 'int (int)'
  `-ParmVarDecl 0x68670b0 <col:7, col:11> col:11 b 'int'

Note that the location of f and g are different, despite being in the same
file.

The preprocessor inserts linemarkers by default:

$ clang foo.c -E
# 1 "foo.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 349 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "foo.c" 2

int f(int a);


# 1 "file1.c" 1
int g(int b);

unless you call it with -P:

$ clang foo.c -E -P

int f(int a);


int g(int b);

Hi,

Where do virtual files come from in the first place?

From the linemarker:

I tried dumping locations in presence of line markers.
They have non-null FileEntry and a reasonable offset, so the original code should work just fine in presence of line markers.
I don't see why changing to presumed locations fixes the issue with not generating the USRs.

Could you provide a minimal example where USRs are not generated? It might be the case that there are other ways to fix it.

Could you provide a minimal example where USRs are not generated? It might
be the case that there are other ways to fix it.

Sure, I'll try to reduce our testcase, but basically we have an
ASTFrontendAction [0] that adds a set of intrinsics [1] to the preprocessor
[2].

[0]
https://github.com/esbmc/esbmc/blob/master/src/clang-c-frontend/AST/esbmc_action.h
[1]
https://github.com/esbmc/esbmc/blob/master/src/clang-c-frontend/clang_c_language.cpp#L206
[2]
https://github.com/esbmc/esbmc/blob/master/src/clang-c-frontend/AST/esbmc_action.h#L31