Page MenuHomePhabricator

[LibTooling] Add retrieval of extended AST-node source to FixIt library
ClosedPublic

Authored by ymandel on Feb 22 2019, 2:32 PM.

Details

Summary

Introduces variants of getText and getSourceRange that extract the source text of an AST node potentially with a trailing token.

Some of the new functions manipulate CharSourceRanges, rather than SourceRanges, because they document and dynamically enforce their type. So, this revision also updates the corresponding existing FixIt functions to manipulate CharSourceRanges. This change is not strictly necessary, but seems like the correct choice, to keep the API self-consistent.

This revision is the first in a series intended to improve the abstractions available to users for writing source-to-source transformations. A full discussion of the end goal can be found on the cfe-dev list with subject "[RFC] Easier source-to-source transformations with clang tooling".

Diff Detail

Repository
rL LLVM

Event Timeline

ymandel created this revision.Feb 22 2019, 2:32 PM

Thanks, the APIs totally make sense. And seem to fit into the other functions we have in FixIt.h, although the name of the file is somewhat misleading.
Mostly comments about naming and one comment about the necessity of using matchers from my side.

clang/include/clang/Tooling/FixIt.h
60 ↗(On Diff #187957)

Do you have alternative names in mind? It would be nice to (1) not mention the SourceRange now that we return CharSourceRange, (2) change "auto" to something more descriptive.

Was thinking about getNodeRange() or getSpannedRange(), but that completely misses the "auto" part (maybe it's fine, though).
WDYT? Maybe other ideas?

73 ↗(On Diff #187957)

What are other tricky cases you have in mind for the future?

77 ↗(On Diff #187957)

Could you add an example of the API use here? The anticipated use-case are removing or textually replacing a node, right?

clang/lib/Tooling/FixIt.cpp
52 ↗(On Diff #187957)

Do you expect this function to be on the hot path?
If so, I'd advice against using the matchers here. They do add enough overhead to be avoided in hot functions.

I guess the problem is that we can't get a hold of the parent node with using the matchers, right?
Not sure if there's an easy way out of it in that case.

ymandel marked 4 inline comments as done.Feb 26 2019, 5:43 AM
ymandel added inline comments.
clang/include/clang/Tooling/FixIt.h
60 ↗(On Diff #187957)

I completely agree. I went through quite a few iterations on this name and disliked this one the least. ;) I think you're right, though, that once we're choosing a different name, the "auto" part doesn't really need to be in it. I like getSpannedRange better than this. Other thoughts:

getLogicalRange
getExtendedRange
getAssociatedRange

any preference?

73 ↗(On Diff #187957)

I just assumed that we'd hit more as we dig into them, but, I'm not so sure now. The two others I can think of offhand are 1) extending to include associated comments, 2) semicolons after declarations. Commas present a similar challenge (if a bit simpler) when used in a list (vs. the comma operator). Are there any other separators in C++?

At a higher level, it would be nice to align this with your work on tree transformations. That is, think of these functions as short-term shims to simulate the behavior we'll ultimately get from that new library. However, it might be premature to consider those details here.

77 ↗(On Diff #187957)

Yes, that's right. Will do (on next edit, once we've resolved naming, etc.)

clang/lib/Tooling/FixIt.cpp
52 ↗(On Diff #187957)

In the context of transformer, I expect that this will be called in proportion to the number of times that a match callback is invoked. so, I expect there to already be matcher use in the control flow.

Yes, I'm using the matchers almost entirely for the hasParent() functionality.

Note that we can change the order of the guards in lines 67-69 and first check for a trailing semi which I'd guess is cheaper than calling the matcher. In that case, matching will only happen for expressions followed by semicolons.

Alternatively, I think I could restructure the uses of this function to *provide* the parent node. In that case, callers can decide what makes the most sense performance-wise for getting the parent.

kimgr added a subscriber: kimgr.Mar 5 2019, 2:06 AM
kimgr added inline comments.
clang/include/clang/Tooling/FixIt.h
73 ↗(On Diff #187957)

Peanut gallery comment on this:

The two others I can think of offhand are

  1. extending to include associated comments,
  2. semicolons after declarations.

Commas present a similar challenge (if a bit simpler) when used in a list (vs. the comma operator).
Are there any other separators in C++?

Would it make sense to let callers choose what level of expansion they want with a flag mask? Somehow I think that makes it easier to name the function, too, e.g.:

StringRef getExpandedRange(const Stmt &S, ASTContext &Context, ExpansionFlags Flags);
ymandel marked 5 inline comments as done.Mar 5 2019, 8:17 AM
ymandel added inline comments.
clang/include/clang/Tooling/FixIt.h
73 ↗(On Diff #187957)

Yes, I think that's a good idea. I even like the name except that it will likely be confused with Expansion locs. Maybe getExtendedRange?

kimgr added inline comments.Mar 5 2019, 11:01 PM
clang/include/clang/Tooling/FixIt.h
73 ↗(On Diff #187957)

Extended sounds good to me too!

ymandel updated this revision to Diff 190037.Mar 10 2019, 7:25 PM
ymandel marked an inline comment as done.
ymandel edited the summary of this revision. (Show Details)

This update significantly simplifies the change. It removes any "smarts", instead allowing the user to request the extension of the source range to include the next specified token (if present). This turns out to be a more basic and useful function for the needs of the upcoming changes.

ymandel marked 7 inline comments as done.Mar 10 2019, 7:28 PM
ymandel added inline comments.
clang/include/clang/Tooling/FixIt.h
73 ↗(On Diff #187957)

I went with "getExtended..." but ended up drastically simplifying the function, since the "smarts" turned out to be not smart enough. Fundamentally, the caller needs to know when to look for a trailing token (and which one).

clang/lib/Tooling/FixIt.cpp
52 ↗(On Diff #187957)

Removed, so no longer relevant.

ymandel updated this revision to Diff 190039.Mar 10 2019, 7:39 PM
ymandel marked an inline comment as done.
ymandel edited the summary of this revision. (Show Details)

Remove unneeded includes.

Ah, had some comments and forgot to send them out before I went on vacation :-(

clang/include/clang/Tooling/FixIt.h
60 ↗(On Diff #187957)

getSpannedRange and getAssociatedRange would be my favorites.
Both seem to leave no room for interpretation.

73 ↗(On Diff #187957)

Would it be fair to characterize those as "the text range that you needs to be removed when removing a node"?
Trying to come up with a comment that does not rely on the intuition, since that might differ from one reader to the other.

Maybe move this comment to getSourceRangeAuto and in this function's comment simply mention that we return the text for the node covered with getSourceRangeAuto?

At a higher level, it would be nice to align this with your work on tree transformations. That is, think of these functions as short-term shims to simulate the behavior we'll ultimately get from that new library. However, it might be premature to consider those details here.

Agree, we can figure it out as soon as the syntax trees are in a good enough shape.

Ignore my previous comment, I'll take another look

ilya-biryukov accepted this revision.Mar 13 2019, 8:36 AM

LGTM.
getExtendedRange is a really good choice!

This revision is now accepted and ready to land.Mar 13 2019, 8:36 AM
ymandel retitled this revision from [LibTooling] Add "smart" retrieval of AST-node source to FixIt library to [LibTooling] Add retrieval of extended AST-node source to FixIt library.Mar 13 2019, 12:20 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2019, 12:47 PM