This is an archive of the discontinued LLVM Phabricator instance.

[Sema] Store a SourceRange for multi-token builtin types
ClosedPublic

Authored by malcolm.parsons on Oct 7 2016, 4:39 AM.

Details

Summary

clang-tidy's modernize-use-auto check uses the SourceRange of a
TypeLoc when replacing the type with auto.
This was producing the wrong result for multi-token builtin types
like long long:

-long long *ll = new long long();
+auto long *ll = new long long();

Diff Detail

Repository
rL LLVM

Event Timeline

malcolm.parsons retitled this revision from to Store a SourceRange for multi-token builtin types.
malcolm.parsons updated this object.
malcolm.parsons added a reviewer: Prazek.
malcolm.parsons added a subscriber: cfe-commits.
Prazek edited edge metadata.Oct 8 2016, 8:30 AM

Thanks for the patch! I think some unit test should be added.

Do you also handle cases like

unsigned long long
unsigned volatile long const long static

etc.
The problem here is that the whole type like "unsigned long long" could be in other tokens.
I talked with Richard Smith while ago and one of the solutions proposed was to have "isValid()" for source range
returning false for cases like this

unsigned const long
include/clang/AST/TypeLoc.h
529 ↗(On Diff #73915)

Can you add a comment here? It might be because I don't know the AST API, but it is not clear for me what does it do.

lib/Sema/DeclSpec.cpp
621 ↗(On Diff #73915)

What about cases like

unsigned int
unsigned long

This is not only about long.

Thanks for the patch! I think some unit test should be added.

Are there any existing unit tests for TypeLoc that I can add to?

Do you also handle cases like

unsigned long long

Yes - see tests in D25316.

The problem here is that the whole type like "unsigned long long" could be in other tokens.
I talked with Richard Smith while ago and one of the solutions proposed was to have "isValid()" for source range
returning false for cases like this

unsigned const long

I see. I hope that's a rare case that I can ignore for now.

lib/Sema/DeclSpec.cpp
621 ↗(On Diff #73915)

for unsigned int, DeclSpec::SetTypeSpecSign() sets TSSLoc to the location of unsigned and DeclSpec::SetTypeSpecType() sets TSTLoc to the location of int so VisitBuiltinTypeLoc() already has the whole range available.

for unsigned long, DeclSpec::SetTypeSpecSign() sets TSSLoc to the location of unsigned and DeclSpec::SetTypeSpecWidth() sets TSWLoc to the location of long so VisitBuiltinTypeLoc() already has the whole range available.

The sign and type parts can't appear more than once, but the width part can.
I'm treating the final width part as being the type part.
This works for long long and long long int, but it looks like it would have issues with int long long.
Maybe TSWLoc should be replaced by a range.

malcolm.parsons edited edge metadata.

Replace TSWLoc with TSWRange

malcolm.parsons added a comment.EditedOct 8 2016, 2:19 PM

I've tested this with clang-query using match typeLoc() and a lot of permutations of int, short, long, unsigned and const.

Prazek added a comment.Oct 9 2016, 5:58 AM

Thanks for the patch! I think some unit test should be added.

Are there any existing unit tests for TypeLoc that I can add to?

Do you also handle cases like

unsigned long long

Yes - see tests in D25316.

The problem here is that the whole type like "unsigned long long" could be in other tokens.
I talked with Richard Smith while ago and one of the solutions proposed was to have "isValid()" for source range
returning false for cases like this

unsigned const long

I see. I hope that's a rare case that I can ignore for now.

I think there are some. Look at unittests/AST/ or unittests/ASTMatchers/. Sorry for being not so precise, but can't check it right now. I can check it later if you won't find it.

Prazek added a comment.Oct 9 2016, 5:59 AM

I've tested this with clang-query using match typeLoc() and a lot of permutations of int, short, long, unsigned and const.

I am pretty sure it works, but someone could change the implementation and break your check without knowing it - not everyone compile and tests clang-extra, so we should have independent test.

Ensure SourceRange is initialized

Add unit tests

aaron.ballman added inline comments.Oct 17 2016, 4:52 PM
include/clang/AST/TypeLoc.h
513 ↗(On Diff #74124)

Since this doubles the size of the type loc for builtin types, do you happen to have any data on what practical impact this has on RAM usage, say for bootstrapping LLVM (or compiling any large source base, really)? Hopefully it's not a lot, but it would be nice to know if it's a .1%, 1%, 10%, etc increase in usage (or does the change get lost in the noise).

unittests/AST/SourceLocationTest.cpp
228 ↗(On Diff #74124)

Can you also add a test that the range is correct for something like long double and long double _Complex?

include/clang/AST/TypeLoc.h
513 ↗(On Diff #74124)

I don't have any data.
I'm not sure how to collect that data.

unittests/AST/SourceLocationTest.cpp
228 ↗(On Diff #74124)

ComplexTypeLoc isn't implemented.

include/clang/AST/TypeLoc.h:

// FIXME: location of the '_Complex' keyword.
class ComplexTypeLoc : public InheritingConcreteTypeLoc<TypeSpecTypeLoc,
                                                        ComplexTypeLoc,
                                                        ComplexType> {
};

Add unit tests for long double.

malcolm.parsons retitled this revision from Store a SourceRange for multi-token builtin types to [Sema] Store a SourceRange for multi-token builtin types.Oct 18 2016, 4:01 AM
aaron.ballman added inline comments.Oct 18 2016, 6:09 AM
include/clang/AST/TypeLoc.h
513 ↗(On Diff #74124)

It's likely platform dependent, but I was thinking something as simple as looking at peak RAM usage between two different builds of the compiler. Something like top would probably work if you're on Linux (unless someone knows of a better way, I'm not strong on Linux).

unittests/AST/SourceLocationTest.cpp
228 ↗(On Diff #74124)

Ah, neat, then don't worry about that one. :-)

include/clang/AST/TypeLoc.h
513 ↗(On Diff #74124)

Before:
/usr/bin/time clang++ ... -c llvm/tools/clang/lib/AST/ExprConstant.cpp
5.56user 0.13system 0:05.91elapsed 96%CPU (0avgtext+0avgdata 256820maxresident)k

After:
/usr/bin/time clang++ ... -c llvm/tools/clang/lib/AST/ExprConstant.cpp
5.67user 0.12system 0:05.98elapsed 97%CPU (0avgtext+0avgdata 256940maxresident)k

((256940 - 256820) / 256820) * 100 = 0.05%

aaron.ballman added inline comments.Oct 19 2016, 11:16 AM
include/clang/AST/TypeLoc.h
513 ↗(On Diff #74124)

Thank you for this -- is there a bigger delta for compilation of LLVM as a whole? ExprConstant.cpp is an interesting case, but not really representative of the project as a whole (for instance, there's not a lot of template metaprogramming in ExprConstant.cpp).

include/clang/AST/TypeLoc.h
513 ↗(On Diff #74124)

I can try running time on a full build.
For every file I sampled the increase was 0.05%.
Do TypeLocs increase when using TMP?

aaron.ballman added inline comments.Oct 19 2016, 1:03 PM
include/clang/AST/TypeLoc.h
513 ↗(On Diff #74124)

I *think* that template arguments use TypeLocs, but I could be remembering incorrectly (and I unfortunately won't have the chance to check until next week sometime). I was asking for the whole project just in case there is some construct that has a heavier use of them. Your findings of .05% are really promising though.

include/clang/AST/TypeLoc.h
513 ↗(On Diff #74124)

Time reports max resident for building lib dir of llvm as 480780 before, 480920 after.
0.03%

aaron.ballman accepted this revision.Oct 20 2016, 5:52 AM
aaron.ballman edited edge metadata.

LGTM!

include/clang/AST/TypeLoc.h
513 ↗(On Diff #74124)

Awesome, thank you for getting this information; this seems like a reasonable consumption change to me.

This revision is now accepted and ready to land.Oct 20 2016, 5:52 AM
include/clang/AST/TypeLoc.h
533 ↗(On Diff #74975)

I suspect that using min and max on SourceLocations is only valid if both locations are in the same file.

Doing this
long.h:

long

unsigned.cpp:

unsigned
#include "long.h"
i;

causes
clang-query> match typeLoc()
...
clang-query: llvm/tools/clang/lib/Frontend/DiagnosticRenderer.cpp:273: clang::SourceLocation retrieveMacroLocation(clang::SourceLocation, clang::FileID, clang::FileID, const llvm::SmallVectorImpl<clang::FileID>&, bool, const clang::SourceManager*): Assertion `SM->getFileID(Loc) == MacroFileID' failed.

Is there a better way to combine SourceRanges, or should the final range be accumulated in DeclSpec.cpp?

aaron.ballman added inline comments.Oct 20 2016, 6:39 AM
include/clang/AST/TypeLoc.h
533 ↗(On Diff #74975)

Hmm, that's a good point, but I'm not aware of a better utility. Perhaps @rsmith knows of one?

include/clang/AST/TypeLoc.h
533 ↗(On Diff #74975)

I get the same assertion failure from a clang-query without my changes if I match a CompoundStmt that starts and ends in different files.
So I don't think it's my fault.
It's also a very rare case.
OK to commit?

aaron.ballman added inline comments.Oct 21 2016, 8:56 AM
include/clang/AST/TypeLoc.h
533 ↗(On Diff #74975)

Thank you for checking. I think you are okay to commit, we can correct the bug in another patch. However, if you wouldn't mind filing a bug in bugzilla so that we don't lose this information, that would be great.

This revision was automatically updated to reflect the committed changes.
include/clang/AST/TypeLoc.h
533 ↗(On Diff #74975)

PR30765.