This is an archive of the discontinued LLVM Phabricator instance.

PTH-- Remove feature entirely-
ClosedPublic

Authored by erichkeane on Nov 14 2018, 2:37 PM.

Details

Summary

When debugging a boost build with a modified
version of Clang, I discovered that the PTH implementation
stores TokenKind in 8 bits. However, we currently have 368
TokenKinds.

The result is that the value gets truncated and the wrong token
gets picked up when including PTH files. It seems that this will
go wrong every time someone uses a token that uses the 9th bit.

Upon asking on IRC, it was brought up that this was a highly
experimental features that was considered a failure. I discovered
via googling that BoostBuild (mostly Boost.Math) is the only user of this
feature, using the CC1 flag directly. I believe that this can be
transferred over to normal PCH with minimal effort:
https://github.com/boostorg/build/issues/367

Based on advice on IRC and research showing that this is a nearly
completely unused feature, this patch removes it entirely.

Note: I considered leaving the build-flags in place and making them
emit an error/warning, however since I've basically identified and
warned the only user, it seemed better to just remove them.

Diff Detail

Repository
rC Clang

Event Timeline

erichkeane created this revision.Nov 14 2018, 2:37 PM
erichkeane added inline comments.Nov 14 2018, 2:41 PM
include/clang/Driver/Options.td
261

The default behavior is exactly to use this flag. I removed the flag since googling showed that it is never really used.

lib/Frontend/CompilerInstance.cpp
385

IdentifierInfoLookup is an abstraction around ASTReader and (previously) PTHManager. It seems that this abstraction could still be useful, so I've chosen to leave it in place.

When PTH wasn't enabled, PTHMgr was nullptr anyway, so this just uses that instead.

unittests/Tooling/DiagnosticsYamlTest.cpp
85 ↗(On Diff #174103)

I'm unsure about what caused this change. It seems to me that these two should be equivalent, and hopefully someone will correct me if I'm wrong.

Should likely add release notes about this.

Also might be worth sending a note to cfe-dev as a heads up and give folks some time to say "wait wait".

Should likely add release notes about this.

Also might be worth sending a note to cfe-dev as a heads up and give folks some time to say "wait wait".

+1 to both of these points, but if doesn't cause too big of a burden for folks, I'm all in favor of it.

I don't think I can comment more broadly on this, but
there is some PTH-related code in Basic/IdentifierTable,
in IdentifierInfo::getNameStart and IdentifierInfo::getLength.
Maybe this should be removed too ?

Added to the release notes. Also, an email was sent out to cfe-dev.

@riccibruno and I are separately looking into IdentifierInfo, because it seems that there are some pretty massive optimization opportunities if we can remove PTH/indirect identifiers.

Added to the release notes. Also, an email was sent out to cfe-dev.

@riccibruno and I are separately looking into IdentifierInfo, because it seems that there are some pretty massive optimization opportunities if we can remove PTH/indirect identifiers.

As discussed on IRC:
In any case this is not blocking the removal of PTH. The code around IdentifierInfo
is very performance sensitive and it will take some time to evaluate the possible
changes, if any.

Would it be better to deprecate the use of PTH right now for the current release, so that the users will be aware that we will remove PTH support in LLVM 9 once they upgrade to LLVM 8? Then we can remove it right after LLVM 8 branch is created.

Would it be better to deprecate the use of PTH right now for the current release, so that the users will be aware that we will remove PTH support in LLVM 9 once they upgrade to LLVM 8? Then we can remove it right after LLVM 8 branch is created.

I considered that. However, it is:
A- Really broken
B- Only has 1 user
C- Only accessible by a cc1 option, users aren't supposed to use these anyway.

Because of this, I/we seem to have little sympathy for the users.

I have some experience with PTH implementation, because had to fix it for Java-port of Clang (https://github.com/java-port/clank).

It was sometime ago, but making it completely workable was not hard.
As I remember the key fix was just to have PTH be EMITTED using raw-Lex mode (where real keyword-IDs are not used, so all Token Kinds nicely fit 8-bits).
It worked, because token automatically became keyword by PTHLexer::Lex:
...

   // Change the kind of this identifier to the appropriate token kind, e.g.
   // turning "for" into a keyword.
Tok.setKind(II->getTokenID());

...

We used PTH, because multiple translation units are parsed in the context of single run.
In this case preprocessing phase was upto 10x faster when token stream was deserialized from PTH (i.e. for all system headers lexed once).
Also it was helpful, because we were able to parse from concurrent threads using the same shared immutable mmaped PTH.

I thought clang-d service is using it to speed up indexing.

I have some experience with PTH implementation, because had to fix it for Java-port of Clang (https://github.com/java-port/clank).

It was sometime ago, but making it completely workable was not hard.
As I remember the key fix was just to have PTH be EMITTED using raw-Lex mode (where real keyword-IDs are not used, so all Token Kinds nicely fit 8-bits).
It worked, because on token automatically became keyword by PTHLexer::Lex:
...

   // Change the kind of this identifier to the appropriate token kind, e.g.
   // turning "for" into a keyword.
Tok.setKind(II->getTokenID());

...

We used PTH, because multiple translation units are parsed in the context of single run.
In this case preprocessing phase was upto 10x faster when token stream was deserialized from PTH (i.e. for all system headers lexed once).
Also it was helpful, because we were able to parse from concurrent threads using the same shared immutable mmaped PTH.

I thought clang-d service is using it to speed up indexing.

Presumably, I could also just make PTH use another bit or two for the TokenID and it would work fine. However, when I mentioned this on IRC the general response was that it is a 'failed experiment'. In your use case, I wonder why you couldn't just use PCH and get even further performance improvements?

I thought clang-d service is using it to speed up indexing.

I believe that clangd is using PCH and not PTH.
(by the highly sophisticated method of grepping for pth and pch inside clangd/)

I thought clang-d service is using it to speed up indexing.

Presumably, I could also just make PTH use another bit or two for the TokenID and it would work fine. However, when I mentioned this on IRC the general response was that it is a 'failed experiment'. In your use case, I wonder why you couldn't just use PCH and get even further performance improvements?

To be fair, I don't remember exactly. :-)
I remember we shared the same PTH for all C and C++ files and built some preprocessor-based features for incomplete code written in editor based on that.

Hmm... One more thing I had to fix was: lex in mode where we emit all preprocessor tokens there as well (like "#define MACRO A" => #, define, MACRO, A)

PHT was also used for rebuilding PCH, when something is changed. Because often single changed file can completely invalidate PCH, while PTH just contain pp-lexed tokens which doesn't carry semantic. So, PTHManager could skip providing tokens for changed file, but keep provide them for all other files.

I looked at how ASTReader create the IdentifierInfos returned by IdentifierInfo *ASTReader::get(StringRef Name),
and I ended up in ASTIdentifierLookupTrait::ReadData, which calls among other things IdentifierTable::getOwn.
The IdentifierInfo created by getOwn will always have the Entry pointer set and therefore I think it is
possible to remove the if (Entry) in IdentifierInfo::getLength and IdentifierInfo::getNameStart.

I made a patch implementing this change here D54866.

include/clang/Lex/Preprocessor.h
396

Would it make sense to remove this alias now that it
will always refer to CurLexer ?

unittests/Tooling/DiagnosticsYamlTest.cpp
85 ↗(On Diff #174212)

It was fixed in r346884 : [Support] Teach YAMLIO about polymorphic types
so you can remove this from your patch.

erichkeane marked 2 inline comments as done.Nov 26 2018, 6:38 AM
erichkeane added inline comments.
include/clang/Lex/Preprocessor.h
396

There seems to be some logic around this that I'm not totally sure about dealing with IsFileLexer.

riccibruno added inline comments.Nov 26 2018, 6:43 AM
include/clang/Lex/Preprocessor.h
396

Sounds fine to me. Someone more familiar with this can
always come back later and rework this.

jyknight accepted this revision.Dec 3 2018, 3:07 PM

Since you've already submitted a fix to Boost, https://github.com/boostorg/build/commit/3385fe2aa699a45e722a1013658f824b6a7c761f, I think this is fine to remove.

This revision is now accepted and ready to land.Dec 3 2018, 3:07 PM
This revision was automatically updated to reflect the committed changes.
test/Driver/pth.c