This is an archive of the discontinued LLVM Phabricator instance.

[clang-extdef-mapping] Directly process .ast files

Authored by thieta on Jun 28 2022, 12:08 AM.



When doing CTU analysis setup you pre-compile .cpp to .ast and then
you run clang-extdef-mapping on the .cpp file as well. This is a
pretty slow process since we have to recompile the file each time.

With this patch you can now run clang-extdef-mapping directly on
the .ast file. That saves a lot of time.

I tried this on llvm/lib/AsmParser/Parser.cpp and running
extdef-mapping on the .cpp file took 5.4s on my machine.

While running it on the .ast file it took 2s.

This can save a lot of time for the setup phase of CTU analysis.

Diff Detail

Event Timeline

thieta created this revision.Jun 28 2022, 12:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 12:08 AM
Herald added a subscriber: rnkovacs. · View Herald Transcript
thieta requested review of this revision.Jun 28 2022, 12:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 12:08 AM

This still needs tests - but I wanted to get your early input on the approach here and what you all think.

thieta updated this revision to Diff 440558.Jun 28 2022, 3:51 AM

Add test. Just repurpose the same test we already have but add a step to
generate ast first and then pushing that through extdef-mapping. It should
always produce the same result.

thieta added inline comments.Jun 28 2022, 3:56 AM

Not sure if I can just create a compilerinstance here and if I should feed it anymore data then I already do.


For some reason CI.getFileManager() often returns no filemanager - but sometimes it works? I just ended up creating my own to make the path absolute.


Pretty sure 128 is wrong here - but I searched the codebase and that seems to be the most common size used? I couldn't find anything using PATH_MAX or something like that.


I tried to figure out how to "run" the consumer on a already loaded ASTUnit - but I didn't find anything in the api for that and this way works since it's more or less what will happen when the consumer is executed anyway.

thieta added inline comments.Jun 29 2022, 7:57 AM

Never mind 128 is just the default size not the absolute size. Got it.

thieta updated this revision to Diff 441279.Jun 30 2022, 12:05 AM

Clean-ups and error handling.

Thanks, this is a great addition!

  • Please use Capitalized names for all the variables (as per the LLVM coding standards suggests here).
  • Could you please update the release notes part of Clang Static Analyzer as this is a cool user facing change.

I tried this on llvm/lib/AsmParser/Parser.cpp and running extdef-mapping on the .cpp file took 5.4s on my machine. While running it on the .ast file it took 2s.

I am wondering if you could have a comparison on more files? I think a measurement run on all clang source/ast files comparing the overall run-time could be even more persuasive.


I think it is okay to create the CI like you do. However, it would be enough to create that only once, not in every HandleAST call. Perhaps, it could be static or you might pass a reference to it from main.


Is there a way to reuse the DiagOpts (and the other Diag variables) over the multiple HandleAST calls? If not then we might use smart pointers to avoid leaking.


128 looks good to me.


Could this be a simple non-pointer variable?


I'd rather use a plural from here, there might be more than one source files.


Could you please add a one-liner explanatory comment for this loop?


Could you please extend the documentation of the tool with the new behavior?

thieta updated this revision to Diff 441930.Jul 3 2022, 2:08 AM
thieta marked 6 inline comments as done.

Addressed review feedback.

thieta marked an inline comment as done.Jul 3 2022, 2:11 AM

Thanks for the review! I uploaded a new version addressing all (I think) of your feedback and added a release note.

The help text is a bit weird since it's using a commonoptionsparser we can't really change the text to add <ast0> as a specific input and we would have to create some kind of extension point in the parser or something. Not sure what you feel about this issue.

I will try to do some extensive testing - but I need to figure out a way to do all that ast building and extdef dumping in a automated way from the compile_commands database. I might run it on some internal code instead since I have a setup for that. Meanwhile let's make sure the code is in good shape.


martong accepted this revision.Jul 4 2022, 3:17 AM

Thanks for the update! LGTM (with very minor naming nits).

This revision is now accepted and ready to land.Jul 4 2022, 3:17 AM
thieta added a comment.Jul 5 2022, 1:00 AM

Ran some internal benchmarks on 1570 files (C and C++ mixed but much more C++ than C):

Running clang-extdef-mapping on the source files took: 268s
Running clang-extdef-mapping on the AST: 102s

That's quite a large speed up if you already need to generate the AST files.

I'll fix the variable names and then merge this one unless someone complains.

thieta updated this revision to Diff 442211.Jul 5 2022, 2:12 AM
thieta marked 3 inline comments as done.

Uppercase all variables

This revision was landed with ongoing or failed builds.Jul 5 2022, 4:46 AM
This revision was automatically updated to reflect the committed changes.

Sorry about my delayed response. I was busy.
I've left a couple comments inline. Nothing serious.
Thanks for the patch!


I think we need to use double back ticks for preformatted/code texts.


Why is this not initialized in the initialized-list like the rest of the members?


What takes the ownership of CI? When is it deleted?

thieta added inline comments.Jul 7 2022, 4:47 AM

Ah no reason - I think I moved this around a few times so it just happened to end up here. I will push a NFC with that.


I don't think anyone takes ownership and it's never properly deleted. I don't think we really need to since this application just runs and exits and never really keep any state. Do you see a problem with it never being deleted?

steakhal added inline comments.Jul 7 2022, 7:29 AM

Not a big deal. I just think that in general, leak suggests bad software design.
On the other hand, one should not have non-primitive global variables, such as smart-pointers. I mean, it's still bad, but let's not touch it xD