This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add base types for building dataflow analyses

Authored by sgatev on Nov 19 2021, 4:32 AM.



This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.

Diff Detail

Event Timeline

sgatev created this revision.Nov 19 2021, 4:32 AM
sgatev requested review of this revision.Nov 19 2021, 4:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 19 2021, 4:32 AM
sgatev updated this revision to Diff 388465.Nov 19 2021, 4:42 AM

Fix ifndefs.

sgatev updated this revision to Diff 388469.Nov 19 2021, 4:56 AM

Use triple slash at the start of declaration comments.

ymandel added inline comments.Nov 19 2021, 6:21 AM
21–22 ↗(On Diff #388469)

It's more common in .cpp files to use using declarations instead:

using clang;
using dataflow;
sgatev updated this revision to Diff 388490.Nov 19 2021, 6:35 AM

Add using namespace declarations in the cpp file.

sgatev marked an inline comment as done.Nov 19 2021, 6:38 AM
xazax.hun added inline comments.Nov 19 2021, 11:45 AM

Does the Dynamic in the suffix refer to the fact this is using type erasure as opposed to templates?

I have multiple questions at this point:

  • Why do we prefer type erasure over generic programming?
  • Do you plan to have non-dynamic counterparts?

Nit: having Dynamic both in the class name and in the method names sounds overly verbose to me.
Nit: please add a comment what dynamic refers to in the name,


Do we need a ctor? Or is this a workaround for aggregate initialization not working well with certain library facilities?

26 ↗(On Diff #388490)

Is there a way to query how the CFG was built? Adding an assert to see if setAlwaysAdd was set might be useful.

sgatev updated this revision to Diff 389166.Nov 23 2021, 4:31 AM

Remove unnecessary constructor.

sgatev updated this revision to Diff 389189.Nov 23 2021, 6:31 AM
sgatev marked an inline comment as done.

Add a note about asserting the requirements of the CFG object.

sgatev updated this revision to Diff 389522.Nov 24 2021, 9:23 AM
sgatev marked an inline comment as done.

Document the role of the "Dynamic" suffix in the name of DataflowAnalysisDynamic and its members.

sgatev updated this revision to Diff 389773.Nov 25 2021, 6:52 AM

Put typed and type-erased interfaces in separate files.

sgatev updated this revision to Diff 389774.Nov 25 2021, 7:00 AM

Rename Environment.h to DataflowEnvironment.h.

sgatev added inline comments.Nov 25 2021, 8:13 AM

Right. The "Dynamic" suffix emphasizes the type-erased nature of the class and its members and is used to differentiate them from their typed counterparts in DataflowAnalysis. I added this to the documentation of the type. I also split the typed and type-erased interfaces in separate files. Users of the framework shouldn't need to interact with types and functions from DataflowAnalysisDynamic.h.

The names are verbose, but should be used in relatively few internal places in the framework. Currently, we have one call site for each of the methods of DataflowAnalysisDynamic. Nevertheless, if you have ideas for better names I'd be happy to change them.

The reason we went with a type-erased layer is to avoid pulling a lot of code in the templates. Over time the implementation got cleaner and as I'm preparing these patches I see some opportunities to simplify it further. However, it's still non-trivial amount of code. I think we should revisit this decision at some point and consider having entirely template-based implementation of the algorithm. I personally don't see clear benefits of one approach over the other at this point. If you have strong preference for using templates, we can consider going down that route now.


We don't really need it at this point. Removed.

26 ↗(On Diff #388490)

Good point. I believe there isn't a way to do that currently, but this is something we should consider implementing. I added a FIXME.

xazax.hun accepted this revision.Nov 25 2021, 3:21 PM

Thanks, it looks good to me. Most of my comments are just brainstorming, exploring alternative ideas. Feel free to ignore some/all of them.


Thanks for the explanation! I don't have a strong preference, we could stick with the type-erased version unless we see some reason to change in the future. However, I don't see much value in the "Dynamic" suffix for the method names. What do you think about simply dropping them?


If I understand this correctly, this could derive from DataflowAnalysisStateDynamic, it could just provide a getter function that casts the type erased lattice element to LatticeT, returning a reference to the contents of the any object. As a result, you would no longer need to do move/copy in runDataflowAnalysis. On the other hand, the user would need to call a getter to get out the lattice element. I guess we expect lattice elements to be relatively cheap to move. Feel free to leave this unchanged, it is more of an observation.


While it is probably obvious to most of us, I wonder if it is obvious to all future readers that the block IDs are the indices of the vector. Depending on how beginner-friendly do we want these comments to be we could make that more explicit.

45 ↗(On Diff #389774)

Alternatively, we could replace Dynamic with TypeErased in the class name making the comment redundant.

This revision is now accepted and ready to land.Nov 25 2021, 3:21 PM
sgatev updated this revision to Diff 390029.Nov 26 2021, 5:18 AM
sgatev marked 3 inline comments as done.

Replace "Dynamic" with "TypeErased" in the names of types and their members.

sgatev updated this revision to Diff 390034.Nov 26 2021, 5:49 AM

Minor tweaks to documentation.

gribozavr2 accepted this revision.Nov 26 2021, 5:50 AM

Thanks Gábor and Dmitri!


Well, the only reason I see to have the suffix is to differentiate the type-erased methods from the methods of the concrete dataflow analysis class that will derive from DataflowAnalysis. For example, we could have

class ConstantPropagationAnalysis : public DataflowAnalysis<
  ConstantPropagationAnalysis, ConstantPropagationLattice> {
  ConstantPropagationLattice initialElement();

  ConstantPropagationLattice transfer(
    const Stmt *, const ConstantPropagationLattice &, Environment &);

In this setup ConstantPropagationAnalysis has two methods that return the initial element and two methods that perform the transfer function - one pair in the definition and one pair inherited from TypeErasedDataflowAnalysis. For the initial element we need to use different names as the two methods differ only in their return types. For the transfer function we could use the same name and rely on overload resolution, however it seems that's not recommended - The names of the remaining methods (joinTypeErased and isEqualTypeErased) have the suffix for consistency.


Acknowledged. I'll keep this option in mind. For now I suggest we keep the typed and type-erased structs separate if possible and if the cost isn't too high.


Let's make it clear. I added a note on that.

45 ↗(On Diff #389774)

That would be appropriate indeed. I updated the names.

ymandel accepted this revision.Nov 29 2021, 6:36 AM
This revision was automatically updated to reflect the committed changes.