This is an archive of the discontinued LLVM Phabricator instance.

Move some functions from ClangASTContext to ClangUtil
AcceptedPublic

Authored by zturner on Mar 28 2016, 4:31 PM.

Details

Summary

This is pretty mechanical, so you don't really have to look at this in too much detail. I'm mostly posting it here to make sure people are ok with the high level idea. Basically ClangASTContext is a monstrous file and I want to reduce the size by splitting out some of the functionality. There's a ton of functions in ClangASTContext that are static and are basically helper functions, but then other people like DWARFASTParserClang, or ClangASTImporter, or other places re-use those functions. So the idea is just to move all these common clang helper functions into a single place.

In doing so, it makes testing this type of functionality very easy, because you can write a unit test for every function in the file. I did that in this patch, and actually found one bug as a result of my unittest failing (yay for unit tests). Since that is the only functional change in the patch, you may want to look at it specifically. It's ClangUtil::RemoveFastQualifiers. The version before my patch did nothing, it returned exactly the same value it was passed in, because QualType::getQualifiers() returns a clang::Qualifiers by value, so it was not modifying the QualType's qualifiers, but a copy of them, which was immediately discarded.

If anyone can think of a way to exercise this in a public API test, let me know.

I'm hoping to gradually move some more of the ClangASTContext functions over to ClangUtil in subsequent patches. It makes understanding the important parts of ClangASTContext easier, and it will allow me to add more unittests for the rest of the functions as well (hopefully turning up more bugs).

Diff Detail

Event Timeline

zturner updated this revision to Diff 51853.Mar 28 2016, 4:31 PM
zturner retitled this revision from to Move some functions from ClangASTContext to ClangUtil.
zturner updated this object.
zturner added reviewers: spyffe, clayborg.
zturner added a subscriber: lldb-commits.

Any issues with this? Honestly you can probably just read the description
and lgtm it based on that, because as I said it's 99% a code move, and
there is no functional change except for the one bug fix in
ClangUtil::RemoveFastQualifiers.

I have a followup patch almost complete which moves some more function, and
adds some more unit tests, and (so far) fixes 2 more bugs that were found
as a result of the unit tests. So I think the work is a win just on the
grounds that I've already found 3 bugs as a result, even disregarding the
benefits to ClangASTContext in file size.

Let me know if I can continue this effort without review, it would save all
of us some time and obviously I can use my judgement if I think the change
is non-trivial.

clayborg requested changes to this revision.Mar 30 2016, 10:52 AM
clayborg edited edge metadata.

The main reason I don't like this is all of the merge headaches it will create for us. We merge to swift.org and then over into some internal git repos. We have like 5 branches that we are always propagating things over to. This kind of change looks good in top of tree, but causes a ton if issues down the line. So this is my main hesitation. If we could keep any functions that were in ClangASTContext in ClangASTContext, because that is where they really belong. I don't really like the ClangUtil.cpp file because it is actually doing clang::ASTContext related things so the functions actually belong in ClangASTContext from a design point of view. So I would rather see the ClangUtil.cpp file go away and have everything be back in ClangASTContext.cpp.

This revision now requires changes to proceed.Mar 30 2016, 10:52 AM

I do like the tests that you added and I think we can still do the tests in TestClangASTContext.cpp.

One or two of the functions, you are right. They do clang::ASTContext related things. That was actually an oversight on my part. I meant to move only functions that specifically did NOT do clang::ASTContext things. Like RemoveFastQualifiers, or converting enums, or converting from qual types to canonical qual types. Very simple utility functions.

The reason why I don't think they should be in ClangASTContext from a design point of view is because they are used from all over the places, including from places that aren't actually holding and don't need an ASTContext.

What if I move the clang::ASTContext functions back but keep the truly independent ones here? ClangASTContext is a really egregiously oversized file, so I think splitting it up a bit would be helpful. I also find it helpful to break a lot of these cyclic dependencies, where A.cpp includes B.h and B.cpp includes A.h, and we're doing that all over the place in large part because B.cpp is ClangASTContext and A.cpp is anything else that has to do anything, no matter what it is, to a clang type, even if it doesn't need an ASTContext.

What kind of merge conflict does it create? There's no tricky changes here, why wouldn't they be mergeable to the other branch? I'm not really crazy about the idea of having decisions about upstream being based on someone's downstream issues though :-/ If something is good for the upstream it seems like that's merit enough to get it in.

One or two of the functions, you are right. They do clang::ASTContext related things. That was actually an oversight on my part. I meant to move only functions that specifically did NOT do clang::ASTContext things. Like RemoveFastQualifiers, or converting enums, or converting from qual types to canonical qual types. Very simple utility functions.

They are still clang::ASTContext related things since clang::QualType is part of that whole infrastructure.

The reason why I don't think they should be in ClangASTContext from a design point of view is because they are used from all over the places, including from places that aren't actually holding and don't need an ASTContext.

Does it just come down to QualType stuff? Those are fine the exist in ClangUtil.cpp, but again, we have a ton or merges we are doing daily and any moving of code does cause merge conflicts when they are just moved to be logically moved elsewhere. When we have a fix, it first goes into llvm.org SVN, then we cherry pick it over to swift.org into two branches: master and master-next, then we cherry pick these fixes over to our internal branches where we at least have two more merges. Each merge is possibly going into older code where we would not have ClangUtil.cpp in that branch. So then we need to be very careful with our merges to make sure we do them right. We finally got to a place after splitting using DWARFASTParser and all that where merges are going very smoothy most of the time since anything language specific is off in a separate file.

What if I move the clang::ASTContext functions back but keep the truly independent ones here? ClangASTContext is a really egregiously oversized file, so I think splitting it up a bit would be helpful. I also find it helpful to break a lot of these cyclic dependencies, where A.cpp includes B.h and B.cpp includes A.h, and we're doing that all over the place in large part because B.cpp is ClangASTContext and A.cpp is anything else that has to do anything, no matter what it is, to a clang type, even if it doesn't need an ASTContext.

We can probably simplify our header includes. The only time you need to include a header file is if it affects the layout of the class in question. That means base classes and instance variables only. Any functions that take full fledged types like:

class Bar
{
    void Foo(lldb_private::FileSpec file_spec);
};

Doesn't require FileSpec.h, just a forward declaration of FileSpec, because it doesn't affect the layout of the class. Anyone that includes the above header might need to manually include "FileSpec.h", but we can probably simplify our includes quite a bit, so that might help.

So again, moving anything can cause merging headaches. See below for more details.

What kind of merge conflict does it create? There's no tricky changes here, why wouldn't they be mergeable to the other branch? I'm not really crazy about the idea of having decisions about upstream being based on someone's downstream issues though :-/ If something is good for the upstream it seems like that's merit enough to get it in.

So one example is the RemoveFastQualifiers() fix. If we merge that back into a branch that doesn't have ClangUtil.cpp, we need to take the contents from any functions in ClangUtils.cpp and find the old corresponding functions and paste the new code back into there.

So if we only had private branches at Apple I wouldn't be making this argument. But on swift.org, we have public merges going on all the time with auto mergers. They can often fail when we do things like this (move stuff for organization sake), so then we have to manually intervene. There are 3 branches on swift.org that are always merging from other places. The LLVM and Clang on swift.org doesn't track top of tree SVN llvm and clang, but might be up to two months behind.

So I really implore us to not make these kinds of disruptive changes as it ends up causing our automation steps to fail and causes us to have to manually intervene and spend a lot of time doing manual merges.

But wouldn't doing that manual merge once then make things easier for the future? For example, if you merge back to a branch that doesn't have ClangUtil.cpp, and then you just add ClangUtil.cpp to that branch, the problem is solved for the future. So while it's an annoying manual merge, it only has to happen once.
Incidentally, splitting up code like this makes merges easier over the long run, just like the example you gave with DWARFASTParserClang where now language specific stuff is in its own file.

idk, I just don't feel very good about starting from the premise that code re-organizations are discouraged because of X, where X is, well pretty much anything. It's impossible for everyone to know about all the issues that all the downstream customers have to deal with, and in principle I feel like changes such as this should even be able to go through without review because the technical merits of having a healthy code organization are well understood, and while discouraging code re-organization might make one particular customer's life easier, I think it weighs on the larger project and indirectly every other downstream customer as a result.

I'm curious now what clang does in this regard, so +rsmith in case he has some insight into what happens in clang when someone wants to re-organize some code and how other downstream customers (for example, swift) deal with this.

Just to be clear, I'm genuinely sympathetic to the issues other people have to face with regards to their downstream build issues, so if there's any way for me to make this less painful I will go out of my way and spend extra time to make it so. But I think we should only be discussing the technical merits of the design and whether the code organization.

clayborg accepted this revision.Mar 30 2016, 12:04 PM
clayborg edited edge metadata.

I have explained my beef with the changes. If you feel you must make them as you want to we will deal with the fallout and extra work.

This revision is now accepted and ready to land.Mar 30 2016, 12:04 PM

I don't maintain any downstream branch to worry about merges but my personal opinion is moving large amount of code around can cause some issue in the future even for upstream. The 2 main issue I can think about:

  • "git log" and "git blame" will not display the original change for the code what is problematic when we try to track down why a piece of code have been added (especially for special cases)
  • As an active LLDB developer I have a pretty good idea about where a given piece of code lives. Moving the code around invalidates this knowledge for every developer in the project

So in general if we get some actual advantage from moving the code (e.g. cleaner API, better testing options, etc...) then I have no issue around it but if the only gain is the file size reduction and the removal of the cyclic dependencies then I think this causes more problem then the benefit of the change (same reason why we don't run clang-format over the full code base).

In this concrete case I don't see the benefit neither in the API nor for testing as all functions were accessible before in the same way and the tests can call the functions from the ClangASTContext instead of the ClangUtils with the same functionality.

rsmith edited edge metadata.Mar 30 2016, 12:52 PM

I'm curious now what clang does in this regard, so +rsmith in case he has some insight into what happens in clang when someone wants to re-organize some code and how other downstream customers (for example, swift) deal with this.

Clang has no API stability guarantees for its C++ interface. We reserve the right to reorganize at will, and to update, rename, and remove interfaces. This applies doubly to our internal code organization. Yes, this creates a burden on people maintaining out-of-tree patches, but in some ways that's a benefit, as it encourages patches to be contributed upstream when they make sense, or at least for patches to be proposed to refactor upstream so that the local changes are cleanly separable from the upstream code.

Organizational changes do have a cost to upstream development in many ways (people are no longer familiar with the code organization, merging fixes to point releases is harder, work-in-progress changes may need non-trivial work to rebase, ...), so we'd only make them when they make sense from a technical / organizational perspective, but in the long term, it's not possible to maintain a healthy codebase if cleanup changes are held back due to those short-term costs.

I don't maintain any downstream branch to worry about merges but my personal opinion is moving large amount of code around can cause some issue in the future even for upstream. The 2 main issue I can think about:

  • "git log" and "git blame" will not display the original change for the code what is problematic when we try to track down why a piece of code have been added (especially for special cases)
  • As an active LLDB developer I have a pretty good idea about where a given piece of code lives. Moving the code around invalidates this knowledge for every developer in the project

So in general if we get some actual advantage from moving the code (e.g. cleaner API, better testing options, etc...) then I have no issue around it but if the only gain is the file size reduction and the removal of the cyclic dependencies then I think this causes more problem then the benefit of the change (same reason why we don't run clang-format over the full code base).

In this concrete case I don't see the benefit neither in the API nor for testing as all functions were accessible before in the same way and the tests can call the functions from the ClangASTContext instead of the ClangUtils with the same functionality.

In this case I only moved a few functions just to test the waters, but if you look through ClangASTContext.h there are probably 40-50 functions that could theory be moved under the same idea. It's a couple thousand lines of code of jsut generic helper functions, which is not insignificant.

The corollary to your statement is that we can just keep adding everything clang related to this one file, which if so maybe it jumps from 10,000 lines to 20,000 lines. File size is not normally a major concern, but at the same time I don't think we should just allow files and classes to grow unbounded in lines of code and number of member functions without being willing to think about if there should be a split.

But relevant code should be grouped together as much as possible. Small interfaces are easier to understand than big interfaces. Answering the question "is there a way to do X" is easier when you only have to look an interface containing 30 functions than when you have to look through an interface containing 200 functions.

As for "git log", "git blame", and knowing where a piece of code lives, there will always be times where you have to do some code archaeology to trace a line of code back to its origin. It's usually only a few extra commands. For example, you git blame ClangUtil.cpp, the revision of a line is the first commit of that file. The diff (or the commit message) shows it was moved from ClangASTContext.cpp. So you git blame the corresponding lines of ClangASTContext.cpp starting with x~1 where x is the revision shown in the first diff. It doesn't really much extra effort at all, certainly not so much that the extra effort outweighs the benefits of maintaining healthy code.

Just my 2c, if nobody likes the idea then so be it, but I think it's easy to get stuck viewing this from the lens of someone who's already spent a great deal of effort understanding the code. So I would encourage you to think about it from the viewpoint of a newcomer to the codebase trying to understand the code.

So, in thinking about this some more, my end goal does not necessarily involve the creation of a new file. The primary goal is group related functions together into a more bite-sized interface in order to make it easier to understand the code.

How about keeping everything in the same file, but still splitting these functions out into another class defined in that file? For example, the ClangUtil class could still be in ClangASTContext.h. I think that eliminates the concern about merging, and while it doesn't address the issue of the massive file (which I still think is an important consideration for the long term health of this code), it at least makes some progress in that it groups everything together so that it makes the interface more easily digestible, and makes a move to another file easier in the future if someone wanted to do it.

So, in thinking about this some more, my end goal does not necessarily involve the creation of a new file. The primary goal is group related functions together into a more bite-sized interface in order to make it easier to understand the code.

How about keeping everything in the same file, but still splitting these functions out into another class defined in that file? For example, the ClangUtil class could still be in ClangASTContext.h. I think that eliminates the concern about merging, and while it doesn't address the issue of the massive file (which I still think is an important consideration for the long term health of this code), it at least makes some progress in that it groups everything together so that it makes the interface more easily digestible, and makes a move to another file easier in the future if someone wanted to do it.

That still doesn't help downstream merging.

So, in thinking about this some more, my end goal does not necessarily involve the creation of a new file. The primary goal is group related functions together into a more bite-sized interface in order to make it easier to understand the code.

How about keeping everything in the same file, but still splitting these functions out into another class defined in that file? For example, the ClangUtil class could still be in ClangASTContext.h. I think that eliminates the concern about merging, and while it doesn't address the issue of the massive file (which I still think is an important consideration for the long term health of this code), it at least makes some progress in that it groups everything together so that it makes the interface more easily digestible, and makes a move to another file easier in the future if someone wanted to do it.

That still doesn't help downstream merging.

I thought you said the issue was if you try to merge to a branch that didn't have the ClangUtil.cpp file. If you just copy and paste a function from one location in a file to another location and change the class it belongs to, that's still a problem?

That's a mighty big burden for upstream developers to have to deal with.

The last time we talked about this, you said something along the lines of "Changes to LLVM and LLDB at the same time are a problem, but any refactor that happens entirely within LLDB is fine"

I thought you said the issue was if you try to merge to a branch that didn't have the ClangUtil.cpp file. If you just copy and paste a function from one location in a file to another location and change the class it belongs to, that's still a problem?

So merging to current top of tree is easier than merging to a release branch. When merging to a top of tree for any repository that is currently mirroring a close relative of the current code is easy and the extra file doesn't matter. The issues arise when we find a bug deep in one of our released branches. Then to fix it we actually fix it into llvm.org SVN, then merge to swift.org into many branches, and then merge into many internal branches and eventually over to the release branch. We want minimal code changes so that we minimize our risk when taking fixes into release branches.

Anytime we introduce manual merging where we have to say "Oh, this function in 'ClangUtil::DoSomething' used to be in 'ClangASTContext::DoSomething', so I will need to take the guts of the function and manually copy it over and make sure it actually works". This is when errors happen. The other worse case is where we actually just merge the ClangUtil.cpp and ClangUtil.h file into our older branch, make sure it gets built in the Xcode project file, and then none of our code gets fixed because we still have the "ClangASTContext::DoSomething" that is buggy, but our merging worked just fine by merging in a new source that compiles but no one uses.

That's a mighty big burden for upstream developers to have to deal with.

But it is where we are and it is our reality. So I am trying to minimize and point out when these kinds of issues will cause problems as I review patches. I know it is tempting to move stuff around, and we used to just accept these patches, but as we have over the past year, we have run into many issues of exactly the flavor we are talking about here, so I am proactively trying to avoid it when the need to do so isn't as strong. Many times I agree that things were architected incorrectly, or just are architected the way they are because of the way the code has evolved over the years, and those I believe we should fix. I would rather avoid this when we can when there isn't a strong reason to do so. I believe this is one of those cases.

I will move most of this back to ClangASTContext, but I want to state again that I would like to reach a point where downstream merge issues are not even a topic that comes up in code reviews. I see many huge refactors coming through from people that do not go up for review, do not ask about other peoples' downstream build issues or how it affects them, and are not discussed in the open before going through.

So I do not want there to be a double standard. Either everyone needs to run every change by everyone in the community before committing anything, or we adopt the LLVM policy of post commit review and downstream maintainers handling their downstream problems silently.

It is not fair to even mention the issue of downstream merge issues, because it stifles improvements to code health (even if they are minor improvements), when LLVM has a clear policy that downstream problems are not the upstream's problem. I'm fine working together and compromising or whatever, but I am pretty strongly opposed to rejecting changes that improve code health because of someone's merge problems. That needs to be an issue that is dealt with downstream -- and more importantly not even mentioned as a point of discussion in the upstream.

With that said, I will undo most of these changes (but continue writing the unittests)

labath added a subscriber: labath.Mar 31 2016, 6:56 AM

Cyclic code dependencies are not an imaginary problem. Right now, it is very hard to make lldb-server small, because everything in lldb depends on everything else, and so lldb-server ends up containing chunks e.g. clang, even though that code will never get execute, because it is impossible to remove it without causing link errors. So, I welcome anything which makes this task easier, now or in the future. Breaking things up into smaller classes/files is definitely one of those things. I also welcome more unittests.