When a replacement's offset is set to UINT_MAX or -1U, it is treated as
a header insertion replacement by cleanupAroundReplacements(). The new #include
directive is then inserted into the correct block.
Details
Diff Detail
Event Timeline
lib/Format/Format.cpp | ||
---|---|---|
1444 | But you should pass it as an ArrayRef or a SmallVectorImpl. | |
1446 | nit: no braces | |
1451 | From the docs in Format.h: "If none of the regular expressions match, INT_MAX is assigned". If that is true for an existing #include as well as the inserted one, I don't see a reason, why those should belong to the same block. So, I think you should return INT_MAX here and remove all the special-casing below. | |
1464 | We already have an include regex in this file, I don't think we should have two. Can you pull out the string into a constant? | |
1468 | There is a lot of duplicated code between this and sortCppIncludes. Pull common code out into functions. | |
1487 | There is no guarantee that people assign categories sequentially, I can easily see people assigning category "1000" or "10000" to be able to add more categories later. I think this should be a map and I think you then also don't really need the vector. Also note that categories can be negative, which I think would break this. | |
1491 | I am guessing that AfterFirstDefine is used so that we insert #includes after header guards. I think we should name it AfterHeaderGuard than and either make the behavior more strict or at least add a FIXME for it (only in header files, only if the #define is after an #ifndef, ...) | |
1515 | I think if you assign the EndOffset of all lower categories that aren't set yet, that will simplify the logic below. | |
1542 | What happens if there is no category with higher priority, but there is a category with lower priority. Shouldn't you then insert the #include right before that? | |
1581 | I think you can remove this FIXME.. Seems obvious. | |
1584 | Why do you split out all the header insertion replacements here instead of just ignoring the ones that aren't header insertions in fixCppIncludeInsertions? | |
1587 | nit: No braces. |
lib/Format/Format.cpp | ||
---|---|---|
1584 | I think this enables us to exit early in fixCppIncludeInsertions if there is no header insertion replacement at all. Also, it is easier to split out header insertion replacements and then merge fixed replacements with other replacements instead of doing this in place, which IMO might not be more efficient since there are still set deletion and insertion for each header insertion replacement. |
include/clang/Format/Format.h | ||
---|---|---|
778 | Why don't we just stick with UINT_MAX everywhere? | |
lib/Format/Format.cpp | ||
1456 | Unused | |
1465 | Why not just a set? | |
1487 | nit: s/more strict/stricter/ | |
1493 | Are you sure that this is what's implemented? The way I see it is:
| |
1512 | I'd leave all of this logic out. This only affects headers with no current #includes, which is rare to begin with and on top of that, this is more to do with whether or not we want to automatically add new lines between the different blocks. As it currently is, this would do both: #define HEADER_GUARD #include <a> // inserted and #define HEADER_GUARD #include <a> // inserted #include <b> Which might be the right behavior, but I think this is the wrong place to worry about it. In the long run, we should probably make clang-format inserted empty lines around the #include blocks if desired. | |
1524 | Add a comment: // By this point, CategoryEndOffset[Highest] is always set appropriately: // - to an appropriate location before/after existing #includes, or // - to right after the header guard, or // - to the beginning of the file. | |
1530 | This should be done together with the check for the offset being -1u. Eventually we are going to have something else that we are going to insert (e.g. using declarations) and we don't want to confuse those. And even more so, if all replacements at offset -1u are not #includes, we don't need to execute most of this function. | |
1537 | Why is CheckMainHeader false? | |
1540 | No parentheses. | |
1549 | Hm.. Not happy with the naming. There is no real difference between fixHeaderInsertions and fixCppIncludeInsertions. You could argue that the "Cpp" part is the difference, but the actual difference is that one iterates over an entire set of replacements whereas the other only modifies header insertions. I think I'd just merge the two functions, but that probably also ties back to my wish not to separate out the two replacement sets ;-). |
- Addressed reviewers' comments.
lib/Format/Format.cpp | ||
---|---|---|
1549 | How about fixInsertionReplacements? |
lib/Format/Format.cpp | ||
---|---|---|
1284 | Nit: Otherwise, returns the priority of the matching category or INT_MAX. | |
1502 | nit: I think you should be using "I" and "E". | |
1507 | Can you just use "std::prev(I)" instead of Prev? | |
1549 | So, not knowing anything of it, what's the difference between fixCppIncludeInsertions and fixInsertionReplacements and when do you need to call what. To me the fact that it's that difficult to find a name for it is telling.. Also, I still don't understand why you think it is better to separate the two functions. In that way you are currently implementing it, it is certainly less efficient. You are always copying all the replacements, which (in the way you are doing it) is O(N lg N), even if there isn't a single header insertion. Maybe at least use set_difference (similar to what I am showing below). So, how I would structure it:
tooling::Replacements HeaderInsertions; for (const auto &R : Replaces) if (isHeaderInsertion(R)) HeaderInsertions.insert(R); if (HeaderInsertions.empty()) return Replaces; tooling::Replacements Result; std::set_difference(Replaces.begin(), Replaces.end(), HeaderInsertions.begin(), HeaderInsertions.end(), Result.begin());
|
- Addressed reviewer's comments.
lib/Format/Format.cpp | ||
---|---|---|
1549 | I really appreciate the way you structure it, which does make the code clearer. Just one coveat: std::set_difference() is not applicable on Replacements since it calls std::copy() on Replacement, which Replacement does not support. I think another way to approach it would be to delete header insertion replacement and insert new replacements back if we can assume that there are only few header insertion replacements. |
lib/Format/Format.cpp | ||
---|---|---|
1548 | So, add a copy constructor to Replacement? |
lib/Format/Format.cpp | ||
---|---|---|
1444 | This error output doesn't belong here. I think we can just remove it. If you'd prefer to keep it, add it to the loop in fixCppIncludeInsertions(): for (const auto &R : Replaces) { if (isHeaderInsertion(R)) HeaderInsertions.insert(R); else if (R.getOffset() == UINT_MAX) log::errs() << ... } | |
1556 | I'd move this check into fixCppIncludeInsertions. | |
unittests/Format/CleanupTest.cpp | ||
310 | I'd pull out a lot of these environment setup things into abstractions in the test fixture. Maybe all you need is two functions like: insert(Code, Replaces); and insertAndFormat(Code, Replaces); ? | |
311 | Can you just make this: tooling::Replacements Replaces = { tooling::Replacement("fix.cpp", UINT_MAX, 0, "#include \"b.h\"")}; ? | |
510 | I'd create a function: tooling::Replacement createInsertion(StringRef HeaderName) { return tooling::Replacement("fix.cpp", UINT_MAX, 0, HeaderName); } |
- Addressed commments.
unittests/Format/CleanupTest.cpp | ||
---|---|---|
310 | I had tried to pull out the environment setup...but it didn't seem to save much code without sacrificing flexibility. In some test cases, we also need to add replacements other than header insertions, and those replacements construction uses FileID, so createInMemoryFile needs to stay in test functions. FormatStyle differs among test cases, so it has to stay. All we save might just be the two lines which could be either cleanup or cleanup+format.\ Any better ideas? |
unittests/Format/CleanupTest.cpp | ||
---|---|---|
310 | Well, the only reason you seem to be using the FileID or "Context" for that matter is to translate between line/col and offset. I'd just pull that functionality out into a separate function (which does it based on the "Code" input) or not at all (hard-coding the offset doesn't seem *that* bad). | |
510 | Have you seen this comment? |
A few nitpicks, but otherwise looks good.
lib/Format/Format.cpp | ||
---|---|---|
1287 | I'd consistently try to use i and e for integers vs. I and E for iterators. | |
unittests/Format/CleanupTest.cpp | ||
257 | I'd probably make "Style" a class member, defaulting to LLVM Style that you can then overwrite for other styles. | |
271 | I'd consider making "Code" a member variable. But it has pros and cons. | |
297 | Why are you starting and ending the header guards with a double underscore? I think this is actually reserved for built-in identifiers. Maybe just stick with the LLVM naming scheme? |
Why don't we just stick with UINT_MAX everywhere?