Page MenuHomePhabricator

fix -fcodegen-modules code when used with PCH (PR44953)
AbandonedPublic

Authored by llunak on Feb 19 2020, 9:16 AM.

Details

Summary

In D69778 I incorrectly changed two places (the commit was supposed to make -fmodules-codegen work also for PCH, but those two places do not handle -fmodules-codegen). Revert the change in those two places.

Diff Detail

Repository
rC Clang

Event Timeline

llunak created this revision.Feb 19 2020, 9:16 AM
aganea added a subscriber: aganea.

I've tested this patch and it solves our original issue.

Could anyone please take a look so that it can be landed & merged into 10.0?

Please simplify the test case - at least to remove the long/uppercase identifiers to use placeholder names (this avoids any accidental semantic meaning they might carry (or appear to carry, but not actually carry)).

What's the failure mode this is avoiding? There's no checking for the behavior of the compiler here, other than "it does not crash" - that's usually a sign of undertesting. If the program crashed without the fix, then there's some specific behavior that's desirable that was hiding behind the crash, something more specific than "does anything other than crash". Please include testing (I'm guessing some FileChecking on IR output in this case, but haven't looked closely)

@llunak Small typo: your patch fixes PR44953, not PR44958.

llunak updated this revision to Diff 246122.Feb 23 2020, 1:00 PM
llunak retitled this revision from fix -fcodegen-modules code when used with PCH (PR44958) to fix -fcodegen-modules code when used with PCH (PR44953).
llunak edited the summary of this revision. (Show Details)

Upon further investigation it turns out I probably should not have enabled those two places for PCH in D69778 at all, since they do not deal with -fmodules-codegen. So reverting those two places should be the right fix. That also makes a test irrelevant (not much point in testing a rare corner-case for a code path never taken).

Upon further investigation it turns out I probably should not have enabled those two places for PCH in D69778 at all, since they do not deal with -fmodules-codegen. So reverting those two places should be the right fix. That also makes a test irrelevant (not much point in testing a rare corner-case for a code path never taken).

I know it's a bit of an awkward situation to test - but please include one to demonstrate why the original code was inappropriate. At least useful for reviewing/validating the patch, but also might help someone else not to make the same change again in the future.

@llunak Do you have time to address this patch this week? If not, I can finish it and land it. Please let me know. We'd like to see it merged into 10.0.

I know it's a bit of an awkward situation to test - but please include one to demonstrate why the original code was inappropriate. At least useful for reviewing/validating the patch, but also might help someone else not to make the same change again in the future.

I don't know how to do that, except by doing the does-not-crash test that you refused above. The only way I can trigger the change in ASTDeclWriter::VisitVarDecl() to make any difference is by using the _declspec(selectany), but that crashes without the fix and does nothing with it. I've tried static variables, static const variables, templated statics, inline variables, statics in functions and I don't know what the code actually affects other than _declspec(selectany) crashing.

Also, since I apparently don't know what the code in fact causes or why _declspec(selectany) crashes, I actually also don't know why the original code was inappropriate, other than that it crashes for an unknown reason. Since I simply reused the modules code for PCH, maybe _declspec(selectany) would crash with modules too? I don't know. But before we get to this, can we please first fix my patch for 10.0? I didn't check properly for -fmodules-codegen in D69778, that's for certain. That can be fixed before finding out why exactly it causes the trouble it causes.

@llunak Do you have time to address this patch this week? If not, I can finish it and land it. Please let me know. We'd like to see it merged into 10.0.

Feel free to do whatever you think would help you.

I know it's a bit of an awkward situation to test - but please include one to demonstrate why the original code was inappropriate. At least useful for reviewing/validating the patch, but also might help someone else not to make the same change again in the future.

I don't know how to do that, except by doing the does-not-crash test that you refused above. The only way I can trigger the change in ASTDeclWriter::VisitVarDecl() to make any difference is by using the _declspec(selectany), but that crashes without the fix and does nothing with it.

Could you explain more what you mean by "does nothing"? I would've assumed it would go down a path to generate IR that was otherwise/previously untested/unreachable (due to the crash) - so testing that the resulting IR is as expected (that the IR contains a selectany (or however that's lowered to IR) declaration and that declaration is used to initialize the 'desc' variable in main?) is what I would think would be appropriate here.

I've tried static variables, static const variables, templated statics, inline variables, statics in functions and I don't know what the code actually affects other than _declspec(selectany) crashing.

Also, since I apparently don't know what the code in fact causes or why _declspec(selectany) crashes, I actually also don't know why the original code was inappropriate, other than that it crashes for an unknown reason. Since I simply reused the modules code for PCH, maybe _declspec(selectany) would crash with modules too? I don't know.

Could you verify this? My attempt to do so doesn't seem to have reproduced the failure with modular codegen:

pch.h

struct CD3DX12_DEFAULT {};
extern const __declspec(selectany) CD3DX12_DEFAULT D3D12_DEFAULT;

struct CD3DX12_CPU_DESCRIPTOR_HANDLE {
  CD3DX12_CPU_DESCRIPTOR_HANDLE(CD3DX12_DEFAULT) { ptr = 0; }
  unsigned int ptr;
};

pch.modulemap

module pch { header "pch.h" }

use.cpp

#include "pch.h"

int main() {
  CD3DX12_CPU_DESCRIPTOR_HANDLE desc = CD3DX12_CPU_DESCRIPTOR_HANDLE(D3D12_DEFAULT);
  return desc.ptr;
}

Commands

Crashes:
clang-cl /Ycpch.h pch.cpp /c /Fox.obj /Fpx.pch 
clang-cl /Yupch.h use.cpp /I. /c /Foy.obj /Fpx.pch

Passes:
clang -cc1 -fdeclspec -triple x86_64-pc-windows-msvc19.11.0 -fmodules-codegen -fmodules -emit-module -fmodule-name=pch pch.modulemap -x c++ -o pch.pcm
clang -cc1 -triple x86_64-pc-windows-msvc19.11.0 -emit-llvm -o pch.ll pch.pcm
clang -cc1 -triple x86_64-pc-windows-msvc19.11.0 -fmodule-name=pch.pcm -fdeclspec use.cpp -emit-llvm

(tried it with a specifically windows-y triple, just in case that was relevant - still possible that there's some other variations between the clang-cl driver and the clang cc1 invocations I'm using for the modules testing here that would turn out to be the critical difference, rather than pch V modules)

But before we get to this, can we please first fix my patch for 10.0? I didn't check properly for -fmodules-codegen in D69778, that's for certain. That can be fixed before finding out why exactly it causes the trouble it causes.

Generally I'm not in favor of committing fixes/changes that are not understood (that leaves traps in the codebase & potentially other bugs, if we're committing code we don't understand). I'd rather revert the original patch until it's better understood.

@llunak Do you have time to address this patch this week? If not, I can finish it and land it. Please let me know. We'd like to see it merged into 10.0.

Feel free to do whatever you think would help you.

But before we get to this, can we please first fix my patch for 10.0? I didn't check properly for -fmodules-codegen in D69778, that's for certain. That can be fixed before finding out why exactly it causes the trouble it causes.

Generally I'm not in favor of committing fixes/changes that are not understood (that leaves traps in the codebase & potentially other bugs, if we're committing code we don't understand). I'd rather revert the original patch until it's better understood.

I think it'll help to split this into smaller parts:

  1. In D69778 I made it possible to use -fmodules-codegen also for PCHs by simply enabling the ModulesCodegen code also when -fmodules-codegen is used together with -building-pch-with-obj. Looking at ASTDeclWriter::VisitVarDecl() in current git it can be clearly seen that I messed up, as the code around line 1020 depends on -building-pch-with-obj but not -fmodules-codegen, so the ModulesCodegen code gets activated also when clang-cl /Yc is used, without -fmodules-codegen. That's clearly wrong and that's why aganea runs into a crash caused by ModulesCodegen code despite not asking for it. I find that part very clear and understood, both versions of my patch fixed that in some way, and that's what should be fixed for 10.0.
  2. In D69778 I included all the ModulesCodegen code, even the one checking for Module::ModuleInterfaceUnit instead of -fmodules-codegen (and that's how I made the mistake). I assumed Module::ModuleInterfaceUnit was some kind of equivalent to the PCH's own object file. That might have been a mistake, I don't know. Given that I'm unable to find a variable for which that piece of code would make a difference, I can't even check. Regardless of what it is, it's unimportant for 10.0.
  3. There's the crash about the _declspec(selectany) variable from the bugreport. It was triggered by my mistake, but I don't know what the actual cause of it is. And as said before, it may not be related to my patch at all other than that it got triggered by it. Again, it's presumably not important enough for 10.0.

I know it's a bit of an awkward situation to test - but please include one to demonstrate why the original code was inappropriate. At least useful for reviewing/validating the patch, but also might help someone else not to make the same change again in the future.

I don't know how to do that, except by doing the does-not-crash test that you refused above. The only way I can trigger the change in ASTDeclWriter::VisitVarDecl() to make any difference is by using the _declspec(selectany), but that crashes without the fix and does nothing with it.

Could you explain more what you mean by "does nothing"? I would've assumed it would go down a path to generate IR that was otherwise/previously untested/unreachable (due to the crash) - so testing that the resulting IR is as expected (that the IR contains a selectany (or however that's lowered to IR) declaration and that declaration is used to initialize the 'desc' variable in main?) is what I would think would be appropriate here.

"Does nothing" means that my current patch disables the ModulesCodegen code for it, so it "does nothing" other than taking the normal non-ModulesCodegen paths, so there's nothing to test compared to the normal case. At least for the 1) case I don't see what to test and how to do it. And for case 2) I also don't know what to test, because I can't find what difference it makes in the PCH case, except for triggering the crash. And I don't know how to test the crash either (except by crashing/not-crashing), because it crashes before it generates any code.

Also, since I apparently don't know what the code in fact causes or why _declspec(selectany) crashes, I actually also don't know why the original code was inappropriate, other than that it crashes for an unknown reason. Since I simply reused the modules code for PCH, maybe _declspec(selectany) would crash with modules too? I don't know.

Could you verify this? My attempt to do so doesn't seem to have reproduced the failure with modular codegen:

Your testcase is not equivalent. In your case ASTDeclWriter::VisitVarDecl() has ModulesCodegen set to false for 'D3D12_DEFAULT', because 'Writer.WritingModule->Kind == Module::ModuleInterfaceUnit' is false. Due to my mistake, ModulesCodegen was true in the PCH case.

hans added a comment.Feb 27 2020, 5:43 AM

I've reverted cbc9d22e49b434b6ceb2eb94b67079d02e0a7b74 on master in 7ea9a6e0220da36ff2fd1fbc29c2755be23e5166 until it can be resolved properly.

The revert was also pushed to 10.x as a8684e93a347b5a7ecbb07bad40301f1699a813a.

Thanks @hans! Nevertheless I think this is a good change and we should push it for 11.0. I was indeed seeing better timings on our end with what @llunak was proposing, we are heavy users of PCH headers (and migrating to modules is not trivial in our case, lots of code branching & integrations)

hans added a comment.Feb 27 2020, 9:21 AM

Thanks @hans! Nevertheless I think this is a good change and we should push it for 11.0. I was indeed seeing better timings on our end with what @llunak was proposing, we are heavy users of PCH headers (and migrating to modules is not trivial in our case, lots of code branching & integrations)

Right, I have no objections against this re-landing on master once the problem in PR44953 is understood and fixed.

Ping. I don't particularly care about the declspec(selectany) corner case, but at least the mistake from D69778 should be fixed (and it's a simple fix), so that it can be committed again.

If I understand correctly, this patch reverts the behavior to what it was before your changes, to only handle modules, but not PCH? Is that correct?

@llunak : It seems @hans already reverted this in rG7ea9a6e0220da36ff2fd1fbc29c2755be23e5166, could you please ensure this patch is still relevant?

@aganea: This change reverts only a small part of my previous change (see my comment here from Feb 23). Hans reverted the original change only until this problem gets sorted out (see his comment from Feb 27 right before the revert).

My point is that all the changes in this patch are already in the upstream. If you want to re-land D69778, you should rather re-open and update that patch instead? (and close this current one)

What is the practical difference? Either way the end result will be the same. And given that I get to wait ages for reviews of my PCH changes I find it more likely to get a review for a small patch rather than a large one.

To ease reviewers comprehension you have to propose patches that diff against the master. People sometimes apply the patches locally before accepting them.
This current patch is a subtraction after D69778 is applied locally on master. It seems a bit complicated.

I think it is better if you closed this, and applied all the changes you mean to do on D69778 and re-open. I think it is a good patch, it was already accepted, you only need to demonstrate that the edge cases where it was failing are solved.

Once you do that, maybe you could send another message to the cfe-dev mailing list to ask for reviewers who have interest into optimizing PCH? I do, but I don't have the deep knowledge of this code.

Yeah, I was confused by all this too - @aganea's right - this review doesn't make sense to me, since it doesn't show a proposed change to LLVM, it shows a proposed change on top of another patch, that would necessarily be committed together/in a single commit (a proposed change on top of another change that's part of a series of independently valid changes are quite different - such a patch series is generally encouraged to ensure small/isolated changes). Things that are going to be committed in a single commit should be reviewed as such.

Reopening the previous review, or starting a new one (including this one - but updating it to include all the changes you're proposing to commit together/a diff relative to the current upstream trunk/tip-of-tree) & linking to the old one for context, sounds suitable.

(updating the previous review has the advantage that reviewers can review the diff relative to the previous version of the patch, using Phabricator's "History" feature, so they can review the delta since the committed/reverted version)

llunak abandoned this revision.May 3 2020, 1:27 PM

This review did make sense when it was created, it's just that it took so long to get this processed that it was simpler to revert the original patch.

Anyway, I've merged the change to https://reviews.llvm.org/D69778 .