This is an archive of the discontinued LLVM Phabricator instance.

Clear the KnownModules cache if the preprocessor is going away
AbandonedPublic

Authored by dexonsmith on Feb 21 2019, 3:29 AM.

Details

Summary

When the Preprocessor (PP) in compiler instance is going away, we should clear the cache of any pointers that it owns as they will be destroyed.

This is one possible fix for the issue I outlined in http://lists.llvm.org/pipermail/cfe-dev/2019-February/061293.html that received no responses. We have now encountered the issue in multiple internal buildbots and I have even seen it in external bots as well. This really should be fixed.

As outlined in my cfe-dev post, it is exceedingly difficult to produce a reliable test case for this (at least for me) so I have not provided one.

If others should be on the list of reviewers, please add them.

Diff Detail

Repository
rC Clang

Event Timeline

nemanjai created this revision.Feb 21 2019, 3:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2019, 3:29 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript

Ping.

If there are no objections in the next week or so, I'll commit this and it can be reviewed post-commit.

Ping.

Unfortunately Richard Smith is out for a few weeks at the moment, so might take a little bit before he can get to this.

It's odd to me that this lacks a test case - but you mention it's shown up on buildbots? Does it reproduce consistently there? Under what conditions (which buildbots/configurations show this - are they permanently failing because of this?)?

A test case, if at all possible, would be super helpful.

If there are no objections in the next week or so, I'll commit this and it can be reviewed post-commit.

That's generally not considered acceptable practice - if something is sent for review it's because it needs review & time doesn't change that. (there are some exceptions to this - some folks send things out for "hey, anyone got other ideas on this, otherwise I think it's fine" sort of thing)

Ping.

Unfortunately Richard Smith is out for a few weeks at the moment, so might take a little bit before he can get to this.

It's odd to me that this lacks a test case - but you mention it's shown up on buildbots? Does it reproduce consistently there? Under what conditions (which buildbots/configurations show this - are they permanently failing because of this?)?

A test case, if at all possible, would be super helpful.

The failure this causes always shows up in the Modules/builtins.m test (at least in my experience). It is far from predictable and it does not consistently reproduce on any build bot. It occasionally shows up and slight perturbations in the source make it go away.
Honestly, I don't find this to be all that surprising. Using memory after freeing it has inherently unpredictable behaviour. There are certain toolchains that will diagnose freeing the same memory twice, but that's not the case here - we just happen to use it after freeing it.

If there are no objections in the next week or so, I'll commit this and it can be reviewed post-commit.

That's generally not considered acceptable practice - if something is sent for review it's because it needs review & time doesn't change that. (there are some exceptions to this - some folks send things out for "hey, anyone got other ideas on this, otherwise I think it's fine" sort of thing)

I am really sorry about how this came across. I understand that given the context, this could quite reasonably be interpreted as me stating "I don't want to wait any longer, so I'm just going to commit this." That was not at all my intention. I merely meant to state that I don't believe this to be in any way controversial. I have shown quite clearly in my email that KnownModules will have pointers to data that the Preprocessor owns. If the existing Preprocessor shared pointer is the last reference, it will obviously be deleted now that we're reassigning to it. Thereby, we are deleting the Preprocessor which will delete all the data it owns and we are keeping KnownModules alive (with cached pointers to data that is being deleted). There is no situation I can think of in which it is reasonable to keep pointers to deleted data. If I came across an issue of this nature - clearly undefined behaviour - in the PPC back end where I spend most of my time, I'd probably not post for review as a fix is clearly in order. But since I am not intimately familiar with this code, I thought I'd get another opinion on the fix by sending an email to the dev list and posting on Phabricator.

All that being said, it sounds like there is an objection to me committing this so I certainly won't proceed without an approval on this review. If you or anyone else can offer a suggestion on how I might come up with a test case for this - or perhaps an alternative fix for this issue, I am more than happy to incorporate your suggestions.

Ping. Does anyone think this is a good idea? Bad idea? Have any further comments?

Is this still a problem?

I'd be interested to understand *why* the preprocessor is going away here. Is the CompilerInstance being reused for two different compilations? Is that what we should fix instead?

Do we need KnownModules at all? It seems to serve a very similar purpose to the Modules string map on ModuleMap. (The main difference seems to be that KnownModules can cache module load failures.) If we can keep only a single data structure tracking this, owned by the ModuleMap (which also owns the modules), that should remove the possibility for dangling module pointers.

I have not seen this problem resurface to be honest. When we initially hit it, changing the path for the build worked around the problem for us so we weren't really hitting in any longer. I posted this because I realized the possibility exists of having these dangling pointers (and it certainly fixed our original problem even without changing the build path).
However, I must say that I am completely out of my depth here as I am not a front end developer and don't really know how any of this stuff is supposed to work (i.e. whether it makes sense for PP here to be pointing to something that will go away).

That being said, I am perfectly happy to do what the experts suggest here, including abandoning the patch, updating it as requested, etc.

I don't think you need prior frontend expertise, just some time and patience since the modules code has some technical debt. If you are still willing to look at it, I agree with Richard's suggestion that we could merge this with the map in the Module Manager. One approach would be to start caching (and invalidating?) module load failures in the ModuleManager somehow, redirect APIs using KnownModules to point there, and then delete this cache entirely. Another approach would be to re-evaluate if we need to cache module load failures; hypothetically, it's possible we don't need that feature (anymore).

dexonsmith commandeered this revision.Nov 3 2019, 8:37 PM
dexonsmith edited reviewers, added: nemanjai; removed: dexonsmith.
dexonsmith abandoned this revision.Nov 3 2019, 8:43 PM

I just pushed 31e14f41a21f9016050a20f07d5da03db2e8c13e, which moves KnownModules into ModuleMap as an alternative.

Do we need KnownModules at all? It seems to serve a very similar purpose to the Modules string map on ModuleMap. (The main difference seems to be that KnownModules can cache module load failures.) If we can keep only a single data structure tracking this, owned by the ModuleMap (which also owns the modules), that should remove the possibility for dangling module pointers.

ModuleMap::Modules only has top-level modules, but KnownModules also indexes submodules. I kept them both.