Page MenuHomePhabricator

Redefine deref(N) attribute family as point-in-time semantics (aka deref-at-point)
Needs RevisionPublic

Authored by reames on Sep 29 2021, 11:23 AM.

Details

Summary

This change shifts all of our mechanisms for specifying dereferenceability to be explicitly point-in-time. This change covers both argument and return attribute forms of both dereferenceable(N) and dereferenceable_or_null(N) attributes, as well as the parallel family of metadata.

The change in semantics is believed to be backwards compatible - that is, legacy bitcode which expected the previous semantics will not miscompile. There may be a non-trivial performance (i.e. optimization quality) impact with this change. Please see detailed discussion later on.

A bit of history for context...

Our existing semantics (as implemented in the optimizer), were primarily scope based. Once an argument was marked dereferenceable, the underlying memory was assumed to *remain* dereferenceable until the end of the dynamic scope. It's been pointed out a couple of times that this didn't align with the way clang uses the attribute, and there were even mentions of potential miscompiles.

The tension comes up from the fact that the legacy semantics happen to be really useful for other frontends, and we've been hesitant to give up the optimization power implied. We don't really have a great mechanism for context-sensitive dereferenceability.

The last iteration of this - from 2019 - eventually evolved into a proposal to split the attributes into two variants - one scoped, one point-in-time. You can see the history and discussion on D61652, but the proposal never landed.

Earlier this year, I returned to the topic and worked with Johannes (the author of the last attempt) to come up with a proposal which allowed us - we thought - to use the point-in-time semantics while recovering the scoped semantics in all practical cases. Unfortunately, this attempt failed as it turns out we don't have broad agreement on what some of our function and argument attributes actually mean. See D99100 and linked reviews/RFC for the history.

While this latest effort didn't succeed for *all* cases where the scoped semantics might be useful, it did succeed for one in particular - the GCed language use case which motivated my strong objections to Johannes' earlier proposals.

All of this is to say, we're going back to the original idea of just redefining the attributes to the weaker semantics, and giving up some optimization potential in the progress.

Expected code quality impact

There may be a negative code quality (performance) impact on some workloads. In particular, there is great uncertainty about the impact on rust code compiled by rustc. On the C/C++/Clang side, no large impact has been identified to date, but we also have not run extensive benchmarks. (Reviewer help with that would be very appreciated!)

The risk of negative code quality impacts will be reduced for LTO and languages with larger modules. We have managed to strengthen the (on by default) inference of the nosync and nofree function attributes, and one of the few cases we got everyone on board with was that an argument to such a function couldn't be freed within it's dynamic scope. As a result, attribute inference will tend to extend point-in-time semantics to scoped semantics for many common cases, provided we can see the whole call graph.

However, once we inline that function out of existence, we will tend to loose the strong deref fact in the callers scope. This means there's a pass ordering interaction which is undesirable, but also hard to avoid as we don't yet have a fully context sensitive deref analysis. We may, in the future, need to consider implementing one. (To be clear, this is not a totally new problem. The old scoped semantics had the same issue for the argument attributes, which were the most heavily used.)

If you think you might have code quality impact, you can test using the -use-dereferenceable-at-point-semantics=false flag. Once confirmed, help reproducing and reducing would be very appreciated.

On test changes

I had previously made an effort to go through and add "nofree nosync" to tests where doing so seemed to preserve the spirit of the test. What's left are cases where a) the semantic change is meaningful, or b) tests which have been updated since my last sweep. Out of these, the only ones I really haven't investigated are the AMDGPU changes.

Diff Detail

Event Timeline

reames created this revision.Sep 29 2021, 11:23 AM
reames requested review of this revision.Sep 29 2021, 11:23 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 29 2021, 11:23 AM
reames edited the summary of this revision. (Show Details)Sep 29 2021, 11:42 AM
nlopes accepted this revision.Oct 1 2021, 2:43 AM

LGTM. We have discussed this before. It's important to fix the reference types in C++.

llvm/test/Analysis/BasicAA/dereferenceable.ll
66

this one is unfortunate.. They can't alias because objsize(%obj) < dereferenceable size of %ret. This case should be trivial to catch.

This revision is now accepted and ready to land.Oct 1 2021, 2:43 AM
nikic requested changes to this revision.Oct 1 2021, 3:26 AM

Sorry, but the fact that there is still no way to opt-in to the old behavior is still a blocker from my side. If we can't use dereferenceable + nofree arguments for that purpose, then we need to provide a different way to do that. Like dereferenceable + really_nofree. It looks like the current implementation doesn't even accept the dereferenceable + nofree + noalias case originally proposed (which is pretty bad from a design perspective, but would at least work fairly well for rustc in practice). I don't think that our current analysis capabilities are sufficient to land this change at this time.

This revision now requires changes to proceed.Oct 1 2021, 3:26 AM
reames added a comment.Oct 1 2021, 3:25 PM

Sorry, but the fact that there is still no way to opt-in to the old behavior is still a blocker from my side. If we can't use dereferenceable + nofree arguments for that purpose, then we need to provide a different way to do that. Like dereferenceable + really_nofree. It looks like the current implementation doesn't even accept the dereferenceable + nofree + noalias case originally proposed (which is pretty bad from a design perspective, but would at least work fairly well for rustc in practice). I don't think that our current analysis capabilities are sufficient to land this change at this time.

@nikic Do you have any specific examples of where this causes a workload to regress? At this point, I really need something specific as opposed to a general concern. We're at the point where perfection is very much the enemy of the good here. As noted, I've already spent a lot of time trying to minimize impact.

ychen added a subscriber: ychen.Oct 2 2021, 12:39 PM

This really needs to be properly benchmarked.

llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
9

No longer happens..

llvm/test/Transforms/LICM/hoist-deref-load.ll
419

Regression in that C code?

@nikic ping on previous question. It's been a month, and this has been LGTMed. Without response, I plan to land this.

This really needs to be properly benchmarked.

This has been benchmarked on every workload I care about, and shows no interesting regressions. Unfortunately, those are all non-public Java workloads,

On the C/C++ side, I don't have a ready environment in which to run anything representative. From the semantic change, I wouldn't expect C++ to show much difference, and besides, this is fixing a long standing fairly major correctness issue. If you have particular suites you care about, please run them and share results.

At this point, I strongly lean towards committing and letting regressions be reported. We might revert, or we might simply fix forward depending on what comes up

nikic added a subscriber: aeubanks.Sat, Nov 13, 8:55 AM

@nikic ping on previous question. It's been a month, and this has been LGTMed. Without response, I plan to land this.

Sorry, I did do some measurements but forgot to report back. The only run-time workload I can easily measure are rustc check builds, where I observed regressions ranging between -0.6% and +4.8% across different projects (the -0.6% "regressions" indicate an improvement, i.e. that some deref-based optimization happens to be non-profitable). I'm not sure that helps you much apart from saying that yes indeed, this does have some impact on rust code. It's not catastrophic (though caveat emptor: impact on "benchmarky" code may well be higher), but it's also avoidable.

This really needs to be properly benchmarked.

This has been benchmarked on every workload I care about, and shows no interesting regressions. Unfortunately, those are all non-public Java workloads,

I think something important to mention here is that the Java (i.e. GC'd language) case is also the only where we don't really expect a regression, because it has good modeling under the proposed patch: GC'd pointers can't be freed during the optimization pipeline (until statepoint lowering), so they're simply not affected by this change. For that reason I don't think the fact that Java workloads don't see regressions really tells us anything about how this would behave for other frontends, which are mostly not GC'd.

On the C/C++ side, I don't have a ready environment in which to run anything representative. From the semantic change, I wouldn't expect C++ to show much difference, and besides, this is fixing a long standing fairly major correctness issue. If you have particular suites you care about, please run them and share results.

Maybe @asbirlea or @aeubanks can run some benchmarks? I would expect regressions for C++, because nofree inference is very limited (e.g. won't be inferred for pretty much any code using growing STL containers). Though at least in the C++ case regressions may be somewhat justified, in that this fixes a correctness issue in some cases.

At this point, I strongly lean towards committing and letting regressions be reported. We might revert, or we might simply fix forward depending on what comes up

I'm not sure what you're expecting from that. At least as far as Rust is concerned, the problem here seems pretty well-understood to me: You are dropping (without replacement) the ability to specify that an argument is dereferenceable within a function. I'm perfectly happy with the change in default behavior, all I want is a way to get the old one back. I don't think that having an example of that in "real" code is going to add any useful information.