This change shifts all of our mechanisms for specifying dereferenceability to be explicitly point-in-time. This change covers both argument and return attribute forms of both dereferenceable(N) and dereferenceable_or_null(N) attributes, as well as the parallel family of metadata.
The change in semantics is believed to be backwards compatible - that is, legacy bitcode which expected the previous semantics will not miscompile. There may be a non-trivial performance (i.e. optimization quality) impact with this change. Please see detailed discussion later on.
A bit of history for context...
Our existing semantics (as implemented in the optimizer), were primarily scope based. Once an argument was marked dereferenceable, the underlying memory was assumed to *remain* dereferenceable until the end of the dynamic scope. It's been pointed out a couple of times that this didn't align with the way clang uses the attribute, and there were even mentions of potential miscompiles.
The tension comes up from the fact that the legacy semantics happen to be really useful for other frontends, and we've been hesitant to give up the optimization power implied. We don't really have a great mechanism for context-sensitive dereferenceability.
The last iteration of this - from 2019 - eventually evolved into a proposal to split the attributes into two variants - one scoped, one point-in-time. You can see the history and discussion on D61652, but the proposal never landed.
Earlier this year, I returned to the topic and worked with Johannes (the author of the last attempt) to come up with a proposal which allowed us - we thought - to use the point-in-time semantics while recovering the scoped semantics in all practical cases. Unfortunately, this attempt failed as it turns out we don't have broad agreement on what some of our function and argument attributes actually mean. See D99100 and linked reviews/RFC for the history.
While this latest effort didn't succeed for *all* cases where the scoped semantics might be useful, it did succeed for one in particular - the GCed language use case which motivated my strong objections to Johannes' earlier proposals.
All of this is to say, we're going back to the original idea of just redefining the attributes to the weaker semantics, and giving up some optimization potential in the progress.
Expected code quality impact
There may be a negative code quality (performance) impact on some workloads. In particular, there is great uncertainty about the impact on rust code compiled by rustc. On the C/C++/Clang side, no large impact has been identified to date, but we also have not run extensive benchmarks. (Reviewer help with that would be very appreciated!)
The risk of negative code quality impacts will be reduced for LTO and languages with larger modules. We have managed to strengthen the (on by default) inference of the nosync and nofree function attributes, and one of the few cases we got everyone on board with was that an argument to such a function couldn't be freed within it's dynamic scope. As a result, attribute inference will tend to extend point-in-time semantics to scoped semantics for many common cases, provided we can see the whole call graph.
However, once we inline that function out of existence, we will tend to loose the strong deref fact in the callers scope. This means there's a pass ordering interaction which is undesirable, but also hard to avoid as we don't yet have a fully context sensitive deref analysis. We may, in the future, need to consider implementing one. (To be clear, this is not a totally new problem. The old scoped semantics had the same issue for the argument attributes, which were the most heavily used.)
If you think you might have code quality impact, you can test using the -use-dereferenceable-at-point-semantics=false flag. Once confirmed, help reproducing and reducing would be very appreciated.
On test changes
I had previously made an effort to go through and add "nofree nosync" to tests where doing so seemed to preserve the spirit of the test. What's left are cases where a) the semantic change is meaningful, or b) tests which have been updated since my last sweep. Out of these, the only ones I really haven't investigated are the AMDGPU changes.
this one is unfortunate.. They can't alias because objsize(%obj) < dereferenceable size of %ret. This case should be trivial to catch.