Page MenuHomePhabricator

[IR][RFC] Memory region declaration intrinsic
AcceptedPublic

Authored by lebedev.ri on Dec 7 2021, 11:23 AM.

Diff Detail

Unit TestsFailed

TimeTest
60,180 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg_mask_mf.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +zfh -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vloxseg_mask_mf.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vloxseg_mask_mf.c
60,170 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vlsegff_mask.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv32 -target-feature +f -target-feature +d -target-feature +v -target-feature +zfh -disable-O0-optnone -fallow-half-arguments-and-returns -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vlsegff_mask.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV32 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vlsegff_mask.c
60,190 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vloxseg_mask_mf.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-overloaded/vloxseg_mask_mf.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-overloaded/vloxseg_mask_mf.c
60,450 msx64 debian > Clang.OpenMP::target_defaultmap_codegen_01.cpp
Script: -- : 'RUN: at line 8'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -no-enable-noundef-analysis -DCK1 -verify -fopenmp -fopenmp-version=50 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/OpenMP/target_defaultmap_codegen_01.cpp -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --allow-unused-prefixes -allow-deprecated-dag-overlap /var/lib/buildkite-agent/builds/llvm-project/clang/test/OpenMP/target_defaultmap_codegen_01.cpp --check-prefix CK1
60,210 msx64 debian > Clang.OpenMP::target_parallel_for_simd_codegen_registration.cpp
Script: -- : 'RUN: at line 2'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -verify -fopenmp -fopenmp-version=45 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/OpenMP/target_parallel_for_simd_codegen_registration.cpp -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --allow-unused-prefixes /var/lib/buildkite-agent/builds/llvm-project/clang/test/OpenMP/target_parallel_for_simd_codegen_registration.cpp
View Full Test Results (10 Failed)

Event Timeline

lebedev.ri created this revision.Dec 7 2021, 11:23 AM
lebedev.ri updated this revision to Diff 392578.Dec 7 2021, 3:50 PM

Some more blurb.

courbet added inline comments.
llvm/docs/LangRef.rst
20833

*a memory region

Maybe even something like:

annotates a pointer with memory region that is accessible through that pointer.

20842

Do we want to specify that end_offset > begin_offset ?

20868–20870

I have a hard time parsing that sentence, did you mean something like:

The only ``inbounds`` addresses that can be computed from pointer ``%ptr`` are those that lie within offsets ``[-8, 24]`` from that pointer.
Other offsets will result in a :ref:`poison value <poisonvalues>`.
20869

Given the semantics above, 24 should be excluded: [8,24)

lebedev.ri marked an inline comment as done.Dec 8 2021, 2:57 AM
lebedev.ri added inline comments.
llvm/docs/LangRef.rst
20842

end_offset can not be required to be bigger than begin_offset.
E.g. consider the following 32-byte region: [%ptr - 32, %ptr + 0],
where %ptr is the end pointer.

Likewise, i don't think we can specify that
the total region size is non-zero,
because then we can't declare empty memory regions,
where there only is a end pointer.

What i think is something that still needs to be better clarified,
is whether the begin_offset should specify the
offset from the beginning of the memory region to the pointer (i.e. be non-negative (begin_offset s>= 0)),
or the offset from the pointer to the beginning of the memory region (i.e. be non-positive (begin_offset s<= 0)),

I guess, given that end_offset specifies the offset from the pointer
to the end of the region, we want begin_offset to be symmetrical,
and specify the offset from the pointer to the beginning of the memory region
(i.e. be non-positive (begin_offset s<= 0)).

20869

Hm, no, why? Is the end pointer not inbounds?

courbet added inline comments.Dec 8 2021, 4:07 AM
llvm/docs/LangRef.rst
20842

end_offset can not be required to be bigger than begin_offset.
E.g. consider the following 32-byte region: [%ptr - 32, %ptr + 0],
where %ptr is the end pointer.

In that base, begin_offset is -32 and end_offset is 0, right ? So we still have begin_offset <= end_offset.

Likewise, i don't think we can specify that
the total region size is non-zero,
because then we can't declare empty memory regions,
where there only is a end pointer.

Yes you're right, obviously for the case of int a[0]. BTW there is an inconsistency between GNU C and C99 on this for flexible array members: in the former case we should not emit an annotation for this while we should for the latter.

What i think is something that still needs to be better clarified,
is whether the begin_offset should specify the
offset from the beginning of the memory region to the pointer (i.e. be non-negative (begin_offset s>= 0)),
or the offset from the pointer to the beginning of the memory region (i.e. be non-positive (begin_offset s<= 0)),

My vote goes to begin_offset >= 0 => memory region begins after the pointer.

lebedev.ri updated this revision to Diff 392711.Dec 8 2021, 4:29 AM
lebedev.ri marked 3 inline comments as done.

Misc wording improvements.

(@courbet i believe i have addressed your comments)

courbet added inline comments.Dec 10 2021, 1:14 AM
llvm/docs/LangRef.rst
20852–20855

I'm still confused as to why we need that restriction.

For example, why do we want to disallow the following:

%p2 = getelementptr inbounds i32, i32* %p1, i64 -42
%p2_bounded = @llvm.memory.region.decl.p0i8(%p2, 42, 43)
%p3 = getelementptr inbounds i32, i32* %p2_bounded, i64 42
%v = load i32, i32* %p3, align 4
lebedev.ri marked an inline comment as done.Dec 10 2021, 1:24 AM
lebedev.ri added inline comments.
llvm/docs/LangRef.rst
10364–10371

I'm still confused as to why we need that restriction.

nikic added a subscriber: nikic.Dec 13 2021, 3:02 AM
nikic added inline comments.
llvm/docs/LangRef.rst
20852–20855

At least as defined, %p2_bounded would already be poison in that case, because %p2_bounded would be before the declared memory region. The fact that %p3 would point back into the memory region doesn't matter in that case, because the pointer is already poison.

20896

Doesn't this sentence contradict the first sentence in "Semantics"? If you want to make a distinction between inbounds/non-inbounds, then I think you have to do that in terms of restricting the visible allocated object, rather than saying that any pointer based on it cannot be outside the range. That would mean that something like %p = memory.region.decl(%p0, 8, 16) would not be poison, though dereferencing it would be and doing an inbounds gep would be, while doing a non-inbounds gep by 8 and then dereferencing would be legal.

llvm/include/llvm/IR/Intrinsics.td
1197

While technically correct, annotating it Returned means that the intrinsic is simply going to be optimized away. You want to drop returned here and instead add the intrinsic to isIntrinsicReturningPointerAliasingArgumentWithoutCapturing(), which handles various intrinsics of this kind.

lebedev.ri marked 5 inline comments as done.

Rebased, addressed review notes.
Ping, i feel like i'd want to start making progress here.

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 12:13 PM
lebedev.ri added inline comments.Mar 2 2022, 12:13 PM
llvm/docs/LangRef.rst
20896

Hmm, i do not remember why i've added that footprint here.
I believe we don't want to make a distinction between inbounds/non-inbounds.

I still believe this should be a generic assume.passthrough intrinsic with an operand bundle for the specific use case.
I briefly described that here: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154281.html

The intrinsic would also be speculatable, readnone, all the good stuff.
The resulting value has all the annotated properties *or* is poison.
This matches attributes on call sites and as such we can retain attributes from call sites:

call @foo(i8* %p, i8* align(16) nonnull %p);

will become

%arg1 = llvm.assume.passthrough(%p) ["align"(16), "nonnull"]
call @foo(i8* %p, i8* align(16) nonnull %arg1);

just before inlining. This way the information is retained properly.
Note that we should already retain it with llvm.assume if there is a noundef as well.

Does anybody else have any thoughts/opinions on this?

I still believe this should be a generic assume.passthrough intrinsic with an operand bundle for the specific use case.

Is that a blocker?
I'm guessing you also argue that the noalias stuff should likewise also be designed this way?

nlopes added a comment.Mar 6 2022, 4:27 AM

This concept of sub-objects also shows up in the 'restrict' stuff. The difference is that with 'restrict' the aliasing constraints are dynamic: the promise is that e.g. two (dynamic) accesses in the original program don't alias. Here the constrains are static.

Ideally we would have a single solution for all this stuff, but instead right now we have multiple disjoint solutions. Each one will have to go through the pain of patching all optimizers so they understand the new intrinsics so they don't throttle back. It's not ideal..
I just wish we could sit physically in some place for a few days and work on a solution once and for all.

It's also very hard to discuss designs when so much code has been written already (especially for the 'restrict' stuff).

Regarding this patch in particular, I would mention that the storage of the returned sub-object is shared with the parent object, just to make it explicit. The concept looks ok.
Again, my concern is that to be able to use this new intrinsic from clang by default will take a lot of work. You'll likely face many perf regressions before patching a bunch of optimizations.

Thanks for taking a look!

This concept of sub-objects also shows up in the 'restrict' stuff. The difference is that with 'restrict' the aliasing constraints are dynamic: the promise is that e.g. two (dynamic) accesses in the original program don't alias. Here the constrains are static.

Ideally we would have a single solution for all this stuff, but instead right now we have multiple disjoint solutions. Each one will have to go through the pain of patching all optimizers so they understand the new intrinsics so they don't throttle back. It's not ideal..

Right.

I just wish we could sit physically in some place for a few days and work on a solution once and for all.

It's also very hard to discuss designs when so much code has been written already (especially for the 'restrict' stuff).

FWIW, currently i don't have any further code for this.

Regarding this patch in particular, I would mention that the storage of the returned sub-object is shared with the parent object, just to make it explicit. The concept looks ok.

Done.

Again, my concern is that to be able to use this new intrinsic from clang by default will take a lot of work. You'll likely face many perf regressions before patching a bunch of optimizations.

I acknowledge this reality. I'm just not sure we have a better approach than that,
it would be good to come up with a "a solution once and for all",
but i'm not sure how that would look.

In general I like this and would be very happy if we had a better way of dealing with sub-objects in LLVM.

Looking at this from the CHERI perspective, it seems like we could enforce this intrinsic at runtime by lowering it the same way as we current do for the out-of-tree`@llvm.cheri.bounds.set()`. This intrinsic also creates a memory subregion, the only difference as far as I can see it is that negative start offsets have to be encoded by doing a negative GEP first since we only have a pointer+length argument. And of course that it is enforced at runtime by narrowing the bounds that are part of the CHERI pointer.

llvm/docs/LangRef.rst
20876

Do you envision this being used for all sub-object pointer creations? If so it might need a flag to disable it since it might break some C patterns such as container_of.

According to https://godbolt.org/z/evTbejaMf the container_of macro results in an inbounds GEP, so with sufficient inlining things might break?

About three years ago I spent quite a lot of time enforcing sub-object bounds at runtime using CHERI. Almost all code works just fine but there are things such as container_of() that require opt-out annotations. I wrote about the incompatibilities that I found in Chapter 5 of https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-949.pdf. TL;DR: not many changes needed - about 50 annotations across the entire FreeBSD source tree. Almost all annotations due to container_of or emulation of C++ inheritance in C.

lebedev.ri marked an inline comment as done.Mar 21 2022, 3:47 PM

Thank you for taking a look!

llvm/docs/LangRef.rst
20876

I'm not sure what you mean by "all sub-object pointer creations".

Roughly, front-ends should emit this intrinsic on some pointer with some bounds
iff they know that it would be UB to go *from that specific pointer* (aka, as per def-use)
outside of the specified bounds.

The one case we know of is C arrays within structs.

arichardson added inline comments.Mar 21 2022, 4:04 PM
llvm/docs/LangRef.rst
20876

By sub-object pointer creations I mean something like &obj->field. You could conceivable treat that as declaring a new sub-object bounded to just field. E.g. something like this: https://godbolt.org/z/bM7j1bxqs

lebedev.ri marked 2 inline comments as done.Mar 21 2022, 4:13 PM
lebedev.ri added a subscriber: aaron.ballman.
lebedev.ri added inline comments.
llvm/docs/LangRef.rst
20876

I think the question is slightly wrong. It's up to front-ends to decide when they can and can't emit this.

If you are asking about https://godbolt.org/z/adq5EWx17,
then as per previous conversations about this, i do believe that code to be well-defined and not UB.

IOW, i do not believe that as per the current C/C++ standards wording each data member of a struct
is it's own sub-object from which you are not allowed to get to it's neighbor objects,
But perhaps @aaron.ballman wants to correct me on this.

arichardson added inline comments.Mar 21 2022, 4:29 PM
llvm/docs/LangRef.rst
20876

Yes absolutely agreed that this is purely up to the frontend, I just assumed you had plans to update clang to emit the new intrinsic.

It has been a long time since I looked at the C standard with regards to subobjects but if I recall correctly you are right that this is not defined as being illegal. However, doesn't that also mean that you can access the member before/after an embedded array?

lebedev.ri marked 2 inline comments as done.Mar 21 2022, 4:39 PM
lebedev.ri added inline comments.
llvm/docs/LangRef.rst
20876

Right. Having a pointer to the array member of a struct isn't going to do anything.
The magic happens when you have a pointer to the *element* of the array member:
https://godbolt.org/z/cjE5bY4G4 <- manually crafted

aaron.ballman added inline comments.Mar 22 2022, 5:37 AM
llvm/docs/LangRef.rst
20876

IOW, i do not believe that as per the current C/C++ standards wording each data member of a struct is it's own sub-object from which you are not allowed to get to it's neighbor objects, But perhaps @aaron.ballman wants to correct me on this.

The rules on pointer arithmetic in C++: http://eel.is/c++draft/expr.add#4, so https://godbolt.org/z/adq5EWx17 looks like UB to me in C++. C2x 6.5.6p9 reads similarly, so it also looks like UB to me in C.

What language rules make you think otherwise?

lebedev.ri marked 2 inline comments as done.Mar 22 2022, 6:40 AM
lebedev.ri added inline comments.
llvm/docs/LangRef.rst
20876

Hm, thank you for correcting me, that does not match my recollection from the previous time we discussed this.
What about https://godbolt.org/z/8T54zxT43, is that pointer well-defined?

aaron.ballman added inline comments.Mar 22 2022, 6:59 AM
llvm/docs/LangRef.rst
20876

I believe that's also UB for the same reason -- the subtraction violates http://eel.is/c++draft/expr.add#4.3, so we can't say much about the resulting pointer value. Note, a slightly different case of using + 1 instead of - 1 is valid because any object can be treated as an array of one (https://eel.is/c++draft/basic.compound#3.sentence-12) and a one-past-the-end pointer is valid (http://eel.is/c++draft/expr.add#4.2) so long as it's not dereferenced.

lebedev.ri marked 2 inline comments as done.Mar 22 2022, 7:04 AM

@aaron.ballman thank you!

llvm/docs/LangRef.rst
20876

Right, +1 (end pointer) is non-dereferenceable, and is fine.

Hmmmm, that sounds too good to be true. Then i stand corrected,
i guess we could emit it even for such cases, although that will likely break code,
and i'm not sure if a sanitizer could be implemented to catch the UB.

Does anybody else have any thoughts/opinions on this?

I still believe this should be a generic assume.passthrough intrinsic with an operand bundle for the specific use case.

Is that a blocker?

No. Though, we should put a TODO somewhere if someone comes looking for a nice project to cleanup our messes.

lebedev.ri marked an inline comment as done.Mar 23 2022, 1:31 PM

Does anybody else have any thoughts/opinions on this?

I still believe this should be a generic assume.passthrough intrinsic with an operand bundle for the specific use case.

Is that a blocker?

No. Though, we should put a TODO somewhere if someone comes looking for a nice project to cleanup our messes.

Awesome! There has been a lot of approval of the proposal, but does anyone want to rubber-stamp it before i land this?

arichardson accepted this revision.Mar 23 2022, 1:45 PM

I am happy with this but I feel like someone else should also approve it :)

llvm/docs/LangRef.rst
20876

Not a sanitizer for x86, but you can use CHERI LLVM and compile with -cheri-bounds=subobject-safe to detect such violations at runtime. Code can then be run on QEMU or also on Arm's Morello boards that were recently released (if you happen to be one of the recipients).
We use that flag for the FreeBSD kernel and it works very well with only a few minor adjustments.

I believe that a sanitizer for non-CHERI hardware would require complete provenance (bounds) shadow memory for every pointer so would be a big engineering effort.

This revision is now accepted and ready to land.Mar 23 2022, 1:45 PM

FWIW, this sounds like a generally useful feature to me and I believe it can be used to better express some C and C++ semantics.

Ping. Do you plan to commit the change? Any blockage?

Ping. Do you plan to commit the change? Any blockage?

I don't know. I think i'm waiting for the dust to settle on the github pr debacle.

nikic added a comment.Apr 6 2022, 6:56 AM

Ping. Do you plan to commit the change? Any blockage?

I don't know. I think i'm waiting for the dust to settle on the github pr debacle.

Any context on that? I feel like I missed something here...

Anyway, I find the overall wording here still a bit confusing. I would put more emphasis on the fact that this effectively restricts the "allocated object" to a certain offset range, which should have the following three effects:

  • For non-inbounds GEPs, this should have no effect.
  • For inbounds GEPs, if the GEP goes outside the range [ptr+begin_offset, ptr+end_offset], the GEP result is poison.
  • For accesses, if the access is outside the range [ptr+begin_offset, ptr+end_offset-1], the behavior is undefined.

The current wording mostly emphasizes the middle point, but not so much the first and the last. And the fact that the "one past the end of the region" address is only valid for GEP inbounds but not for accesses is probably important for optimization purposes.

llvm/docs/LangRef.rst
20841

nit: "offset to" -> "offset from"?

llvm/include/llvm/IR/Intrinsics.td
1196

nit: Is the ReadNone here actually meaningful if the whole intrinsic is already IntrNoMem?

Ping. Do you plan to commit the change? Any blockage?

I don't know. I think i'm waiting for the dust to settle on the github pr debacle.

Any context on that? I feel like I missed something here...

The TLDR is that i don't feel like contributing to projects that continuously shit on their fellow contributors.
Migration from phab to github pull requests will likely be my tipping point.

Anyway, I find the overall wording here still a bit confusing. I would put more emphasis on the fact that this effectively restricts the "allocated object" to a certain offset range, which should have the following three effects:

  • For non-inbounds GEPs, this should have no effect.
  • For inbounds GEPs, if the GEP goes outside the range [ptr+begin_offset, ptr+end_offset], the GEP result is poison.
  • For accesses, if the access is outside the range [ptr+begin_offset, ptr+end_offset-1], the behavior is undefined.

The current wording mostly emphasizes the middle point, but not so much the first and the last. And the fact that the "one past the end of the region" address is only valid for GEP inbounds but not for accesses is probably important for optimization purposes.

brooks added a subscriber: brooks.Apr 8 2022, 1:47 PM
This comment was removed by brooks.
bsmith added a subscriber: bsmith.May 9 2022, 7:22 AM
Matt added a subscriber: Matt.Sep 3 2022, 11:52 AM