Page MenuHomePhabricator

[LangRef] Describe why the pointer aliasing rules are currently unsound.
Needs ReviewPublic

Authored by efriedma on Oct 5 2020, 4:04 PM.

Details

Summary

I'd like some short description in LangRef we can point to when questions come up, rather than the complicated discussion in https://bugs.llvm.org/show_bug.cgi?id=34548 .

Diff Detail

Unit TestsFailed

TimeTest
240 mslinux > Clang.Driver::riscv-cpus.c
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -target riscv32 -### -c /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/riscv-cpus.c 2>&1 -mcpu=rocket-rv32 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefix=MCPU-ROCKET32 /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/riscv-cpus.c

Event Timeline

efriedma created this revision.Oct 5 2020, 4:04 PM
Herald added a project: Restricted Project. · View Herald Transcript
efriedma requested review of this revision.Oct 5 2020, 4:04 PM

I'm not sure if I'd be any wiser after reading this. I mean, except that I might think "it seems to be complicated".

I'll try to think about whether there's any way to make this more readable. Maybe some examples would help?

I'll try to think about whether there's any way to make this more readable. Maybe some examples would help?

Yes. Some text and the one example that shows why int2ptr->ptr2int is not a no-op would suffice in addition to a link to the other resources.

aqjune added inline comments.Oct 5 2020, 11:37 PM
llvm/docs/LangRef.rst
2558

The contents below talk about loads that produce pointers as well - maybe the title should be more comprehensive?

2571

Let's talk about inttoptr in this paragraph first, and talk about load pointers as separate paragraph(s). I believe inttoptr already has quite a few things to talk about.

I think it is important to mention that, with this inttoptr semantics, inttoptr isn't a scalar operation anymore. It cannot be freely hoisted or sunk across pointer escaping instructions, but it is currently done by LLVM.
Considering that a function with inttoptr is currently tagged as readnone, this will bring a non trivial change in performance, so it will be desirable to keep inttoptr as scalar ops. I suggest making this explicit in the text.

Another possible semantics that would be good to mention here too is to make inttoptr i simply based on i.
The caveat of the semantics is that requires defining based-on relation between non-pointer values as well, which is super tricky; for example, given two integers i and j, if (i == j) use(i) cannot be optimized to if (i == j) use(j) anymore because i and j may be based on different objects. Also, there are big amount of int/float operations (shift, icmp, fptosi, ...) to deal with.

2578

FYI: Alive2 uses this semantics, and it returns poison when a pointer-byte is read as a non-pointer type (and vice versa). The main blocker is a load type canonicalization in InstCombine, which changes load pointer to load i64.

Thank you for the patch!

lebedev.ri added inline comments.Oct 5 2020, 11:46 PM
llvm/docs/LangRef.rst
2578
aqjune added inline comments.Oct 5 2020, 11:56 PM
llvm/docs/LangRef.rst
2578

Wow, this is super nice!!

nlopes added a comment.Oct 6 2020, 3:05 AM

My meta-comment about this patch is that I'm not sure LangRef is the right place for this content. I see LangRef as the stuff that is set in stone, not necessarily for ongoing discussions.
However, since LangRef doesn't get these bits right, it might be ok to have a warning section about stuff that is disputed/under discussion so that readers know that part is not set in stone.

llvm/docs/LangRef.rst
2586

my understanding is that for C we need to allow a pointer to be at the end of the object (not past, just at the end; you can't dereference it still). This is, for example, because of loops that increment pointers and check them against the end ptr.

That end pointer can have the same underlying machine value as the beginning of another object, thus making inttoptr complicated.
If the allocator guaranteed that this case didn't happen, it would make things easier, but for small objects that's probably a big penalty. And LLVM can't assume that would hold for custom allocators.

We support this semantics of control-flow contributing to the aliasing of an inttoptr result. To fold inttoptr(ptrtoint(ptr)) -> ptr one needs to prove that ptr is dereferenceable though.

My meta-comment about this patch is that I'm not sure LangRef is the right place for this content. I see LangRef as the stuff that is set in stone, not necessarily for ongoing discussions. However, since LangRef doesn't get these bits right, it might be ok to have a warning section about stuff that is disputed/under discussion so that readers know that part is not set in stone.

Given the feedback, I'm going to cut the bits describing the potential solutions, to try to keep it more readable.

efriedma added inline comments.Oct 6 2020, 9:43 AM
llvm/docs/LangRef.rst
2586

my understanding is that for C we need to allow a pointer to be at the end of the object

Well, for pointers we know are dereferenced, we could mark up the GEPs more aggressively. But yes, that's where this sort of model breaks down.

To fold inttoptr(ptrtoint(ptr)) -> ptr one needs to prove that ptr is dereferenceable though.

I guess if you have load(inttoptr(ptrtoint(ptr))), you can optimize that to load(ptr) if ptr is dereferenceable: the load would be UB if it wasn't equivalent. I don't think it helps more generally, though.