Page MenuHomePhabricator

[BitcodeReader] Allow reading pointer types from old IR
Needs ReviewPublic

Authored by sebastian-ne on Jun 14 2022, 2:09 AM.

Details

Summary

When opaque pointers are enabled and old IR with typed pointers is read,
the BitcodeReader automatically upgrades all typed pointers to opaque
pointers. This is a lossy conversion, i.e. when a function argument is a
pointer and unused, it’s impossible to reconstruct the original type
behind the pointer.

There are cases where the type information of pointers is needed. One is
reading DXIL, which is bitcode of old LLVM IR and makes a lot of use of
pointers in function signatures.
We’d like to keep using up-to-date llvm to read in and process DXIL, so
in the face of opaque pointers, we need some way to access the type
information of pointers from the read bitcode.

This patch allows extracting type information by supplying a function to
parseBitcodeFile that gets called for each function signature. This
function can access the type information via the reader’s type IDs and
the getTypeByID and getContainedTypeID functions.
The AccessBitcodeTypeInfo test exemplarily shows how type info from
pointers can be stored in metadata for use after the BitcodeReader
finished.

Diff Detail

Unit TestsFailed

TimeTest
60,100 msx64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -shared-libasan -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/scariness_score_test.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxDynamicConfig/TestCases/Output/scariness_score_test.cpp.tmp
60,120 msx64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/scariness_score_test.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/scariness_score_test.cpp.tmp
60,070 msx64 debian > Clang.Driver::emit-reproducer.c
Script: -- : 'RUN: at line 1'; rm -rf /var/lib/buildkite-agent/builds/llvm-project/build/tools/clang/test/Driver/Output/emit-reproducer.c.tmp && mkdir /var/lib/buildkite-agent/builds/llvm-project/build/tools/clang/test/Driver/Output/emit-reproducer.c.tmp

Event Timeline

sebastian-ne created this revision.Jun 14 2022, 2:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 2:09 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
sebastian-ne requested review of this revision.Jun 14 2022, 2:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 2:09 AM
nikic added a comment.Jun 15 2022, 4:21 AM

I think exposing something like this is reasonable. My high-level question would be whether function signatures are the only place where this is needed or whether we would benefit from a more generic mechanism -- for example, does a similar use-case for calls exist?

beanz added a comment.Jun 15 2022, 7:11 AM

This approach seems reasonable to me, but I'm curious about a meta question here.

DXIL and SPIR-V (and I'm sure other targets as well), require some degree of support for typed pointers.

Might a more general annotation in the IR make sense, then the bitcode reader could annotate the module with type information which would persist after the module is created.

We have a guarantee that we can read older bitcode, but it seems weird to amend the bitcode reader to support a specific version of older LLVM IR.
What's the use case for wanting to read and preserve everything in the DXIL?

I think exposing something like this is reasonable.

That’s encouraging, thanks for the feedback.

My high-level question would be whether function signatures are the only place where this is needed or whether we would benefit from a more generic mechanism -- for example, does a similar use-case for calls exist?

Good point. As far as I know, dxil does not use indirect calls, so one can always do CallInst->getCalledFunction() and get types from the function signature.
I’ll try to get some opaque pointer support into our existing software to see if there’s more needed.

To allow extending the callback (potentially later), we could give the lambda a Value instead of a Function?

Might a more general annotation in the IR make sense, then the bitcode reader could annotate the module with type information which would persist after the module is created.

The elementtype(<ty>) attribute on function arguments sounded sounds quite fitting as a general annotation. It’s explicitly restricted to intrinsics though.
I think there’s more problems than preserving type information. As far as I understand, the SPIR-V backend wants to compile things like C code, which allows pointers and union casts, but I think SPIR-V and DXIL both don’t allow to re-interpret memory, so not sure how that’s handled at all.

I found one more place where type information from dxil needs to be extracted and that’s metadata.
The patch now adds a second callback, which allows changing metadata while it’s read in.
In the test, that’s used to replace a pointer value metadata with a tuple of the original value and metadata that stores its type information.
I wasn’t able to store type info about metadata without replacing it because the values get indistinguishable after being read (e.g. an i8* and an i32* both end up as a ptr).

The callback for function types now takes a Value, which should make it easy to extend for CallInstructions or others if that’s needed.