This is an archive of the discontinued LLVM Phabricator instance.

Support converting pointers from opaque to typed
ClosedPublic

Authored by beanz on Mar 22 2022, 4:19 PM.

Details

Summary

Using the pointer type analysis we can re-constitute typed pointers and
populate the correct types in the bitcasts throughout the IR.

This doesn't yet handle all cases, but this should be illustrative as to the dirction and feasability of
the solution.

Diff Detail

Event Timeline

beanz created this revision.Mar 22 2022, 4:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 4:19 PM
beanz requested review of this revision.Mar 22 2022, 4:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 4:19 PM

I'm guessing this isn't actually meant for upstreaming, is it?

beanz added a comment.Mar 23 2022, 5:34 AM

I'm guessing this isn't actually meant for upstreaming, is it?

It very much is. This is part of the work required to support DXIL generation. See: https://discourse.llvm.org/t/rfc-adding-hlsl-and-directx-support-to-clang-llvm/60783/

beanz edited the summary of this revision. (Show Details)Mar 23 2022, 5:35 AM

I'm guessing this isn't actually meant for upstreaming, is it?

It very much is. This is part of the work required to support DXIL generation. See: https://discourse.llvm.org/t/rfc-adding-hlsl-and-directx-support-to-clang-llvm/60783/

Then i don't believe this design as present is acceptable for inclusion into llvm.

beanz added a comment.Mar 23 2022, 5:41 AM

Then i don't believe this design as present is acceptable for inclusion into llvm.

Please elaborate.

beanz added a comment.Mar 23 2022, 7:27 AM

A few notes about this PR.

For those not familiar with the bitcode writing infrastructure, the ValueEnumerator assigns stable IDs to IR constructs (llvm::Values, llvm::Types...) so they can refer to each other. It is not directly involved in writing bitcode, and the general approach in the ValueEnumerator has been unchanged for a very long time.

I'm trying to balance two conflicting factors with the work to support DXIL:
(1) Keeping the DirectX backend small and easy to maintain by avoiding unnecessary code duplication
-and-
(2) Keeping the DXIL writer contained in the backend

This patch adds a small override to the ValueEnumerator to insert an i8* type at the beginning of the type list. Having i8* at the beginning of the list is required to satisfy TypeList ordering constraints that the 3.7 bitcode reader (and the modern one too) requires.

beanz updated this revision to Diff 418050.Mar 24 2022, 2:40 PM

Updating to use dxil::TypedPointerType, and handle FunctionTypes containing opaque pointers.

A few notes about this PR.

For those not familiar with the bitcode writing infrastructure, the ValueEnumerator assigns stable IDs to IR constructs (llvm::Values, llvm::Types...) so they can refer to each other. It is not directly involved in writing bitcode, and the general approach in the ValueEnumerator has been unchanged for a very long time.

I'm trying to balance two conflicting factors with the work to support DXIL:
(1) Keeping the DirectX backend small and easy to maintain by avoiding unnecessary code duplication
-and-
(2) Keeping the DXIL writer contained in the backend

This patch adds a small override to the ValueEnumerator to insert an i8* type at the beginning of the type list. Having i8* at the beginning of the list is required to satisfy TypeList ordering constraints that the 3.7 bitcode reader (and the modern one too) requires.

I would consider forking the bitcode writer and its dependencies like ValueEnumerator rather than trying to share code. Having to support 3.7 bitcode features in the "official" bitcode writer doesn't seem appealing, however small the features may be. I see 3.7 bitcode as a completely new backend rather than a slight variation on LLVM head IR. The bitcode reader is already quite complicated due to maintaining backward compatibility, I would prefer if that didn't happen with the bitcode writer.

beanz added a comment.Mar 24 2022, 2:58 PM

I would consider forking the bitcode writer and its dependencies like ValueEnumerator rather than trying to share code. Having to support 3.7 bitcode features in the "official" bitcode writer doesn't seem appealing, however small the features may be. I see 3.7 bitcode as a completely new backend rather than a slight variation on LLVM head IR. The bitcode reader is already quite complicated due to maintaining backward compatibility, I would prefer if that didn't happen with the bitcode writer.

That is a reasonable approach. Worth noting, the ValueEnumerator really just maps IR constructs to stable IDs, it is used by the bitcode writer, but other than a few cases of unclean abstractions, has no real binding to the bitcode format.

My patch does completely fork the bitcode writer, I'm just trying to share the ValueEnumerator and bitcode writer pass at the moment. I can also fork both of those too.

Sound good to fork BitcodeWriter for DXIL. When the masm-style assembler was added, there was also a decision that it should be forked to llvm/lib/MC/MCParser/MasmParser.cpp to avoid complexity to the GNU assembler/Darwin assembler style AsmParser.cpp

This approach looks good to me.

Depending on bitcast instructions inserted by the DXILPrepare pass feels slightly iffy. It's an assumption about the IR that is not covered by the standard "IR contract". Then again, it is quite common for the existing backends to have such assumptions, e.g. around legality in GlobalISel. So I think this is acceptable.

Making ValueEnumerator more generically useful makes a lot of sense to me, I really only have one concern about how that is done here, see the inline comment.

llvm/lib/Bitcode/Writer/ValueEnumerator.cpp
365 ↗(On Diff #418050)

Can you explain why PrefixType is needed? It seems like a strange wart in the API.

If this is really necessary, I feel like a cleaner API for the ValueEnumerator itself would be to split the enumeration out of the constructor, so that it's used as (not sure where ShouldPreserveUseListOrder should go):

ValueEnumerator VE;
VE.EnumerateModule(M, ShouldPreserveUseListOrder);

You could then inject an EnumerateType in the DirectX backend. There may have to be a Finalize method that must be called exactly once.

I haven't thought about how that change would propagate through the BitCodeModuleWriter.

beanz updated this revision to Diff 422953.Apr 14 2022, 1:44 PM

Rebasing patches on the updated patch stack.

beanz added a comment.Apr 14 2022, 1:49 PM

@nhaehnle sorry, I realized this patch hadn't been rebased since the request to just fork the full bitcode writer.

The reason I add a "PrefixType" in the ValueEnumerator is because the ValueEnumerator's constructor does a module walk and registers types. Because Types can't be referenced out of order in the bitcode, I need to make sure to shove in the i8* type at the beginning.

My latest update to this patch keeps that behavior even though the full ValueEnumerator is forked, but with it forked I think there is a better solution I can implement to have it insert the pointer types in the right place, and move the PointerTypeMap into the ValueEnumerator (which probably makes more sense anyways).

I'll work on reworking this patch over the next day or two, but may not get an update uploaded until I get back from vacation on 4/25.

Thank you!

beanz updated this revision to Diff 425276.Apr 26 2022, 12:01 PM

Rebasing and cleaning up code.

kuhar added inline comments.May 8 2022, 7:06 PM
llvm/lib/Target/DirectX/DXILWriter/CMakeLists.txt
2

Could we use target_include_directories instead or a global ones?

beanz added inline comments.May 8 2022, 7:26 PM
llvm/lib/Target/DirectX/DXILWriter/CMakeLists.txt
2

include_directories sets the directory property not the global one, so it only applies to this directory and any subdirectories. There's no reason we couldn't use the target-specific one, but there is no material difference.

kuhar added inline comments.May 8 2022, 7:31 PM
llvm/lib/Target/DirectX/DXILWriter/CMakeLists.txt
2

I see this similar to using global variables in C++. There's no difference *now* but in my experience cmake tends to grow by copy-paste-style development and in X years we may have N targets here all with the same include directories. As we port these to other builds systems like GN or blaze, having clear target_ includes/libraries/definitions makes life much easier.

kuhar added inline comments.May 8 2022, 7:33 PM
llvm/lib/Target/DirectX/DXILWriter/CMakeLists.txt
2

s/blaze/bazel

beanz added inline comments.May 9 2022, 7:28 AM
llvm/lib/Target/DirectX/DXILWriter/CMakeLists.txt
2

I'm a lot concerned about this.

On a technical level, target_include_directories is used in 3 places in LLVM (all except for one in modules not lists), however include_directories is used almost two dozen times many of them throughout the Target subdirectories. The convention is to use include_directories. Using directory-based build settings is fundamental to the LLVM build system and has been since the days of the old Makefile build. Anyone translating CMake to another build system needs to understand that and emulate CMake correctly otherwise this will all go off the rails.

I don't think you can draw a parallel between global variables in C++ and directory-scope properties in CMake. They're not the same thing, and our entire build system is built around directory-scoped behaviors which isn't uncommon or unique to CMake. Our old Makefile build had directory-based variable overrides too.

While this may seem like a small change (and should have no material impact), you're asking me to deviate from established conventions in the codebase and I don't believe your reasons justify it. Which leads me to my even bigger concern...

On a policy level, we absolutely should not be considering the impacts on other build systems. The GN and Bazel builds were allowed in-tree on the strict policy that they were optional and would not impact the development or usage of the CMake build. Had anyone suggested in integrating GN or Bazel that we would alter our CMake conventions or usage to accommodate the new build systems they likely wouldn't have been allowed in-tree. The single biggest concern and complaint about adding GN and Bazel was that the community must have _only_ one build system. As the guy who spent 18 months cleaning up CMake so that we could delete the Makefile builds, that's a really important feature.

If, as a policy, we want to move away from directory-based build settings we should have that larger discussion before pushing one patch in a different direction.

kuhar added inline comments.May 9 2022, 7:54 AM
llvm/lib/Target/DirectX/DXILWriter/CMakeLists.txt
2

I don't have a strong opinion on this, this was just a suggestion. Sorry if this wasn't initially clear from my comment, I'll try to be more explicit in the future.

target_include_directories is used in 3 places in LLVM (all except for one in modules not lists)
While this may seem like a small change (and should have no material impact), you're asking me to deviate from established conventions in the codebase and I don't believe your reasons justify it.

I counted 80, but that's probably still not much compared to the size of the monorepo.

➜ ~/llvm/llvm-project git:(main)                                                   
$ ag target_include -Gtxt | wc -l                                                  
80

compared to:

$ ag include_directories -Gtxt | grep -v target_ | wc -l
215

So unless I counted wrong, I wouldn't say that using target_include_directories would be a deviation from the existing practices in llvm-project.

On a policy level, we absolutely should not be considering the impacts on other build systems.
...

Other build systems don't consume cmake files in any way. What I wrote was a hand-wavy argument that being explicit with target_ makes it easier to understand what cmake targets consist of, and to support that claim, in my experience, using target_ makes it easier to port builds to other (more explicit) build systems.

If, as a policy, we want to move away from directory-based build settings we should have that larger discussion before pushing one patch in a different direction.

I'm not an expert on cmake by any means, but my impression was that this shift was largely agreed on and part of 'modern cmake': https://cliutils.gitlab.io/modern-cmake/chapters/basics.html#:~:text=Targets%20are%20your,an%20include%20directory%3A

On the high level, please don't spend too much time on this if you don't agree with this suggestion.

dblaikie added a project: Restricted Project.May 9 2022, 10:44 AM
pete added a comment.May 24 2022, 10:00 AM

This looks good to me.

Only thing I can think to add is a test for non-zero address spaces. So probably the gep/load/store tests you have already but an address space of 1 (or any address space that makes sense in DXIL, not sure if any are special).

beanz updated this revision to Diff 433085.May 31 2022, 7:35 AM

Updating to handle addrspaces correctly.

pete accepted this revision.Jun 6 2022, 9:34 AM

Updating to handle addrspaces correctly.

Thanks! Given that change, this all LGTM.

This revision is now accepted and ready to land.Jun 6 2022, 9:34 AM
This revision was landed with ongoing or failed builds.Jun 6 2022, 9:39 AM
This revision was automatically updated to reflect the committed changes.