Page MenuHomePhabricator

[WebAssembly] WIP: Add support for reference types
AbandonedPublic

Authored by pmatos on Aug 9 2019, 3:26 PM.

Details

Summary

This is a quick-and-dirty straw man implementation of reference-types
support for the WebAssembly backend. anyref value types are represented
as non-integral pointers in addrspace(1) and tables are represented as
external global in addrspace(1) pointing to pointers also in addrspace(1).

The bulk of the work here is just adding all the boiler plate for multiple tables
and the new instruction encodings. The ISel part of this is very primitive
and just the quickest thing I could possibly do to play with this. In its current
state, I've gotten it to emit .o files that I think are correctly formed,
but I haven't been able to test them properly (I think at least some of the
problem is browser bugs I'm looking into).

This isn't ready for review or anything, but I figured
a) This might save somebody else some time in implementing the boiler plate
b) It could serve as a starting point to discuss how to represent this in IR

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Harbormaster completed remote builds in B36554: Diff 214467.
fitzgen added a subscriber: fitzgen.Aug 9 2019, 3:38 PM
tlively added a subscriber: tlively.Aug 9 2019, 4:22 PM

Thank you! We also have been floating some ideas about supporting reference types as types in another address space, but haven't actually started implementing it AFAIK. I like the general direction. Let's wait for other people's comments too.

  • Are you planning to land this at some point (after adding some tests and such), or is this purely for discussions for the direction? If it's the latter, maybe I don't need to comment in the code in too much detail.. But anyway I did, so please ignore them if you are not actually planning to land this.
  • Even if not to the extent to cover the whole CL, I think a few test cases still would be helpful for discussions to see how reference types can be represented in .ll files and printed in .s files.
  • There are many places that add anyref but not funcref.
  • Out of curiousity, is Julia planning to use reference types? If so, for which concept?
  • I know it is not finished, but please clang-format if you are gonna land this later.
  • Maybe you've already seen this, but if we end up landing this, we should update the linking spec in the tool convention, because we add a new relocation type.
lld/wasm/WriterUtils.cpp
235

Why not funcref?

llvm/include/llvm/BinaryFormat/WasmRelocs.def
18

I guess this is for the index to distinguish tables, not elements within the table, right? The name is not very distinguishable with R_WASM_TABLE_INDEX_SLEB and R_WASM_TABLE_INDEX_I32 above, so I guess we need a new name or something

llvm/include/llvm/MC/MCExpr.h
293

Do we need this? Other symbol kinds (global, function, and events) don't have corresponding VK_WSM_ entries here. VK_WASM_TYPEINDEX is left for some other reason (D58472).

llvm/lib/MC/WasmObjectWriter.cpp
1204

Can't this be other reference types, like FUNCREF?

llvm/lib/Object/WasmObjectFile.cpp
823

event -> table

995

Why should we only allow anyref?

llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h
51

Global -> Table

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1155

It looks it assumes at this point we are not PIC. Maybe we can assert in if (isPositionIndependent()) { above that nonzero address space is not supported?

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h
63

How are we gonna represent other reference types, such as funcref or exnref?

llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.td
149

Not sure what this means. Isn't this the type of table itself, not that of the elements?

llvm/lib/Target/WebAssembly/WebAssemblyInstrTable.td
6
  • Nit: For these two lines: We usually don't seem to have a whitespace after : for *.td files
  • Is the reason for using offset instead of index here that you plan to use named operand table methods?
11

I guess the name should be anyref.table.get to follow the wasm backend convention (even though we don't have separate instructions). And because "table.get" part is the constant, we only need to parameterize the prefix. This can be similar to something like here.

13

I guess rules to translate instructions for non-zero indices have not been implemented here, right?

llvm/lib/Target/WebAssembly/WebAssemblyRegisterInfo.td
48

Real nit: maybe we don't need a newline between reference types

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
144

Is this gonna support existing .ll files with the previous data layout?

wingo added a subscriber: wingo.Aug 12 2019, 6:38 AM

Overall direction looks good to me as well.

Ditto that tests would be useful just to make it clearer how this is represented in IR.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h
56

We'll almost-surely want to pull this 1 out into a constant, WasmAnyrefAddressSpace or something

x86 uses address spaces starting at 256 and counting up for its architecture-specific address spaces. The docs say "Address spaces 1-255 are currently reserved for user-defined code." so we should consider using 256 here.

vchuravy commandeered this revision.Oct 2 2019, 1:31 PM
vchuravy added a reviewer: loladiro.
vchuravy updated this revision to Diff 222900.Oct 2 2019, 1:31 PM
  • change anyref AS to 256
  • fix some comments and address nit

Keno has asked me to take this over for him and I will work on getting this into shape so that it can get landed.

Another approach to reference types we should look at is clang's upcoming sizeless types support (https://reviews.llvm.org/D62962, RFC: http://lists.llvm.org/pipermail/cfe-dev/2019-June/062523.html). This would allow reference types to be constructed at the source level but most operations such as sizeof, loading, and storing would be disallowed because they would be treated as incomplete types.

I don't think that approach is inconsistent with this one, though. This patch deals mostly with the backend while sizeless types are a frontend concept. I'm not sure what the codegen and lowering would look like, though.

It would be great to see some test .ll files here so we can get a better understanding for what is implemented here. I would also like to see tests of what happens if you try to do a ptrtoint or some other illegal operation on a reference type.

llvm/lib/Target/WebAssembly/WebAssembly.h
88

I assume other reference types will get their own address spaces so we can differentiate them in IR. Would it then make sense to put the table itself in another address space? We should also think about how best to partition the address space space looking forward to when we will have multiple memories, multiple tables, and multiple function references or reference type imports.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1121

I think this message needs to be updated for the new AS numbers.

One other thought: Does the implementation in this diff care what type the reference type pointers are pointing to? It would be nice if we could enforce the use of an opaque pointee type to prevent the reference types from being dereferenced.

vchuravy updated this revision to Diff 231343.Nov 27 2019, 7:42 PM
vchuravy marked 6 inline comments as done.
  • add test for passing anyref through a function
  • fix wrong name for funcref
  • fixes and formatting
  • fix error message
vchuravy marked 4 inline comments as done.Nov 27 2019, 7:52 PM

Rebased onto current master and added an initial test. (I will start adding more as I start integrating this with the rest of the toolchain)

llvm/lib/Object/WasmObjectFile.cpp
995

Not sure what Keno meant by this comment, but it might be to check that reference-types is enabled before allowing it as a element type.

llvm/lib/Target/WebAssembly/WebAssembly.h
88

Yes. funcref should get its own address space but I am not sure about its usage yet.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h
63

The question is which of these types need representing. As far as I can tell exnref and is produced by LLVM already and funcref is primarily used for function tables in modules and isn't necessarily a frontend facing time. Albeit nothing in the spec says that it couldn't be.

If funcref would need a frontend facing representation it would need a new AS.

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
144

I don't see why it shouldn't without activating -mattr=+reference-types new .ll files won't compile,
but old one should.

vchuravy updated this revision to Diff 231460.Nov 28 2019, 12:08 PM

support old DL modules

vchuravy updated this revision to Diff 231462.Nov 28 2019, 12:18 PM

restore previous changes

vchuravy marked an inline comment as done.Nov 28 2019, 12:23 PM
vchuravy added inline comments.
llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
144

I see what you meant know after building emscripten/binaryen with these changes, I made the DL changes depending on the feature flag.

vchuravy updated this revision to Diff 231601.Nov 30 2019, 2:59 PM
  • fix AS in anyref testfile
yowl added a subscriber: yowl.May 20 2020, 1:42 PM
pmatos added a subscriber: pmatos.Jul 10 2020, 2:35 AM

I am interested in continuing this work and have a patch in progress based on the current available one here. Should I post the new patch here or under a new bug?

vchuravy abandoned this revision.Jul 27 2020, 7:30 AM

I am interested in continuing this work and have a patch in progress based on the current available one here. Should I post the new patch here or under a new bug?

Feel free to commandeer this revision

I am interested in continuing this work and have a patch in progress based on the current available one here. Should I post the new patch here or under a new bug?

Feel free to commandeer this revision

Thanks. I will post something soon.

pmatos commandeered this revision.Jul 28 2020, 2:27 AM
pmatos added a reviewer: vchuravy.
pmatos updated this revision to Diff 281154.Jul 28 2020, 2:29 AM

Initial implementation of reference types in the WebAssembly backend

Please ignore my .gitlab-ci.yml. That's just an internal change that I got uploaded by mistake.
I am looking to see this through and start discussion on this with the goal of landing it.

At the moment, for example this test crashes:

struct {
  __attribute__((address_space(256))) char *a[0];
} b;
void c() {
  for (int d = 0; d < 10; d++)
    b.a[d] = 0;
}

This picked up from previous work by @vchuravy. I am still surprised by, and don't understand the reasoning behind, using a an i8 * for the externref representation. If anything I would expect to see i32 *.
In any case, the test above crashes in loop vectorization in TargetLowering.h getValueType because externref is casted just fine to a pointer type and it shouldn't be since externref is supposed to be opaque.

I would be keen to hear some comments and suggestions going forward on this.

There's also this line of work on opaque types that could be potentially interested but seems far from being landed: https://groups.google.com/g/llvm-dev/c/Dw_DYSXGFto/m/OzzK-CkGAwAJ

Please ignore my .gitlab-ci.yml. That's just an internal change that I got uploaded by mistake.
I am looking to see this through and start discussion on this with the goal of landing it.

It's great to see this patch being picked up again!

At the moment, for example this test crashes:

struct {
  __attribute__((address_space(256))) char *a[0];
} b;
void c() {
  for (int d = 0; d < 10; d++)
    b.a[d] = 0;
}

This picked up from previous work by @vchuravy. I am still surprised by, and don't understand the reasoning behind, using a an i8 * for the externref representation. If anything I would expect to see i32 *.
In any case, the test above crashes in loop vectorization in TargetLowering.h getValueType because externref is casted just fine to a pointer type and it shouldn't be since externref is supposed to be opaque.

I'm not aware of any particular reason to use i8* and I would expect i32* to work as well. Certainly once LLVM has opaque pointers, we will want to take advantage of those. What is this test case supposed to do? As far as I can tell, it's not very meaningful to have reference types in a struct since they can't be stored in memory. Is the zero assignment supposed to become a nullref assignment?

I would be keen to hear some comments and suggestions going forward on this.

I am curious about how you plan to use this functionality from frontends going forward and what exact use cases you are aiming to support.

clang/lib/Basic/Targets/WebAssembly.cpp
43 ↗(On Diff #281154)

I would split the target-features changes out into a separate patch that we can land right away. There is utility in having the feature flag available to be put into the target-features section even if LLVM cannot itself emit reference types usage yet.

183 ↗(On Diff #281154)

It would be good to have a comment about what is different in this data layout string. Also, is it really necessary to have a different data layout string when reference types are enabled?

llvm/include/llvm/CodeGen/ValueTypes.td
169

Do we need funcref here as well, or do we just use externref everywhere and infer separate funcref types later?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4091

What is this change for?

pmatos updated this revision to Diff 281540.Jul 29 2020, 5:47 AM

Update patch to compile against current master branch

I will be splitting the part enabling the target feature through clang into a separate revision as suggested by @tlively

clang/lib/Basic/Targets/WebAssembly.cpp
43 ↗(On Diff #281154)

Sure - I will create a separate revision for this bit then.

183 ↗(On Diff #281154)

ni will define 256 as a non-integral address space. This is required for externref but I don't think it hurts to have it even if reference types is disabled but it feels cleaner to not have it if not needed.

llvm/include/llvm/CodeGen/ValueTypes.td
169

That's the plan for now. I think I haven't seen the use of having funcref here as well so I haven't added it.

I will be splitting the part enabling the target feature through clang into a separate revision as suggested by @tlively

I just noticed that most of this work landed in an earlier commit: https://github.com/llvm/llvm-project/commit/764f4089e89e4693b7bb8f1ee18080703ce760dd
I should have seen it but I based my work initially on 10.0.0 so failed to notice that.

pmatos updated this revision to Diff 282194.Jul 31 2020, 5:04 AM

Update externref patch

I will be splitting the part enabling the target feature through clang into a separate revision as suggested by @tlively

I just noticed that most of this work landed in an earlier commit

Sorry, I forgot about that!

It would be good to split this into (at least) two patches. The first should do the minimal testable amount of work to get instruction selection working, and follow-on patches can add the other parts, like additions to the object file format. Part of the reason for that split is that different people will be better at reviewing those different parts of the patch.

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
131–133

I'm not sure why we're starting with address space 256. The AMDGPU backend, for example, uses address spaces 1-7, so I think it would be ok for us to start at 1.

I also think it's somewhat strange for target features to change the datalayout. I looked at a handful of other targets and most of their data layouts were determined by the triple alone, if possible, and some considered other granular settings like the CPU. I don't really understand the discussion about compatibility that @vchuravy and @aheejin had below, though.

llvm/test/CodeGen/WebAssembly/externref.ll
6 ↗(On Diff #282194)

This would probably be better modeled by

%extern = type opaque
%externref = type %extern addrspace(256)*

rather than using i8 pointers. The LLVM IR validator would then disallow loads and stores from %externrefs.

pmatos abandoned this revision.Aug 5 2020, 3:40 AM