Context: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md
This is just a first step, adding the new instruction variants while keeping the existing 32-bit functionality working.
Some of the basic load/store tests have new wasm64 versions that show that the basics of the target are working.
Further features need implementation, but these will be added in followups to keep things reviewable.
Details
Diff Detail
Event Timeline
llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td | ||
---|---|---|
62 | side note: since we are about to start depending even more on the asm syntax, and having users try to use it, it's probably getting to now-or-never on this fix. | |
llvm/test/CodeGen/WebAssembly/cpus.ll | ||
4 | probably not to fix in this CL, but triple=wasm64 and -mcpu=mvp probably shouldn't be allowed. | |
llvm/test/CodeGen/WebAssembly/load-ext.ll | ||
5–6 | should the datalayout be removed here too? | |
llvm/test/MC/WebAssembly/wasm64.s | ||
3 | Are we able to check the encoding of the instructions yet? See e.g. https://github.com/llvm/llvm-project/blob/master/llvm/test/MC/WebAssembly/atomics-encodings.s |
llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td | ||
---|---|---|
62 | Is this about swapping offset & align? I could do that in a next commit, though personally this doesn't sound important to me. @sbc100 opinion? | |
llvm/test/CodeGen/WebAssembly/cpus.ll | ||
4 | I was actually thinking about that, but looking at the code that deals with the cpu flags (in WebAssembly.cpp) and couldn't see a good place to error out on this. Maybe in setABI ? | |
llvm/test/CodeGen/WebAssembly/load-ext.ll | ||
5–6 | Yes, forgot to delete it here. I assumed it was better to rely on whatever is default for wasm32/64, but not familiar with under what conditions this could produce values other than the ones in datalayout. | |
llvm/test/MC/WebAssembly/wasm64.s | ||
3 | Yes, we should be able to add that to this test. Not sure what the point is though, since all load/store ops have exactly the same encoding as in wasm32. Frankly, this test doesn't test anything, since the only thing that changes is that load/store now accept 64-bit operands. We need to test that those are being emitted by LLVM (which we don't test here), and that that combination is accepted by verifiers (which we also don't test here, the assembler doesn't verify types, and happily constructs illegal instruction sequences). What is new here (and is currently commented out pending new reloc types) is i64.const label, but we can't check the encoding of that, since at the .o level it will just be all zeroes. |
llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyWasmObjectWriter.cpp | ||
---|---|---|
95 | Might be worth a separate case for fixup_sleb128_i64 since we know they can only be memory addresses. And we probably want R_WASM_MEMORY_ADDR_SLEB64 and R_WASM_MEMORY_ADDR_I64 as new relocation types. |
llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td | ||
---|---|---|
62 | Yes, it's about swapping offset and align. I don't have a strong opinion on that, other than that we should decide whether or not to do it, and remove/update the comment as applicable. One argument for keeping things the same might be that offsets are much more common (at least when hand-writing) than alignments, and a format like <align>:<offset> might require an alignment when there otherwise might not need to be one. | |
llvm/test/CodeGen/WebAssembly/cpus.ll | ||
4 | yeah I guess the feature flags are a bit entangled in the CPU. If we can it would be good to error on any cpu setting (or feature flag setting) that doesn't support memory64. | |
llvm/test/CodeGen/WebAssembly/load-ext.ll | ||
5–6 | In general everything should have an explicit datalayout chosen by the frontend (also the default datalayout for a target is AFAIK the same as what clang uses explicitly when there are no ABI-affecting cflags). But when there is no frontend and it's just tests, it gets a little more interesting, and it sometimes makes sense to have tests to double duty. I think it's fine to leave it out, as long as we are actually intending to run the test 2 different ways and we've verified that we are testing what we intend to test. Otherwise it should have an explicit datalayout just because it's a general default thing to do. | |
llvm/test/MC/WebAssembly/wasm64.s | ||
3 | Oh right I had forgotten that the encodings were all the same. |
llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyWasmObjectWriter.cpp | ||
---|---|---|
95 | Yup, doing that in the follow-up, if that's ok. |
Question: Is this feature gonna be handled by -mtriple and not -mattrs like other features?
llvm/lib/Target/WebAssembly/WebAssemblyInstrAtomics.td | ||
---|---|---|
131 | Patterns and instruction definitions in WebAssemblyInstrMemory.td are defined as multiclasses to group A32 and A64 versions. I think it's better to be consistent here too..? The same for all patterns and defs for atomic.wait, atomic.notify and AStoreOffset. | |
llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td | ||
44 | Nit: Can we set do reqs = [] as a default argument so we don't need [] in many cases below? | |
72 | Requires only affects patterns and not instruction definitions. Not sure if it is Tablegen bug... So I think it's better to have Requires<HasAddr64> in patterns, here and others, like multiclass LoadPatNoOffset<ValueType ty, PatFrag kind, string inst> { def : Pat<(ty (kind I32:$addr)), (!cast<NI>(inst # "_A32") 0, 0, I32:$addr)>; def : Pat<(ty (kind I64:$addr)), (!cast<NI>(inst # "_A64") 0, 0, I64:$addr)>, Requires<[HasAddr64]>; } The same for other patterns in this file, atomics and SIMD. | |
251 | Don't we need reqs parameter and Requires<!listconcat(reqs, [HasAddr64])> here as in WebAssemblyLoad here? But actually the funny thing is Requires doesn't work in instruction definitions.. Not sure if it's a Tablegen bug. We have them on definitions anyway, and I think it's ok, and and if we have them it's better to be consistent. | |
llvm/test/MC/WebAssembly/wasm64.s | ||
107 | If the only thing this file tests currently is whether LLVM accepts this file without an error, do we need to have these CHECK lines, which are the same as the original program? Roundtripping capability is tested in other files already. I might be mistaken, so please let me know if so. |
@aheejin yes, this is part of the triple, much like x86 vs x64. That structure was already in place before I started.
llvm/lib/Target/WebAssembly/WebAssemblyInstrAtomics.td | ||
---|---|---|
131 | I tried to stay close to the original definitions. I do not feel the amount of duplication here is excessive to warrant trying to factor things out into more common definitions, given that trying to abstract things in tablegen leads to less readable results than in a normal programming language. | |
llvm/test/MC/WebAssembly/wasm64.s | ||
107 | Yes, the value of these CHECKs is low at the moment, but they check a little bit more than just the instruction being accepted, since they convert the instruction back to text. There is nothing in here that likely isn't covered by other tests, but that is hopefully changing soon. |
llvm/lib/Target/WebAssembly/WebAssemblyInstrAtomics.td | ||
---|---|---|
131 | I'm not sure the structure of original definition of these is different from that of InstrMemory.td. For ImmOff patterns, here we have class WaitPatImmOff<ValueType ty, Intrinsic kind, PatFrag operand, NI inst> : Pat<(i32 (kind (operand I32:$addr, imm:$off), ty:$exp, I64:$timeout)), (inst 0, imm:$off, I32:$addr, ty:$exp, I64:$timeout)>; def : WaitPatImmOff<i32, int_wasm_atomic_wait_i32, regPlusImm, ATOMIC_WAIT_I32>; def : WaitPatImmOff<i32, int_wasm_atomic_wait_i32, or_is_add, ATOMIC_WAIT_I32>; def : WaitPatImmOff<i64, int_wasm_atomic_wait_i64, regPlusImm, ATOMIC_WAIT_I64>; def : WaitPatImmOff<i64, int_wasm_atomic_wait_i64, or_is_add, ATOMIC_WAIT_I64>; And in WebAssemblyInstrMemory.td we have class LoadPatImmOff<ValueType ty, PatFrag kind, PatFrag operand, NI inst> : Pat<(ty (kind (operand I32:$addr, imm:$off))), (inst 0, imm:$off, I32:$addr)>; def : LoadPatImmOff<i32, load, regPlusImm, LOAD_I32>; def : LoadPatImmOff<i64, load, regPlusImm, LOAD_I64>; def : LoadPatImmOff<f32, load, regPlusImm, LOAD_F32>; def : LoadPatImmOff<f64, load, regPlusImm, LOAD_F64>; ... And I don't think the amount of duplication is excessive either, but I suggested this more for consistency, because other patterns are grouped with multiclasses. BinRMW and TerRMW related patterns in this file are grouped as well, even though they have the same number of entries with these patterns. |
llvm/lib/Target/WebAssembly/WebAssemblyInstrAtomics.td | ||
---|---|---|
131 | Agree it would be more consistent, but to make the change in this particular example requires doing instruction string concatenation (which I had to do in other cases to reduce the amount of duplication), whereas here it would just make things worse. I can make the change if you feel strongly about it, but I don't think it actually makes anything better. |
llvm/lib/Target/WebAssembly/WebAssemblyInstrAtomics.td | ||
---|---|---|
131 | I don't know why string concatenation will be specially ugly only here, given that we are doing it everywhere else? Is there something I'm missing that something is different here? (And AStore patterns too) I'm not sure what would be the reason for grouping multiclass in most of other patterns but not some of them, that's all. If the reason is there's not enough entries, BinRMW and BinTer have the same number of entries with this one too. |
llvm/lib/Target/WebAssembly/WebAssemblyInstrAtomics.td | ||
---|---|---|
131 | Just looked if I could make this happen. Making ATOMIC_I generate 2 variants is not as easy as for WebAssembyLoad, since it takes full dag arguments, which have several components that would have to be replaced with !subst or whatever.. this would get tricky. Adding a multi-class just for every pair of instructions would actually make the code bigger, and certainly less readable. For the patterns, you're looking at changing this: class WaitPatNoOffset<ValueType ty, Intrinsic kind, WebAssemblyRegClass rc, NI inst> : Pat<(i32 (kind rc:$addr, ty:$exp, I64:$timeout)), (inst 0, 0, rc:$addr, ty:$exp, I64:$timeout)>; def : WaitPatNoOffset<i32, int_wasm_atomic_wait_i32, I32, ATOMIC_WAIT_I32_A32>; def : WaitPatNoOffset<i64, int_wasm_atomic_wait_i64, I32, ATOMIC_WAIT_I64_A32>; def : WaitPatNoOffset<i32, int_wasm_atomic_wait_i32, I32, ATOMIC_WAIT_I32_A32>; def : WaitPatNoOffset<i64, int_wasm_atomic_wait_i64, I32, ATOMIC_WAIT_I64_A32>; def : WaitPatNoOffset<i32, int_wasm_atomic_wait_i32, I64, ATOMIC_WAIT_I32_A64>; def : WaitPatNoOffset<i64, int_wasm_atomic_wait_i64, I64, ATOMIC_WAIT_I64_A64>; def : WaitPatNoOffset<i32, int_wasm_atomic_wait_i32, I64, ATOMIC_WAIT_I32_A64>; def : WaitPatNoOffset<i64, int_wasm_atomic_wait_i64, I64, ATOMIC_WAIT_I64_A64>; into: multiclass WaitPatNoOffset<ValueType ty, Intrinsic kind, string inst> { def : Pat<(i32 (kind I32:$addr, ty:$exp, I64:$timeout)), (!cast<NI>(inst#_A32) 0, 0, I32:$addr, ty:$exp, I64:$timeout)>; def : Pat<(i32 (kind I64:$addr, ty:$exp, I64:$timeout)), (!cast<NI>(inst#_A64) 0, 0, I64:$addr, ty:$exp, I64:$timeout)>; } defm : WaitPatNoOffset<i32, int_wasm_atomic_wait_i32, "ATOMIC_WAIT_I32">; defm : WaitPatNoOffset<i64, int_wasm_atomic_wait_i64, "ATOMIC_WAIT_I64">; defm : WaitPatNoOffset<i32, int_wasm_atomic_wait_i32, "ATOMIC_WAIT_I32">; defm : WaitPatNoOffset<i64, int_wasm_atomic_wait_i64, "ATOMIC_WAIT_I64">; If you think that is better, I'll go make similar changes below. |
llvm/lib/Target/WebAssembly/WebAssemblyInstrAtomics.td | ||
---|---|---|
131 | (Chatted offline and posting the summary here) I meant grouping patterns, and wasn't talking about making two variants of ATOMIC_I with dags and everything. This CL makes grouped patterns in WebAssemblyInstrSIMD.td but doesn't touch SIMD_I itself. That's what I meant in this file too. |
Some lines in WebAssemblyInstrAtomics.td and WebAssemblyInstrSIMD.td look they exceed 80 cols, so please fix that too :)
llvm/lib/Target/WebAssembly/WebAssemblyInstrAtomics.td | ||
---|---|---|
120 | These two patterns are still ungrouped.. |
llvm/lib/Target/WebAssembly/WebAssemblyInstrAtomics.td | ||
---|---|---|
120 | introducing a similar structure to the patterns below for these would create a multiclass with 2 patterns only called once, which is strictly worse than what is here |
llvm/test/CodeGen/WebAssembly/cpus.ll | ||
---|---|---|
4 | We have a bunch of errors for incompatible features and options in the driver code in clang already. That would probably be a good place for these checks, too. As for the default cpu for the wasm64 triple, we could either do a cpu with 64-bit memory and no other features like mvp, or we could do a cpu that has 65-bit memory and all the small features that were standardized and widely implemented before it. Since there aren't many such features right now, though, I guess an mvp-like mvp64 cpu would be best for now. |
llvm/test/CodeGen/WebAssembly/cpus.ll | ||
---|---|---|
4 | Ok, where do I add this? :) Maybe ok as a followup? |
I think this is good now. If the Requires I mentioned above should actually go back in, it can be in a followup CL (presumably where we add MEMORY_SIZE_I64 and friends).
llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td | ||
---|---|---|
369 | Question for the tblgen experts maybe, if a def has both an instruction definition and a pattern as this one does, does the Requires field actually apply to the pattern? Or does Requires only work for freestanding patterns? | |
llvm/test/CodeGen/WebAssembly/cpus.ll | ||
4 | oh and yes, I think it's fine for a followup and i mentioned above. |
llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td | ||
---|---|---|
369 | I would have thought that Requires should apply to both kinds of patterns. In particular, inheriting it just sets the Predicates field in both Instruction and Pattern. But I read @aheejin's comment above to mean that this doesn't work after all, so perhaps I am wrong. |
Might be worth a separate case for fixup_sleb128_i64 since we know they can only be memory addresses.
And we probably want R_WASM_MEMORY_ADDR_SLEB64 and R_WASM_MEMORY_ADDR_I64 as new relocation types.