This allows loads at global address + x to be selected better, by putting the global address operand into the offset.
From splitting up D139530
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/WebAssembly/userstack.ll | ||
---|---|---|
333 | so the result of this is that the frame offset will end up in a const or add expression (consumed by the i32.load8_u), and the global address will be folded? |
llvm/test/CodeGen/WebAssembly/userstack.ll | ||
---|---|---|
333 | In this case the frame offset wasn't being folded, although I'm not sure if that is intentional or not. This is the codegen before the patch: frame_offset_with_global_address: .functype frame_offset_with_global_address () -> (i32) i32.const $push0=, str global.get $push5=, __stack_pointer i32.const $push6=, 16 i32.sub $push9=, $pop5, $pop6 i32.const $push7=, 12 i32.add $push8=, $pop9, $pop7 i32.add $push1=, $pop0, $pop8 i32.load8_u $push2=, 0($pop1) i32.const $push3=, 67 i32.and $push4=, $pop2, $pop3 return $pop4 With the patch the global address gets folded in which saves a const and an add instruction: frame_offset_with_global_address: .functype frame_offset_with_global_address () -> (i32) global.get $push3=, __stack_pointer i32.const $push4=, 16 i32.sub $push7=, $pop3, $pop4 i32.const $push5=, 12 i32.add $push6=, $pop7, $pop5 i32.load8_u $push1=, str($pop6) i32.const $push0=, 67 i32.and $push2=, $pop1, $pop0 return $pop2 |
llvm/lib/Target/WebAssembly/WebAssemblyRegisterInfo.cpp | ||
---|---|---|
78–92 | Now that (add tga x) is now selected into something like i32.load offset=tga, x, the above assertion was being triggered because it assumed that any offset operand would always be an immediate, not a target global address. So this just wraps around it. |
llvm/test/CodeGen/WebAssembly/userstack.ll | ||
---|---|---|
333 | Ah ok yeah this test was added in https://reviews.llvm.org/D90577 which fixed a bug in the case below the one you are modifying in this CL. So I guess what's happening now is that this IR is being ISel'ed differently and is ending up in the first case in eliminateFrameIndex instead of the second. |
llvm/test/CodeGen/WebAssembly/userstack.ll | ||
---|---|---|
333 | Maybe we could add to the comment something like "(this allows the global address to be relocated)" Since this test doesn't cover that second case anymore, I wonder if we have any tests that do. I would assume that clause is still needed, maybe at least for when we aren't using DAG ISel... |
llvm/test/CodeGen/WebAssembly/userstack.ll | ||
---|---|---|
333 | I commented the if added in D90577 out and ran this test, and this test still passes. When it was added that wouldn't have been the case, so I'm not sure what changed since then to make this pass. So what I'm saying is, this test is not covering the if in D90577 even now. 🤷🏻 Not sure what we should do. Delete the test and come up with a new test that covers the case? But that shouldn't be this CL's responsibility.. @dschuff And I have another question. I am not very familiar with this prolog-epilog code..
|
llvm/test/CodeGen/WebAssembly/userstack.ll | ||
---|---|---|
333 | By the way I don't think we need to hold up this CL because of this issue..? |
llvm/test/CodeGen/WebAssembly/userstack.ll | ||
---|---|---|
333 | Oh, that's interesting.
|
Sorry, I missed your comment 2 days ago. See my comment below.
llvm/test/CodeGen/WebAssembly/negative-base-reg.ll | ||
---|---|---|
20–22 | OK, so looking back at https://github.com/llvm/llvm-project/issues/29497 and D24053 this actually does look like the problem that we fixed back then. |
llvm/test/CodeGen/WebAssembly/negative-base-reg.ll | ||
---|---|---|
20–22 | Thanks, that makes sense. At the risk of going down the rabbit hole here, do you think there's a way to generate a gep that results in an ISD::ADD with nuw? I.e. is there a way to say that %n is unsigned here so that we are still allowed to fold @global_i32 in? define i32 @load_i32_global_address_with_folded_offset(i32 %n) { %s = getelementptr inbounds i32, i32* @global_i32, i32 %n %t = load i32, i32* %s ret i32 %t } https://reviews.llvm.org/D15544 suggests that it used to be enough to just specify inbounds: ; CHECK-LABEL: load_i32_with_folded_gep_offset: ; CHECK: i32.load $push0=, 24($0){{$}} define i32 @load_i32_with_folded_gep_offset(i32* %p) { %s = getelementptr inbounds i32, i32* %p, i32 6 %t = load i32, i32* %s ret i32 %t } But now it looks like the pointer needs to be manually calculated: ; CHECK-LABEL: load_i32_with_folded_offset: ; CHECK: i32.load $push0=, 24($0){{$}} define i32 @load_i32_with_folded_offset(ptr %p) { %q = ptrtoint ptr %p to i32 %r = add nuw i32 %q, 24 %s = inttoptr i32 %r to ptr %t = load i32, ptr %s ret i32 %t } |
llvm/test/CodeGen/WebAssembly/negative-base-reg.ll | ||
---|---|---|
20–22 | So I guess my question is given the following C: extern int *data; int f(unsigned idx) { return data[idx]; } Should it not be semantically possible to somehow generate i32.load data($0) from it, given that: a) we know the argument is unsigned |
llvm/test/CodeGen/WebAssembly/negative-base-reg.ll | ||
---|---|---|
20–22 | Just also writing this down here so I remember later: Because the spec defines the memarg offset as an unsigned integer, the addition of the offset to the base address won't wrap. And it looks like it may be impossible to do this optimisation because the signed-ness of an integer is lost in LLVM IR. |
llvm/test/CodeGen/WebAssembly/negative-base-reg.ll | ||
---|---|---|
20–22 | To illustrate my point: Given the following C: extern char data[1024]; char h(unsigned idx) { return data[idx]; } char i(signed idx) { return data[idx]; } The emitted IR (clang -emit-llvm -S --target=wasm32 foo.c -O3) is exactly the same for the two functions: ; Function Attrs: mustprogress nofree norecurse nosync nounwind readonly willreturn define hidden signext i8 @h(i32 noundef %0) local_unnamed_addr #0 { %2 = getelementptr inbounds [1024 x i8], ptr @data, i32 0, i32 %0 %3 = load i8, ptr %2, align 1, !tbaa !2 ret i8 %3 } ; Function Attrs: mustprogress nofree norecurse nosync nounwind readonly willreturn define hidden signext i8 @i(i32 noundef %0) local_unnamed_addr #0 { %2 = getelementptr inbounds [1024 x i8], ptr @data, i32 0, i32 %0 %3 = load i8, ptr %2, align 1, !tbaa !2 ret i8 %3 } If we really wanted to cater for this, could we maybe get clang to generate the ptrtoint -> add nuw -> inttoptr sequence instead of a gep when it knows that the type is unsigned? |
Yeah, we had some of this discussion with @sunfish back in 2016 as well. I just ended up disabling some of our tests and accepting reduced optimizations in D24053 and we never came up with a way to get them back at the time.
I haven't had a chance to think about this more since my earlier comment but thanks for writing this down, I'll come back to this in a couple of weeks (feel free to ping me if I don't).
Thanks for working on this anyway, sorry it didn't quite work out. I had some hope that we could make some progress here. Perhaps in the future :(
Not at all, I haven't had any time to work on this either! Just marked it as closed to clean up the review queues. I'd be happy to reopen this/convert it to a GitHub issue. I think the discussion here is a useful starting point
Now that (add tga x) is now selected into something like i32.load offset=tga, x, the above assertion was being triggered because it assumed that any offset operand would always be an immediate, not a target global address. So this just wraps around it.