Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Details
- Reviewers
RKSimon filcab reames javed.absar hfinkel - Commits
- rGe1b2d314688e: [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
rG85d758299e48: [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
rGdaaa0c0f7d40: [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
rL296651: [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
rL295336: [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
rL295314: [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Diff Detail
Event Timeline
Thanks Artur.
Can you commit the tests before, so we can easily show that there's a difference? Like in the other patch.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
4459 | ANY_EXTEND mentions that the bits are undefined. Maybe we can include it too (probably there's no difference in most cases, though)? | |
test/CodeGen/AArch64/load-combine-big-endian.ll | ||
355 | Should we be doing anything to addrspace !=0 pointers? | |
test/CodeGen/ARM/fp16-promote.ll | ||
858 | This looks sketchy. Does pop change the PC on ARM? Did it appear but wasn't there? |
- Land tests separately and rebase the patch against it.
- Remove addrspace(1) from the test cases.
- Change CHECK pattern matching in fp16-promote.ll test_extractelement slightly.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
4459 | calculateByteProvider can return either memory location (ByteProvider with memory location specified), zero constant (zero constant ByteProvider) or unknown (None). We can only express the origin of extend bytes for ZERO_EXTEND. In other cases including ANY_EXTEND it's unknown. Or you meant something different? | |
test/CodeGen/AArch64/load-combine-big-endian.ll | ||
355 | No, there is nothing special we should do. I removed addrspace qualifiers from the tests. | |
test/CodeGen/ARM/fp16-promote.ll | ||
858 | Here is the codegen for this test case: test_extractelement: .fnstart @ BB#0: .save {r4, r5, r11, lr} push {r4, r5, r11, lr} .pad #8 sub sp, sp, #8 ldrd r4, r5, [r1] and r1, r2, #3 mov r2, sp orr r1, r2, r1, lsl #1 stm sp, {r4, r5} ldrh r1, [r1] strh r1, [r0] add sp, sp, #8 pop {r4, r5, r11, pc} So, in fact we push lr and then pop it to pc. I changed the pattern matching slightly to make it more clear. |
No problems on my side, but let's see if Renato has comments on the ARM codegen.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
4459 | I was aiming for an "a read of undef can take any value" approach. But I'm not sure if "the value is undefined", in SDAG, has the same semantics. I don't think this is expected to be much of a benefit (or at all), so I think we might as well drop this question. | |
test/CodeGen/ARM/fp16-promote.ll | ||
858 | Ah. That makes sense, yes. I didn't remember you could pop multiple registers in that way. |
test/CodeGen/ARM/fp16-promote.ll | ||
---|---|---|
858 | The loads and stores are not ideal, but they're expected: we're scalarizing the <4 x half>, and the default expansion of extractelement involves a stack temporary. The reason the prologue and epilogue are changing is that load merging allows the backend to form ldrd: ldrd requires an even/odd register pair, so the register allocator chooses to use use r4 and r5, which therefore requires spilling r4/r5. The end result looks fine. |
ANY_EXTEND mentions that the bits are undefined. Maybe we can include it too (probably there's no difference in most cases, though)?