This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
ClosedPublic

Authored by apilipenko on Feb 6 2017, 10:06 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

apilipenko created this revision.Feb 6 2017, 10:06 AM
filcab edited edge metadata.Feb 6 2017, 10:31 AM

Thanks Artur.
Can you commit the tests before, so we can easily show that there's a difference? Like in the other patch.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4465 ↗(On Diff #87251)

ANY_EXTEND mentions that the bits are undefined. Maybe we can include it too (probably there's no difference in most cases, though)?

test/CodeGen/AArch64/load-combine-big-endian.ll
377 ↗(On Diff #87251)

Should we be doing anything to addrspace !=0 pointers?

test/CodeGen/ARM/fp16-promote.ll
857 ↗(On Diff #87251)

This looks sketchy. Does pop change the PC on ARM? Did it appear but wasn't there?
If so, shouldn't bx lr still be there, to go back to the caller? (I'm not very familiar with ARM asm)

apilipenko updated this revision to Diff 87425.Feb 7 2017, 7:04 AM
  • Land tests separately and rebase the patch against it.
  • Remove addrspace(1) from the test cases.
  • Change CHECK pattern matching in fp16-promote.ll test_extractelement slightly.
apilipenko added inline comments.Feb 7 2017, 7:15 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4465 ↗(On Diff #87251)

calculateByteProvider can return either memory location (ByteProvider with memory location specified), zero constant (zero constant ByteProvider) or unknown (None). We can only express the origin of extend bytes for ZERO_EXTEND. In other cases including ANY_EXTEND it's unknown.

Or you meant something different?

test/CodeGen/AArch64/load-combine-big-endian.ll
377 ↗(On Diff #87251)

No, there is nothing special we should do. I removed addrspace qualifiers from the tests.

test/CodeGen/ARM/fp16-promote.ll
857 ↗(On Diff #87251)

Here is the codegen for this test case:

test_extractelement:
	.fnstart
@ BB#0:
	.save	{r4, r5, r11, lr}
	push	{r4, r5, r11, lr}
	.pad	#8
	sub	sp, sp, #8
	ldrd	r4, r5, [r1]
	and	r1, r2, #3
	mov	r2, sp
	orr	r1, r2, r1, lsl #1
	stm	sp, {r4, r5}
	ldrh	r1, [r1]
	strh	r1, [r0]
	add	sp, sp, #8
	pop	{r4, r5, r11, pc}

So, in fact we push lr and then pop it to pc. I changed the pattern matching slightly to make it more clear.

filcab added a subscriber: rengolin.Feb 7 2017, 8:44 AM

No problems on my side, but let's see if Renato has comments on the ARM codegen.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4465 ↗(On Diff #87251)

I was aiming for an "a read of undef can take any value" approach. But I'm not sure if "the value is undefined", in SDAG, has the same semantics. I don't think this is expected to be much of a benefit (or at all), so I think we might as well drop this question.

test/CodeGen/ARM/fp16-promote.ll
857 ↗(On Diff #87251)

Ah. That makes sense, yes. I didn't remember you could pop multiple registers in that way.
What doesn't make sense is that code.
@rengolin Can you take a look at the post-patch test_extractelement?
It's loading stuff onto registers only to spill it again and reload the actual data we want. It's also saving r11 for no reason.

RKSimon added inline comments.
test/CodeGen/ARM/fp16-promote.ll
857 ↗(On Diff #87251)

@rengolin @jmolloy Please can you advise on the ARM codegen in test_extractelement? Is it something to worry about?

efriedma added inline comments.
test/CodeGen/ARM/fp16-promote.ll
857 ↗(On Diff #87251)

The loads and stores are not ideal, but they're expected: we're scalarizing the <4 x half>, and the default expansion of extractelement involves a stack temporary. The reason the prologue and epilogue are changing is that load merging allows the backend to form ldrd: ldrd requires an even/odd register pair, so the register allocator chooses to use use r4 and r5, which therefore requires spilling r4/r5.

The end result looks fine.

filcab accepted this revision.Feb 16 2017, 3:09 AM

LGTM since ARM codegen seems to be ok

This revision is now accepted and ready to land.Feb 16 2017, 3:09 AM
This revision was automatically updated to reflect the committed changes.