Download Raw Diff

Details

Reviewers

Summary

There is a missed optimization in the DAGCombiner.cpp LLVM file for the selection of Post Indexed Load and Store operations.
This patch fixes the code in the function CombineToPostIndexedLoadStore, that checks the uses of an ADD/SUB operation, which does not correctly check the real uses.

Diff Detail

Event Timeline

fdeferriere updated this revision to Diff 24014.Apr 20 2015, 5:28 AM

fdeferriere retitled this revision from to Fix CombineToPostIndexedLoadStore in DAGCombiner.cpp.

fdeferriere updated this object.

fdeferriere edited the test plan for this revision. (Show Details)

Herald added a subscriber: aemerson. · View Herald TranscriptApr 20 2015, 5:28 AM

fdeferriere added a reviewer: qcolombet.Apr 20 2015, 5:31 AM

fdeferriere edited subscribers, added: Unknown Object (MLST); removed: aemerson.

Hi François,

Since you are fixing two problems, I would prefer having two different patches/commits.

The fix for #1 looks good to me.
Please add a test case for it and commit it separately.

The fix for #2 is almost good. You just need the change "TryNext to RealUse" and the removal of the ADD/SUB check. The outer loop must not be removed.
Please add your test case to the patch (not just in the comment) so that it runs with make check and upload the new patch.

Thanks,
-Quentin

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8853	I don’t get why this change is useful.
8878	This change looks wrong to me. You basically assume that BasePtr == Ptr, which AFAICT, is not necessarily true. To me, the proper fix would be to simply replace TryNext by RealUse and use the opposite logic, like you did.

Thanks Quentin for your review.

I have uploaded a new version of the patch, where I removed the first part of the original patch, that will be delivered as a separate patch. I also took into account your remarks, except for one point :

I added the following code before the inner loop :
  if (Use != Op)
     continue;

With this code, the semantics remains identical to my previous commit, and I checked again this morning on our target that we need this check to avoid catching unprofitable cases. However I don't know for other targets, and on our target in most cases this does not make a big difference, so I can remove this check if you prefer.

fdeferriere added inline comments.Apr 21 2015, 5:24 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8853	Since N is a Load/Store operation, the check is redundant with the ADD/SUB check. This change was just to simplify the code, I will remove it.

Hi François,

Almost good to me. See my inlined comments.

Could you run clang-format on your patch?
Some indents are suspicious.

Thanks,
-Quentin

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9088	This check still does not make sense to me. This is indeed equivalent to your previous commit, but then, it has the same problem. The bottom line is, yes, please remove it :).
9092	I would remove the RealUse variable…
9094	And directly set TryNext here...
9101	Then, this if-block becomes useless.

Removed the RealUse variable and the extra check.

Hi François,

LGTM with a pair of nitpicks.

Please commit with those fixes or let me know if you want I commit for you.

Nice catch BTW.

Thanks,
-Quentin

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9079	Period at the end of the comment.
test/CodeGen/ARM/automod-test.ll
34	Add a CHECK-LABEL line.

This revision is now accepted and ready to land.Apr 22 2015, 9:37 AM

Fixed comments by Q.Colombet.

Please Quentin proceed with the commit, or tell me how to do it myself.
Thanks.

François.

Hi François,

The test part needs to be updated for top of tree trunk.
Indeed, thanks to your change we grab much more post/pre addressing mode for ARM and we must update the regexp of the related test case for them to pass.

Let me know if you need help.

Thanks,
-Quentin

PS: Attached the your patch for TOT, your test case needed to be updated as well.

francois1.patch3 KBDownload

Hi Quentin,

 When running the ARM self tests I found 9 failing tests.

 I could fix 5 of them :
    2012-10-04-AAPCS-byval-align8.ll
    2012-10-18-PR14099-ByvalFrameAddress.ll
    2013-01-21-PR14992.ll
    byval_load_align.ll
    ldrd.ll

There are 3 tests for which the code is worse, due to save/restore of registers modified by the automod. On our target, we added a target specific pass to fix these cases and fully benefit from the automod addressing mode. These tests are :
    truncstore-dag-combine.ll
    unaligned_load_store.ll
    wrong-t2stmia-size-opt.ll

There is one test for which I suspect that the new code is wrong. A muls is generated in a sequence where it is checked that no muls is generated. The test is :
     avoid-cpsr-rmw.ll

Also, what do you mean by "Attached the your patch for TOT, your test case needed to be updated as well" ? 

Please tell me how you want me to proceed.

Thanks,

 - François.

Hi François,

There are 3 tests for which the code is worse, due to save/restore of registers modified by the auto mod.

For those we can file a PR when the change lands. Just make sure to update those tests with a comment to explain why it is worse and that they still pass make check.

There is one test for which I suspect that the new code is wrong. A muls is generated in a sequence where it is checked that no muls is generated. The test is :

Please investigate this one. Let me know if you need help.

Also, what do you mean by "Attached the your patch for TOT, your test case needed to be updated as well" ?

I meant that I have attached to my previous email your patch where I updated the test case so that we use the new "load <dstty>, <ptrty>" syntax instead of "load <ty>”. That was supposed to help you, not confuse you :). Feel free to ignore it.

Cheers,
-Quentin

Fixed the ARM regressions tests for :
test/CodeGen/ARM/2012-10-18-PR14099-ByvalFrameAddress.ll
test/CodeGen/ARM/2013-01-21-PR14992.ll
test/CodeGen/ARM/automod-test.ll
test/CodeGen/ARM/avoid-cpsr-rmw.ll
test/CodeGen/ARM/byval_load_align.ll
test/CodeGen/ARM/ldrd.ll
test/CodeGen/ARM/wrong-t2stmia-size-opt.ll

Opened a PR 24049 to report additional copies in some cases after this fix, and marked the following ARM regression tests as XFAIL :
test/CodeGen/ARM/truncstore-dag-combine.ll
test/CodeGen/ARM/unaligned_load_store.ll

Hi François,

Thanks for fixing the tests.
I’ll have a quick look at the tests that you’ve XFAILed before deciding whether or not this is OK.

Cheers,
-Quentin

test/CodeGen/ARM/2012-10-18-PR14099-ByvalFrameAddress.ll
29	Looks like the load store optimizer could use some improvement to catch this case (assuming r0 is not used afterward).
test/CodeGen/ARM/avoid-cpsr-rmw.ll
31	Could you comment why this check is failing now? I am not saying this is wrong, I’d like to check we are not missing something.

Hi François,

A couple more comments.

It seems your patch exposed a few short coming in the load store optimizer for ARM. I am not sure we can proceed with this commit without fixing the load store optimizer first, otherwise we may regress a bunch of thing… Which is unfortunate since your patch seems a good general improvement to me.

Did you happen to run benchmark on ARM with/without this change?
That would help making our mind.

Anyway, let me look at the XFAILed test cases.

Cheers,
-Quentin

test/CodeGen/ARM/automod-test.ll
27	Looks like we could improve the load/store optimizer to catch those cases as well.
test/CodeGen/ARM/avoid-cpsr-rmw.ll
31	Never mind me, I found the answer in an exchange we had offline. For the record, the assembly code generated without and with your patch are the following : Without the patch ldr.w r9, [r0] ldr r3, [r0, #4] ldr r2, [r0, #8] ldr.w r12, [r0, #12] adds r0, #16 <— after this point, having muls would be wrong. mul r3, r3, r9 mul r2, r3, r2 mul r2, r2, r12 With the patch ldr.w r12, [r0, #4] ldr r3, [r0, #8] ldr.w r9, [r0, #12] ldr r2, [r0], #16 mul r2, r12, r2 muls r2, r3, r2 <— Now the muls is necessary, since we got rid of the adds. mul r2, r2, r9

Hi François,

Looks like the XFAILed test cases were just us being lucky in the previous version.
This is unfortunate, but like I said, I do not think we can land the patch as is, since it may regress some stuff.

Looking at the test cases makes me feel like we may not want to form the pre/post addressing mode that early, but instead rely on later passes to catch those. The rational is that I would prefer we grab the multiple load/store first since they impact the performance, whereas pre/post increment are mainly size optimization and thus less important.

The bottom line is that IMHO, we should investigate that it gives to completely get rid of that DAG combine and do the pre/post optimization later.

Let me know what do you think?

Cheers,
-Quentin

test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll
33	This one is funny because we change: ld [@addr] ld [@addr, 4] into: ld [@addr], #4 ld [@addr] {reg used for @addr is redefined} And that prevents the load store optimizer to catch it.
test/CodeGen/ARM/2013-01-21-PR14992.ll
18	This one is worrisome as I expect it may expose runtime regressions and because it seems plausible to happen frequently. Without: ldm r0!, {r4, r5, r6} bl bar With: ldr r4, [r0, #4] ldr r5, [r0, #8] ldr r6, [r0], #12 bl bar The problem here is that after register allocation, without recoloring and with the support of PRE and POST automod, we won’t be able to form a ldm since the registers are not in the right order (r6, r4, r5, instead of r4, r5, r6). The pre- regalloc load/store optimizer could be taught to reorder them to make that possible but that does not seem trivial work.
test/CodeGen/ARM/wrong-t2stmia-size-opt.ll
21	We miss that we can express this as: [@addr], #8 [@addr, -#4] That said, the stored registers are not in the right order after reg alloc to be able to recognize that this is stm @addr!, {r1, r2}.

This revision now requires changes to proceed.Jul 7 2015, 4:21 PM

Hi Quentin,

Thanks for your detailed review.

I agree with you that this patch introduces too much regressions to be delivered. I see now two possibilities :

1- Still rely on this pass to select pre/post addressing modes, but add a kind of "repair" pass after code selection to fix the regressions. This is what we did on our target. However, I am not sure if on ARM this could fix the code to catch again the multiple load/store instructions,

2- Disable this pass and add a new pass on machine instructions. The main advantage I see is that it can allow to look for pre/post patterns on regions larger than a DAG, which currently is a limitation. For our target, it would probably replace our "repair" pass.

Should I add a comment "Abandon Revision" to close this patch ?
Should I log a kind of "request for enhancement" (in Bugzilla ?) to keep a trace of this analysis ?

Regards,

-François.

Hi François,

Either way sounds good to me, depends what is most comfortable for you I guess.

See my comments inlined for few hints/remarks.

Diff 29157

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Context not available.
	continue;	continue;

	// Check for #1.	// Check for #1.
	bool TryNext = false;	// Look for a real use, i.e. one use that is not a load / store op, or one
		// that cannot be folded as addressing mode.
		bool TryNext = true;
	for (SDNode *Use : BasePtr.getNode()->uses()) {	for (SDNode *Use : BasePtr.getNode()->uses()) {
	if (Use == Ptr.getNode())	if (Use == Ptr.getNode())
	continue;	continue;

	// If all the uses are load / store addresses, then don't do the	// If one use is not a load / store address, then do the transformation.
	// transformation.	if (Use->getOpcode() == ISD::ADD \|\| Use->getOpcode() == ISD::SUB) {
	if (Use->getOpcode() == ISD::ADD \|\| Use->getOpcode() == ISD::SUB){
	bool RealUse = false;
	for (SDNode *UseUse : Use->uses()) {	for (SDNode *UseUse : Use->uses()) {
	if (!canFoldInAddressingMode(Use, UseUse, DAG, TLI))	if (!canFoldInAddressingMode(Use, UseUse, DAG, TLI)) {
	RealUse = true;	TryNext = false;
		break;
		}
	}	}

	if (!RealUse) {
	TryNext = true;
	break;
	}
	}	}
	}	}

Context not available.

test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll

Context not available.
	; CHECK: movw [[BASE:r[0-9]+]], :lower16:static_val	; CHECK: movw [[BASE:r[0-9]+]], :lower16:static_val
	; CHECK: movt [[BASE]], :upper16:static_val	; CHECK: movt [[BASE]], :upper16:static_val
	; ldm is not formed when the coalescer failed to coalesce everything.	; ldm is not formed when the coalescer failed to coalesce everything.
	; CHECK: ldrd r2, [[TMP:r[0-9]+]], {{\[}}[[BASE]]{{\]}}	; CHECK: ldr r2, {{\[}}[[BASE]]{{\]}}, #4
		; CHECK: ldr [[TMP:r[0-9]+]], {{\[}}[[BASE]]{{\]}}
		qcolombetUnsubmitted Not Done Reply Inline Actions This one is funny because we change: ld [@addr] ld [@addr, 4] into: ld [@addr], #4 ld [@addr] {reg used for @addr is redefined} And that prevents the load store optimizer to catch it. qcolombet: This one is funny because we change: ld [@addr] ld [@addr, 4] into: ld [@addr], #4 ld [@addr]…
	; CHECK: movw r0, #555	; CHECK: movw r0, #555
	define i32 @main() {	define i32 @main() {
	entry:	entry:
Context not available.
	; CHECK: movw [[BASE:r[0-9]+]], :lower16:static_val	; CHECK: movw [[BASE:r[0-9]+]], :lower16:static_val
	; CHECK: movt [[BASE]], :upper16:static_val	; CHECK: movt [[BASE]], :upper16:static_val
	; ldm is not formed when the coalescer failed to coalesce everything.	; ldm is not formed when the coalescer failed to coalesce everything.
	; CHECK: ldrd r2, [[TMP:r[0-9]+]], {{\[}}[[BASE]]{{\]}}	; CHECK: ldr r2, {{\[}}[[BASE]]{{\]}}, #4
		; CHECK: ldr [[TMP:r[0-9]+]], {{\[}}[[BASE]]{{\]}}
	; CHECK: movw r0, #555	; CHECK: movw r0, #555
	define i32 @main_fixed_arg() {	define i32 @main_fixed_arg() {
	entry:	entry:
Context not available.

test/CodeGen/ARM/2012-10-18-PR14099-ByvalFrameAddress.ll

Context not available.
	; CHECK-LABEL: caller:	; CHECK-LABEL: caller:
	define void @caller() {	define void @caller() {

	; CHECK: ldm r0, {r1, r2, r3}	; CHECK: ldr r3, [r0, #8]
		; CHECK: ldr r1, [r0], #4
		; CHECK: ldr r2, [r0]
		qcolombetUnsubmitted Not Done Reply Inline Actions Looks like the load store optimizer could use some improvement to catch this case (assuming r0 is not used afterward). qcolombet: Looks like the load store optimizer could use some improvement to catch this case (assuming r0…
	call void @t(i32 0, %struct.s* @v);	call void @t(i32 0, %struct.s* @v);
	ret void	ret void
	}	}
Context not available.

test/CodeGen/ARM/2013-01-21-PR14992.ll

Context not available.
	%2 = load i32, i32* %arrayidx2, align 4	%2 = load i32, i32* %arrayidx2, align 4
	%add.ptr = getelementptr inbounds i32, i32* %a, i32 3	%add.ptr = getelementptr inbounds i32, i32* %a, i32 3
	;Make sure we do not have a duplicated register in the front of the reg list	;Make sure we do not have a duplicated register in the front of the reg list
	;EXPECTED: ldm [[BASE:r[0-9]+]]!, {[[REG:r[0-9]+]], {{r[0-9]+}},	;EXPECTED: ldr [[REG:r[0-9]+]], [{{r[0-9]+}}
	;CHECK-NOT: ldm [[BASE:r[0-9]+]]!, {[[REG:r[0-9]+]], [[REG]],	;CHECK: ldr [[REG:r[0-9]+]], [{{r[0-9]+}}
		;CHECK-NOT: ldr [[REG]], [{{r[0-9]+}}
		qcolombetUnsubmitted Not Done Reply Inline Actions This one is worrisome as I expect it may expose runtime regressions and because it seems plausible to happen frequently. Without: ldm r0!, {r4, r5, r6} bl bar With: ldr r4, [r0, #4] ldr r5, [r0, #8] ldr r6, [r0], #12 bl bar The problem here is that after register allocation, without recoloring and with the support of PRE and POST automod, we won’t be able to form a ldm since the registers are not in the right order (r6, r4, r5, instead of r4, r5, r6). The pre- regalloc load/store optimizer could be taught to reorder them to make that possible but that does not seem trivial work. qcolombet: This one is worrisome as I expect it may expose runtime regressions and because it seems…
	tail call void @bar(i32* %add.ptr) nounwind optsize	tail call void @bar(i32* %add.ptr) nounwind optsize
	%add = add nsw i32 %1, %0	%add = add nsw i32 %1, %0
	%add3 = add nsw i32 %add, %2	%add3 = add nsw i32 %add, %2
Context not available.

test/CodeGen/ARM/automod-test.ll

				; Test that checks that automod addressing mode is selected
				;
				; RUN: llc -O2 < %s -march=arm \| FileCheck %s
				;
				; ======================================================
				; Without the fix, the generated code is the following :
				;
				; ldrh r3, [r2, #2]
				; strh r3, [r1, #-2]
				; ldrh r3, [r2]
				; sub r2, r2, #6
				; strh r3, [r1]
				; ldr r3, [r0], #48
				; add r1, r1, #6
				; cmp r3, #0
				; bne .LBB0_1
				;
				; With the patch, post modifying addressing modes are selected :
				;
				; ldrh r3, [r2, #2]
				; strh r3, [r1, #-2]
				; ldrh r3, [r2], #-6
				; strh r3, [r1], #6
				; ldr r3, [r0], #48
				; cmp r3, #0
				; bne .LBB0_1
				; ======================================================
				qcolombetUnsubmitted Not Done Reply Inline Actions Looks like we could improve the load/store optimizer to catch those cases as well. qcolombet: Looks like we could improve the load/store optimizer to catch those cases as well.

				@input_tab64 = common global [32 x i16] zeroinitializer, align 2
				@output_tab64 = common global [32 x i16] zeroinitializer, align 2

				; Function Attrs: nounwind
				define void @compute(i32* nocapture readonly %IDX) #0 {
				entry:
				qcolombetUnsubmitted Not Done Reply Inline Actions Add a CHECK-LABEL line. qcolombet: Add a CHECK-LABEL line.
				; CHECK-LABEL: compute:

				%0 = load i32, i32* %IDX, align 4
				%tobool14 = icmp eq i32 %0, 0
				br i1 %tobool14, label %for.end, label %for.body

				for.body: ; preds = %entry, %for.body
				%i.015 = phi i32 [ %add8, %for.body ], [ 0, %entry ]
				%sub = sub nsw i32 32, %i.015
				%sub1 = add nsw i32 %sub, -1
				%arrayidx2 = getelementptr inbounds [32 x i16], [32 x i16]* @input_tab64, i32 0, i32 %sub1
				%1 = load i16, i16* %arrayidx2, align 2
				%arrayidx3 = getelementptr inbounds [32 x i16], [32 x i16]* @output_tab64, i32 0, i32 %i.015
				store i16 %1, i16* %arrayidx3, align 2
				%sub5 = add nsw i32 %sub, -2
				%arrayidx6 = getelementptr inbounds [32 x i16], [32 x i16]* @input_tab64, i32 0, i32 %sub5
				%2 = load i16, i16* %arrayidx6, align 2
				%add = add nsw i32 %i.015, 1
				%arrayidx7 = getelementptr inbounds [32 x i16], [32 x i16]* @output_tab64, i32 0, i32 %add
				store i16 %2, i16* %arrayidx7, align 2
				%add8 = add nsw i32 %i.015, 3
				%shl = shl i32 %add8, 2
				%arrayidx = getelementptr inbounds i32, i32* %IDX, i32 %shl
				%3 = load i32, i32* %arrayidx, align 4
				%tobool = icmp eq i32 %3, 0
				br i1 %tobool, label %for.end, label %for.body

				; CHECK: ldrh r{{[0-9]+}}, [r{{[0-9]+}}], #-6
				; CHECK: strh r{{[0-9]+}}, [r{{[0-9]+}}], #6

				for.end: ; preds = %for.body, %entry
				ret void
				}

test/CodeGen/ARM/avoid-cpsr-rmw.ll

Context not available.
	while.body:	while.body:
	; CHECK: while.body	; CHECK: while.body
	; CHECK: mul r{{[0-9]+}}	; CHECK: mul r{{[0-9]+}}
	; CHECK-NOT: muls
	%ptr1.addr.09 = phi i32* [ %add.ptr, %while.body ], [ %ptr1, %entry ]	%ptr1.addr.09 = phi i32* [ %add.ptr, %while.body ], [ %ptr1, %entry ]
	qcolombetUnsubmitted Not Done Reply Inline Actions Could you comment why this check is failing now? I am not saying this is wrong, I’d like to check we are not missing something. qcolombet: Could you comment why this check is failing now? I am not saying this is wrong, I’d like to…
	qcolombetUnsubmitted Not Done Reply Inline Actions Never mind me, I found the answer in an exchange we had offline. For the record, the assembly code generated without and with your patch are the following : Without the patch ldr.w r9, [r0] ldr r3, [r0, #4] ldr r2, [r0, #8] ldr.w r12, [r0, #12] adds r0, #16 <— after this point, having muls would be wrong. mul r3, r3, r9 mul r2, r3, r2 mul r2, r2, r12 With the patch ldr.w r12, [r0, #4] ldr r3, [r0, #8] ldr.w r9, [r0, #12] ldr r2, [r0], #16 mul r2, r12, r2 muls r2, r3, r2 <— Now the muls is necessary, since we got rid of the adds. mul r2, r2, r9 qcolombet: Never mind me, I found the answer in an exchange we had offline. For the record, the assembly…
	%ptr2.addr.08 = phi i32* [ %incdec.ptr, %while.body ], [ %ptr2, %entry ]	%ptr2.addr.08 = phi i32* [ %incdec.ptr, %while.body ], [ %ptr2, %entry ]
	%0 = load i32, i32* %ptr1.addr.09, align 4	%0 = load i32, i32* %ptr1.addr.09, align 4
Context not available.

test/CodeGen/ARM/byval_load_align.ll

Context not available.
	; rdar://15144402	; rdar://15144402
	; Make sure we don't assume 4-byte alignment when loading from a byval argument	; Make sure we don't assume 4-byte alignment when loading from a byval argument
	; with alignment of 2.	; with alignment of 2.
	; CHECK: ldr r1, [r[[REG:[0-9]+]]]	; CHECK: ldr r3, [r[[REG:[0-9]+]], #8]
	; CHECK: ldr r2, [r[[REG]], #4]	; CHECK: ldr r1, [r[[REG]]], #4
	; CHECK: ldr r3, [r[[REG]], #8]	; CHECK: ldr r2, [r[[REG]]]
	; CHECK-NOT: ldm	; CHECK-NOT: ldm
	; CHECK: .align 1 @ @sID	; CHECK: .align 1 @ @sID

Context not available.

test/CodeGen/ARM/ldrd.ll

Context not available.
	;	;
	; BASIC: @f	; BASIC: @f
	; BASIC: %bb	; BASIC: %bb
	; BASIC: ldrd	; BASIC: ldr
	; BASIC: str	; BASIC: str
	; GREEDY: @f	; GREEDY: @f
	; GREEDY: %bb	; GREEDY: %bb
	; GREEDY: ldrd	; GREEDY: ldr
	; GREEDY: str	; GREEDY: str
	define void @f(i32* nocapture %a, i32* nocapture %b, i32 %n) nounwind {	define void @f(i32* nocapture %a, i32* nocapture %b, i32 %n) nounwind {
	entry:	entry:
Context not available.

test/CodeGen/ARM/truncstore-dag-combine.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+v4t %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+v4t %s -o - \| FileCheck %s

				; XFAIL: *
				; PR 24049

	define void @bar(i8* %P, i16* %Q) {			define void @bar(i8* %P, i16* %Q) {
	entry:			entry:
	%P1 = bitcast i8* %P to i16* ; <i16*> [#uses=1]			%P1 = bitcast i8* %P to i16* ; <i16*> [#uses=1]

test/CodeGen/ARM/unaligned_load_store.ll

Context not available.
	; rdar://7113725	; rdar://7113725
	; rdar://12091029	; rdar://12091029

		; XFAIL: *
		; PR 24049

	define void @t(i8* nocapture %a, i8* nocapture %b) nounwind {	define void @t(i8* nocapture %a, i8* nocapture %b) nounwind {
	entry:	entry:
	; EXPANDED-LABEL: t:	; EXPANDED-LABEL: t:
Context not available.

test/CodeGen/ARM/wrong-t2stmia-size-opt.ll

Context not available.

	; Check that stm writes two registers. The bug caused one of registers (LR,	; Check that stm writes two registers. The bug caused one of registers (LR,
	; which invalid for Thumb1 form of STMIA instruction) to be dropped.	; which invalid for Thumb1 form of STMIA instruction) to be dropped.
	; CHECK: stm{{[^,]}}, {{{.,.*}}}	; CHECK: str r1
		; CHECK: str.w lr
Context not available.
		qcolombetUnsubmitted Not Done Reply Inline Actions We miss that we can express this as: [@addr], #8 [@addr, -#4] That said, the stored registers are not in the right order after reg alloc to be able to recognize that this is stm @addr!, {r1, r2}. qcolombet: We miss that we can express this as: [@addr], #8 [@addr, -#4] That said, the stored registers…

This is an archive of the discontinued LLVM Phabricator instance.

Fix CombineToPostIndexedLoadStore in DAGCombiner.cpp
Needs RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 29157

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll

test/CodeGen/ARM/2012-10-18-PR14099-ByvalFrameAddress.ll

test/CodeGen/ARM/2013-01-21-PR14992.ll

test/CodeGen/ARM/automod-test.ll

test/CodeGen/ARM/avoid-cpsr-rmw.ll

test/CodeGen/ARM/byval_load_align.ll

test/CodeGen/ARM/ldrd.ll

test/CodeGen/ARM/truncstore-dag-combine.ll

test/CodeGen/ARM/unaligned_load_store.ll

test/CodeGen/ARM/wrong-t2stmia-size-opt.ll

This is an archive of the discontinued LLVM Phabricator instance.

Fix CombineToPostIndexedLoadStore in DAGCombiner.cppNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 29157

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll

test/CodeGen/ARM/2012-10-18-PR14099-ByvalFrameAddress.ll

test/CodeGen/ARM/2013-01-21-PR14992.ll

test/CodeGen/ARM/automod-test.ll

test/CodeGen/ARM/avoid-cpsr-rmw.ll

test/CodeGen/ARM/byval_load_align.ll

test/CodeGen/ARM/ldrd.ll

test/CodeGen/ARM/truncstore-dag-combine.ll

test/CodeGen/ARM/unaligned_load_store.ll

test/CodeGen/ARM/wrong-t2stmia-size-opt.ll

Fix CombineToPostIndexedLoadStore in DAGCombiner.cpp
Needs RevisionPublic