This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Compute known bits of CopyFromReg
ClosedPublic

Authored by piotr on Mar 19 2019, 1:15 AM.

Download Raw Diff

Details

Reviewers

bogner
craig.topper
RKSimon

Commits

rG0376ac1d9464: [SelectionDAG] Compute known bits of CopyFromReg
rL357745: [SelectionDAG] Compute known bits of CopyFromReg

Summary

Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.

This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.

Also add a missing truncation on X86, found by testing of this patch.

Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa

Diff Detail

Repository: rL LLVM

Event Timeline

piotr created this revision.Mar 19 2019, 1:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2019, 1:15 AM

Herald added subscribers: llvm-commits, jdoerfert, jsji and 4 others. · View Herald Transcript

Harbormaster completed remote builds in B29340: Diff 191258.Mar 19 2019, 1:19 AM

piotr added a reviewer: bogner.Mar 19 2019, 1:22 AM

lebedev.ri added reviewers: craig.topper, RKSimon.Mar 19 2019, 1:23 AM

piotr added a reviewer: RKSimon.Mar 19 2019, 1:23 AM

lebedev.ri added a subscriber: lebedev.ri.Mar 19 2019, 1:26 AM

lebedev.ri added inline comments.

lib/Target/X86/X86ISelLowering.cpp
19513 ↗	(On Diff #191258)	Does it trigger any tests without the rest of the diff? Does this have to be in the same diff as the other part of the diff?

piotr marked an inline comment as done.Mar 19 2019, 2:29 AM

piotr added inline comments.

lib/Target/X86/X86ISelLowering.cpp
19513 ↗	(On Diff #191258)	The change in X86ISelLowering.cpp, when applied on its own, does not trigger on any of the existing tests. However, when applying the rest of the diff, without the change in X86ISelLowering.cpp, there are two assertions thrown in CodeGen/X86/early-ifcvt.ll and CodeGen/X86/switch.ll. Therefore I think the rest of the diff is necessary to trigger the new condition that is handled here.

RKSimon added inline comments.Mar 19 2019, 5:28 AM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3181 ↗	(On Diff #191258)	auto R = cast<RegisterSDNode>(Op.getOperand(1));
lib/Target/X86/X86ISelLowering.cpp
19513 ↗	(On Diff #191258)	DAG.getAnyExtOrTrunc ?
test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll
146 ↗	(On Diff #191258)	Please commit this with trunk's current codegen and then rebase the patch to show the diff.

piotr marked an inline comment as done.Mar 19 2019, 5:40 AM

piotr added inline comments.

test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll
146 ↗	(On Diff #191258)	Just to make sure I understand what is being asked: you would like to see the raw output of the new tests with 1) the current trunk and 2) version with the patch?

RKSimon added inline comments.Mar 19 2019, 6:06 AM

test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll
146 ↗	(On Diff #191258)	Yes, so you commit the new tests with checks for current codegen by trunk. Then this patch just shows the codegen delta

Addressed review comments, extracted new tests and rebased.

Harbormaster completed remote builds in B29488: Diff 191854.Mar 22 2019, 4:07 AM

piotr marked an inline comment as done.Mar 22 2019, 4:15 AM

piotr added inline comments.

test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll
146 ↗	(On Diff #191258)	https://reviews.llvm.org/D59690 created. I will rebase this patch when D59690 gets pushed.

Rebased.

Harbormaster completed remote builds in B29735: Diff 192574.Mar 28 2019, 12:49 AM

Ping.

LGTM - are you intending to do the same for ComputeNumSignBits ?

This revision is now accepted and ready to land.Apr 4 2019, 4:18 AM

In D59535#1454693, @RKSimon wrote:

LGTM - are you intending to do the same for ComputeNumSignBits ?

Don't sign bits for live ins usually become an AssertSExt?

Closed by commit rL357745: [SelectionDAG] Compute known bits of CopyFromReg (authored by piotr). · Explain WhyApr 5 2019, 12:42 AM

This revision was automatically updated to reflect the committed changes.

@RKSimon No, I do not intend to modify ComputeNumSignBits at this time.

Hello

We are seeing a lot of codesize increases and performance changes from this patch, unfortunately. Consider this code, which I believe is fairly common in an embedded context:

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv6m-arm-none-eabi"

; Function Attrs: minsize nounwind optsize
define dso_local i32 @C(i32 %x, i32* nocapture %y) local_unnamed_addr #0 {
entry:
  br label %for.cond

for.cond:                                         ; preds = %B.exit, %entry
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %B.exit ]
  %exitcond = icmp eq i32 %i.0, 128
  br i1 %exitcond, label %for.end, label %for.body

for.body:                                         ; preds = %for.cond
  %mul = shl i32 %i.0, 2
  %add = add i32 %mul, %x
  store volatile i32 0, i32* inttoptr (i32 1342226444 to i32*), align 4
  store volatile i32 %add, i32* inttoptr (i32 1342226436 to i32*), align 4
  store volatile i32 1, i32* inttoptr (i32 1342226448 to i32*), align 16
  tail call void @llvm.arm.isb(i32 15) #1
  br label %while.cond.i

while.cond.i:                                     ; preds = %while.cond.i, %for.body
  %0 = load volatile i32, i32* inttoptr (i32 1342226448 to i32*), align 16
  %tobool.i = icmp eq i32 %0, 0
  br i1 %tobool.i, label %B.exit, label %while.cond.i

B.exit:                                           ; preds = %while.cond.i
  %1 = load volatile i32, i32* inttoptr (i32 1342226440 to i32*), align 8
  %arrayidx = getelementptr inbounds i32, i32* %y, i32 %i.0
  store i32 %1, i32* %arrayidx, align 4
  %inc = add nuw nsw i32 %i.0, 1
  br label %for.cond

for.end:                                          ; preds = %for.cond
  ret i32 0
}

; Function Attrs: nounwind
declare void @llvm.arm.isb(i32) #1

attributes #0 = { minsize nounwind optsize "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-m0plus" "target-features"="+armv6-m,+strict-align,+thumb-mode,-crc,-dotprod,-dsp,-fp16fml,-hwdiv,-hwdiv-arm,-ras" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind }

Before this patch, constant hoist would take those constants addresses, which refer to memory mapped registers/IO, and turns them into a base constant plus a number of inttoptr casts. That way there is only one base, plus a number of offsets that are turned into STR rn, [base, offset].

With this new patch, it just sees straight through the CopyFromReg though! (Technically, it turns an ADD to an OR, to a constant). So we end up with multiple bases, each have to be materialised from a constant pool.

There are other things that constant hoist and CGP to in a similar manor, to create base+offset for architectures like Thumb where the offset range is fairly small and materialising constants is not always easy.

So although I like the idea of this patch, it seems to be breaking some optimisations that are relying on this not happening. Why we don't have tests that show any of that, I'm not sure..

It's indeed unfortunate that you saw a regression with my patch. The patch makes SelectionDAG see more context, so it provides more optimization opportunities. Arguably, the fact that you see worse code now is a limitation of the optimization you mentioned that has become visible only now. Have you checked whether the problematic opt could be restricted (possibly only on targets not benefiting from the broader scope of the SelectionDAG)?

The patch makes SelectionDAG see more context, so it provides more optimization opportunities.

I think it may be more accurate to say that this breaks a whole class of existing optimisations. They were designed under the expectation that selection dag wouldn't do this. You can argue whether that was a sensible design or not, but it's the state we are in right now.

Do you have examples of this performing useful optimisations? Most of the tests here don't show any changes from what I would call sensible code. I would expect the IR transforms to have handled most things like this already (or being doing the split into different BB's deliberately).

I think the most sensible approach is to revert this for now (and hopefully add some more tests!). It seems that this is new info, that wasn't taken into account in the creation/review of this patch, so unless you can fix this quickly we need to prevent breaking the old code. I'm a little chagrined to do that, because there are some improvements in the results I've seen, there are just outweighed by all the regressions. I'm happy to try and help out with extra testing.

Yes, we see improvements on the AMDGPU target.

I think it may be more accurate to say that this breaks a whole class of existing optimisations.

We need a test for each of these cases. Otherwise we risk having the same problem all over again.

@dmgreen Please revert the patch and add new tests in a separate review. I would prefer if you added the new tests first, but it is not essential.

Thanks. I've reverted in r358113 and will attempt to add some extra tests in.

We need a test for each of these cases. Otherwise we risk having the same problem all over again.

Definitely agree.

I guess this is something that will be fixed in global isel. In the meantime, perhaps we could do with something that is more specific about not being visible though, more than just a bitcast. Happy to help if I can, just let me know.

OK, thanks!

In D59535#1461651, @dmgreen wrote:

Thanks. I've reverted in r358113 and will attempt to add some extra tests in.

Yes, please add tests (ideally for x86 too.).

We need a test for each of these cases. Otherwise we risk having the same problem all over again.

Definitely agree.

I guess this is something that will be fixed in global isel. In the meantime, perhaps we could do with something that is more specific about not being visible though, more than just a bitcast. Happy to help if I can, just let me know.

D60294 caught my attention, that may be a related fix.

lebedev.ri mentioned this in D60294: [DAGCombiner] [CodeGenPrepare] Split large offsets from base addresses.Apr 11 2019, 5:49 AM

Just FYI.

With this patch, llvm generates wrong instructions for one of our internal applications. And it causes debug version compiler crash in X86DAGToDAGISel::getAddressOperands, when it computes load address of jump table, it got an AM with non-zero disp, it is unexpected by getAddressOperands. I'm not sure if it is the same problem as described in https://reviews.llvm.org/rL358113, or a separate issue.

Unfortunately I don't have a small test case to reproduce it, the original code is large and difficult to be simplified.

@Carrot Thanks for letting me know.

@dmgreen Do you have an ETA on the new tests for the problematic cases you had noticed?

Hello. I added a constant address case in rL358114, and a couple of other tests for minsize in rL358128. I was trying to look into X86 tests, but the addressing modes being so extensive makes it difficult. There will possibly be some RISCV testcases out of D60294 too.

One of the problems is that I think, because this was working on demand bits, it was actually converting something from aaaaaaxx + xxxxxxbb -> aaaaaaxx | xxxxxxbb and from there computing aaaaaabb directly (if you understand my meaning, x's are unknowns). If there wasn't the right seperation of bits, this may not have come up. Hence the existing test cases not catching things, but it coming up quite a few times in practice.

Hopefully the new tests are at least enough to show some problem cases. Let me know if not, or if there's any other way I can help.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

20 lines

Target/

X86/

X86ISelLowering.cpp

6 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.s.buffer.load.ll

10 lines

ARM/

atomic-op.ll

8 lines

PowerPC/

pr35688.ll

16 lines

SystemZ/

subregliveness-04.ll

2 lines

X86/

fold-tied-op.ll

2 lines

pr28444.ll

5 lines

Diff 193841

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 25 Lines
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineConstantPool.h"		#include "llvm/CodeGen/MachineConstantPool.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineMemOperand.h"		#include "llvm/CodeGen/MachineMemOperand.h"
		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/RuntimeLibcalls.h"		#include "llvm/CodeGen/RuntimeLibcalls.h"
#include "llvm/CodeGen/SelectionDAGAddressAnalysis.h"		#include "llvm/CodeGen/SelectionDAGAddressAnalysis.h"
#include "llvm/CodeGen/SelectionDAGNodes.h"		#include "llvm/CodeGen/SelectionDAGNodes.h"
#include "llvm/CodeGen/SelectionDAGTargetInfo.h"		#include "llvm/CodeGen/SelectionDAGTargetInfo.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"		#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"		#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/CodeGen/ValueTypes.h"		#include "llvm/CodeGen/ValueTypes.h"
▲ Show 20 Lines • Show All 3,155 Lines • ▼ Show 20 Lines	case ISD::SMAX: {
// Fallback - just get the shared known bits of the operands.		// Fallback - just get the shared known bits of the operands.
Known = computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);		Known = computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);
if (Known.isUnknown()) break; // Early-out		if (Known.isUnknown()) break; // Early-out
Known2 = computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);		Known2 = computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
Known.Zero &= Known2.Zero;		Known.Zero &= Known2.Zero;
Known.One &= Known2.One;		Known.One &= Known2.One;
break;		break;
}		}
		case ISD::CopyFromReg: {
		auto R = cast<RegisterSDNode>(Op.getOperand(1));
		const unsigned Reg = R->getReg();

		const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
		if (!TRI->isVirtualRegister(Reg))
		break;

		const MachineRegisterInfo *MRI = &MF->getRegInfo();
		if (!MRI->hasOneDef(Reg))
		break;

		const FunctionLoweringInfo::LiveOutInfo *LOI = FLI->GetLiveOutRegInfo(Reg);
		if (!LOI \|\| LOI->Known.getBitWidth() != BitWidth)
		break;

		Known = LOI->Known;
		break;
		}
case ISD::FrameIndex:		case ISD::FrameIndex:
case ISD::TargetFrameIndex:		case ISD::TargetFrameIndex:
TLI->computeKnownBitsForFrameIndex(Op, Known, DemandedElts, *this, Depth);		TLI->computeKnownBitsForFrameIndex(Op, Known, DemandedElts, *this, Depth);
break;		break;

default:		default:
if (Opcode < ISD::BUILTIN_OP_END)		if (Opcode < ISD::BUILTIN_OP_END)
break;		break;
▲ Show 20 Lines • Show All 6,286 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 19,574 Lines • ▼ Show 20 Lines	static SDValue LowerAndToBT(SDValue And, ISD::CondCode CC,
// See if we can use the 32-bit instruction instead of the 64-bit one for a		// See if we can use the 32-bit instruction instead of the 64-bit one for a
// shorter encoding. Since the former takes the modulo 32 of BitNo and the		// shorter encoding. Since the former takes the modulo 32 of BitNo and the
// latter takes the modulo 64, this is only valid if the 5th bit of BitNo is		// latter takes the modulo 64, this is only valid if the 5th bit of BitNo is
// known to be zero.		// known to be zero.
if (Src.getValueType() == MVT::i64 &&		if (Src.getValueType() == MVT::i64 &&
DAG.MaskedValueIsZero(BitNo, APInt(BitNo.getValueSizeInBits(), 32)))		DAG.MaskedValueIsZero(BitNo, APInt(BitNo.getValueSizeInBits(), 32)))
Src = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, Src);		Src = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, Src);

// If the operand types disagree, extend the shift amount to match. Since		// If the operand types disagree, extend or truncate the shift amount to match.
// BT ignores high bits (like shifts) we can use anyextend.		// Since BT ignores high bits (like shifts) we can use anyextend for the extension.
if (Src.getValueType() != BitNo.getValueType())		if (Src.getValueType() != BitNo.getValueType())
BitNo = DAG.getNode(ISD::ANY_EXTEND, dl, Src.getValueType(), BitNo);		BitNo = DAG.getAnyExtOrTrunc(BitNo, dl, Src.getValueType());

X86CC = DAG.getConstant(CC == ISD::SETEQ ? X86::COND_AE : X86::COND_B,		X86CC = DAG.getConstant(CC == ISD::SETEQ ? X86::COND_AE : X86::COND_B,
dl, MVT::i8);		dl, MVT::i8);
return DAG.getNode(X86ISD::BT, dl, MVT::i32, Src, BitNo);		return DAG.getNode(X86ISD::BT, dl, MVT::i32, Src, BitNo);
}		}

/// Turns an ISD::CondCode into a value suitable for SSE floating-point mask		/// Turns an ISD::CondCode into a value suitable for SSE floating-point mask
/// CMPs.		/// CMPs.
▲ Show 20 Lines • Show All 24,367 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	main_body:
%z = bitcast i32 %load2 to float		%z = bitcast i32 %load2 to float
%w = bitcast i32 %load3 to float		%w = bitcast i32 %load3 to float
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %x, float %y, float %z, float %w, i1 true, i1 true)		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %x, float %y, float %z, float %w, i1 true, i1 true)
ret void		ret void
}		}

;CHECK-LABEL: {{^}}s_buffer_load_index_across_bb:		;CHECK-LABEL: {{^}}s_buffer_load_index_across_bb:
;CHECK-NOT: s_waitcnt;		;CHECK-NOT: s_waitcnt;
;CHECK: v_or_b32		;CHECK-NOT: v_or_b32
;CHECK: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen		;CHECK: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen offset:8
define amdgpu_ps void @s_buffer_load_index_across_bb(<4 x i32> inreg %desc, i32 %index) {		define amdgpu_ps void @s_buffer_load_index_across_bb(<4 x i32> inreg %desc, i32 %index) {
main_body:		main_body:
%tmp = shl i32 %index, 4		%tmp = shl i32 %index, 4
br label %bb1		br label %bb1

bb1: ; preds = %main_body		bb1: ; preds = %main_body
%tmp1 = or i32 %tmp, 8		%tmp1 = or i32 %tmp, 8
%load = call i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32> %desc, i32 %tmp1, i32 0)		%load = call i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32> %desc, i32 %tmp1, i32 0)
%bitcast = bitcast i32 %load to float		%bitcast = bitcast i32 %load to float
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %bitcast, float undef, float undef, float undef, i1 true, i1 true)		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %bitcast, float undef, float undef, float undef, i1 true, i1 true)
ret void		ret void
}		}

;CHECK-LABEL: {{^}}s_buffer_load_index_across_bb_merged:		;CHECK-LABEL: {{^}}s_buffer_load_index_across_bb_merged:
;CHECK-NOT: s_waitcnt;		;CHECK-NOT: s_waitcnt;
;CHECK: v_or_b32		;CHECK-NOT: v_or_b32
;CHECK: v_or_b32		;CHECK: buffer_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen offset:8
;CHECK: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen
;CHECK: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen
define amdgpu_ps void @s_buffer_load_index_across_bb_merged(<4 x i32> inreg %desc, i32 %index) {		define amdgpu_ps void @s_buffer_load_index_across_bb_merged(<4 x i32> inreg %desc, i32 %index) {
main_body:		main_body:
%tmp = shl i32 %index, 4		%tmp = shl i32 %index, 4
br label %bb1		br label %bb1

bb1: ; preds = %main_body		bb1: ; preds = %main_body
%tmp1 = or i32 %tmp, 8		%tmp1 = or i32 %tmp, 8
%load = call i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32> %desc, i32 %tmp1, i32 0)		%load = call i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32> %desc, i32 %tmp1, i32 0)
Show All 12 Lines

llvm/trunk/test/CodeGen/ARM/atomic-op.ll

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	entry:
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%14 = atomicrmw umax i32* %val2, i32 0 monotonic		%14 = atomicrmw umax i32* %val2, i32 0 monotonic
store i32 %14, i32* %old		store i32 %14, i32* %old

ret void		ret void
}		}

define void @func2() nounwind {		define void @func2(i16 %int_val) nounwind {
entry:		entry:
%val = alloca i16		%val = alloca i16
%old = alloca i16		%old = alloca i16
store i16 31, i16* %val		store i16 %int_val, i16* %val
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umin_2		; CHECK-T1: bl ___sync_fetch_and_umin_2
; CHECK-T1-M0: bl ___sync_fetch_and_umin_2		; CHECK-T1-M0: bl ___sync_fetch_and_umin_2
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%0 = atomicrmw umin i16* %val, i16 16 monotonic		%0 = atomicrmw umin i16* %val, i16 16 monotonic
store i16 %0, i16* %old		store i16 %0, i16* %old
%uneg = sub i16 0, 1		%uneg = sub i16 0, 2
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umin_2		; CHECK-T1: bl ___sync_fetch_and_umin_2
; CHECK-T1-M0: bl ___sync_fetch_and_umin_2		; CHECK-T1-M0: bl ___sync_fetch_and_umin_2
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%1 = atomicrmw umin i16* %val, i16 %uneg monotonic		%1 = atomicrmw umin i16* %val, i16 %uneg monotonic
Show All 35 Lines	entry:
store i8 %0, i8* %old		store i8 %0, i8* %old
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umin_1		; CHECK-T1: bl ___sync_fetch_and_umin_1
; CHECK-T1-M0: bl ___sync_fetch_and_umin_1		; CHECK-T1-M0: bl ___sync_fetch_and_umin_1
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%uneg = sub i8 0, 1		%uneg = sub i8 0, 2
%1 = atomicrmw umin i8* %val, i8 %uneg monotonic		%1 = atomicrmw umin i8* %val, i8 %uneg monotonic
store i8 %1, i8* %old		store i8 %1, i8* %old
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umax_1		; CHECK-T1: bl ___sync_fetch_and_umax_1
; CHECK-T1-M0: bl ___sync_fetch_and_umax_1		; CHECK-T1-M0: bl ___sync_fetch_and_umax_1
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/pr35688.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -enable-mssa-loop-dependency=false -verify-machineinstrs -mtriple=powerpc64le-unknown-unknown < %s \| \			; RUN: llc -enable-mssa-loop-dependency=false -verify-machineinstrs -mtriple=powerpc64le-unknown-unknown < %s \| \
	; RUN: FileCheck %s			; RUN: FileCheck %s
	; RUN: llc -enable-mssa-loop-dependency=true -verify-machineinstrs -mtriple=powerpc64le-unknown-unknown < %s \| \			; RUN: llc -enable-mssa-loop-dependency=true -verify-machineinstrs -mtriple=powerpc64le-unknown-unknown < %s \| \
	; RUN: FileCheck %s --check-prefix=MSSA			; RUN: FileCheck %s --check-prefix=MSSA
	; Function Attrs: nounwind			; Function Attrs: nounwind
	define void @ec_GFp_nistp256_points_mul() {			define void @ec_GFp_nistp256_points_mul() {
	; CHECK-LABEL: ec_GFp_nistp256_points_mul:			; CHECK-LABEL: ec_GFp_nistp256_points_mul:
	; CHECK: ld 5, 0(3)			; CHECK: ld 4, 0(3)
	; CHECK: li 3, 127			; CHECK: li 3, 0
	; CHECK: li 4, 0			; CHECK: subfic 5, 4, 0
	; CHECK: subfic 6, 5, 0			; CHECK: subfze 5, 3
	; CHECK: subfze 6, 4
	; CHECK: sradi 7, 6, 63
	; CHECK: srad 6, 6, 3
	; CHECK: subfc 5, 5, 7
	; CHECK: subfe 5, 4, 6
	; CHECK: sradi 5, 5, 63			; CHECK: sradi 5, 5, 63
				; CHECK: subfc 4, 4, 5
				; CHECK: subfe 4, 3, 5
				; CHECK: sradi 4, 4, 63

	; With MemorySSA, everything is taken out of the loop by licm.			; With MemorySSA, everything is taken out of the loop by licm.
	; Loads and stores to undef are treated as non-aliasing.			; Loads and stores to undef are treated as non-aliasing.
	; MSSA-LABEL: ec_GFp_nistp256_points_mul			; MSSA-LABEL: ec_GFp_nistp256_points_mul
	; MSSA: ld 3, 0(3)			; MSSA: ld 3, 0(3)
	; MSSA: li 4, 0			; MSSA: li 4, 0
	; MSSA: subfic 5, 3, 0			; MSSA: subfic 5, 3, 0
	; MSSA: subfze 5, 4			; MSSA: subfze 5, 4
	Show All 24 Lines

llvm/trunk/test/CodeGen/SystemZ/subregliveness-04.ll

	; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z13 -disable-early-taildup -disable-cgp -systemz-subreg-liveness < %s \| FileCheck %s			; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z13 -disable-early-taildup -disable-cgp -systemz-subreg-liveness < %s \| FileCheck %s

	; Check for successful compilation.			; Check for successful compilation.
	; CHECK: lhi %r0, -5			; CHECK: lhi {{%r[0-9]+}}, -5

	target datalayout = "E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64"			target datalayout = "E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64"
	target triple = "s390x-ibm-linux"			target triple = "s390x-ibm-linux"

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define void @main() #0 {			define void @main() #0 {
	bb:			bb:
	%tmp = xor i8 0, -5			%tmp = xor i8 0, -5
	Show All 29 Lines

llvm/trunk/test/CodeGen/X86/fold-tied-op.ll

	; RUN: llc -verify-machineinstrs -mtriple=i386--netbsd < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=i386--netbsd < %s \| FileCheck %s
	; Regression test for http://reviews.llvm.org/D5701			; Regression test for http://reviews.llvm.org/D5701

	; ModuleID = 'xxhash.i'			; ModuleID = 'xxhash.i'
	target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"			target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
	target triple = "i386--netbsd"			target triple = "i386--netbsd"

	; CHECK-LABEL: fn1			; CHECK-LABEL: fn1
	; CHECK: orl {{.*#+}} 4-byte Folded Reload
	; CHECK: addl {{.*#+}} 4-byte Folded Reload			; CHECK: addl {{.*#+}} 4-byte Folded Reload
				; CHECK: orl {{.*#+}} 4-byte Folded Reload
	; CHECK: xorl {{.*#+}} 4-byte Folded Reload			; CHECK: xorl {{.*#+}} 4-byte Folded Reload
	; CHECK: xorl {{.*#+}} 4-byte Folded Reload			; CHECK: xorl {{.*#+}} 4-byte Folded Reload
	; CHECK: retl			; CHECK: retl

	%struct.XXH_state64_t = type { i32, i32, i64, i64, i64 }			%struct.XXH_state64_t = type { i32, i32, i64, i64, i64 }

	@a = common global i32 0, align 4			@a = common global i32 0, align 4
	@b = common global i64 0, align 8			@b = common global i64 0, align 8
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/pr28444.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 < %s \| FileCheck %s
	; https://llvm.org/bugs/show_bug.cgi?id=28444			; https://llvm.org/bugs/show_bug.cgi?id=28444

	; extract_vector_elt is allowed to have a different result type than			; extract_vector_elt is allowed to have a different result type than
	; the vector scalar type.			; the vector scalar type.
	; This uses both			; This uses both
	; i8 = extract_vector_elt v1i1, Constant:i64<0>			; i8 = extract_vector_elt v1i1, Constant:i64<0>
	; i1 = extract_vector_elt v1i1, Constant:i64<0>			; i1 = extract_vector_elt v1i1, Constant:i64<0>

	define void @extractelt_mismatch_vector_element_type(i32 %arg, i1 %x) {			define void @extractelt_mismatch_vector_element_type(i32 %arg, i1 %x) {
	; CHECK-LABEL: extractelt_mismatch_vector_element_type:			; CHECK-LABEL: extractelt_mismatch_vector_element_type:
	; CHECK: # %bb.0: # %bb			; CHECK: # %bb.0: # %bb
	; CHECK-NEXT: movb $1, %al			; CHECK-NEXT: movb $1, (%rax)
	; CHECK-NEXT: movb %al, (%rax)			; CHECK-NEXT: movb $1, (%rax)
	; CHECK-NEXT: movb %al, (%rax)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	bb:			bb:
	%tmp = icmp ult i32 %arg, 0			%tmp = icmp ult i32 %arg, 0
	%tmp2 = insertelement <1 x i1> undef, i1 true, i32 0			%tmp2 = insertelement <1 x i1> undef, i1 true, i32 0
	%f = insertelement <1 x i1> undef, i1 %x, i32 0			%f = insertelement <1 x i1> undef, i1 %x, i32 0
	%tmp3 = select i1 %tmp, <1 x i1> %f, <1 x i1> %tmp2			%tmp3 = select i1 %tmp, <1 x i1> %f, <1 x i1> %tmp2
	%tmp6 = extractelement <1 x i1> %tmp3, i32 0			%tmp6 = extractelement <1 x i1> %tmp3, i32 0
	br label %bb1			br label %bb1

	bb1:			bb1:
	store volatile <1 x i1> %tmp3, <1 x i1>* undef			store volatile <1 x i1> %tmp3, <1 x i1>* undef
	store volatile i1 %tmp6, i1* undef			store volatile i1 %tmp6, i1* undef
	ret void			ret void
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Compute known bits of CopyFromRegClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 193841

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll

llvm/trunk/test/CodeGen/ARM/atomic-op.ll

llvm/trunk/test/CodeGen/PowerPC/pr35688.ll

llvm/trunk/test/CodeGen/SystemZ/subregliveness-04.ll

llvm/trunk/test/CodeGen/X86/fold-tied-op.ll

llvm/trunk/test/CodeGen/X86/pr28444.ll

[SelectionDAG] Compute known bits of CopyFromReg
ClosedPublic