This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Fold redundant masking operations of shifted value
AbandonedPublic

Authored by dnsampaio on Jun 18 2018, 5:33 AM.

Download Raw Diff

Details

Reviewers

samparker
SjoerdMeijer
javed.absar
spatel
craig.topper
RKSimon
lebedev.ri
thakis
efriedma

Commits

rL336426: [SelectionDAG] https://reviews.llvm.org/D48278

Summary

Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB

can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.

Diff Detail

Event Timeline

dnsampaio created this revision.Jun 18 2018, 5:33 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJun 18 2018, 5:33 AM

dnsampaio added a parent revision: D46749: [SelectionDAG]Reduce masked data movement chains and memory access widths.Jun 18 2018, 5:34 AM

dnsampaio added a child revision: D47730: [SelectionDAG]Reduce masked data movement chains and memory access widths pt3.

Hi Diogo,

Some nitpick comments. Please add negative tests for shifts with multiple uses and where the shift and mask aren't by constants.

cheers,
sam

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4236	Why not check that you have a supported shift here and potentially exit early?
4275	Remove or uncomment.
4301	use LLVM_FALLTHROUGH
4310	remove
4324	I think you should be adding NewShift instead.
test/CodeGen/ARM/2018_05_29_FoldRedundantMask.ll
119 ↗	(On Diff #151700)	Would you mind updating the target triple to something more modern, like v7 and v7m, to check that the code generation is better here. This extra load concerns me a bit.

dnsampaio removed a parent revision: D46749: [SelectionDAG]Reduce masked data movement chains and memory access widths.Jul 4 2018, 8:47 AM

Replaced tests as to be not dependent in the load width reduction.
Added 1 positive test per case, and 2 negative tests, one where one mask is not a constant and other the shifted amount is not constant.

LGTM. For future reference, and before committing, arm and aarch64 tests live in different codegen directories so please separate the test into the two sub-directories. Thanks!

This revision is now accepted and ready to land.Jul 5 2018, 1:24 AM

AArch64 tests go into the folder AArch64

Give patch more context.

dnsampaio mentioned this in rL336426: [SelectionDAG] https://reviews.llvm.org/D48278.Jul 6 2018, 2:47 AM

dnsampaio added a commit: rL336426: [SelectionDAG] https://reviews.llvm.org/D48278.Jul 6 2018, 2:50 AM

This patch was reverted with rL336453 because it caused:
https://bugs.llvm.org/show_bug.cgi?id=38084

Beyond that, I don't understand the motivation. The patch increases the latency of a computation. Why is that beneficial? The x86 diff doesn't look like a win to me.

I don't know what the ARM/AArch output looked like before this patch. Always check in the baseline tests before making a code change, so we have that as a reference (and in case the patch is reverted, we won't lose the test coverage that motivated the code patch).

This revision now requires changes to proceed.Jul 8 2018, 8:44 AM

Like @spatel I'm not clear on what you're really trying to accomplish here - has the arm/arm64 codegen improved?

test/CodeGen/AArch64/FoldRedundantShiftedMasking.ll
2	There's no need to use a check-prefix, remove it and use CHECK
89	Confusing - please move the shl_nogood checks before the shl_nogood2 define
test/CodeGen/ARM/FoldRedundantShiftedMasking.ll
5	There's no need to use a check-prefix, remove it and use CHECK
90	Confusing - please move the shl_nogood checks before the shl_nogood2 define

In D48278#1155201, @spatel wrote:

This patch was reverted with rL336453 because it caused:
https://bugs.llvm.org/show_bug.cgi?id=38084

Sorry about that.

Beyond that, I don't understand the motivation. The patch increases the latency of a computation. Why is that beneficial? The x86 diff doesn't look like a win to me.

It reduces the number of computation operations, from 3 to 2, and the number of constants kept for performing the masking, from 2 to 1.
I don't see how it increases the latency. If you are going to perform the masking and the shift anyway.

I don't know what the ARM/AArch output looked like before this patch. Always check in the baseline tests before making a code change, so we have that as a reference (and in case the patch is reverted, we won't lose the test coverage that motivated the code patch).

See in line comments for code change, in both x86-64, AArch64 and ARM.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4236	All tests are on the header of the function already (except for the case ISD::ROTL). Making a single big if statement just makes it harder to understand and harder to someone add an exception of the invalid rule . Technically speaking, having one big test or one after the other is the same thing.
4301	Added LLVM_FALLTHROUGH. But gcc 8.1 still gives warning, so I'm also keeping the /* fall-through / /home/diosam01/LLVM/local/src/lib/CodeGen/SelectionDAG/DAGCombiner.cpp: In member function ‘llvm::SDValue {anonymous}::DAGCombiner::foldRedundantShiftedMasks(llvm::SDNode)’: /home/diosam01/LLVM/local/src/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:4180:18: warning: this statement may fall through [-Wimplicit-fallthrough=] N0Opcode = ISD::SRL; ~~~~~~~~~^~~~~ /home/diosam01/LLVM/local/src/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:4182:5: note: here case ISD::SRL: ^~~~
4324	No, NewShift and its users is added to the work list automatically at DAGCombiner.cpp lines 1483 and 1484. In the log, you can see that a transformed node is always visited again in the same combining phase. I want the other user of this value, as it might be the solely user left.
test/CodeGen/AArch64/FoldRedundantShiftedMasking.ll
4–13	Before patch: mov w9, #15 mov w8, #3855 movk w9, #3840, lsl #16 and w8, w0, w8 and w9, w9, w0, ror #8 orr w0, w9, w8 After patch: mov w8, #3855 and w8, w0, w8 orr w0, w8, w8, ror #8
19–27	Before patch: sxth w8, w0 mov w9, #172 mov w10, #44032 and w9, w8, w9 and w8, w10, w8, lsl #8 orr w0, w9, w8 After patch: mov w8, #172 and w8, w0, w8 orr w0, w8, w8, lsl #8
19–27	In x86-64: Before patch: movswl %di, %eax movl %eax, %ecx andl $172, %ecx shll $8, %eax andl $44032, %eax # imm = 0xAC00 orl %ecx, %eax After patch: andl $172, %edi movl %edi, %eax shll $8, %eax leal (%rax,%rdi), %eax
33–41	Before patch: sxth w8, w0 mov w9, #44032 mov w10, #172 and w9, w8, w9 and w8, w10, w8, lsr #8 orr w0, w9, w8 After patch: mov w8, #44032 and w8, w0, w8 orr w0, w8, w8, lsr #8
33–41	In x86-64 Before patch: movswl %di, %eax movl %eax, %ecx andl $44032, %ecx # imm = 0xAC00 shrl $8, %eax andl $172, %eax orl %ecx, %eax After patch: andl $44032, %edi # imm = 0xAC00 movl %edi, %eax shrl $8, %eax leal (%rax,%rdi), %eax
47–55	Before patch: sxth w8, w0 mov w9, #44032 mov w10, #172 and w9, w8, w9 and w8, w10, w8, lsr #8 orr w0, w9, w8 After patch: mov w8, #44032 and w8, w0, w8 orr w0, w8, w8, lsr #8
47–55	In x86-64 Before patch: movswl %di, %eax movl %eax, %ecx andl $44032, %ecx # imm = 0xAC00 shrl $8, %eax andl $172, %eax orl %ecx, %eax After patch: andl $44032, %edi # imm = 0xAC00 movl %edi, %eax shrl $8, %eax leal (%rax,%rdi), %eax
test/CodeGen/ARM/FoldRedundantShiftedMasking.ll
23	Before patch: mov r1, #15 mov r2, #15 orr r1, r1, #3840 orr r2, r2, #251658240 and r1, r0, r1 and r0, r2, r0, ror #8 orr r0, r0, r1 After patch: mov r1, #15 orr r1, r1, #3840 and r0, r0, r1 orr r0, r0, r0, ror #8
24–35	Before patch: lsl r0, r0, #16 mov r1, #172 and r0, r1, r0, asr #16 orr r0, r0, r0, lsl #8 After patch: and r0, r0, #172 orr r0, r0, r0, lsl #8
37–45	Before patch: lsl r0, r0, #16 mov r1, #44032 and r1, r1, r0, asr #16 asr r0, r0, #16 mov r2, #172 and r0, r2, r0, lsr #8 orr r0, r1, r0 After patch: and r0, r0, #44032 orr r0, r0, r0, lsr #8
50–58	Before lsl r0, r0, #16 mov r1, #44032 and r1, r1, r0, asr #16 asr r0, r0, #16 mov r2, #172 and r0, r2, r0, lsr #8 orr r0, r1, r0 After and r0, r0, #44032 orr r0, r0, r0, lsr #8

In D48278#1155230, @RKSimon wrote:

Like @spatel I'm not clear on what you're really trying to accomplish here - has the arm/arm64 codegen improved?

In the examples it reduces most required ARM instructions and constants by half in this examples, as the OR and SHIFT operations can be combined.

In D48278#1155560, @dnsampaio wrote:

It reduces the number of computation operations, from 3 to 2, and the number of constants kept int constants for performing the masking, from 2 to 1.
I don't see how it increases the latency. If you are going to perform the masking and the shift anyway.

Ah, I see that now. But I'm not convinced this is the right approach. Why are we waiting to optimize this in the backend? This is a universally good optimization, so it should be in IR:
https://rise4fun.com/Alive/O04

I'm not sure exactly where that optimization belongs. Ie, is it EarlyCSE, GVN, somewhere else, or is it its own pass? But I don't see any benefit in waiting to do this in the DAG.

In D48278#1155779, @spatel wrote:

In D48278#1155560, @dnsampaio wrote:

It reduces the number of computation operations, from 3 to 2, and the number of constants kept int constants for performing the masking, from 2 to 1.
I don't see how it increases the latency. If you are going to perform the masking and the shift anyway.

Ah, I see that now. But I'm not convinced this is the right approach. Why are we waiting to optimize this in the backend? This is a universally good optimization, so it should be in IR:
https://rise4fun.com/Alive/O04

I'm not sure exactly where that optimization belongs. Ie, is it EarlyCSE, GVN, somewhere else, or is it its own pass? But I don't see any benefit in waiting to do this in the DAG.

This also raises a question that has come up in another review recently - D41233. If we reverse the canonicalization of shl+and, we would solve the most basic case that I showed above:

define i32 @shl_first(i32 %a) {
  %t2 = shl i32 %a, 8
  %t3 = and i32 %t2, 44032
  ret i32 %t3
}

define i32 @mask_first(i32 %a) {
  %a2 = and i32 %a, 172
  %a3 = shl i32 %a2, 8
  ret i32 %a3
}

Added checks for shift opcodes as to early exit if not found.
Validate mask widths (although it would be a error in the code if they are of different).

spatel mentioned this in D41233: [InstCombine] Canonizing 'and' before 'shl'.Jul 9 2018, 7:22 AM

Ah, I see that now. But I'm not convinced this is the right approach. Why are we waiting to optimize this in the backend? This is a universally good optimization, so it should be in IR:

Agree, I also intend to implement this transformation in the IR. But there are cases that this is only seen after some instructions have been combined in the dag, so why not here also? And indeed, it is a requirement for a future patch that detects opportunities to reduce load and store widths.

I'm not sure exactly where that optimization belongs. Ie, is it EarlyCSE, GVN, somewhere else, or is it its own pass? But I don't see any benefit in waiting to do this in the DAG.

This also raises a question that has come up in another review recently - D41233. If we reverse the canonicalization of shl+and, we would solve the most basic case that I showed above:
define i32 @shl_first(i32 %a) {
  %t2 = shl i32 %a, 8
  %t3 = and i32 %t2, 44032
  ret i32 %t3
}

define i32 @mask_first(i32 %a) {
  %a2 = and i32 %a, 172
  %a3 = shl i32 %a2, 8
  ret i32 %a3
}

dnsampaio added a reviewer: thakis.Jul 9 2018, 11:47 PM

In D48278#1155803, @dnsampaio wrote:

Ah, I see that now. But I'm not convinced this is the right approach. Why are we waiting to optimize this in the backend? This is a universally good optimization, so it should be in IR:

Agree, I also intend to implement this transformation in the IR. But there are cases that this is only seen after some instructions have been combined in the dag, so why not here also? And indeed, it is a requirement for a future patch that detects opportunities to reduce load and store widths.

IMO, this is backwards, we optimize first in IR (because the sooner we can fold something like this, the more it helps later transforms). Then, only if there's reason to create redundancy (because the patterns emerge late) do we repeat a fold in the backend. Can you post a motivating example that would not be solved by IR transforms, so we can see why this is necessary for the DAG?

efriedma added inline comments.Jul 10 2018, 12:05 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5986	Walking the uses of an SDNode can get expensive... is this O(N^2) in the number of uses of MaskedValue? If not, please add a comment explaining why it isn't.

spatel mentioned this in D49229: [AggressiveInstCombine] Fold redundant masking operations of shifted value.Jul 12 2018, 6:51 AM

In D48278#1155793, @spatel wrote:
In D48278#1155779, @spatel wrote:

In D48278#1155560, @dnsampaio wrote:

It reduces the number of computation operations, from 3 to 2, and the number of constants kept int constants for performing the masking, from 2 to 1.
I don't see how it increases the latency. If you are going to perform the masking and the shift anyway.

Ah, I see that now. But I'm not convinced this is the right approach. Why are we waiting to optimize this in the backend? This is a universally good optimization, so it should be in IR:
https://rise4fun.com/Alive/O04

I'm not sure exactly where that optimization belongs. Ie, is it EarlyCSE, GVN, somewhere else, or is it its own pass? But I don't see any benefit in waiting to do this in the DAG.

This also raises a question that has come up in another review recently - D41233. If we reverse the canonicalization of shl+and, we would solve the most basic case that I showed above:
define i32 @shl_first(i32 %a {
  %t2 = shl i32 %a, 8
  %t3 = and i32 %t2, 44032
  ret i32 %t3
}

define i32 @mask_first(i32 %a) {
  %a2 = and i32 %a, 172
  %a3 = shl i32 %a2, 8
  ret i32 %a3
}

This "canonicalization" won't help to prevent even basic duplicated masked values when using a lshr:

%0 = sext i16 %a to i32
%1 = lshr i32 %0, 8
%2 = and i32 %1, 172
%3 = and i32 %0, 44032

And a simplest case, that it is already in the test case, that won't be handled in the IR level:
define i32 @ror(i32 %a) {
entry:

%m2 = and i32 %a, 3855
%shl = shl i32 %a, 24
%shr = lshr i32 %a, 8
%or = or i32 %shl, %shr
%m1 = and i32 %or, 251658255
%or2 = or i32 %m1, %m2
ret i32 %or2

}

The shl shr instructions become a ror that masks the same masked value.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5986	Ok, will change it as to work only with MaskedValue used by a shift and a AND operations.

Only accepts instructions with 2 uses (AND / SHIFT operations). So that looping through the uses is not expensive, and we avoid it in most cases.
Removed recursive bit value computations(computeKnownBits).

efriedma added inline comments.Jul 13 2018, 11:22 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6104	hasNUsesOfValue(). (use_size is linear in the number of uses.)

Replaced num_uses by !hasNUsesOfValue as requested.

dnsampaio added inline comments.Jul 14 2018, 2:24 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6104	Fixed.

dnsampaio marked an inline comment as done.Jul 16 2018, 3:10 AM

Please can you regenerate the diff with context?

Added context.

dnsampaio abandoned this revision.Dec 28 2018, 9:27 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

116 lines

test/

CodeGen/

AArch64/

FoldRedundantShiftedMasking.ll

96 lines

ARM/

FoldRedundantShiftedMasking.ll

96 lines

X86/

FoldRedundantShiftedMasking.roll.ll

18 lines

pr32329.ll

40 lines

Diff 155818

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	private:
SDValue reduceBuildVecExtToExtBuildVec(SDNode *N);		SDValue reduceBuildVecExtToExtBuildVec(SDNode *N);
SDValue reduceBuildVecConvertToConvertBuildVec(SDNode *N);		SDValue reduceBuildVecConvertToConvertBuildVec(SDNode *N);
SDValue reduceBuildVecToShuffle(SDNode *N);		SDValue reduceBuildVecToShuffle(SDNode *N);
SDValue createBuildVecShuffle(const SDLoc &DL, SDNode *N,		SDValue createBuildVecShuffle(const SDLoc &DL, SDNode *N,
ArrayRef<int> VectorMask, SDValue VecIn1,		ArrayRef<int> VectorMask, SDValue VecIn1,
SDValue VecIn2, unsigned LeftIdx);		SDValue VecIn2, unsigned LeftIdx);
SDValue matchVSelectOpSizesWithSetCC(SDNode *Cast);		SDValue matchVSelectOpSizesWithSetCC(SDNode *Cast);

		SDValue foldRedundantShiftedMasks(SDNode *N);
/// Walk up chain skipping non-aliasing memory nodes,		/// Walk up chain skipping non-aliasing memory nodes,
/// looking for aliasing nodes and adding them to the Aliases vector.		/// looking for aliasing nodes and adding them to the Aliases vector.
void GatherAllAliases(SDNode *N, SDValue OriginalChain,		void GatherAllAliases(SDNode *N, SDValue OriginalChain,
SmallVectorImpl<SDValue> &Aliases);		SmallVectorImpl<SDValue> &Aliases);

/// Return true if there is any possibility that the two addresses overlap.		/// Return true if there is any possibility that the two addresses overlap.
bool isAlias(LSBaseSDNode Op0, LSBaseSDNode Op1) const;		bool isAlias(LSBaseSDNode Op0, LSBaseSDNode Op1) const;

▲ Show 20 Lines • Show All 3,754 Lines • ▼ Show 20 Lines

// Unfold		// Unfold
// x & (-1 'logical shift' y)		// x & (-1 'logical shift' y)
// To		// To
// (x 'opposite logical shift' y) 'logical shift' y		// (x 'opposite logical shift' y) 'logical shift' y
// if it is better for performance.		// if it is better for performance.
SDValue DAGCombiner::unfoldExtremeBitClearingToShifts(SDNode *N) {		SDValue DAGCombiner::unfoldExtremeBitClearingToShifts(SDNode *N) {
assert(N->getOpcode() == ISD::AND);		assert(N->getOpcode() == ISD::AND);

		samparkerUnsubmitted Done Reply Inline Actions Why not check that you have a supported shift here and potentially exit early? samparker: Why not check that you have a supported shift here and potentially exit early?
		dnsampaioAuthorUnsubmitted Done Reply Inline Actions All tests are on the header of the function already (except for the case ISD::ROTL). Making a single big if statement just makes it harder to understand and harder to someone add an exception of the invalid rule . Technically speaking, having one big test or one after the other is the same thing. dnsampaio: All tests are on the header of the function already (except for the case ISD::ROTL). Making a…
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);

// Do we actually prefer shifts over mask?		// Do we actually prefer shifts over mask?
if (!TLI.preferShiftsToClearExtremeBits(N0))		if (!TLI.preferShiftsToClearExtremeBits(N0))
return SDValue();		return SDValue();

// Try to match (-1 '[outer] logical shift' y)		// Try to match (-1 '[outer] logical shift' y)
Show All 22 Lines	SDValue DAGCombiner::unfoldExtremeBitClearingToShifts(SDNode *N) {
else if (matchMask(N0))		else if (matchMask(N0))
X = N1;		X = N1;
else		else
return SDValue();		return SDValue();

SDLoc DL(N);		SDLoc DL(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

// tmp = x 'opposite logical shift' y		// tmp = x 'opposite logical shift' y
		samparkerUnsubmitted Done Reply Inline Actions Remove or uncomment. samparker: Remove or uncomment.
SDValue T0 = DAG.getNode(InnerShift, DL, VT, X, Y);		SDValue T0 = DAG.getNode(InnerShift, DL, VT, X, Y);
// ret = tmp 'logical shift' y		// ret = tmp 'logical shift' y
SDValue T1 = DAG.getNode(OuterShift, DL, VT, T0, Y);		SDValue T1 = DAG.getNode(OuterShift, DL, VT, T0, Y);

return T1;		return T1;
}		}

SDValue DAGCombiner::visitAND(SDNode *N) {		SDValue DAGCombiner::visitAND(SDNode *N) {
Show All 9 Lines	SDValue DAGCombiner::visitAND(SDNode *N) {
if (VT.isVector()) {		if (VT.isVector()) {
if (SDValue FoldedVOp = SimplifyVBinOp(N))		if (SDValue FoldedVOp = SimplifyVBinOp(N))
return FoldedVOp;		return FoldedVOp;

// fold (and x, 0) -> 0, vector edition		// fold (and x, 0) -> 0, vector edition
if (ISD::isBuildVectorAllZeros(N0.getNode()))		if (ISD::isBuildVectorAllZeros(N0.getNode()))
// do not return N0, because undef node may exist in N0		// do not return N0, because undef node may exist in N0
return DAG.getConstant(APInt::getNullValue(N0.getScalarValueSizeInBits()),		return DAG.getConstant(APInt::getNullValue(N0.getScalarValueSizeInBits()),
SDLoc(N), N0.getValueType());		SDLoc(N), N0.getValueType());
		samparkerUnsubmitted Done Reply Inline Actions use LLVM_FALLTHROUGH samparker: use LLVM_FALLTHROUGH
		dnsampaioAuthorUnsubmitted Done Reply Inline Actions Added LLVM_FALLTHROUGH. But gcc 8.1 still gives warning, so I'm also keeping the /* fall-through / /home/diosam01/LLVM/local/src/lib/CodeGen/SelectionDAG/DAGCombiner.cpp: In member function ‘llvm::SDValue {anonymous}::DAGCombiner::foldRedundantShiftedMasks(llvm::SDNode)’: /home/diosam01/LLVM/local/src/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:4180:18: warning: this statement may fall through [-Wimplicit-fallthrough=] N0Opcode = ISD::SRL; ~~~~~~~~~^~~~~ /home/diosam01/LLVM/local/src/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:4182:5: note: here case ISD::SRL: ^~~~ dnsampaio: Added LLVM_FALLTHROUGH. But gcc 8.1 still gives warning, so I'm also keeping the /* fall…
if (ISD::isBuildVectorAllZeros(N1.getNode()))		if (ISD::isBuildVectorAllZeros(N1.getNode()))
// do not return N1, because undef node may exist in N1		// do not return N1, because undef node may exist in N1
return DAG.getConstant(APInt::getNullValue(N1.getScalarValueSizeInBits()),		return DAG.getConstant(APInt::getNullValue(N1.getScalarValueSizeInBits()),
SDLoc(N), N1.getValueType());		SDLoc(N), N1.getValueType());

// fold (and x, -1) -> x, vector edition		// fold (and x, -1) -> x, vector edition
if (ISD::isBuildVectorAllOnes(N0.getNode()))		if (ISD::isBuildVectorAllOnes(N0.getNode()))
return N1;		return N1;
if (ISD::isBuildVectorAllOnes(N1.getNode()))		if (ISD::isBuildVectorAllOnes(N1.getNode()))
		samparkerUnsubmitted Done Reply Inline Actions remove samparker: remove
return N0;		return N0;
}		}

// fold (and c1, c2) -> c1&c2		// fold (and c1, c2) -> c1&c2
ConstantSDNode *N0C = getAsNonOpaqueConstant(N0);		ConstantSDNode *N0C = getAsNonOpaqueConstant(N0);
ConstantSDNode *N1C = isConstOrConstSplat(N1);		ConstantSDNode *N1C = isConstOrConstSplat(N1);
if (N0C && N1C && !N1C->isOpaque())		if (N0C && N1C && !N1C->isOpaque())
return DAG.FoldConstantArithmetic(ISD::AND, SDLoc(N), VT, N0C, N1C);		return DAG.FoldConstantArithmetic(ISD::AND, SDLoc(N), VT, N0C, N1C);

		if (N1C && !VT.isVector()) {
		if (SDValue R = foldRedundantShiftedMasks(N))
		return R;
		}
// canonicalize constant to RHS		// canonicalize constant to RHS
		samparkerUnsubmitted Done Reply Inline Actions I think you should be adding NewShift instead. samparker: I think you should be adding NewShift instead.
		dnsampaioAuthorUnsubmitted Done Reply Inline Actions No, NewShift and its users is added to the work list automatically at DAGCombiner.cpp lines 1483 and 1484. In the log, you can see that a transformed node is always visited again in the same combining phase. I want the other user of this value, as it might be the solely user left. dnsampaio: No, NewShift and its users is added to the work list automatically at DAGCombiner.cpp lines…
if (DAG.isConstantIntBuildVectorOrConstantInt(N0) &&		if (DAG.isConstantIntBuildVectorOrConstantInt(N0) &&
!DAG.isConstantIntBuildVectorOrConstantInt(N1))		!DAG.isConstantIntBuildVectorOrConstantInt(N1))
return DAG.getNode(ISD::AND, SDLoc(N), VT, N1, N0);		return DAG.getNode(ISD::AND, SDLoc(N), VT, N1, N0);
// fold (and x, -1) -> x		// fold (and x, -1) -> x
if (isAllOnesConstant(N1))		if (isAllOnesConstant(N1))
return N0;		return N0;
// if (and x, c) is known to be zero, return 0		// if (and x, c) is known to be zero, return 0
unsigned BitWidth = VT.getScalarSizeInBits();		unsigned BitWidth = VT.getScalarSizeInBits();
▲ Show 20 Lines • Show All 1,645 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitShiftByConstant(SDNode N, ConstantSDNode Amt) {

// Fold the constants, shifting the binop RHS by the shift amount.		// Fold the constants, shifting the binop RHS by the shift amount.
SDValue NewRHS = DAG.getNode(N->getOpcode(), SDLoc(LHS->getOperand(1)),		SDValue NewRHS = DAG.getNode(N->getOpcode(), SDLoc(LHS->getOperand(1)),
N->getValueType(0),		N->getValueType(0),
LHS->getOperand(1), N->getOperand(1));		LHS->getOperand(1), N->getOperand(1));
assert(isa<ConstantSDNode>(NewRHS) && "Folding was not successful!");		assert(isa<ConstantSDNode>(NewRHS) && "Folding was not successful!");

// Create the new shift.		// Create the new shift.
SDValue NewShift = DAG.getNode(N->getOpcode(),		SDValue NewShift = DAG.getNode(N->getOpcode(),
		efriedmaUnsubmitted Done Reply Inline Actions Walking the uses of an SDNode can get expensive... is this O(N^2) in the number of uses of MaskedValue? If not, please add a comment explaining why it isn't. efriedma: Walking the uses of an SDNode can get expensive... is this O(N^2) in the number of uses of…
		dnsampaioAuthorUnsubmitted Done Reply Inline Actions Ok, will change it as to work only with MaskedValue used by a shift and a AND operations. dnsampaio: Ok, will change it as to work only with MaskedValue used by a shift and a AND operations.
SDLoc(LHS->getOperand(0)),		SDLoc(LHS->getOperand(0)),
VT, LHS->getOperand(0), N->getOperand(1));		VT, LHS->getOperand(0), N->getOperand(1));

// Create the new binop.		// Create the new binop.
return DAG.getNode(LHS->getOpcode(), SDLoc(N), VT, NewShift, NewRHS);		return DAG.getNode(LHS->getOpcode(), SDLoc(N), VT, NewShift, NewRHS);
}		}

SDValue DAGCombiner::distributeTruncateThroughAnd(SDNode *N) {		SDValue DAGCombiner::distributeTruncateThroughAnd(SDNode *N) {
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	if (C1 && C2 && C1->getValueType(0) == C2->getValueType(0)) {
return DAG.getNode(N->getOpcode(), dl, VT, N0->getOperand(0),		return DAG.getNode(N->getOpcode(), dl, VT, N0->getOperand(0),
CombinedShiftNorm);		CombinedShiftNorm);
}		}
}		}
}		}
return SDValue();		return SDValue();
}		}

		// fold expressions x1 and x2 alike:
		// x1 = ( and, x, 0x00FF )
		// x2 = (( shl x, 8 ) and 0xFF00 )
		// into
		// x2 = shl x1, 8 ; reuse the computation of x1
		SDValue DAGCombiner::foldRedundantShiftedMasks(SDNode *AND) {
		// 1st we check for the desired structure
		if (!AND \|\| AND->getOpcode() != ISD::AND)
		return SDValue();

		const SDValue &SHIFT = AND->getOperand(0);
		if ((SHIFT.getNumOperands() != 2) \|\| (!SHIFT.hasOneUse()))
		return SDValue();

		unsigned N0Opcode = SHIFT.getOpcode();
		switch (N0Opcode) {
		case ISD::SHL:
		case ISD::SRA:
		case ISD::SRL:
		case ISD::ROTR:
		case ISD::ROTL:
		break;
		default:
		return SDValue();
		}

		const ConstantSDNode *ShiftAmount =
		dyn_cast<ConstantSDNode>(SHIFT.getOperand(1));
		if (!ShiftAmount)
		return SDValue();

		const APInt &ShiftValue = ShiftAmount->getAPIntValue();
		const ConstantSDNode *Mask = dyn_cast<ConstantSDNode>(AND->getOperand(1));
		if (!Mask)
		return SDValue();

		const APInt &MaskValue = Mask->getAPIntValue();
		SDValue MASKED = SHIFT.getOperand(0);
		if (MASKED.getValueType().isVector() \|\| !MASKED->hasNUsesOfValue(2, 0))
		efriedmaUnsubmitted Done Reply Inline Actions hasNUsesOfValue(). (use_size is linear in the number of uses.) efriedma: hasNUsesOfValue(). (use_size is linear in the number of uses.)
		dnsampaioAuthorUnsubmitted Done Reply Inline Actions Fixed. dnsampaio: Fixed.
		return SDValue();

		// MASKED has 2 uses and 1 is the shift operation
		for (SDNode *OtherUser : MASKED->uses()) {
		if (OtherUser == SHIFT.getNode())
		continue;

		if (OtherUser->getOpcode() != ISD::AND \|\|
		OtherUser->getValueType(0).isVector())
		return SDValue();

		ConstantSDNode *OtherMask =
		dyn_cast<ConstantSDNode>(OtherUser->getOperand(1));

		if (!OtherMask)
		continue;

		const APInt &OtherMaskValue = OtherMask->getAPIntValue();

		if (OtherMaskValue.getBitWidth() != MaskValue.getBitWidth())
		return SDValue();

		// 2nd we check if the mask / shifted mask combine
		switch (N0Opcode) {
		case ISD::SHL:
		if (!(OtherMaskValue.shl(ShiftValue) == MaskValue \|\|
		MaskValue.lshr(ShiftValue) == OtherMaskValue))
		return SDValue();
		break;
		case ISD::SRA:
		if (!(OtherMaskValue.ashr(ShiftValue) == MaskValue))
		return SDValue();
		break;
		case ISD::SRL:
		if (!(OtherMaskValue.lshr(ShiftValue) == MaskValue \|\|
		MaskValue.shl(ShiftValue) == OtherMaskValue))
		return SDValue();
		break;
		case ISD::ROTR:
		if (!(OtherMaskValue.rotr(ShiftValue) == MaskValue))
		return SDValue();
		break;
		case ISD::ROTL:
		if (!(OtherMaskValue.rotl(ShiftValue) == MaskValue))
		return SDValue();
		break;
		default:
		return SDValue();
		}
		// 3rd we do a transformation
		LLVM_DEBUG(
		dbgs() << "\tValue being masked and shift-masked: "; MASKED.dump();
		dbgs() << "t\tApplied mask: 0x" << OtherMaskValue.toString(16, false)
		<< " : ";
		OtherUser->dump();
		dbgs() << "\tShifted by: " << ShiftValue.getZExtValue() << " : ";
		SHIFT.dump(); dbgs() << "\t\tAnd masked by: 0x"
		<< MaskValue.toString(16, false) << " : ";
		AND->dump(); dbgs() << "\tReusing the value of: "; OtherUser->dump(););

		SDValue ShiftTheAND(OtherUser, 0);
		const SDLoc DL(SHIFT);
		EVT VT = AND->getValueType(0);
		SDValue NewShift =
		DAG.getNode(N0Opcode, DL, VT, ShiftTheAND, SHIFT.getOperand(1));
		AddToWorklist(OtherUser);
		return NewShift;
		}
		return SDValue();
		}

SDValue DAGCombiner::visitSHL(SDNode *N) {		SDValue DAGCombiner::visitSHL(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N0.getValueType();		EVT VT = N0.getValueType();
unsigned OpSizeInBits = VT.getScalarSizeInBits();		unsigned OpSizeInBits = VT.getScalarSizeInBits();

// fold vector ops		// fold vector ops
if (VT.isVector()) {		if (VT.isVector()) {
▲ Show 20 Lines • Show All 12,374 Lines • Show Last 20 Lines

test/CodeGen/AArch64/FoldRedundantShiftedMasking.ll

This file was added.

				; RUN: llc -march=aarch64 %s -o - \| FileCheck %s

				RKSimonUnsubmitted Done Reply Inline Actions There's no need to use a check-prefix, remove it and use CHECK RKSimon: There's no need to use a check-prefix, remove it and use CHECK
				define i32 @ror(i32 %a) {
				entry:
				%m2 = and i32 %a, 3855
				%shl = shl i32 %a, 24
				%shr = lshr i32 %a, 8
				%or = or i32 %shl, %shr
				%m1 = and i32 %or, 251658255
				%or2 = or i32 %m1, %m2
				ret i32 %or2
				}
				; CHECK-LABEL: ror
				dnsampaioAuthorUnsubmitted Done Reply Inline Actions Before patch: mov w9, #15 mov w8, #3855 movk w9, #3840, lsl #16 and w8, w0, w8 and w9, w9, w0, ror #8 orr w0, w9, w8 After patch: mov w8, #3855 and w8, w0, w8 orr w0, w8, w8, ror #8 dnsampaio: ** Before patch: mov w9, #15 mov w8, #3855 movk w9, #3840, lsl #16 and w8, w0, w8 and w9…
				; CHECK: mov [[R1:w[0-9]]], #3855
				; CHECK-NEXT: and [[R2:w[0-9]]], w0, [[R1]]
				; CHECK-NEXT: orr [[R3:w[0-9]]], [[R1]], [[R1]], ror #8

				define i32 @shl(i16 %a) {
				entry:
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 172
				%2 = shl i32 %0, 8
				%3 = and i32 %2, 44032
				%4 = or i32 %1, %3
				ret i32 %4
				}
				; CHECK-LABEL:shl:
				dnsampaioAuthorUnsubmitted Done Reply Inline Actions Before patch: sxth w8, w0 mov w9, #172 mov w10, #44032 and w9, w8, w9 and w8, w10, w8, lsl #8 orr w0, w9, w8 After patch: mov w8, #172 and w8, w0, w8 orr w0, w8, w8, lsl #8 dnsampaio: ** Before patch: sxth w8, w0 mov w9, #172 mov w10, #44032 and w9, w8, w9 and w8, w10, w8…
				dnsampaioAuthorUnsubmitted Done Reply Inline Actions In x86-64: Before patch: movswl %di, %eax movl %eax, %ecx andl $172, %ecx shll $8, %eax andl $44032, %eax # imm = 0xAC00 orl %ecx, %eax After patch: andl $172, %edi movl %edi, %eax shll $8, %eax leal (%rax,%rdi), %eax dnsampaio: ** In x86-64: * Before patch: movswl %di, %eax movl %eax, %ecx andl $172, %ecx shll $8, %eax…
				; CHECK: mov w8, #172
				; CHECK-NEXT: and w8, w0, w8
				; CHECK-NEXT: orr w0, w8, w8, lsl #8

				define i32 @lshr(i16 %a) {
				entry:
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 44032
				%2 = lshr i32 %0, 8
				%3 = and i32 %2, 172
				%4 = or i32 %1, %3
				ret i32 %4
				}
				; CHECK-LABEL:lshr:
				dnsampaioAuthorUnsubmitted Done Reply Inline Actions Before patch: sxth w8, w0 mov w9, #44032 mov w10, #172 and w9, w8, w9 and w8, w10, w8, lsr #8 orr w0, w9, w8 After patch: mov w8, #44032 and w8, w0, w8 orr w0, w8, w8, lsr #8 dnsampaio: ** Before patch: sxth w8, w0 mov w9, #44032 mov w10, #172 and w9, w8, w9 and w8, w10, w8…
				dnsampaioAuthorUnsubmitted Done Reply Inline Actions In x86-64 Before patch: movswl %di, %eax movl %eax, %ecx andl $44032, %ecx # imm = 0xAC00 shrl $8, %eax andl $172, %eax orl %ecx, %eax After patch: andl $44032, %edi # imm = 0xAC00 movl %edi, %eax shrl $8, %eax leal (%rax,%rdi), %eax dnsampaio: ** In x86-64 * Before patch: movswl %di, %eax movl %eax, %ecx andl $44032, %ecx #…
				; CHECK: mov w8, #44032
				; CHECK-NEXT: and w8, w0, w8
				; CHECK-NEXT: orr w0, w8, w8, lsr #8

				define i32 @ashr(i16 %a) {
				entry:
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 44032
				%2 = ashr i32 %0, 8
				%3 = and i32 %2, 172
				%4 = or i32 %1, %3
				ret i32 %4
				}
				; CHECK-LABEL:ashr:
				dnsampaioAuthorUnsubmitted Done Reply Inline Actions Before patch: sxth w8, w0 mov w9, #44032 mov w10, #172 and w9, w8, w9 and w8, w10, w8, lsr #8 orr w0, w9, w8 After patch: mov w8, #44032 and w8, w0, w8 orr w0, w8, w8, lsr #8 dnsampaio: ** Before patch: sxth w8, w0 mov w9, #44032 mov w10, #172 and w9, w8, w9 and w8, w10, w8…
				dnsampaioAuthorUnsubmitted Done Reply Inline Actions In x86-64 Before patch: movswl %di, %eax movl %eax, %ecx andl $44032, %ecx # imm = 0xAC00 shrl $8, %eax andl $172, %eax orl %ecx, %eax After patch: andl $44032, %edi # imm = 0xAC00 movl %edi, %eax shrl $8, %eax leal (%rax,%rdi), %eax dnsampaio: ** In x86-64 * Before patch: movswl %di, %eax movl %eax, %ecx andl $44032, %ecx #…
				; CHECK: mov w8, #44032
				; CHECK-NEXT: and w8, w0, w8
				; CHECK-NEXT: orr w0, w8, w8, lsr #8


				define i32 @shl_nogood(i16 %a) {
				entry:
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 172
				%2 = shl i32 %0, %1
				%3 = and i32 %2, 44032
				%4 = or i32 %1, %3
				ret i32 %4
				}

				; CHECK-LABEL:shl_nogood:
				; CHECK: sxth w8, w0
				; CHECK-NEXT: mov w9, #172
				; CHECK-NEXT: and w9, w8, w9
				; CHECK-NEXT: lsl w8, w8, w9
				; CHECK-NEXT: mov w10, #44032
				; CHECK-NEXT: and w8, w8, w10
				; CHECK-NEXT: orr w0, w9, w8
				; CHECK-NEXT: ret

				define i32 @shl_nogood2(i16 %a) {
				entry:
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 172
				%2 = shl i32 %0, 8
				%3 = and i32 %2, %0
				%4 = or i32 %1, %3
				ret i32 %4
				}
				RKSimonUnsubmitted Done Reply Inline Actions Confusing - please move the shl_nogood checks before the shl_nogood2 define RKSimon: Confusing - please move the shl_nogood checks before the shl_nogood2 define
				; CHECK-LABEL:shl_nogood2:
				; CHECK: sxth w8, w0
				; CHECK-NEXT: mov w9, #172
				; CHECK-NEXT: and w9, w8, w9
				; CHECK-NEXT: and w8, w8, w8, lsl #8
				; CHECK-NEXT: orr w0, w9, w8
				; CHECK-NEXT: ret

test/CodeGen/ARM/FoldRedundantShiftedMasking.ll

This file was added.

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "armv4t-arm-none-eabi"

				; RUN: llc -march=arm < %s \| FileCheck %s

				RKSimonUnsubmitted Done Reply Inline Actions There's no need to use a check-prefix, remove it and use CHECK RKSimon: There's no need to use a check-prefix, remove it and use CHECK
				define i32 @ror(i32 %a) {
				entry:
				%m2 = and i32 %a, 3855
				%shl = shl i32 %a, 24
				%shr = lshr i32 %a, 8
				%or = or i32 %shl, %shr
				%m1 = and i32 %or, 251658255
				%or2 = or i32 %m1, %m2
				ret i32 %or2
				}
				; CHECK-LABEL: ror
				; CHECK: mov [[R1:r[0-9]]], #15
				; CHECK-NEXT: orr [[R2:r[0-9]]], [[R1]], #3840
				; CHECK-NEXT: and [[R3:r[0-9]]], r0, [[R1]]
				; CHECK-NEXT: orr [[R4:r[0-9]]], [[R3]], [[R3]], ror #8

				define i32 @shl(i16 %a) {
				entry:
				dnsampaioAuthorUnsubmitted Done Reply Inline Actions Before patch: mov r1, #15 mov r2, #15 orr r1, r1, #3840 orr r2, r2, #251658240 and r1, r0, r1 and r0, r2, r0, ror #8 orr r0, r0, r1 After patch: mov r1, #15 orr r1, r1, #3840 and r0, r0, r1 orr r0, r0, r0, ror #8 dnsampaio: ** Before patch: mov r1, #15 mov r2, #15 orr r1, r1, #3840 orr r2, r2, #251658240 and r1…
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 172
				%2 = shl i32 %0, 8
				%3 = and i32 %2, 44032
				%4 = or i32 %1, %3
				ret i32 %4
				}
				; CHECK-LABEL: shl:
				; CHECK: and r0, r0, #172
				; CHECK-NEXT: orr r0, r0, r0, lsl #8

				define i32 @lshr(i16 %a) {
				dnsampaioAuthorUnsubmitted Done Reply Inline Actions Before patch: lsl r0, r0, #16 mov r1, #172 and r0, r1, r0, asr #16 orr r0, r0, r0, lsl #8 After patch: and r0, r0, #172 orr r0, r0, r0, lsl #8 dnsampaio: ** Before patch: lsl r0, r0, #16 mov r1, #172 and r0, r1, r0, asr #16 orr r0, r0, r0, lsl…
				entry:
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 44032
				%2 = lshr i32 %0, 8
				%3 = and i32 %2, 172
				%4 = or i32 %1, %3
				ret i32 %4
				}
				; CHECK-LABEL: lshr:
				; CHECK: and r0, r0, #44032
				dnsampaioAuthorUnsubmitted Done Reply Inline Actions Before patch: lsl r0, r0, #16 mov r1, #44032 and r1, r1, r0, asr #16 asr r0, r0, #16 mov r2, #172 and r0, r2, r0, lsr #8 orr r0, r1, r0 After patch: and r0, r0, #44032 orr r0, r0, r0, lsr #8 dnsampaio: ** Before patch: lsl r0, r0, #16 mov r1, #44032 and r1, r1, r0, asr #16 asr r0, r0, #16…
				; CHECK-NEXT: orr r0, r0, r0, lsr #8

				define i32 @ashr(i16 %a) {
				entry:
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 44032
				%2 = ashr i32 %0, 8
				%3 = and i32 %2, 172
				%4 = or i32 %1, %3
				ret i32 %4
				}
				; CHECK-LABEL: ashr:
				; CHECK: and r0, r0, #44032
				dnsampaioAuthorUnsubmitted Done Reply Inline Actions Before lsl r0, r0, #16 mov r1, #44032 and r1, r1, r0, asr #16 asr r0, r0, #16 mov r2, #172 and r0, r2, r0, lsr #8 orr r0, r1, r0 After and r0, r0, #44032 orr r0, r0, r0, lsr #8 dnsampaio: ** Before lsl r0, r0, #16 mov r1, #44032 and r1, r1, r0, asr #16 asr r0, r0, #16 mov r2…
				; CHECK-NEXT: orr r0, r0, r0, lsr #8

				define i32 @shl_nogood(i16 %a) {
				entry:
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 172
				%2 = shl i32 %0, %1
				%3 = and i32 %2, 44032
				%4 = or i32 %1, %3
				ret i32 %4
				}

				; CHECK-LABEL:shl_nogood:
				; CHECK: lsl r0, r0, #16
				; CHECK-NEXT: mov r1, #172
				; CHECK-NEXT: and r1, r1, r0, asr #16
				; CHECK-NEXT: asr r0, r0, #16
				; CHECK-NEXT: mov r2, #44032
				; CHECK-NEXT: and r0, r2, r0, lsl r1
				; CHECK-NEXT: orr r0, r1, r0

				define i32 @shl_nogood2(i16 %a) {
				entry:
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 172
				%2 = shl i32 %0, 8
				%3 = and i32 %2, %0
				%4 = or i32 %1, %3
				ret i32 %4
				}
				; CHECK-LABEL:shl_nogood2:
				; CHECK: lsl r0, r0, #16
				RKSimonUnsubmitted Done Reply Inline Actions Confusing - please move the shl_nogood checks before the shl_nogood2 define RKSimon: Confusing - please move the shl_nogood checks before the shl_nogood2 define
				; CHECK-NEXT: mov r1, #172
				; CHECK-NEXT: asr r2, r0, #16
				; CHECK-NEXT: and r1, r1, r0, asr #16
				; CHECK-NEXT: lsl r2, r2, #8
				; CHECK-NEXT: and r0, r2, r0, asr #16
				; CHECK-NEXT: orr r0, r1, r0

test/CodeGen/X86/FoldRedundantShiftedMasking.roll.ll

This file was added.

				; RUN: llc -march=x86-64 %s -o - \| FileCheck %s

				define i32 @roll(i32 %a) {
				entry:
				%m2 = and i32 %a, 3855
				%shl = shl i32 %a, 24
				%shr = lshr i32 %a, 8
				%or = or i32 %shl, %shr
				%m1 = and i32 %or, 251658255
				%or2 = or i32 %m1, %m2
				ret i32 %or2
				}

				; CHECK-LABEL: roll:
				; CHECK: andl $3855, %edi
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: roll $24, %eax
				; CHECK-NEXT: orl %edi, %eax

test/CodeGen/X86/pr32329.ll

	Show All 23 Lines
	; X86-NEXT: pushl %edi			; X86-NEXT: pushl %edi
	; X86-NEXT: .cfi_def_cfa_offset 16			; X86-NEXT: .cfi_def_cfa_offset 16
	; X86-NEXT: pushl %esi			; X86-NEXT: pushl %esi
	; X86-NEXT: .cfi_def_cfa_offset 20			; X86-NEXT: .cfi_def_cfa_offset 20
	; X86-NEXT: .cfi_offset %esi, -20			; X86-NEXT: .cfi_offset %esi, -20
	; X86-NEXT: .cfi_offset %edi, -16			; X86-NEXT: .cfi_offset %edi, -16
	; X86-NEXT: .cfi_offset %ebx, -12			; X86-NEXT: .cfi_offset %ebx, -12
	; X86-NEXT: .cfi_offset %ebp, -8			; X86-NEXT: .cfi_offset %ebp, -8
	; X86-NEXT: movl obj, %edx
	; X86-NEXT: movsbl var_27, %eax			; X86-NEXT: movsbl var_27, %eax
	; X86-NEXT: movzwl var_2, %esi			; X86-NEXT: movzwl var_2, %esi
	; X86-NEXT: movl var_310, %ecx			; X86-NEXT: movl var_310, %ecx
	; X86-NEXT: imull %eax, %ecx			; X86-NEXT: imull %eax, %ecx
	; X86-NEXT: addl var_24, %ecx			; X86-NEXT: addl var_24, %ecx
	; X86-NEXT: andl $4194303, %edx # imm = 0x3FFFFF			; X86-NEXT: movl $4194303, %edi # imm = 0x3FFFFF
	; X86-NEXT: leal (%edx,%edx), %ebx			; X86-NEXT: andl obj, %edi
	; X86-NEXT: subl %eax, %ebx			; X86-NEXT: leal (%edi,%edi), %edx
	; X86-NEXT: movl %ebx, %edi			; X86-NEXT: subl %eax, %edx
	; X86-NEXT: subl %esi, %edi			; X86-NEXT: movl %edx, %ebx
	; X86-NEXT: imull %edi, %ecx			; X86-NEXT: subl %esi, %ebx
				; X86-NEXT: imull %ebx, %ecx
	; X86-NEXT: addl $-1437483407, %ecx # imm = 0xAA51BE71			; X86-NEXT: addl $-1437483407, %ecx # imm = 0xAA51BE71
	; X86-NEXT: movl $9, %esi			; X86-NEXT: movl $9, %esi
	; X86-NEXT: xorl %ebp, %ebp			; X86-NEXT: xorl %ebp, %ebp
	; X86-NEXT: shldl %cl, %esi, %ebp			; X86-NEXT: shldl %cl, %esi, %ebp
	; X86-NEXT: shll %cl, %esi			; X86-NEXT: shll %cl, %esi
	; X86-NEXT: testb $32, %cl			; X86-NEXT: testb $32, %cl
	; X86-NEXT: cmovnel %esi, %ebp			; X86-NEXT: cmovnel %esi, %ebp
	; X86-NEXT: movl $0, %ecx			; X86-NEXT: movl $0, %ecx
	; X86-NEXT: cmovnel %ecx, %esi			; X86-NEXT: cmovnel %ecx, %esi
	; X86-NEXT: cmpl %edx, %edi			; X86-NEXT: cmpl %edi, %ebx
	; X86-NEXT: movl %ebp, var_50+4			; X86-NEXT: movl %ebp, var_50+4
	; X86-NEXT: movl %esi, var_50			; X86-NEXT: movl %esi, var_50
	; X86-NEXT: setge var_205			; X86-NEXT: setge var_205
	; X86-NEXT: imull %eax, %ebx			; X86-NEXT: imull %eax, %edx
	; X86-NEXT: movb %bl, var_218			; X86-NEXT: movb %dl, var_218
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
	; X86-NEXT: .cfi_def_cfa_offset 16			; X86-NEXT: .cfi_def_cfa_offset 16
	; X86-NEXT: popl %edi			; X86-NEXT: popl %edi
	; X86-NEXT: .cfi_def_cfa_offset 12			; X86-NEXT: .cfi_def_cfa_offset 12
	; X86-NEXT: popl %ebx			; X86-NEXT: popl %ebx
	; X86-NEXT: .cfi_def_cfa_offset 8			; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: popl %ebp			; X86-NEXT: popl %ebp
	; X86-NEXT: .cfi_def_cfa_offset 4			; X86-NEXT: .cfi_def_cfa_offset 4
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: foo:			; X64-LABEL: foo:
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movl {{.*}}(%rip), %eax
	; X64-NEXT: movsbl {{.*}}(%rip), %r9d			; X64-NEXT: movsbl {{.*}}(%rip), %r9d
	; X64-NEXT: movzwl {{.*}}(%rip), %r8d			; X64-NEXT: movzwl {{.*}}(%rip), %r8d
	; X64-NEXT: movl {{.*}}(%rip), %ecx			; X64-NEXT: movl {{.*}}(%rip), %ecx
	; X64-NEXT: imull %r9d, %ecx			; X64-NEXT: imull %r9d, %ecx
	; X64-NEXT: addl {{.*}}(%rip), %ecx			; X64-NEXT: addl {{.*}}(%rip), %ecx
	; X64-NEXT: andl $4194303, %eax # imm = 0x3FFFFF			; X64-NEXT: movl $4194303, %esi
	; X64-NEXT: leal (%rax,%rax), %edi			; X64-NEXT: andl obj(%rip), %esi
				; X64-NEXT: leal (%rsi,%rsi), %edi
	; X64-NEXT: subl %r9d, %edi			; X64-NEXT: subl %r9d, %edi
	; X64-NEXT: movl %edi, %esi			; X64-NEXT: movl %edi, %edx
	; X64-NEXT: subl %r8d, %esi			; X64-NEXT: subl %r8d, %edx
	; X64-NEXT: imull %esi, %ecx			; X64-NEXT: imull %edx, %ecx
	; X64-NEXT: addl $-1437483407, %ecx # imm = 0xAA51BE71			; X64-NEXT: addl $-1437483407, %ecx # imm = 0xAA51BE71
	; X64-NEXT: movl $9, %edx			; X64-NEXT: movl $9, %eax
	; X64-NEXT: # kill: def $cl killed $cl killed $ecx			; X64-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-NEXT: shlq %cl, %rdx			; X64-NEXT: shlq %cl, %rax
	; X64-NEXT: movq %rdx, {{.*}}(%rip)			; X64-NEXT: movq %rax, {{.*}}(%rip)
	; X64-NEXT: cmpl %eax, %esi			; X64-NEXT: cmpl %esi, %edx
	; X64-NEXT: setge {{.*}}(%rip)			; X64-NEXT: setge {{.*}}(%rip)
	; X64-NEXT: imull %r9d, %edi			; X64-NEXT: imull %r9d, %edi
	; X64-NEXT: movb %dil, {{.*}}(%rip)			; X64-NEXT: movb %dil, {{.*}}(%rip)
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%bf.load = load i32, i32* bitcast (%struct.AA* @obj to i32*), align 8			%bf.load = load i32, i32* bitcast (%struct.AA* @obj to i32*), align 8
	%bf.clear = shl i32 %bf.load, 1			%bf.clear = shl i32 %bf.load, 1
	%add = and i32 %bf.clear, 8388606			%add = and i32 %bf.clear, 8388606
	Show All 28 Lines