Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
RKSimon
hfinkel
nemanjai
xbolva00
kparzysz
craig.topper

Commits

rGc17c5864fff6: [InstCombine] recognize popcount.
rL374512: [InstCombine] recognize popcount.

Summary

Try to recognize below popcount implemented in hacker's delight:

int popcount32(unsigned i) {
  i = i - ((i >> 1) & 0x55555555);
  i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
  i = ((i + (i >> 4)) & 0x0F0F0F0F);
  return (i * 0x01010101) >> 24; 
}

This helps platforms which support harware popcount instruction(eg: PowerPC) get some gain for benchmark deepsjeng of cpu2017.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

shchenz created this revision.Sep 29 2019, 12:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2019, 12:56 AM

Herald added subscribers: llvm-commits, steven.zhang, • wuzish, hiraditya. · View Herald Transcript

I think this should to into AgressiveInstCombine.

address @lebedev.ri comment

Cool!

Maybe we need to add TargetTransformInfo in InstCombiner to make sure this combination only happens when getPopcntSupport is true.

Atleast on X86 it is not needed, since popcount expander expands intrinsic to exactly this pattern.

https://godbolt.org/z/i-rswr

xbolva00 added a comment.Sep 29 2019, 4:30 AM

This comment was removed by xbolva00.

shchenz added a reviewer: xbolva00.Sep 29 2019, 6:25 PM

In D68189#1687190, @xbolva00 wrote:

Maybe we need to add TargetTransformInfo in InstCombiner to make sure this combination only happens when getPopcntSupport is true.

Atleast on X86 it is not needed, since popcount expander expands intrinsic to exactly this pattern.

https://godbolt.org/z/i-rswr

Thanks, yes it should be ok on powerpc and X86. But I am afraid there are some platforms which do not have hardware popcount, and we folding many arithmetical operations into one intrinsic, thus we may miss some combination/canonicalization opportunities for these arithmetical operations on such platform. I am not sure ^-^

In D68189#1687354, @shchenz wrote:

In D68189#1687190, @xbolva00 wrote:

Maybe we need to add TargetTransformInfo in InstCombiner to make sure this combination only happens when getPopcntSupport is true.

Atleast on X86 it is not needed, since popcount expander expands intrinsic to exactly this pattern.

https://godbolt.org/z/i-rswr

Thanks, yes it should be ok on powerpc and X86. But I am afraid there are some platforms which do not have hardware popcount, and we folding many arithmetical operations into one intrinsic, thus we may miss some combination/canonicalization opportunities for these arithmetical operations on such platform. I am not sure ^-^

No. This is a canonicalization pass. Native LLVM IR intrinsic is more canonical than some IR blob.

Please add m_OneUse everywhere.

xbolva00 added inline comments.Sep 30 2019, 5:39 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
297	Recognized

In D68189#1687830, @xbolva00 wrote:

Please add m_OneUse everywhere.

@xbolva00 Do we need to add m_OneUse for every operation? If some instruction like first lshr (%2 = lshr i32 %0, 1) has other use, it may still a win by doing this transformation.
Here if we do not consider m_OneUse issue, the worst case is: all instructions have other uses, and we can only replace final lshr (%13 = lshr i32 %12, 24) with popcount intrinsic. One instruction vs One intrinsic, I think it may be not a lose?

I meant something like in foldAnyOrAllBitsSet. Final lshr is ok to have multiple uses, all partial computations should be m_Oneuse.

In D68189#1688114, @xbolva00 wrote:

I meant something like in foldAnyOrAllBitsSet. Final lshr is ok to have multiple uses, all partial computations should be m_Oneuse.

I don't think that's necessary for this transform. The difference is we are only creating a single instruction for this transform. Therefore, we can always replace the final instruction in the sequence with some other instruction without producing any extra instructions overall.

spatel added inline comments.Sep 30 2019, 7:52 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
269–271	This might actually be easier to read if generalized for more bitwidths. Can we use APInt and handle power-of-2 sizes from 8- to 128-bit? http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel

spatel added inline comments.Sep 30 2019, 8:01 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
269–271	For reference, we know it's safe to do that transform more generally because the backend handles the expansion back to this pattern more generally: https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L5923 (That expansion is why it is safe to do this transform in IR without a TTI hook in the first place; we don't expect any regressions because we should reverse the transform.)

jsji added a subscriber: ppc-slack.Sep 30 2019, 1:14 PM

xbolva00 added inline comments.Oct 1 2019, 6:52 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
270	Type size - 8?

vector support?

I agree it would be best to generalize this.
There was a previous patch that already did that (D45173) and it was generic IIRC, but it got stuck and is now under wrong license.
Maybe @kparzysz is willing to relicense it thought?

This revision now requires changes to proceed.Oct 6 2019, 11:01 AM

Yes, D45173 recognizes some standards forms of popcount, but it can not recognize the one in benchmark deepsjeng. There are many forms of popcount, this one is TargetLowering::expandCTPOP choose to expand. So I guess we should add some specific combination code to recognize it?

address comments.

shchenz marked 3 inline comments as done.Oct 8 2019, 1:55 AM

shchenz added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
270	When type size is 8, we don't need the final `lshr`, so we can not do it in instcombine for opcode `lshr`, but we can still do it based on final `mul`. Currently I left it as a follow-up issue until a real world case found. Hope this is ok.

Thanks, new patch looks great.

minor fix for testcase

Ok for me

craig.topper added a subscriber: craig.topper.Oct 9 2019, 1:52 PM

craig.topper added inline comments.

llvm/include/llvm/IR/PatternMatch.h
664–670	Can we std::move this into specific_intval constructor and then std::move it again in the class. Otherwise we're making multiple heap allocations whenever the value is more the 64 bits.

avoid unnescessary heap allocations in APInt copy constructor.

shchenz edited the summary of this revision. (Show Details)Oct 9 2019, 6:50 PM

shchenz added a reviewer: craig.topper.

shchenz added inline comments.

llvm/include/llvm/IR/PatternMatch.h
664–670	Right. Thanks for pointing it out.

craig.topper added inline comments.Oct 10 2019, 10:14 PM

llvm/test/Transforms/AggressiveInstCombine/popcount.ll
3	Why does this run the entire -O3 pipeline and not just the aggressive instcombine pass?

This revision was not accepted when it landed; it landed in state Needs Review.Oct 10 2019, 10:14 PM

Closed by commit rGc17c5864fff6: [InstCombine] recognize popcount. (authored by shchenz). · Explain Why

This revision was automatically updated to reflect the committed changes.

shchenz marked an inline comment as done.Oct 10 2019, 10:28 PM

shchenz added inline comments.

llvm/test/Transforms/AggressiveInstCombine/popcount.ll
3	Thanks, done in rL374514

Diff 224539

llvm/include/llvm/IR/PatternMatch.h

Show First 20 Lines • Show All 637 Lines • ▼ Show 20 Lines	if (const auto *CV = dyn_cast<ConstantInt>(V))
VR = CV->getZExtValue();		VR = CV->getZExtValue();
return true;		return true;
}		}
return false;		return false;
}		}
};		};

/// Match a specified integer value or vector of all elements of that		/// Match a specified integer value or vector of all elements of that
// value.		/// value.
struct specific_intval {		struct specific_intval {
uint64_t Val;		APInt Val;

specific_intval(uint64_t V) : Val(V) {}		specific_intval(APInt V) : Val(std::move(V)) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) {
const auto *CI = dyn_cast<ConstantInt>(V);		const auto *CI = dyn_cast<ConstantInt>(V);
if (!CI && V->getType()->isVectorTy())		if (!CI && V->getType()->isVectorTy())
if (const auto *C = dyn_cast<Constant>(V))		if (const auto *C = dyn_cast<Constant>(V))
CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue());		CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue());

return CI && CI->getValue() == Val;		return CI && APInt::isSameValue(CI->getValue(), Val);
}		}
};		};

/// Match a specific integer value or vector with all elements equal to		/// Match a specific integer value or vector with all elements equal to
/// the value.		/// the value.
inline specific_intval m_SpecificInt(uint64_t V) { return specific_intval(V); }		inline specific_intval m_SpecificInt(APInt V) {
		return specific_intval(std::move(V));
		}

		inline specific_intval m_SpecificInt(uint64_t V) {
		return m_SpecificInt(APInt(64, V));
		}
		craig.topperUnsubmitted Not Done Reply Inline Actions Can we std::move this into specific_intval constructor and then std::move it again in the class. Otherwise we're making multiple heap allocations whenever the value is more the 64 bits. craig.topper: Can we std::move this into specific_intval constructor and then std::move it again in the class.
		shchenzAuthorUnsubmitted Done Reply Inline Actions Right. Thanks for pointing it out. shchenz: Right. Thanks for pointing it out.

/// Match a ConstantInt and bind to its value. This does not match		/// Match a ConstantInt and bind to its value. This does not match
/// ConstantInts wider than 64-bits.		/// ConstantInts wider than 64-bits.
inline bind_const_intval_ty m_ConstantInt(uint64_t &V) { return V; }		inline bind_const_intval_ty m_ConstantInt(uint64_t &V) { return V; }

/// Match a specified basic block value.		/// Match a specified basic block value.
struct specific_bbval {		struct specific_bbval {
BasicBlock *Val;		BasicBlock *Val;
▲ Show 20 Lines • Show All 1,265 Lines • Show Last 20 Lines

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

Show First 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	static bool foldAnyOrAllBitsSet(Instruction &I) {
Value *And = Builder.CreateAnd(MOps.Root, Mask);		Value *And = Builder.CreateAnd(MOps.Root, Mask);
Value *Cmp = MatchAllBitsSet ? Builder.CreateICmpEQ(And, Mask)		Value *Cmp = MatchAllBitsSet ? Builder.CreateICmpEQ(And, Mask)
: Builder.CreateIsNotNull(And);		: Builder.CreateIsNotNull(And);
Value *Zext = Builder.CreateZExt(Cmp, I.getType());		Value *Zext = Builder.CreateZExt(Cmp, I.getType());
I.replaceAllUsesWith(Zext);		I.replaceAllUsesWith(Zext);
return true;		return true;
}		}

		// Try to recognize below function as popcount intrinsic.
		// This is the "best" algorithm from
		// http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
		// Also used in TargetLowering::expandCTPOP().
		//
		// int popcount(unsigned int i) {
		// i = i - ((i >> 1) & 0x55555555);
		// i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
		// i = ((i + (i >> 4)) & 0x0F0F0F0F);
		// return (i * 0x01010101) >> 24;
		// }
		static bool tryToRecognizePopCount(Instruction &I) {
		if (I.getOpcode() != Instruction::LShr)
		return false;

		Type *Ty = I.getType();
		if (!Ty->isIntOrIntVectorTy())
		return false;
		xbolva00Unsubmitted Not Done Reply Inline Actions Type size - 8? xbolva00: Type size - 8?
		shchenzAuthorUnsubmitted Done Reply Inline Actions When type size is 8, we don't need the final `lshr`, so we can not do it in instcombine for opcode `lshr`, but we can still do it based on final `mul`. Currently I left it as a follow-up issue until a real world case found. Hope this is ok. shchenz: When type size is 8, we don't need the final `lshr`, so we can not do it in instcombine for…

		spatelUnsubmitted Done Reply Inline Actions This might actually be easier to read if generalized for more bitwidths. Can we use APInt and handle power-of-2 sizes from 8- to 128-bit? http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel spatel: This might actually be easier to read if generalized for more bitwidths. Can we use APInt and…
		spatelUnsubmitted Not Done Reply Inline Actions For reference, we know it's safe to do that transform more generally because the backend handles the expansion back to this pattern more generally: https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L5923 (That expansion is why it is safe to do this transform in IR without a TTI hook in the first place; we don't expect any regressions because we should reverse the transform.) spatel: For reference, we know it's safe to do that transform more generally because the backend…
		unsigned Len = Ty->getScalarSizeInBits();
		// FIXME: fix Len == 8 and other irregular type lengths.
		if (!(Len <= 128 && Len > 8 && Len % 8 == 0))
		return false;

		APInt Mask55 = APInt::getSplat(Len, APInt(8, 0x55));
		APInt Mask33 = APInt::getSplat(Len, APInt(8, 0x33));
		APInt Mask0F = APInt::getSplat(Len, APInt(8, 0x0F));
		APInt Mask01 = APInt::getSplat(Len, APInt(8, 0x01));
		APInt MaskShift = APInt(Len, Len - 8);

		Value *Op0 = I.getOperand(0);
		Value *Op1 = I.getOperand(1);
		Value *MulOp0;
		// Matching "(i * 0x01010101...) >> 24".
		if ((match(Op0, m_Mul(m_Value(MulOp0), m_SpecificInt(Mask01)))) &&
		match(Op1, m_SpecificInt(MaskShift))) {
		Value *ShiftOp0;
		// Matching "((i + (i >> 4)) & 0x0F0F0F0F...)".
		if (match(MulOp0, m_And(m_c_Add(m_LShr(m_Value(ShiftOp0), m_SpecificInt(4)),
		m_Deferred(ShiftOp0)),
		m_SpecificInt(Mask0F)))) {
		Value *AndOp0;
		// Matching "(i & 0x33333333...) + ((i >> 2) & 0x33333333...)".
		if (match(ShiftOp0,
		m_c_Add(m_And(m_Value(AndOp0), m_SpecificInt(Mask33)),
		xbolva00Unsubmitted Done Reply Inline Actions Recognized xbolva00: Recognized
		m_And(m_LShr(m_Deferred(AndOp0), m_SpecificInt(2)),
		m_SpecificInt(Mask33))))) {
		Value Root, SubOp1;
		// Matching "i - ((i >> 1) & 0x55555555...)".
		if (match(AndOp0, m_Sub(m_Value(Root), m_Value(SubOp1))) &&
		match(SubOp1, m_And(m_LShr(m_Specific(Root), m_SpecificInt(1)),
		m_SpecificInt(Mask55)))) {
		LLVM_DEBUG(dbgs() << "Recognized popcount intrinsic\n");
		IRBuilder<> Builder(&I);
		Function *Func = Intrinsic::getDeclaration(
		I.getModule(), Intrinsic::ctpop, I.getType());
		I.replaceAllUsesWith(Builder.CreateCall(Func, {Root}));
		return true;
		}
		}
		}
		}

		return false;
		}

/// This is the entry point for folds that could be implemented in regular		/// This is the entry point for folds that could be implemented in regular
/// InstCombine, but they are separated because they are not expected to		/// InstCombine, but they are separated because they are not expected to
/// occur frequently and/or have more than a constant-length pattern match.		/// occur frequently and/or have more than a constant-length pattern match.
static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {		static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {
bool MadeChange = false;		bool MadeChange = false;
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
// Ignore unreachable basic blocks.		// Ignore unreachable basic blocks.
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
continue;		continue;
// Do not delete instructions under here and invalidate the iterator.		// Do not delete instructions under here and invalidate the iterator.
// Walk the block backwards for efficiency. We're matching a chain of		// Walk the block backwards for efficiency. We're matching a chain of
// use->defs, so we're more likely to succeed by starting from the bottom.		// use->defs, so we're more likely to succeed by starting from the bottom.
// Also, we want to avoid matching partial patterns.		// Also, we want to avoid matching partial patterns.
// TODO: It would be more efficient if we removed dead instructions		// TODO: It would be more efficient if we removed dead instructions
// iteratively in this loop rather than waiting until the end.		// iteratively in this loop rather than waiting until the end.
for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {		for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {
MadeChange \|= foldAnyOrAllBitsSet(I);		MadeChange \|= foldAnyOrAllBitsSet(I);
MadeChange \|= foldGuardedRotateToFunnelShift(I);		MadeChange \|= foldGuardedRotateToFunnelShift(I);
		MadeChange \|= tryToRecognizePopCount(I);
}		}
}		}

// We're done with transforms, so remove dead instructions.		// We're done with transforms, so remove dead instructions.
if (MadeChange)		if (MadeChange)
for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
SimplifyInstructionsInBlock(&BB);		SimplifyInstructionsInBlock(&BB);

▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/Transforms/AggressiveInstCombine/popcount.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -O3 < %s -instcombine -S \| FileCheck %s

				craig.topperUnsubmitted Not Done Reply Inline Actions Why does this run the entire -O3 pipeline and not just the aggressive instcombine pass? craig.topper: Why does this run the entire -O3 pipeline and not just the aggressive instcombine pass?
				shchenzAuthorUnsubmitted Done Reply Inline Actions Thanks, done in rL374514 shchenz: Thanks, done in rL374514
				;int popcount8(unsigned char i) {
				; i = i - ((i >> 1) & 0x55);
				; i = (i & 0x33) + ((i >> 2) & 0x33);
				; i = ((i + (i >> 4)) & 0x0F);
				; return (i * 0x01010101);
				;}
				define signext i32 @popcount8(i8 zeroext %0) {
				; CHECK-LABEL: @popcount8(
				; CHECK-NEXT: [[TMP2:%.]] = lshr i8 [[TMP0:%.]], 1
				; CHECK-NEXT: [[TMP3:%.*]] = and i8 [[TMP2]], 85
				; CHECK-NEXT: [[TMP4:%.*]] = sub i8 [[TMP0]], [[TMP3]]
				; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[TMP4]], 51
				; CHECK-NEXT: [[TMP6:%.*]] = lshr i8 [[TMP4]], 2
				; CHECK-NEXT: [[TMP7:%.*]] = and i8 [[TMP6]], 51
				; CHECK-NEXT: [[TMP8:%.*]] = add nuw nsw i8 [[TMP7]], [[TMP5]]
				; CHECK-NEXT: [[TMP9:%.*]] = lshr i8 [[TMP8]], 4
				; CHECK-NEXT: [[TMP10:%.*]] = add nuw nsw i8 [[TMP9]], [[TMP8]]
				; CHECK-NEXT: [[TMP11:%.*]] = and i8 [[TMP10]], 15
				; CHECK-NEXT: [[TMP12:%.*]] = zext i8 [[TMP11]] to i32
				; CHECK-NEXT: ret i32 [[TMP12]]
				;
				%2 = lshr i8 %0, 1
				%3 = and i8 %2, 85
				%4 = sub i8 %0, %3
				%5 = and i8 %4, 51
				%6 = lshr i8 %4, 2
				%7 = and i8 %6, 51
				%8 = add nuw nsw i8 %7, %5
				%9 = lshr i8 %8, 4
				%10 = add nuw nsw i8 %9, %8
				%11 = and i8 %10, 15
				%12 = zext i8 %11 to i32
				ret i32 %12
				}

				;int popcount32(unsigned i) {
				; i = i - ((i >> 1) & 0x55555555);
				; i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
				; i = ((i + (i >> 4)) & 0x0F0F0F0F);
				; return (i * 0x01010101) >> 24;
				;}
				define signext i32 @popcount32(i32 zeroext %0) {
				; CHECK-LABEL: @popcount32(
				; CHECK-NEXT: [[TMP2:%.]] = tail call i32 @llvm.ctpop.i32(i32 [[TMP0:%.]]), !range !0
				; CHECK-NEXT: ret i32 [[TMP2]]
				;
				%2 = lshr i32 %0, 1
				%3 = and i32 %2, 1431655765
				%4 = sub i32 %0, %3
				%5 = and i32 %4, 858993459
				%6 = lshr i32 %4, 2
				%7 = and i32 %6, 858993459
				%8 = add nuw nsw i32 %7, %5
				%9 = lshr i32 %8, 4
				%10 = add nuw nsw i32 %9, %8
				%11 = and i32 %10, 252645135
				%12 = mul i32 %11, 16843009
				%13 = lshr i32 %12, 24
				ret i32 %13
				}

				;int popcount64(unsigned long long i) {
				; i = i - ((i >> 1) & 0x5555555555555555);
				; i = (i & 0x3333333333333333) + ((i >> 2) & 0x3333333333333333);
				; i = ((i + (i >> 4)) & 0x0F0F0F0F0F0F0F0F);
				; return (i * 0x0101010101010101) >> 56;
				;}
				define signext i32 @popcount64(i64 %0) {
				; CHECK-LABEL: @popcount64(
				; CHECK-NEXT: [[TMP2:%.]] = tail call i64 @llvm.ctpop.i64(i64 [[TMP0:%.]]), !range !1
				; CHECK-NEXT: [[TMP3:%.*]] = trunc i64 [[TMP2]] to i32
				; CHECK-NEXT: ret i32 [[TMP3]]
				;
				%2 = lshr i64 %0, 1
				%3 = and i64 %2, 6148914691236517205
				%4 = sub i64 %0, %3
				%5 = and i64 %4, 3689348814741910323
				%6 = lshr i64 %4, 2
				%7 = and i64 %6, 3689348814741910323
				%8 = add nuw nsw i64 %7, %5
				%9 = lshr i64 %8, 4
				%10 = add nuw nsw i64 %9, %8
				%11 = and i64 %10, 1085102592571150095
				%12 = mul i64 %11, 72340172838076673
				%13 = lshr i64 %12, 56
				%14 = trunc i64 %13 to i32
				ret i32 %14
				}

				;int popcount128(__uint128_t i) {
				; __uint128_t x = 0x5555555555555555;
				; x <<= 64;
				; x \|= 0x5555555555555555;
				; __uint128_t y = 0x3333333333333333;
				; y <<= 64;
				; y \|= 0x3333333333333333;
				; __uint128_t z = 0x0f0f0f0f0f0f0f0f;
				; z <<= 64;
				; z \|= 0x0f0f0f0f0f0f0f0f;
				; __uint128_t a = 0x0101010101010101;
				; a <<= 64;
				; a \|= 0x0101010101010101;
				; unsigned mask = 120;
				; i = i - ((i >> 1) & x);
				; i = (i & y) + ((i >> 2) & y);
				; i = ((i + (i >> 4)) & z);
				; return (i * a) >> mask;
				;}
				define signext i32 @popcount128(i128 %0) {
				; CHECK-LABEL: @popcount128(
				; CHECK-NEXT: [[TMP2:%.]] = tail call i128 @llvm.ctpop.i128(i128 [[TMP0:%.]]), !range !2
				; CHECK-NEXT: [[TMP3:%.*]] = trunc i128 [[TMP2]] to i32
				; CHECK-NEXT: ret i32 [[TMP3]]
				;
				%2 = lshr i128 %0, 1
				%3 = and i128 %2, 113427455640312821154458202477256070485
				%4 = sub i128 %0, %3
				%5 = and i128 %4, 68056473384187692692674921486353642291
				%6 = lshr i128 %4, 2
				%7 = and i128 %6, 68056473384187692692674921486353642291
				%8 = add nuw nsw i128 %7, %5
				%9 = lshr i128 %8, 4
				%10 = add nuw nsw i128 %9, %8
				%11 = and i128 %10, 20016609818878733144904388672456953615
				%12 = mul i128 %11, 1334440654591915542993625911497130241
				%13 = lshr i128 %12, 120
				%14 = trunc i128 %13 to i32
				ret i32 %14
				}

				;vector unsigned char popcount8vec(vector unsigned char i)
				;{
				; i = i - ((i>> 1) & 0x55);
				; i = (i & 0x33) + ((i >> 2) & 0x33);
				; i = ((i + (i >> 4)) & 0x0F);
				; return (i * 0x01);
				;}
				define <16 x i8> @popcount8vec(<16 x i8> %0) {
				; CHECK-LABEL: @popcount8vec(
				; CHECK-NEXT: [[TMP2:%.]] = lshr <16 x i8> [[TMP0:%.]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
				; CHECK-NEXT: [[TMP3:%.*]] = and <16 x i8> [[TMP2]], <i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85>
				; CHECK-NEXT: [[TMP4:%.*]] = sub <16 x i8> [[TMP0]], [[TMP3]]
				; CHECK-NEXT: [[TMP5:%.*]] = and <16 x i8> [[TMP4]], <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>
				; CHECK-NEXT: [[TMP6:%.*]] = lshr <16 x i8> [[TMP4]], <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>
				; CHECK-NEXT: [[TMP7:%.*]] = and <16 x i8> [[TMP6]], <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>
				; CHECK-NEXT: [[TMP8:%.*]] = add nuw nsw <16 x i8> [[TMP7]], [[TMP5]]
				; CHECK-NEXT: [[TMP9:%.*]] = lshr <16 x i8> [[TMP8]], <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
				; CHECK-NEXT: [[TMP10:%.*]] = add nuw nsw <16 x i8> [[TMP9]], [[TMP8]]
				; CHECK-NEXT: [[TMP11:%.*]] = and <16 x i8> [[TMP10]], <i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15>
				; CHECK-NEXT: ret <16 x i8> [[TMP11]]
				;
				%2 = lshr <16 x i8> %0, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
				%3 = and <16 x i8> %2, <i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85, i8 85>
				%4 = sub <16 x i8> %0, %3
				%5 = and <16 x i8> %4, <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>
				%6 = lshr <16 x i8> %4, <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>
				%7 = and <16 x i8> %6, <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>
				%8 = add nuw nsw <16 x i8> %7, %5
				%9 = lshr <16 x i8> %8, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
				%10 = add nuw nsw <16 x i8> %9, %8
				%11 = and <16 x i8> %10, <i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15>
				ret <16 x i8> %11
				}

				;vector unsigned int popcount32vec(vector unsigned int i)
				;{
				; i = i - ((i>> 1) & 0x55555555);
				; i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
				; i = ((i + (i >> 4)) & 0x0F0F0F0F);
				; return (i * 0x01010101) >> 24;
				;}
				define <4 x i32> @popcount32vec(<4 x i32> %0) {
				; CHECK-LABEL: @popcount32vec(
				; CHECK-NEXT: [[TMP2:%.]] = tail call <4 x i32> @llvm.ctpop.v4i32(<4 x i32> [[TMP0:%.]])
				; CHECK-NEXT: ret <4 x i32> [[TMP2]]
				;
				%2 = lshr <4 x i32> %0, <i32 1, i32 1, i32 1, i32 1>
				%3 = and <4 x i32> %2, <i32 1431655765, i32 1431655765, i32 1431655765, i32 1431655765>
				%4 = sub <4 x i32> %0, %3
				%5 = and <4 x i32> %4, <i32 858993459, i32 858993459, i32 858993459, i32 858993459>
				%6 = lshr <4 x i32> %4, <i32 2, i32 2, i32 2, i32 2>
				%7 = and <4 x i32> %6, <i32 858993459, i32 858993459, i32 858993459, i32 858993459>
				%8 = add nuw nsw <4 x i32> %7, %5
				%9 = lshr <4 x i32> %8, <i32 4, i32 4, i32 4, i32 4>
				%10 = add nuw nsw <4 x i32> %9, %8
				%11 = and <4 x i32> %10, <i32 252645135, i32 252645135, i32 252645135, i32 252645135>
				%12 = mul <4 x i32> %11, <i32 16843009, i32 16843009, i32 16843009, i32 16843009>
				%13 = lshr <4 x i32> %12, <i32 24, i32 24, i32 24, i32 24>
				ret <4 x i32> %13
				}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] recognize popcount implemented in hacker's delight.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 224539

llvm/include/llvm/IR/PatternMatch.h

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

llvm/test/Transforms/AggressiveInstCombine/popcount.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] recognize popcount implemented in hacker's delight.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 224539

llvm/include/llvm/IR/PatternMatch.h

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

llvm/test/Transforms/AggressiveInstCombine/popcount.ll

[InstCombine] recognize popcount implemented in hacker's delight.
ClosedPublic