Currently on PowerPC, llvm does not expand calls to memcmp as it does for memcpy, memmove. When memcmp size and alignment info of the two sources being compared is known at compile time, we can do an inline expansion rather than calling the library routine.
The expansion is implemented at the IR level in CodeGenPrepare.cpp where other intrinsic calls are also expanded in bool CodeGenPrepare::optimizeCallInst(CallInst *CI, bool& ModifiedDT). It is only enabled by default for PowerPC, and other targets may enable it by returning true for expandMemCmp().
This patch expands memcmp by using a MaxLoadSize set by the target. For PowerPC, this is set to loading 8 bytes at a time. The class MemCmpExpansion sets up the basic block structure required for the expansion and does the inline expansion through the member function getMemCmpExpansion which returns the final value to replace the memcmp call instruction with. The expansion works by loading the number of bytes specified by LoadSize from each source of the memcmp parameters. It then subtracts and compares to zero to see if the values differ. If a difference is found, it branches with an early exit to the ResultBlock for calculating which source was larger and returning the correct result (Src1 > Src2 ? 1 : -1). Otherwise, it falls through to the either the next LoadCmpBlock or the EndBlock if this is the last LoadCmpBlock. It also adds a special case for when the memcmp result is used in an equality with 0 by skipping the result calculation and returning either 0 or 1 depending on if a difference is found. In this special case, there is also a command line option ("memcmp-num-loads-per-block") that can be used to place more loads per block. For example with 2 loads per block, this builds the basic block with an IR sequence like:
load a, load b, load c, load d
xor x1 = a^b, xor x2 = c^d
or o1 = x1 | x2