This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/IR/
-
llvm/
-
IR/
-
RuntimeLibcalls.def
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
LegalizeDAG.cpp
-
Target/ARM/
-
ARM/
-
ARMISelLowering.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
-
clz.ll

Differential D47917

[ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.
ClosedPublic

Authored by efriedma on Jun 7 2018, 4:01 PM.

Download Raw Diff

Details

Reviewers

spatel
deadalnix
t.p.northover
fhahn
javed.absar
dmgreen

Commits

rG96e3cd85bd2d: [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.
rL340458: [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.

Summary

The inline sequence is very long (about 70 bytes on Thumb1), so it's not a good idea to inline it, especially when optimizing for size.

Diff Detail

Repository: rL LLVM

Event Timeline

efriedma created this revision.Jun 7 2018, 4:01 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptJun 7 2018, 4:01 PM

Herald added subscribers: chrib, kristof.beyls. · View Herald Transcript

dmgreen added a subscriber: dmgreen.Jun 8 2018, 8:59 AM

I'm not too familiar with ARM/Thumb codegen. Add the extra RUN with the current output, so we can see the difference?

Should there also be a test to demonstrate the difference between regular codegen and codegen with the optsize/minsize attribute?

I'm not checking for optsize because I don't think it makes sense to inline even when optimizing for speed... although maybe that's not right. The current code for Thumb1 is something like the following (which is essentially computing popcount(nextpoweroftwo(x)-1)).

test:
        .fnstart
@ %bb.0:
        lsrs    r1, r0, #1
        orrs    r1, r0
        lsrs    r0, r1, #2
        orrs    r0, r1
        lsrs    r1, r0, #4
        orrs    r1, r0
        lsrs    r0, r1, #8
        orrs    r0, r1
        lsrs    r1, r0, #16
        orrs    r1, r0
        mvns    r0, r1
        lsrs    r1, r0, #1
        ldr     r2, .LCPI0_0
        ands    r2, r1
        subs    r0, r0, r2
        ldr     r1, .LCPI0_1
        lsrs    r2, r0, #2
        ands    r0, r1
        ands    r2, r1
        adds    r0, r0, r2
        lsrs    r1, r0, #4
        adds    r0, r0, r1
        ldr     r1, .LCPI0_2
        ands    r1, r0
        ldr     r0, .LCPI0_3
        muls    r0, r1, r0
        lsrs    r0, r0, #24
        bx      lr
        .p2align        2
@ %bb.1:
.LCPI0_0:
        .long   1431655765              @ 0x55555555
.LCPI0_1:
        .long   858993459               @ 0x33333333
.LCPI0_2:
        .long   252645135               @ 0xf0f0f0f
.LCPI0_3:
        .long   16843009                @ 0x1010101
.Lfunc_end0:
        .size   test, .Lfunc_end0-test

I like the idea I think. Should this be guarded by some sort of gnueabi though?

__clzsi2 is provided by libgcc/compiler-rt, so it should be generally available. I guess it might be possible to construct an environment where it isn't, but I'm not sure how.

So we (Arm Compiler 6) don't ship/compile against compiler-rt. At least not at the moment. I'm not sure why, it's been like that from before my time, we have just survived like that for a long while. I'm guessing our c library has always just filled in the gaps (at least the parts that we need).

I asked some people around here, Peter S suggested this probably didn't count as gnueabi, seeing as it works on many platforms (darwin, windows, etc). We probably have to just presume these rt functions are present, which I think makes sense from a compiler perspective. From reading the libgcc docs, they are always presumed to be present (even with nostdlib, afaict)

I can fix this up downstream, either by adding the function to our c library, or more likely by partially reverting this just in AC6 until we have these functions.

With that out the way, LGTM. This looks like a good codesize improvement.

This revision is now accepted and ready to land.Jun 13 2018, 8:21 AM

Closed by commit rL340458: [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available. (authored by efriedma). · Explain WhyAug 22 2018, 2:48 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

IR/

RuntimeLibcalls.def

3 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

15 lines

Target/

ARM/

ARMISelLowering.cpp

4 lines

test/

CodeGen/

ARM/

clz.ll

8 lines

Diff 162061

llvm/trunk/include/llvm/IR/RuntimeLibcalls.def

	Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	HANDLE_LIBCALL(SDIVREM_I128, nullptr)			HANDLE_LIBCALL(SDIVREM_I128, nullptr)
	HANDLE_LIBCALL(UDIVREM_I8, nullptr)			HANDLE_LIBCALL(UDIVREM_I8, nullptr)
	HANDLE_LIBCALL(UDIVREM_I16, nullptr)			HANDLE_LIBCALL(UDIVREM_I16, nullptr)
	HANDLE_LIBCALL(UDIVREM_I32, nullptr)			HANDLE_LIBCALL(UDIVREM_I32, nullptr)
	HANDLE_LIBCALL(UDIVREM_I64, nullptr)			HANDLE_LIBCALL(UDIVREM_I64, nullptr)
	HANDLE_LIBCALL(UDIVREM_I128, nullptr)			HANDLE_LIBCALL(UDIVREM_I128, nullptr)
	HANDLE_LIBCALL(NEG_I32, "__negsi2")			HANDLE_LIBCALL(NEG_I32, "__negsi2")
	HANDLE_LIBCALL(NEG_I64, "__negdi2")			HANDLE_LIBCALL(NEG_I64, "__negdi2")
				HANDLE_LIBCALL(CTLZ_I32, "__clzsi2")
				HANDLE_LIBCALL(CTLZ_I64, "__clzdi2")
				HANDLE_LIBCALL(CTLZ_I128, "__clzti2")

	// Floating-point			// Floating-point
	HANDLE_LIBCALL(ADD_F32, "__addsf3")			HANDLE_LIBCALL(ADD_F32, "__addsf3")
	HANDLE_LIBCALL(ADD_F64, "__adddf3")			HANDLE_LIBCALL(ADD_F64, "__adddf3")
	HANDLE_LIBCALL(ADD_F80, "__addxf3")			HANDLE_LIBCALL(ADD_F80, "__addxf3")
	HANDLE_LIBCALL(ADD_F128, "__addtf3")			HANDLE_LIBCALL(ADD_F128, "__addtf3")
	HANDLE_LIBCALL(ADD_PPCF128, "__gcc_qadd")			HANDLE_LIBCALL(ADD_PPCF128, "__gcc_qadd")
	HANDLE_LIBCALL(SUB_F32, "__subsf3")			HANDLE_LIBCALL(SUB_F32, "__subsf3")
	▲ Show 20 Lines • Show All 434 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 4,256 Lines • ▼ Show 20 Lines	case ISD::UDIVREM:
ExpandDivRemLibCall(Node, Results);		ExpandDivRemLibCall(Node, Results);
break;		break;
case ISD::MUL:		case ISD::MUL:
Results.push_back(ExpandIntLibCall(Node, false,		Results.push_back(ExpandIntLibCall(Node, false,
RTLIB::MUL_I8,		RTLIB::MUL_I8,
RTLIB::MUL_I16, RTLIB::MUL_I32,		RTLIB::MUL_I16, RTLIB::MUL_I32,
RTLIB::MUL_I64, RTLIB::MUL_I128));		RTLIB::MUL_I64, RTLIB::MUL_I128));
break;		break;
		case ISD::CTLZ_ZERO_UNDEF:
		switch (Node->getSimpleValueType(0).SimpleTy) {
		default:
		llvm_unreachable("LibCall explicitly requested, but not available");
		case MVT::i32:
		Results.push_back(ExpandLibCall(RTLIB::CTLZ_I32, Node, false));
		break;
		case MVT::i64:
		Results.push_back(ExpandLibCall(RTLIB::CTLZ_I64, Node, false));
		break;
		case MVT::i128:
		Results.push_back(ExpandLibCall(RTLIB::CTLZ_I128, Node, false));
		break;
		}
		break;
}		}

// Replace the original node with the legalized result.		// Replace the original node with the legalized result.
if (!Results.empty()) {		if (!Results.empty()) {
LLVM_DEBUG(dbgs() << "Successfully converted node to libcall\n");		LLVM_DEBUG(dbgs() << "Successfully converted node to libcall\n");
ReplaceNode(Node, Results.data());		ReplaceNode(Node, Results.data());
} else		} else
LLVM_DEBUG(dbgs() << "Could not convert node to libcall\n");		LLVM_DEBUG(dbgs() << "Could not convert node to libcall\n");
▲ Show 20 Lines • Show All 519 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 844 Lines • ▼ Show 20 Lines	ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,
// ARM does not have ROTL.		// ARM does not have ROTL.
setOperationAction(ISD::ROTL, MVT::i32, Expand);		setOperationAction(ISD::ROTL, MVT::i32, Expand);
for (MVT VT : MVT::vector_valuetypes()) {		for (MVT VT : MVT::vector_valuetypes()) {
setOperationAction(ISD::ROTL, VT, Expand);		setOperationAction(ISD::ROTL, VT, Expand);
setOperationAction(ISD::ROTR, VT, Expand);		setOperationAction(ISD::ROTR, VT, Expand);
}		}
setOperationAction(ISD::CTTZ, MVT::i32, Custom);		setOperationAction(ISD::CTTZ, MVT::i32, Custom);
setOperationAction(ISD::CTPOP, MVT::i32, Expand);		setOperationAction(ISD::CTPOP, MVT::i32, Expand);
if (!Subtarget->hasV5TOps() \|\| Subtarget->isThumb1Only())		if (!Subtarget->hasV5TOps() \|\| Subtarget->isThumb1Only()) {
setOperationAction(ISD::CTLZ, MVT::i32, Expand);		setOperationAction(ISD::CTLZ, MVT::i32, Expand);
		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, LibCall);
		}

// @llvm.readcyclecounter requires the Performance Monitors extension.		// @llvm.readcyclecounter requires the Performance Monitors extension.
// Default to the 0 expansion on unsupported platforms.		// Default to the 0 expansion on unsupported platforms.
// FIXME: Technically there are older ARM CPUs that have		// FIXME: Technically there are older ARM CPUs that have
// implementation-specific ways of obtaining this information.		// implementation-specific ways of obtaining this information.
if (Subtarget->hasPerfMon())		if (Subtarget->hasPerfMon())
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Custom);		setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Custom);

▲ Show 20 Lines • Show All 14,273 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/clz.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+v5t %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+v5t %s -o - \| FileCheck %s -check-prefixes=CHECK,INLINE
				; RUN: llc -mtriple=arm-eabi %s -o - \| FileCheck %s -check-prefixes=CHECK,LIBCALL

	declare i32 @llvm.ctlz.i32(i32, i1)			declare i32 @llvm.ctlz.i32(i32, i1)

	define i32 @test(i32 %x) {			define i32 @test(i32 %x) {
	; CHECK: test			; CHECK-LABEL: test
	; CHECK: clz r0, r0			; INLINE: clz r0, r0
				; LIBCALL: b __clzsi2
	%tmp.1 = call i32 @llvm.ctlz.i32( i32 %x, i1 true )			%tmp.1 = call i32 @llvm.ctlz.i32( i32 %x, i1 true )
	ret i32 %tmp.1			ret i32 %tmp.1
	}			}