This is an archive of the discontinued LLVM Phabricator instance.

Memory intrinsic value profile optimization: Avoid divide by 0
ClosedPublic

Authored by tejohnson on Apr 27 2017, 4:24 PM.

Download Raw Diff

Details

Reviewers

Commits

rG51177295c485: Memory intrinsic value profile optimization: Avoid divide by 0
rL301645: Memory intrinsic value profile optimization: Avoid divide by 0

Summary

Skip memops if the total value profiled count is 0, we can't correctly
scale up the counts and there is no point anyway.

Diff Detail

Repository: rL LLVM

Event Timeline

tejohnson created this revision.Apr 27 2017, 4:24 PM

Harbormaster completed remote builds in B5957: Diff 97018.Apr 27 2017, 4:24 PM

davidxl added inline comments.Apr 27 2017, 5:13 PM

lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
877 ↗	(On Diff #97018)	Or change it to if (TotalCount < MemOpCountThreshold) return false; and move this check before the actual count check.

tejohnson added inline comments.Apr 27 2017, 5:39 PM

lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
877 ↗	(On Diff #97018)	I could move this up, but note that the counts will be scaled up by ActualCount/SavedTotalCount (where SavedTotalCount == TotalCount at this point). So it is possible for TotalCount < MemOpCountThreshold but not ActualCount, and not the scaled counts. So that check seems overly conservative (and will have an effect other than just preventing 0 divides).

davidxl added inline comments.Apr 27 2017, 6:59 PM

lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
877 ↗	(On Diff #97018)	The scale is actually to scale down for most the cases -- this is because BB's count will be updated after inlining, while value profiling total count will be updated. In other words, it won't cause conservative behavior for hot sites.

tejohnson added inline comments.Apr 28 2017, 6:24 AM

lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
877 ↗	(On Diff #97018)	I checked in one of our large apps, and there are over 44K memory intrinsics where we scale up the counts. So at the least, I would like to consider that change separately from this bugfix.

lgtm

This revision is now accepted and ready to land.Apr 28 2017, 7:32 AM

Closed by commit rL301645: Memory intrinsic value profile optimization: Avoid divide by 0 (authored by tejohnson). · Explain WhyApr 28 2017, 7:44 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Instrumentation/

IndirectCallPromotion.cpp

4 lines

test/

Transforms/

PGOProfile/

memop_size_opt_zero.ll

19 lines

Diff 97096

llvm/trunk/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp

Show First 20 Lines • Show All 866 Lines • ▼ Show 20 Lines	bool MemOPSizeOpt::perform(MemIntrinsic *MI) {
DEBUG(dbgs() << "Read one memory intrinsic profile with count " << ActualCount		DEBUG(dbgs() << "Read one memory intrinsic profile with count " << ActualCount
<< "\n");		<< "\n");
DEBUG(		DEBUG(
for (auto &VD		for (auto &VD
: VDs) { dbgs() << " (" << VD.Value << "," << VD.Count << ")\n"; });		: VDs) { dbgs() << " (" << VD.Value << "," << VD.Count << ")\n"; });

if (ActualCount < MemOPCountThreshold)		if (ActualCount < MemOPCountThreshold)
return false;		return false;
		// Skip if the total value profiled count is 0, in which case we can't
		// scale up the counts properly (and there is no profitable transformation).
		if (TotalCount == 0)
		return false;

TotalCount = ActualCount;		TotalCount = ActualCount;
if (MemOPScaleCount)		if (MemOPScaleCount)
DEBUG(dbgs() << "Scale counts: numerator = " << ActualCount		DEBUG(dbgs() << "Scale counts: numerator = " << ActualCount
<< " denominator = " << SavedTotalCount << "\n");		<< " denominator = " << SavedTotalCount << "\n");

// Keeping track of the count of the default case:		// Keeping track of the count of the default case:
uint64_t RemainCount = TotalCount;		uint64_t RemainCount = TotalCount;
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/PGOProfile/memop_size_opt_zero.ll

				; Test to ensure the pgo memop optimization pass doesn't try to scale
				; up a value profile with a 0 count, which would lead to divide by 0.
				; RUN: opt < %s -passes=pgo-memop-opt -pgo-memop-count-threshold=1 -S \| FileCheck %s --check-prefix=MEMOP_OPT
				; RUN: opt < %s -pgo-memop-opt -pgo-memop-count-threshold=1 -S \| FileCheck %s --check-prefix=MEMOP_OPT

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define void @foo(i8* %dst, i8* %src, i64 %conv) !prof !0 {
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dst, i8* %src, i64 %conv, i32 1, i1 false), !prof !1
				ret void
				}

				; MEMOP_OPT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dst, i8* %src, i64 %conv, i32 1, i1 false), !prof !1

				!0 = !{!"function_entry_count", i64 1}
				!1 = !{!"VP", i32 1, i64 0, i64 1, i64 0, i64 2, i64 0, i64 3, i64 0, i64 9, i64 0, i64 4, i64 0, i64 5, i64 0, i64 6, i64 0, i64 7, i64 0, i64 8, i64 0}

				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i32, i1)