This is an archive of the discontinued LLVM Phabricator instance.

Add a cache for DL.getTypeAllocSize() to BasicAA.
Needs ReviewPublic

Authored by jlebar on Sep 15 2022, 5:16 PM.

Download Raw Diff

Details

Reviewers

asbirlea
nikic

Summary

getTypeAllocSize is surprisingly expensive. In my (private, sorry)
testcase, we spend 400ms in getTypeAllocSize out of a total of 3s in
BasicAA. After this change, this goes to 0, and I see no measurable
overhead from the hashtable lookup. Therefore this is a >1.1x speedup
to BasicAA, in my test.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,040 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,060 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test
	60,050 ms	x64 debian > libFuzzer.libFuzzer::out-of-process-fuzz.test
	60,030 ms	x64 debian > libFuzzer.libFuzzer::value-profile-load.test

Event Timeline

jlebar created this revision.Sep 15 2022, 5:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 5:16 PM

Herald added subscribers: jeroen.dobbelaere, hiraditya. · View Herald Transcript

jlebar requested review of this revision.Sep 15 2022, 5:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 5:16 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B186984: Diff 460561.Sep 15 2022, 5:57 PM

Use structured binding for iterator.

jlebar added a child revision: D134006: Add an optional cache to computeKnownBits..Sep 15 2022, 8:06 PM

Harbormaster completed remote builds in B187023: Diff 460602.Sep 15 2022, 8:08 PM

nikic added a reviewer: nikic.Sep 16 2022, 12:55 AM

Compile-time on CTMark: http://llvm-compile-time-tracker.com/compare.php?from=71e52a125cb5f532192c40cf13692c18ede18cb4&to=14ba5930d47843050c2618b70c6cf6fc7d0fc66f&stat=instructions So not a great deal of impact end-to-end.

getTypeAllocSize() is indeed fairly expensive due to the ABI alignment calculation. I think a cache could generally make sense, though I wonder why it is BasicAA specific, and not part of DataLayout itself?

I would also be interested in whether the GEPs in your case are mostly constant offset GEPs. It's on my roadmap to canonicalize those to i8 GEPs, which would allow us to save on a lot of redundant offset calculation in both BasicAA and other places.

I think a cache could generally make sense, though I wonder why it is BasicAA specific, and not part of DataLayout itself?

I was hesitant to add a cache on DataLayout because

it's of unbounded size and lifetime (though I guess every Type that's created also permanently adds some memory usage in the LLVM context, so maybe this isn't an issue?). Whereas in here the lifetime is bounded by the lifetime of the BasicAAResult.
in DataLayout, it's hard to justify why we cache getTypeAllocSize but not any of the other properties.
this is clearly a win in BasicAA (in my benchmark) but like, who knows how people use DataLayout, maybe it's not a win for all ways one could use it.

That said I don't feel strongly! WDYT?

I would also be interested in whether the GEPs in your case are mostly constant offset GEPs. It's on my roadmap to canonicalize those to i8 GEPs, which would allow us to save on a lot of redundant offset calculation in both BasicAA and other places.

In my case I don't think they are. This is an XLA benchmark, so we tend to have 4D pointers where one or a few offsets are variable. Though I agree that canonicalization would help if they were all constant.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

BasicAliasAnalysis.h

12 lines

lib/