This is an archive of the discontinued LLVM Phabricator instance.

Add a cache for DL.getTypeAllocSize() to BasicAA.
Needs ReviewPublic

Authored by jlebar on Sep 15 2022, 5:16 PM.

Details

Reviewers
asbirlea
nikic
Summary

getTypeAllocSize is surprisingly expensive. In my (private, sorry)
testcase, we spend 400ms in getTypeAllocSize out of a total of 3s in
BasicAA. After this change, this goes to 0, and I see no measurable
overhead from the hashtable lookup. Therefore this is a >1.1x speedup
to BasicAA, in my test.

Diff Detail

Event Timeline

jlebar created this revision.Sep 15 2022, 5:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 5:16 PM
jlebar requested review of this revision.Sep 15 2022, 5:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 5:16 PM
jlebar updated this revision to Diff 460602.Sep 15 2022, 7:21 PM

Use structured binding for iterator.

nikic added a comment.Sep 16 2022, 2:18 AM

Compile-time on CTMark: http://llvm-compile-time-tracker.com/compare.php?from=71e52a125cb5f532192c40cf13692c18ede18cb4&to=14ba5930d47843050c2618b70c6cf6fc7d0fc66f&stat=instructions So not a great deal of impact end-to-end.

getTypeAllocSize() is indeed fairly expensive due to the ABI alignment calculation. I think a cache could generally make sense, though I wonder why it is BasicAA specific, and not part of DataLayout itself?

I would also be interested in whether the GEPs in your case are mostly constant offset GEPs. It's on my roadmap to canonicalize those to i8 GEPs, which would allow us to save on a lot of redundant offset calculation in both BasicAA and other places.

I think a cache could generally make sense, though I wonder why it is BasicAA specific, and not part of DataLayout itself?

I was hesitant to add a cache on DataLayout because

  • it's of unbounded size and lifetime (though I guess every Type that's created also permanently adds some memory usage in the LLVM context, so maybe this isn't an issue?). Whereas in here the lifetime is bounded by the lifetime of the BasicAAResult.
  • in DataLayout, it's hard to justify why we cache getTypeAllocSize but not any of the other properties.
  • this is clearly a win in BasicAA (in my benchmark) but like, who knows how people use DataLayout, maybe it's not a win for all ways one could use it.

That said I don't feel strongly! WDYT?

I would also be interested in whether the GEPs in your case are mostly constant offset GEPs. It's on my roadmap to canonicalize those to i8 GEPs, which would allow us to save on a lot of redundant offset calculation in both BasicAA and other places.

In my case I don't think they are. This is an XLA benchmark, so we tend to have 4D pointers where one or a few offsets are variable. Though I agree that canonicalization would help if they were all constant.