This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][WIP] Avoid generating Record layouts for pointee types
Needs ReviewPublic

Authored by teemperor on Aug 19 2021, 2:07 PM.

Details

Summary

This is a WIP patch that tries to avoid creating a RecordLayout in Clang and instead just emit an opaque structure type
as if we only had a forward declarations. The main motivation for this patch is actually just supporting a use case in LLDB
where laying out types can be very expensive as it usually triggers parsing of debug information.

The changes in this patch can be summarized as:

  • CodeGenTypes::ConvertRecordDeclType (and related funcs) have a new parameter that tells us if we need the definition. It's currently only set to false for Clang pointer types.
  • There are a few new places where I added (temporary) calls to ConvertTypeForMem() on some pointee types. The reason is that the code after is usually creating GEP instructions where we need a non-opaque source type. We can't do this automatically from the GEP factory methods as they would need to know the clang::Type to automatically do this (and they only have the llvm::Type that can't be mapped back to a clang::Type from what I can see, but that might be incorrect).
  • A few test that needed to be adjusted as they relied on e.g. Foo *x to be enough to force Foo to be laid out/emitted.

There are still about a dozen more tests failing but from what I can see they all just need to be adjusted to force specific types to be emitted. I'll fix those up once there is consensus that this patch is going in the right direction.

Some benchmarks: I did a stage2 build of LLVM+Clang with my patch and those are the stats:

current ToT Clang:
2232421 - total amount of struct types created
  94911 - of which are opaque struct types

with this patch:
1715074 - total amount of struct types created (-23%)
 173127 - of which are opaque struct types (+82%)

I built a part of Clang (the last 300 source files in the compile_commands.json) and the average time on my 64 core machine changes like this (as per hyperfine):

Benchmark #1: parallel --progress -j63 -a ToT-clang
  Time (mean ± σ):     27.703 s ±  0.168 s    [User: 1434.619 s, System: 66.687 s]
  Range (min … max):   27.459 s … 27.891 s    10 runs
 
Benchmark #2: parallel --progress -j63 -a with-patch
  Time (mean ± σ):     27.439 s ±  0.111 s    [User: 1427.739 s, System: 66.220 s]
  Range (min … max):   27.300 s … 27.625 s    10 runs
 
Summary
  'parallel --progress -j63 -a with-patch' ran
    1.01 ± 0.01 times faster than 'parallel --progress -j63 -a ToT-clang'

Diff Detail

Event Timeline

teemperor created this revision.Aug 19 2021, 2:07 PM
teemperor requested review of this revision.Aug 19 2021, 2:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2021, 2:07 PM

I'm mostly putting this up to get some early feedback if anyone sees a problem with using opaque types here (e.g. it breaks some optimizations, etc.). If it does, it would still be nice if we could at least make this happen on some opt-in bases as it would be very beneficial for improving the performance of LLDB.

llvm/include/llvm/IR/Instructions.h
1176

)This change and the one below slipped in by accident, that was more of a debugging help that I wanted to put up as a separate patch.)

Notion seems plausible - though if there's some way to refactor so there's less need for manual insertion/maintenance of calls to ConvertTypeForMem that'd be good/important. I don't think there'd be anything fundamentally wrong with this approach - though checking some workloads to see if you can get bit identical results (eg: does some interesting binaries (including a clang selfhost) built with/without this patch compile to exactly the same file?) would probably be a good place to start to check the soundness.

I have no problem with breaking LLVM analyses that rely on record types being filled in when they don't need to be. I've been consistently telling people for years that they shouldn't be relying on IR types for things like that.

I would stick with the frontend terminology of a "complete" type, though. And you might consider adding a function to CGT which completes a type, both for clarity and so that you can fast-path the common case where you've already got a sized type.

You can probably rip out the UpdateCompletedType logic when you're done.