Page MenuHomePhabricator

[RFC] Clang 64-bit source locations
Needs ReviewPublic

Authored by miyuki on Feb 22 2021, 9:18 AM.

Details

Reviewers
rsmith
lebedev.ri
Summary

This patch is a draft implementation of 64-bit source locations in
Clang. The intent is to evaluate viability of using 64-bit source
locations.

The patch implements a compile-time switch for the bit width of source
locations.

A major thing worth noting is that 64-bit source locations will
require an ABI breakage in libclang. This draft patch changes
the bit width in libclang unconditionally, rather than making it
configurable.

In some places libclang uses void * to store source locations, this
obviously will not work on 32-bit hosts, I plan to address this issue
if the overall approach gets accepted.

I've performed two benchmarks to evaluate the effects of the patch on
memory consumption and performance:

  1. LLVM+Clang release build:
    • Average memory usage during a parallel (32 thread) build increased by 8.333 ± 0.185 % (95 % confidence interval)
    • Peak memory usage increased by 8.319 ± 1.680 %
    • Total time increased by 0.669 ± 0.047 %
  2. Parsing and semantic analysis (i.e. -fsyntax-only) of a large C++ source file from the LLVM LNT test suite: https://github.com/llvm/llvm-test-suite/tree/main/MultiSource/Benchmarks/tramp3d-v4
    • Memory usage increased by 8.97 % (exact measurement)

Diff Detail

Event Timeline

miyuki created this revision.Feb 22 2021, 9:18 AM
miyuki requested review of this revision.Feb 22 2021, 9:18 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 22 2021, 9:18 AM

Hi @miyuki,

A major thing worth noting is that 64-bit source locations will
require an ABI breakage in libclang. This patch changes the bit width
in libclang unconditionally, rather than making it configurable.

Is it possible to make the libclang change configurable as well? A libclang ABI breakage would be problematic for us at Apple. We support dynamically loading and using multiple libclang dylibs from different toolchains, which can be different clang versions.

Hi @miyuki,

A major thing worth noting is that 64-bit source locations will
require an ABI breakage in libclang. This patch changes the bit width
in libclang unconditionally, rather than making it configurable.

Is it possible to make the libclang change configurable as well? A libclang ABI breakage would be problematic for us at Apple. We support dynamically loading and using multiple libclang dylibs from different toolchains, which can be different clang versions.

Yes, this should be possible. Python bindings will need some adjustments, but nothing overly complicated.

miyuki updated this revision to Diff 325762.Feb 23 2021, 6:21 AM
miyuki edited the summary of this revision. (Show Details)

Fixed python bindings and formatting.

Thanks for doing this!

The 8-9% memory hit is better than I'd feared, but still seems uncomfortably large. I've left comments on a couple of places where I think we could substantially reduce this.
The performance regression seems large enough that people will notice, but perhaps not prohibitively huge. I assume that's all caused by increasing the size of the working set.

Can we avoid a libclang ABI break if we don't allow the use of 64-bit source locations for builds with 32-bit pointers?

It would be useful to get data points from more users.

clang/include/clang/AST/DeclBase.h
1763–1784

I think allowing these to grow is likely a bad tradeoff -- we're paying 8 bytes per declaration in order to make only ObjCContainerDecls smaller (actually, not even that -- there'll be 4 bytes of padding in an ObjCContainerDecl either way in 64-bit-SourceLocation mode). Can you try moving the SourceLocation out of ObjCContainerDeclBitfields and into ObjCContainerDecl and see if that makes a noticeable difference to the memory overhead of this patch?

clang/include/clang/AST/Stmt.h
1154

Oof, we're wasting a lot of space here. Lots of Stmt subclasses only need a couple of bytes of bitfields, but we're padding them all out to 16 because some of them store a SourceLocation. It'd be nice to estimate how much of that we could get back by moving source locations to the derived classes, if you're prepared to do the work there. (I suspect that would result in too many #ifdefs to be reasonable to check in, though.)

clang/include/clang/Analysis/ProgramPoint.h
90–111

Can we reasonably say that 32-bit clang builds don't support 64-bit SourceLocations, and keep using a void* (or uintptr_t) here?

clang/lib/Serialization/ASTReader.cpp
3893–3894

I think we should ensure the on-disk AST file format doesn't depend on the new macro. Going to 64 bits unconditionally here is probably fine.

Can we avoid a libclang ABI break if we don't allow the use of 64-bit source locations for builds with 32-bit pointers?

To @rsmith's point, the simplest option may be to avoid building libclang if 64-bit source locations are enabled.

Thanks for doing this!

The 8-9% memory hit is better than I'd feared, but still seems uncomfortably large. I've left comments on a couple of places where I think we could substantially reduce this.

Thanks, I'll try that.

Can we avoid a libclang ABI break if we don't allow the use of 64-bit source locations for builds with 32-bit pointers?

No, unfortunately in some structs libclang stores source locations in 'unsigned int' fields, e.g. CXToken. In CXSourceLocation and CXSourceRange some space can be saved by an extra indirection, so 64-bit locations could fit there, AFAICT.