Page MenuHomePhabricator

[RFC] Clang 64-bit source locations
Needs ReviewPublic

Authored by simon_tatham on Feb 22 2021, 9:18 AM.

Details

Summary

This patch is a draft implementation of 64-bit source locations in
Clang. The intent is to evaluate viability of using 64-bit source
locations.

The patch implements a compile-time switch for the bit width of source
locations.

The libclang ABI is unchanged, regardless of that compile-time switch.
So libclang clients will only be able to handle compilations whose
source locations all fit in 32 bits; internal clang SourceLocations
are bounds-checked before converting them to the libclang types, and
replaced with the 'null location' error indicator if the bounds check
fails.

This patch is a joint effort between Mikhail Maltsev and Simon Tatham.

Mikhail performed two benchmarks on a previous version of the patch to
evaluate the effects of the patch on memory consumption and
performance:

  1. LLVM+Clang release build:
    • Average memory usage during a parallel (32 thread) build increased by 8.333 ± 0.185 % (95 % confidence interval)
    • Peak memory usage increased by 8.319 ± 1.680 %
    • Total time increased by 0.669 ± 0.047 %
  2. Parsing and semantic analysis (i.e. -fsyntax-only) of a large C++ source file from the LLVM LNT test suite: https://github.com/llvm/llvm-test-suite/tree/main/MultiSource/Benchmarks/tramp3d-v4
    • Memory usage increased by 8.97 % (exact measurement)

The current version should improve on that somewhat, but no
measurements are available yet.

Diff Detail

Event Timeline

miyuki created this revision.Feb 22 2021, 9:18 AM
miyuki requested review of this revision.Feb 22 2021, 9:18 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 22 2021, 9:18 AM

Hi @miyuki,

A major thing worth noting is that 64-bit source locations will
require an ABI breakage in libclang. This patch changes the bit width
in libclang unconditionally, rather than making it configurable.

Is it possible to make the libclang change configurable as well? A libclang ABI breakage would be problematic for us at Apple. We support dynamically loading and using multiple libclang dylibs from different toolchains, which can be different clang versions.

Hi @miyuki,

A major thing worth noting is that 64-bit source locations will
require an ABI breakage in libclang. This patch changes the bit width
in libclang unconditionally, rather than making it configurable.

Is it possible to make the libclang change configurable as well? A libclang ABI breakage would be problematic for us at Apple. We support dynamically loading and using multiple libclang dylibs from different toolchains, which can be different clang versions.

Yes, this should be possible. Python bindings will need some adjustments, but nothing overly complicated.

miyuki updated this revision to Diff 325762.Feb 23 2021, 6:21 AM
miyuki edited the summary of this revision. (Show Details)

Fixed python bindings and formatting.

Thanks for doing this!

The 8-9% memory hit is better than I'd feared, but still seems uncomfortably large. I've left comments on a couple of places where I think we could substantially reduce this.
The performance regression seems large enough that people will notice, but perhaps not prohibitively huge. I assume that's all caused by increasing the size of the working set.

Can we avoid a libclang ABI break if we don't allow the use of 64-bit source locations for builds with 32-bit pointers?

It would be useful to get data points from more users.

clang/include/clang/AST/DeclBase.h
1763–1784

I think allowing these to grow is likely a bad tradeoff -- we're paying 8 bytes per declaration in order to make only ObjCContainerDecls smaller (actually, not even that -- there'll be 4 bytes of padding in an ObjCContainerDecl either way in 64-bit-SourceLocation mode). Can you try moving the SourceLocation out of ObjCContainerDeclBitfields and into ObjCContainerDecl and see if that makes a noticeable difference to the memory overhead of this patch?

clang/include/clang/AST/Stmt.h
1154

Oof, we're wasting a lot of space here. Lots of Stmt subclasses only need a couple of bytes of bitfields, but we're padding them all out to 16 because some of them store a SourceLocation. It'd be nice to estimate how much of that we could get back by moving source locations to the derived classes, if you're prepared to do the work there. (I suspect that would result in too many #ifdefs to be reasonable to check in, though.)

clang/include/clang/Analysis/ProgramPoint.h
90–111

Can we reasonably say that 32-bit clang builds don't support 64-bit SourceLocations, and keep using a void* (or uintptr_t) here?

clang/lib/Serialization/ASTReader.cpp
3893–3894

I think we should ensure the on-disk AST file format doesn't depend on the new macro. Going to 64 bits unconditionally here is probably fine.

Can we avoid a libclang ABI break if we don't allow the use of 64-bit source locations for builds with 32-bit pointers?

To @rsmith's point, the simplest option may be to avoid building libclang if 64-bit source locations are enabled.

Thanks for doing this!

The 8-9% memory hit is better than I'd feared, but still seems uncomfortably large. I've left comments on a couple of places where I think we could substantially reduce this.

Thanks, I'll try that.

Can we avoid a libclang ABI break if we don't allow the use of 64-bit source locations for builds with 32-bit pointers?

No, unfortunately in some structs libclang stores source locations in 'unsigned int' fields, e.g. CXToken. In CXSourceLocation and CXSourceRange some space can be saved by an extra indirection, so 64-bit locations could fit there, AFAICT.

simon_tatham commandeered this revision.Jun 17 2021, 1:24 AM
simon_tatham added a reviewer: miyuki.
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2021, 1:24 AM

@miyuki is working on other things at the moment, and I've picked this up.

This updated diff reverts the libclang ABI break completely. In the libclang ABI, there's still only 32 bits of space for a SourceLocation, and this is handled by bounds-checking on the clang side when converting into that format (and we return an invalid location if the bounds check fails). So any use of libclang that previously worked should still work, and ones that would previously have failed will now fail in a different way.

Perhaps in future we might introduce a secondary libclang ABI coexisting with the existing one, with different function names, and expanded data structures that can hold a full 64-bit SourceLocation? And then each client could switch over at their own convenience (or not at all).

I've addressed @rsmith's review suggestions by making a new pair of opaque structures called SourceLocation::LowBits and SourceLocation::OptionalHighBits, and a pair of functions to convert between a SourceLocation and a pair of those. So, in all the places where a SourceLocation was stored in one of those 'bits' unions, the union still contains a SourceLocation::LowBits containing 32 of the bits, and the derived class contains an OptionalHighBits which is zero size or another 32 bits depending on what size of SourceLocation we're compiling for.

The AST file format now uses a 64-bit field unconditionally for a source location. I'm not too familiar with this part of clang, but it looks as if AST files already have a versioning system based on the clang git commit id, so that incompatibility between this and the previous format will already be detected without me having to manually bump a format version number anywhere?

I haven't yet managed to repeat @miyuki's memory-usage analysis, so I can't say how much of the previous 9% loss is recovered by these changes. But I'll look at doing that next.

Incidentally, in passing I spotted what looks like an obvious bug in the libclang SourceLocation conversion code. Raised D104442 separately.

simon_tatham edited the summary of this revision. (Show Details)Jun 17 2021, 1:49 AM
stuij added a subscriber: stuij.Jun 17 2021, 5:40 AM

Thanks to @miyuki for repeating the previous benchmark with this version of the patch (and on the same machine as before, which was better than I could have done).

The revised results now have the memory usage increase (compared to 32-bit SourceLocation) in the region of 5.5% to 7.5% instead of 8% to 9%. In detail:

  • average memory usage during a clang build increased by 6.858 ± 0.260 %
  • peak memory usage during a clang build increased by 5.529 ± 1.641 %
  • memory usage during parsing of TraMP3d source file increased by 7.32 %

The change in speed between the previous version of the patch and this one is negligible (both values are within the uncertainty interval of the other one).

Hmmm. Two people have pointed out to me that my strategy of having a 32-bit SourceLocations::LowBits and an 0- or 32-bit SourceLocations::OptionalHighBits doesn't actually work, because an empty struct still takes at least 1 byte. So this version of the patch will still increase memory usage in the 32SL configuration, which is just what I was trying to avoid. Whoops.

@rsmith, do you have any thoughts on what would be an acceptable replacement strategy? You mentioned wanting to avoid actual #ifdefs all over the AST class hierarchy. Some possible alternatives:

We could define a family of macros: one for declaring a possible-SourceLocation in a branch of the Stmt bitfields union, one for declaring the same SourceLocation in a particular Stmt subclass, and a pair for the accessor functions that read and write whichever of those exists. Then there'd be slightly ugly macro calls all over Stmt.h and friends, but only one outright #ifdef, where the macros are defined.

(And then there are two options for how to define the macros in the 64SL case: either move the whole SourceLocation into the subclasses for speed, or keep half of it in the bitfields as now, for space. But that decision could be localized into the macro definitions, and easily changed.)

Alternatively, @miyuki suggests that my SourceLocation::OptionalHighBits could become an extra base class of some of the Stmt subclasses, so that empty base optimization would allow it to take up 0 bytes in the 32SL configuration.

The macro approach is the kind of thing I wouldn't mind doing in my own projects, but then, I have a strong stomach for macros :-) and I'd rather check before I do all the work to rewrite this patch in one of those styles, only to find out that you'd prefer another, or something I haven't even thought of.

This patch doesn't seem to have attracted much review attention in the last couple of weeks. On the theory that perhaps it's just too big and monolithic to review in one go, I've started to break it up into smaller pieces.

So far I've only covered the preparation stages, so I hope the following patches won't be too controversial:
D105490: Remove unused parameter from parseMSInlineAsm.
D105491: [clang] Use i64 for the !srcloc metadata on asm IR nodes.
D105492: [clang] Introduce SourceLocation::[U]IntType typedefs.
D105493: [clang] Change set type used for SourceLocation.
D105494: [clang] Introduce a union inside ProgramPoint.
D105495: [clang] Make negative getLocWithOffset widening-safe.
D105497: [clang] Serialize source locations as 64-bit in PCH.
D105498: [clang] Remove assumption about SourceLocation alignment.