This is an archive of the discontinued LLVM Phabricator instance.

[Support] Make line-number cache robust against access patterns.
ClosedPublic

Authored by graydon on Mar 28 2018, 3:39 PM.

Details

Summary

The LLVM SourceMgr class (which is used indirectly by Swift, though not Clang)
has a routine for looking up line numbers of SMLocs. This routine uses a
shared, special-purpose cache that handles exactly one access pattern
efficiently: looking up the line number of an SMLoc that points into the same
buffer as the last query made to the SourceMgr, at a location in the buffer at
or ahead of the last query.

When this works it's fine, but when it fails it's catastrophic for performancer:
one recent out-of-order access from a Swift utility routine ran for tens of
seconds, spending 99% of its time repeatedly scanning buffers for '\n'.

This change removes the shared cache from the SourceMgr and installs a new
cache in each SrcBuffer. The per-SrcBuffer caches are also "full", in the sense
that rather than caching a single last-query pointer, they cache _all_ the
line-ending offsets, in a binary-searchable array, such that once it's
populated (on first access), all subsequent access patterns run at the same
speed.

Performance measurements I've done show this is actually a little bit faster on
real codebases (though only a couple fractions of a percent). Memory usage is
up by a few tens to hundreds of bytes per SrcBuffer that has a line lookup done
on it; I've attempted to minimize this by using dynamic selection of integer
sized when storing offset arrays. But the main motive here is to
make-impossible the cases we don't always see, that show up by surprise when
there is an out-of-order access pattern.

Diff Detail

Event Timeline

graydon created this revision.Mar 28 2018, 3:39 PM

That's potentially a lot more memory usage, since the various vectors never get freed. Have you measured what the effect on memory usage is on a large Swift project with diagnostics in many files?

That's potentially a lot more memory usage, since the various vectors never get freed. Have you measured what the effect on memory usage is on a large Swift project with diagnostics in many files?

I've not been able to measure any memory difference when building local projects here (rusage maxrss fluctuates a few tens of kb anyways due to goodness knows what nondeterminism). Keep in mind we only fill the vector on a lookup, so most buffers will never be populated. And those that do, assume a file is (say) 1000 lines long, it'll cost 8kb to index. Even if we indexed a thousand such files -- a diagnostic in every file of a large project! -- we'd only be talking 8mb.

I'll keep trying a few other cases and post back.

jordan_rose added a comment.EditedMar 29 2018, 10:03 AM

And those that do, assume a file is (say) 1000 lines long, it'll cost 8kb to index. Even if we indexed a thousand such files -- a diagnostic in every file of a large project! -- we'd only be talking 8mb.

I'm not sure 1000 lines is a reasonable assumption, but the orders of magnitude probably do still work out. Okay, I'm less concerned, and this looks okay.

Did a bit more precise measurement: on a medium-sized module (802 files, 200kloc) when we're doing -emit-module for a whole module (which turns out to map _every file_ in order to get doc-comment locations, when it's emitting module docs) the normal compilation uses 191MB RSS and these maps account for 3.3MB of that (1.7%). Largest map is 32KB, median is 4KB. Not trivial, but not huge.

However, in those cases it also represents a more-noticeable speedup: 271s -> 267s (and 925bn instructions -> 918bn).

Given that the majority of the files are quite small -- certainly smaller than 2^64 bits! -- I've made a variant that uses variable size offsets (8, 16, 32 or 64 bit) rather than pointers. That variant uses only 956KB for the indexes in that testcase (with 200kloc / 191MB RSS). I think that's about as memory-cheap as I can make it using this approach.

graydon updated this revision to Diff 140387.Mar 30 2018, 12:05 AM

Modification to use variable unit-size offset vectors rather than pointer vectors.

graydon updated this revision to Diff 140388.Mar 30 2018, 12:08 AM

Fix comments.

graydon edited the summary of this revision. (Show Details)Mar 30 2018, 12:11 AM
graydon edited the summary of this revision. (Show Details)

That seems about as clever as possible—anything more and it would definitely be overboard. Can you add tests with 255-, 256-, and 257-byte buffers, then? With and without newlines as the last character, and testing the just-past-the-end pointer in addition to something in-bounds?

There's commentary in lib/MC/MCParser/AsmParser.cpp about the ungraceful degradation in SourceMgr's cache. Would this help or simplify what AsmParser is doing?

graydon updated this revision to Diff 140714.Apr 2 2018, 5:23 PM
graydon edited the summary of this revision. (Show Details)

Update to add more tests around boundary conditions of buffer-map expansion.

There's commentary in lib/MC/MCParser/AsmParser.cpp about the ungraceful degradation in SourceMgr's cache. Would this help or simplify what AsmParser is doing?

I believe this should eliminate the behaviour those comments describe (and thus remove the need for the code), though I feel a bit under-qualified tinkering with the code there itself. Would you like me to try, or shall I leave it to the owners of that file?

There's commentary in lib/MC/MCParser/AsmParser.cpp about the ungraceful degradation in SourceMgr's cache. Would this help or simplify what AsmParser is doing?

I believe this should eliminate the behaviour those comments describe (and thus remove the need for the code), though I feel a bit under-qualified tinkering with the code there itself. Would you like me to try, or shall I leave it to the owners of that file?

If you could have a go at it, that would be great. It's fine to do it as a follow-up NFC patch. Or file a bug and cc me. It's not really my area either but I keep running across that comment and it has always bothered me, maybe enough to do something now.

graydon updated this revision to Diff 141239.Apr 5 2018, 4:43 PM
  • [AsmParser] Remove code that compensated for formerly-slow SrcMgr.FindLineNumber()

There's commentary in lib/MC/MCParser/AsmParser.cpp about the ungraceful degradation in SourceMgr's cache. Would this help or simplify what AsmParser is doing?

I believe this should eliminate the behaviour those comments describe (and thus remove the need for the code), though I feel a bit under-qualified tinkering with the code there itself. Would you like me to try, or shall I leave it to the owners of that file?

If you could have a go at it, that would be great. It's fine to do it as a follow-up NFC patch. Or file a bug and cc me. It's not really my area either but I keep running across that comment and it has always bothered me, maybe enough to do something now.

Ok, after a little staring it seemed clear it was just a small change. I made a synthetic test with 10,000 line-directives and built it with -c -g, both with and without the change, and only saw a 0.1% increase in instructions retired / 1ms increase in time (of 300ms). So I figure this is probably acceptable?

Removing code with effectively no performance or functionality penalty is a beautiful thing. That part LGTM.
@jordan_rose still needs to sign off, I think.

jordan_rose accepted this revision.Apr 6 2018, 9:52 AM

Ah, yes, Graydon's convinced me that the change will work and will not regress memory usage terribly, and the new tests look good.

This revision is now accepted and ready to land.Apr 6 2018, 9:52 AM
This revision was automatically updated to reflect the committed changes.