This is an archive of the discontinued LLVM Phabricator instance.

[MC] Always encode instruction into SmallVector
ClosedPublic

Authored by aengelke on Mar 10 2023, 6:56 AM.

Details

Summary

All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream
to encode the instruction into a SmallVector. The raw_ostream however
incurs some overhead for the actual encoding.

This change allows an MCCodeEmitter to directly emit an instruction into
a SmallVector without using a raw_ostream and therefore allow for
performance improvments in encoding. A default path that uses existing
raw_ostream implementations is provided.

Diff Detail

Event Timeline

aengelke created this revision.Mar 10 2023, 6:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2023, 6:56 AM
aengelke requested review of this revision.Mar 10 2023, 6:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2023, 6:56 AM
aengelke updated this revision to Diff 504167.Mar 10 2023, 8:52 AM

Attempt to fix bolt

As I said on D145792 - it'd help us more if we could address any obviously missed perf in raw_svector_ostream first before trying a refactor like this.

I don't think there are "obviously missed" problems; IMHO the main problem is the abstraction itself, which for every write involves a function call (raw_ostream::write, unlikely to be inlined due to size and unlikely to be optimizable as the buffer pointers/mode are mutable members) and an indirect function call (write_impl, extremely unlikely to be devirtualized due to general code complexity). The impact of the latter could probably be reduced by buffering in raw_svector_ostream, but that'd be a substantial and massively breaking change with unknown (and hardly quantifiable) benefits. If you have any suggestions for more generally applicable approaches, I'd be happy to try them; but right now getting rid of the (apparently completely unneeded) abstraction appears to be the easiest, least intrusive, and most performant change.

MaskRay accepted this revision.Mar 10 2023, 11:30 AM

LGTM, but make sure @RKSimon is happy as well.

I don't think there are "obviously missed" problems; IMHO the main problem is the abstraction itself, which for every write involves a function call (raw_ostream::write, unlikely to be inlined due to size and unlikely to be optimizable as the buffer pointers/mode are mutable members) and an indirect function call (write_impl, extremely unlikely to be devirtualized due to general code complexity). The impact of the latter could probably be reduced by buffering in raw_svector_ostream, but that'd be a substantial and massively breaking change with unknown (and hardly quantifiable) benefits. If you have any suggestions for more generally applicable approaches, I'd be happy to try them; but right now getting rid of the (apparently completely unneeded) abstraction appears to be the easiest, least intrusive, and most performant change.

The change makes sense to me. It removes an unneeded abstraction and simplifies many call sites.
raw_ostream::write =(inlinable)=> flush_tied_then_write (unneeded TiedStream check) =(virtual function call)=> raw_svector_ostream::write_impl ==> SmallVector append(ItTy in_start, ItTy in_end) (range; less efficient then push_back).

Every step adds some overhead.

This revision is now accepted and ready to land.Mar 10 2023, 11:30 AM
Amir accepted this revision.Mar 10 2023, 5:36 PM

BOLT changes LGTM

@RKSimon Are you ok with these two patches or do you have any concerns/alternative ideas?

@RKSimon ping? We get a consistent 0.5%–1.5% compile-time improvement from these changes.

@RKSimon ping? We get a consistent 0.5%–1.5% compile-time improvement from these changes.

Sorry - I missed this entirely! LGTM

This revision was automatically updated to reflect the committed changes.
skan added a subscriber: skan.Apr 11 2023, 7:02 PM