This is an archive of the discontinued LLVM Phabricator instance.

[lldb][intelpt] Reduce trace memory usage by grouping instructions
AbandonedPublic

Authored by zrthxn on Apr 8 2022, 3:18 AM.

Details

Reviewers
wallace
jj10306
Summary

Grouping instructions with the same high 48 bits and storing prefixes. Simple lookup improved to get exponentially faster instruction address lookups.

Diff Detail

Event Timeline

zrthxn created this revision.Apr 8 2022, 3:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 3:18 AM
zrthxn requested review of this revision.Apr 8 2022, 3:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 3:18 AM
zrthxn edited the summary of this revision. (Show Details)Apr 8 2022, 3:19 AM
zrthxn added a comment.Apr 8 2022, 3:23 AM

Example of improvement in memory usage

thread #1: tid = 42805
  Raw trace size: 4096 KiB
  Total number of instructions: 900004
  Total approximate memory usage: 4394.58 KiB
  Average memory usage per instruction: 5.00 bytes

where previously the same trace took ~12 KiB and 13 bytes per instruction

wallace requested changes to this revision.Apr 11 2022, 5:11 PM

I did a first pass on this diff. I'm asking to refactor a bit the InstructionBlock classes so that they are smarter. Besides that, if you use IDs more ubiquitously and stop using instruction indices everywhere, everything becomes much simpler.

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
112–135

Let's rename it to InstructionsBlock or another similar name. An instruction pointer is actually another way to refer to an instruction address.
Let's also try to remove numbers from the comments, so that if we modify them, we don't need to update the comment. It's hard to update comments because the compiler will never complaint if the comment doesn't make sense anymore.

Let's also make the size of the suffix dynamic, so that we can easily experiment with it later. We can use templates asking for a suffix type that must be unsigned.

If we embed the type of the suffix in the template, then we can do the splitting inside this class instead of outside. If you use my proposal for the constructor, please move its implementation to the cpp file.

Also, as this struct is not just a simple bag of data, let's make it a proper class.

I renamed a few things to create a nicer API.

First of all, I'm renaming the Append method to TryAppend, which receives the full load address and returns true if the append could happen, and false otherwise.

Besides that, I'm asking for a new class called Instructions that contains all the instruction blocks and receives instruction ids, and then it's able to defer the job to the actual instruction block.

173–201

all of these become simpler if you work directly with the id instead of the instruction index

260–263

you can move these two to the new InstructionsBlock class

271

if you convert this from instruction index -> TSC to instruction id -> TSC, then everything becomes simpler

274–276

i have the impression that you don't need this anymore

279

you can also make it instruction id -> error message

This revision now requires changes to proceed.Apr 11 2022, 5:11 PM
zrthxn added a comment.EditedApr 18 2022, 2:55 AM

I've tried most of these changes to see the effect it has, but in my opinion this adds quite a lot of code complexity for not enough benefit in terms of memory usage which was our goal. I think this will make the DecodedThread even more of a monolith class with single-use subclasses... are you sure we should continue with this?
We can still use instruction ID more widely though without having to turn instruction block into a class.

let me think about it :)

zrthxn abandoned this revision.Jun 17 2022, 11:38 AM