Grouping instructions with the same high 48 bits and storing prefixes. Simple lookup improved to get exponentially faster instruction address lookups.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Example of improvement in memory usage
thread #1: tid = 42805 Raw trace size: 4096 KiB Total number of instructions: 900004 Total approximate memory usage: 4394.58 KiB Average memory usage per instruction: 5.00 bytes
where previously the same trace took ~12 KiB and 13 bytes per instruction
I did a first pass on this diff. I'm asking to refactor a bit the InstructionBlock classes so that they are smarter. Besides that, if you use IDs more ubiquitously and stop using instruction indices everywhere, everything becomes much simpler.
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
---|---|---|
112–135 | Let's rename it to InstructionsBlock or another similar name. An instruction pointer is actually another way to refer to an instruction address. Let's also make the size of the suffix dynamic, so that we can easily experiment with it later. We can use templates asking for a suffix type that must be unsigned. If we embed the type of the suffix in the template, then we can do the splitting inside this class instead of outside. If you use my proposal for the constructor, please move its implementation to the cpp file. Also, as this struct is not just a simple bag of data, let's make it a proper class. I renamed a few things to create a nicer API. First of all, I'm renaming the Append method to TryAppend, which receives the full load address and returns true if the append could happen, and false otherwise. Besides that, I'm asking for a new class called Instructions that contains all the instruction blocks and receives instruction ids, and then it's able to defer the job to the actual instruction block. | |
173–201 | all of these become simpler if you work directly with the id instead of the instruction index | |
260–263 | you can move these two to the new InstructionsBlock class | |
271 | if you convert this from instruction index -> TSC to instruction id -> TSC, then everything becomes simpler | |
274–276 | i have the impression that you don't need this anymore | |
279 | you can also make it instruction id -> error message |
I've tried most of these changes to see the effect it has, but in my opinion this adds quite a lot of code complexity for not enough benefit in terms of memory usage which was our goal. I think this will make the DecodedThread even more of a monolith class with single-use subclasses... are you sure we should continue with this?
We can still use instruction ID more widely though without having to turn instruction block into a class.
Let's rename it to InstructionsBlock or another similar name. An instruction pointer is actually another way to refer to an instruction address.
Let's also try to remove numbers from the comments, so that if we modify them, we don't need to update the comment. It's hard to update comments because the compiler will never complaint if the comment doesn't make sense anymore.
Let's also make the size of the suffix dynamic, so that we can easily experiment with it later. We can use templates asking for a suffix type that must be unsigned.
If we embed the type of the suffix in the template, then we can do the splitting inside this class instead of outside. If you use my proposal for the constructor, please move its implementation to the cpp file.
Also, as this struct is not just a simple bag of data, let's make it a proper class.
I renamed a few things to create a nicer API.
First of all, I'm renaming the Append method to TryAppend, which receives the full load address and returns true if the append could happen, and false otherwise.
Besides that, I'm asking for a new class called Instructions that contains all the instruction blocks and receives instruction ids, and then it's able to defer the job to the actual instruction block.