This is an archive of the discontinued LLVM Phabricator instance.

[intelpt] Refactoring instruction decoding for flexibility
ClosedPublic

Authored by zrthxn on Mar 23 2022, 12:18 AM.

Details

Summary

Refactored (but not yet tested) version of the instruction decoding code

Diff Detail

Event Timeline

zrthxn created this revision.Mar 23 2022, 12:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 12:19 AM
zrthxn requested review of this revision.Mar 23 2022, 12:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 12:19 AM
wallace added inline comments.Mar 23 2022, 11:16 AM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
178

you need to have something like

std::unordered_map<uint64_t, llvm::Error> m_errors;

that way, you'll be able to quickly look for the error associated with an instruction index. The IntelPTInstruction int his case, instead of storing the Error, can just store one bit of information has_error = true/false;

zrthxn updated this revision to Diff 417709.Mar 23 2022, 12:19 PM

Introduced unordered map for errors in DecodedThread

jj10306 requested changes to this revision.Mar 23 2022, 12:35 PM
jj10306 added inline comments.
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
178

nit: from https://llvm.org/docs/CodingStandards.html#c-standard-library
prefer llvm::DenseMap over std::map's unless there's a specific reason not to.

Don't forget to update CalculateApproximateMemoryUsage() as well! Also, besides for being inline with the coding standards I linked above, using llvm::DenseMap here has the actual advantage that it exposes its approximate size via getMemorySize(), whereas there is no easy way to get the size of std::map.

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
120
This revision now requires changes to proceed.Mar 23 2022, 12:35 PM
zrthxn added inline comments.Mar 23 2022, 12:50 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
178

DenseMap looks interesting we should try that.

Yes i will update the mem calculation and make it more accurate so some refactor will be needed. That'll be the next small patch once this works.

jj10306 added inline comments.Mar 23 2022, 12:56 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
152

Return a reference here to avoid potential expensive copy when returning.

Something else to consider is if we need/want an API exposing all the entire error map or if something like:
llvm::Error GetErrorByInstructionIndex(uint64_t insn_index) const that allows the caller to specify the key into the map would make more sense?
This also has the advantage that it hides the implementation detail of what data type is being used under the hood to represent the error map!
@wallace @zrthxn wdyt?

zrthxn added inline comments.Mar 23 2022, 1:00 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
152

Yea I did think about that. In my opinion returning a reference would be good since with that you can get the size and error at each index and we don't need to have functions for each operation which would have long clunky names making the code less readable...

wallace requested changes to this revision.Mar 23 2022, 5:15 PM
wallace added inline comments.
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
105–107

remove this

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
152

ahh!! DenseMap is for sure the way to go. The getMemorySize() method will help us a good deal. Thanks @jj10306

Besides that, we shouldn't expose publicly the internal representation of an object whose data structure might change. For example, sooner than later we might prefer to store the errors in a different way. So, to prevent breaking callers if we change this data structure, let's go for what Jakob proposes

llvm::Error GetErrorByInstructionIndex(uint64_t insn_index) const

which returns Error::success() in case the instruction is not an actual error

154

Omit the thread, because you are appending to a DecodedThread. In any case, it's redundant to mention where you are appending to

157

you should be able to omit the insn_index by using m_instructions.size() + m_errors.size() as the index

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
102–104

update the documentation

110

don't call it thread, call it decoded_thread. The thread is the one passed as parameter

110

use make_shared here and return a DecodedThreadSP (alias for shared_ptr<DecodedThread>). That way, you'll avoid having to invoke shared_from_this() below

125

remove this

148

This will create a copy of the IntelPTInstruction before storing it in the vector. Instead, you should use the same semantics as vector::emplace_back(), which uses paratemer packs/variadic templates. You can even rename Append to Emplace in this case

184–215

see below. We don't need to pass the raw_buffer size because it can be gotten from the buffer itself. Also, return the DecodedThreadSP directly

228–241

Now we don't need to return the raw_trace_size as an out parameter because we are creating the decoded thread deeper in the stack.

242–243

we don't need raw_trace_size to be an out parameter anymore. We can directly use it deep inside our decoding logic

255

remove raw_trace_size as an out parameter

267–268

the trace_size doesn't need to be passed

zrthxn marked 8 inline comments as done.Mar 23 2022, 11:07 PM
zrthxn added inline comments.
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
148

Yea I was doing that before, the idea was to send those variadic args to emplace_back but that wasnt working. I introduced this to avoid having 2 Appends since we already have 2 constructors which fulfill that requirement, and I can change this to std::move to avoid copies if thats a concern

wallace added inline comments.Mar 23 2022, 11:08 PM
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
148

Yes, use the same pattern than emolace_back uses with a parameter pack

zrthxn updated this revision to Diff 417832.Mar 23 2022, 11:11 PM

A few changes to remove redundant things

wallace added inline comments.Mar 24 2022, 8:30 AM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
105–107

Errors can only be copied, that's why we need to create a new instance of the error that is a copy of the original one. We can draw inspiration from IntelPTInstruction::ToError(), which can now be deleted

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
148–151

we can make the documentation clearer

155

this has to use the new parameter pack semantics, so that you can pass either {pt_insn} or {pt_insn, timestamp} without having to create copies of the IntelPTInstruction class

157
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
110

make this a shared pointer since the beginning. Use make_shared here

wallace added inline comments.Mar 24 2022, 8:30 AM
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
148

ideally you should be able to do decoded_thread.AppendInstruction({insn}) here

wallace added inline comments.Mar 24 2022, 8:33 AM
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
228

don't use expected. The DecodedThread object already can store errors. Just return a DecodedThreadSP and assign it a single error and return it. If you need to return the failed DecodedThread before you know the size of the buffer, just pass 0 as size

242

remove the *, because we will make DecodeLiveThread not return an Expected

zrthxn updated this revision to Diff 417969.Mar 24 2022, 10:10 AM

Error gettting method

wallace requested changes to this revision.Mar 24 2022, 10:15 AM

there are many comments from the previous versions of this diff that you didn't apply. Go through all of them first :)

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
88–91

delete it. We don't want to leave old code as comments

122

now that you changed this, could show share in the description of this diff the difference in byte size between the old and new code when tracing the same number of instructions?

148–151

you didn't apply these changes

152

or GetErrorByInstructionIndex

157

same here

This revision now requires changes to proceed.Mar 24 2022, 10:15 AM
zrthxn marked 17 inline comments as done.Mar 24 2022, 11:05 AM
zrthxn updated this revision to Diff 418000.Mar 24 2022, 11:47 AM

Incorporate other comments

zrthxn retitled this revision from [wip][intelpt] Refactoring instruction decoding for flexibility to [intelpt] Refactoring instruction decoding for flexibility.Mar 24 2022, 11:48 AM
zrthxn updated this revision to Diff 418154.EditedMar 25 2022, 1:23 AM

Resolved many runtime errors.
One small thing still remains, with post mortem decoder

zrthxn updated this revision to Diff 418249.Mar 25 2022, 9:10 AM

Refactor to use more templates and param packs

wallace requested changes to this revision.Mar 25 2022, 9:40 AM

much closer! I'm glad you are starting to understand the patterns we use for this kind of code

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
13

delete this

14

you don't need to import it. Maybe your c++ vscode extension has been autoimporting this one, in which case it's better to disable that feature.

146–149

emplace will prevent unnecessary copies and also doesn't need you to pass a pair

152–153

in order to have correct formatting all along, you need to use git clang-format: https://llvm.org/docs/Contributing.html#format-patches

follow that guide. Whenever you are going to submit a patch, first run git clang-format and it will format your code correctly, so that you never again have to lose time doing that. It can even format comments

153–154

here you also need to ask for the size of the DenseMap

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
42–45

delete commented code

72

Pass the error by const reference, because we don't modify it

142
151
156

remove this one. We need the version that accepts a parameter pack, as we discussed offline

158–159

same here

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
111–112

the magic of make_shared is that it uses parameter packs, so that it only constructs the object right where it'll store it, thus preventing unnecessary copies

182

don't pass the process, because we can get it from the thread_sp object

205

if the compiler complains mentioning that the Process pointer you are passing is const, you can do an unsafe cast that removes the const qualifier and write a comment here. I hope you don't need it anyway

236

use make_shared instead of shared_from_this. That will be much more performant

240

don't pass the process, it's not necessary anymore

244–248

same here

251–258

just return a direct DecodedThreadSP that holds any errors you might find here, same like LiveThreadDecoder::DoDecode()

270–275

this will become simpler once DecodeTraceFile returns directly a DecodedThreadSP

This revision now requires changes to proceed.Mar 25 2022, 9:40 AM
zrthxn marked 15 inline comments as done.Mar 25 2022, 10:17 AM

Before refactor

thread #1: tid = 37275
  Raw trace size: 4 KiB
  Total number of instructions: 21
  Total approximate memory usage: 5.38 KiB

After refactor

(lldb) thread trace dump info
Trace technology: intel-pt

thread #1: tid = 13690
  Raw trace size: 4 KiB
  Total number of instructions: 20
  Total approximate memory usage: 5.34 KiB
zrthxn marked 2 inline comments as done.Mar 25 2022, 12:37 PM
zrthxn updated this revision to Diff 418299.Mar 25 2022, 12:38 PM

Incoporate more feedback,
Only parameter pack isnt done yet

zrthxn marked 4 inline comments as done.Mar 25 2022, 1:03 PM
zrthxn updated this revision to Diff 418306.Mar 25 2022, 1:03 PM

Finalize diff

zrthxn updated this revision to Diff 418309.Mar 25 2022, 1:14 PM

Added average memory per instruction

jj10306 added inline comments.Mar 25 2022, 2:02 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
39–43

Feels kinda weird that the err param isn't being used? Not sure if it would be preferred but another option would be to make the default constructor construct an error instruction.
Passing the error is more clear that this constructor will create an error instruction, but then it feels a bit unnecessary to pass it since it's now unused. I'm a little torn so would be curious to hear others opinions (:

46

Is this boolean necessary? In the case of an error, the other two fields also indicate an error so this doesn't seem to add much information.
If we remove it, you can just update IsError to check the other two fields accordingly.

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
157

Should we be using std::forward() here? Same question for the AppendError function

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
77

nit: should we update this to use the error map? I don't think there's a significant difference performance wise, but the code would be a little cleaner imo and consistent with how GetError() works.

zrthxn added inline comments.Mar 25 2022, 2:22 PM
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
77

That would sort of look like this I think

if (m_decoded_thread_sp->GetErrorByInstructionIndex(m_pos).isA<ErrorSuccess>())
  return false;
else true;
zrthxn updated this revision to Diff 418327.Mar 25 2022, 2:28 PM

Updated tests

jj10306 added inline comments.Mar 25 2022, 3:28 PM
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
77

What about
return (bool)m_decoded_thread_sp->GetErrorByInstructionIndex(m_pos);
Another idea is to just remove the IsError() function entirely since calling GetError() tells you if it's an error. iirc all error checks actually use GetError except for the checks inside of TraceHtr which is soon going to be deleted by @wallace in new patches, so you could just change those couple instances of IsError and remove it all together. Definitely not necessary, just spitballing ideas (:
@wallace what do you think?

wallace added inline comments.Mar 25 2022, 3:56 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
26
118–120

let's improve this method a bit

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
88–92

good

156

just call it args or instruction_args instead of __args

157

the goal is to avoid creating explicitly a IntelPTInstruction, so you should be able to achieve something like

m_instructions.emplace_back(args...);
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
205

delete this

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
122–123 ↗(On Diff #418309)

print it in bytes instead

wallace added inline comments.Mar 25 2022, 3:59 PM
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
77

I don't think that's a good idea. The problem is calling GetErrorByInstructionIndex is that you then have an Error object that you need to consume. There's also the cost of creating this object even if you just want to know if there's an error or not and you don't want to do anything with the actual error message. It's better then to create the Error object only when needed.

zrthxn marked 13 inline comments as done.Mar 25 2022, 10:12 PM
zrthxn added inline comments.
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
26

This is needed at DecodedThread.h:66, not here

zrthxn marked an inline comment as done.Mar 25 2022, 10:15 PM

Test program info, (llvm-project/lldb/test/API/commands/trace/intelpt-trace/a.out)

(lldb) thread trace dump info
Trace technology: intel-pt

thread #1: tid = 3842849
  Raw trace size: 4 KiB
  Total number of instructions: 22
  Total approximate memory usage: 6.38 KiB
  Average memory usage per instruction: 296 bytes
zrthxn updated this revision to Diff 418376.Mar 25 2022, 10:17 PM

Clean up and finalize

wallace commandeered this revision.Mar 26 2022, 10:53 AM
wallace edited reviewers, added: zrthxn; removed: wallace.
wallace updated this revision to Diff 418404.Mar 26 2022, 10:55 AM
  • make tests pass
  • simplified the error handling. In fact, using Error objects might be too expensive and potentially provides little

value in the API, because the user needs to consume the Error forcefully. Besides that, once we expose this python,
the error will be a plain string, therefore, I'm now storing the error as a string. Error won't be that frequent,
so the cost of that is okay.

wallace accepted this revision.Mar 26 2022, 11:03 AM
zrthxn commandeered this revision.Mar 26 2022, 11:04 AM
zrthxn removed a reviewer: zrthxn.
jj10306 accepted this revision.Mar 26 2022, 11:14 AM
This revision is now accepted and ready to land.Mar 26 2022, 11:14 AM