This is an archive of the discontinued LLVM Phabricator instance.

[intelpt] Refactoring instruction decoding for flexibility
ClosedPublic

Authored by zrthxn on Mar 23 2022, 12:18 AM.

Details

Summary

Refactored (but not yet tested) version of the instruction decoding code

Diff Detail

Event Timeline

zrthxn created this revision.Mar 23 2022, 12:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 12:19 AM
zrthxn requested review of this revision.Mar 23 2022, 12:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 12:19 AM
wallace added inline comments.Mar 23 2022, 11:16 AM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
181

you need to have something like

std::unordered_map<uint64_t, llvm::Error> m_errors;

that way, you'll be able to quickly look for the error associated with an instruction index. The IntelPTInstruction int his case, instead of storing the Error, can just store one bit of information has_error = true/false;

zrthxn updated this revision to Diff 417709.Mar 23 2022, 12:19 PM

Introduced unordered map for errors in DecodedThread

jj10306 requested changes to this revision.Mar 23 2022, 12:35 PM
jj10306 added inline comments.
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
181

nit: from https://llvm.org/docs/CodingStandards.html#c-standard-library
prefer llvm::DenseMap over std::map's unless there's a specific reason not to.

Don't forget to update CalculateApproximateMemoryUsage() as well! Also, besides for being inline with the coding standards I linked above, using llvm::DenseMap here has the actual advantage that it exposes its approximate size via getMemorySize(), whereas there is no easy way to get the size of std::map.

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
114
This revision now requires changes to proceed.Mar 23 2022, 12:35 PM
zrthxn added inline comments.Mar 23 2022, 12:50 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
181

DenseMap looks interesting we should try that.

Yes i will update the mem calculation and make it more accurate so some refactor will be needed. That'll be the next small patch once this works.

jj10306 added inline comments.Mar 23 2022, 12:56 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
147

Return a reference here to avoid potential expensive copy when returning.

Something else to consider is if we need/want an API exposing all the entire error map or if something like:
llvm::Error GetErrorByInstructionIndex(uint64_t insn_index) const that allows the caller to specify the key into the map would make more sense?
This also has the advantage that it hides the implementation detail of what data type is being used under the hood to represent the error map!
@wallace @zrthxn wdyt?

zrthxn added inline comments.Mar 23 2022, 1:00 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
147

Yea I did think about that. In my opinion returning a reference would be good since with that you can get the size and error at each index and we don't need to have functions for each operation which would have long clunky names making the code less readable...

wallace requested changes to this revision.Mar 23 2022, 5:15 PM
wallace added inline comments.
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
102–104

remove this

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
147

ahh!! DenseMap is for sure the way to go. The getMemorySize() method will help us a good deal. Thanks @jj10306

Besides that, we shouldn't expose publicly the internal representation of an object whose data structure might change. For example, sooner than later we might prefer to store the errors in a different way. So, to prevent breaking callers if we change this data structure, let's go for what Jakob proposes

llvm::Error GetErrorByInstructionIndex(uint64_t insn_index) const

which returns Error::success() in case the instruction is not an actual error

149

Omit the thread, because you are appending to a DecodedThread. In any case, it's redundant to mention where you are appending to

152

you should be able to omit the insn_index by using m_instructions.size() + m_errors.size() as the index

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
103–105

update the documentation

107

don't call it thread, call it decoded_thread. The thread is the one passed as parameter

107

use make_shared here and return a DecodedThreadSP (alias for shared_ptr<DecodedThread>). That way, you'll avoid having to invoke shared_from_this() below

119

remove this

145

This will create a copy of the IntelPTInstruction before storing it in the vector. Instead, you should use the same semantics as vector::emplace_back(), which uses paratemer packs/variadic templates. You can even rename Append to Emplace in this case

175–207

see below. We don't need to pass the raw_buffer size because it can be gotten from the buffer itself. Also, return the DecodedThreadSP directly

211–224

Now we don't need to return the raw_trace_size as an out parameter because we are creating the decoded thread deeper in the stack.

214–215

we don't need raw_trace_size to be an out parameter anymore. We can directly use it deep inside our decoding logic

249

remove raw_trace_size as an out parameter

255–256

the trace_size doesn't need to be passed

zrthxn marked 8 inline comments as done.Mar 23 2022, 11:07 PM
zrthxn added inline comments.
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
145

Yea I was doing that before, the idea was to send those variadic args to emplace_back but that wasnt working. I introduced this to avoid having 2 Appends since we already have 2 constructors which fulfill that requirement, and I can change this to std::move to avoid copies if thats a concern

wallace added inline comments.Mar 23 2022, 11:08 PM
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
145

Yes, use the same pattern than emolace_back uses with a parameter pack

zrthxn updated this revision to Diff 417832.Mar 23 2022, 11:11 PM

A few changes to remove redundant things

wallace added inline comments.Mar 24 2022, 8:30 AM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
102–104

Errors can only be copied, that's why we need to create a new instance of the error that is a copy of the original one. We can draw inspiration from IntelPTInstruction::ToError(), which can now be deleted

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
143–146

we can make the documentation clearer

150

this has to use the new parameter pack semantics, so that you can pass either {pt_insn} or {pt_insn, timestamp} without having to create copies of the IntelPTInstruction class

152
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
107

make this a shared pointer since the beginning. Use make_shared here

wallace added inline comments.Mar 24 2022, 8:30 AM
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
145

ideally you should be able to do decoded_thread.AppendInstruction({insn}) here

wallace added inline comments.Mar 24 2022, 8:33 AM
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
211

don't use expected. The DecodedThread object already can store errors. Just return a DecodedThreadSP and assign it a single error and return it. If you need to return the failed DecodedThread before you know the size of the buffer, just pass 0 as size

214

remove the *, because we will make DecodeLiveThread not return an Expected

zrthxn updated this revision to Diff 417969.Mar 24 2022, 10:10 AM

Error gettting method

wallace requested changes to this revision.Mar 24 2022, 10:15 AM

there are many comments from the previous versions of this diff that you didn't apply. Go through all of them first :)

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
85–87

delete it. We don't want to leave old code as comments

117

now that you changed this, could show share in the description of this diff the difference in byte size between the old and new code when tracing the same number of instructions?

143–146

you didn't apply these changes

147

or GetErrorByInstructionIndex

152

same here

This revision now requires changes to proceed.Mar 24 2022, 10:15 AM
zrthxn marked 17 inline comments as done.Mar 24 2022, 11:05 AM
zrthxn updated this revision to Diff 418000.Mar 24 2022, 11:47 AM

Incorporate other comments

zrthxn retitled this revision from [wip][intelpt] Refactoring instruction decoding for flexibility to [intelpt] Refactoring instruction decoding for flexibility.Mar 24 2022, 11:48 AM
zrthxn updated this revision to Diff 418154.EditedMar 25 2022, 1:23 AM

Resolved many runtime errors.
One small thing still remains, with post mortem decoder

zrthxn updated this revision to Diff 418249.Mar 25 2022, 9:10 AM

Refactor to use more templates and param packs

wallace requested changes to this revision.Mar 25 2022, 9:40 AM

much closer! I'm glad you are starting to understand the patterns we use for this kind of code

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
13

delete this

14

you don't need to import it. Maybe your c++ vscode extension has been autoimporting this one, in which case it's better to disable that feature.

128–131

in order to have correct formatting all along, you need to use git clang-format: https://llvm.org/docs/Contributing.html#format-patches

follow that guide. Whenever you are going to submit a patch, first run git clang-format and it will format your code correctly, so that you never again have to lose time doing that. It can even format comments

129–130

here you also need to ask for the size of the DenseMap

136–139

emplace will prevent unnecessary copies and also doesn't need you to pass a pair

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
43–46

delete commented code

72

Pass the error by const reference, because we don't modify it

135–137
146
151

remove this one. We need the version that accepts a parameter pack, as we discussed offline

153–154

same here

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
108–109

the magic of make_shared is that it uses parameter packs, so that it only constructs the object right where it'll store it, thus preventing unnecessary copies

175

don't pass the process, because we can get it from the thread_sp object

199–201

if the compiler complains mentioning that the Process pointer you are passing is const, you can do an unsafe cast that removes the const qualifier and write a comment here. I hope you don't need it anyway

206–207

use make_shared instead of shared_from_this. That will be much more performant

210

don't pass the process, it's not necessary anymore

214–218

same here

245–246

just return a direct DecodedThreadSP that holds any errors you might find here, same like LiveThreadDecoder::DoDecode()

256–261

this will become simpler once DecodeTraceFile returns directly a DecodedThreadSP

This revision now requires changes to proceed.Mar 25 2022, 9:40 AM
zrthxn marked 15 inline comments as done.Mar 25 2022, 10:17 AM

Before refactor

thread #1: tid = 37275
  Raw trace size: 4 KiB
  Total number of instructions: 21
  Total approximate memory usage: 5.38 KiB

After refactor

(lldb) thread trace dump info
Trace technology: intel-pt

thread #1: tid = 13690
  Raw trace size: 4 KiB
  Total number of instructions: 20
  Total approximate memory usage: 5.34 KiB
zrthxn marked 2 inline comments as done.Mar 25 2022, 12:37 PM
zrthxn updated this revision to Diff 418299.Mar 25 2022, 12:38 PM

Incoporate more feedback,
Only parameter pack isnt done yet

zrthxn marked 4 inline comments as done.Mar 25 2022, 1:03 PM
zrthxn updated this revision to Diff 418306.Mar 25 2022, 1:03 PM

Finalize diff

zrthxn updated this revision to Diff 418309.Mar 25 2022, 1:14 PM

Added average memory per instruction

jj10306 added inline comments.Mar 25 2022, 2:02 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
38

Feels kinda weird that the err param isn't being used? Not sure if it would be preferred but another option would be to make the default constructor construct an error instruction.
Passing the error is more clear that this constructor will create an error instruction, but then it feels a bit unnecessary to pass it since it's now unused. I'm a little torn so would be curious to hear others opinions (:

41

Is this boolean necessary? In the case of an error, the other two fields also indicate an error so this doesn't seem to add much information.
If we remove it, you can just update IsError to check the other two fields accordingly.

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
152

Should we be using std::forward() here? Same question for the AppendError function

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
77

nit: should we update this to use the error map? I don't think there's a significant difference performance wise, but the code would be a little cleaner imo and consistent with how GetError() works.

zrthxn added inline comments.Mar 25 2022, 2:22 PM
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
77

That would sort of look like this I think

if (m_decoded_thread_sp->GetErrorByInstructionIndex(m_pos).isA<ErrorSuccess>())
  return false;
else true;
zrthxn updated this revision to Diff 418327.Mar 25 2022, 2:28 PM

Updated tests

jj10306 added inline comments.Mar 25 2022, 3:28 PM
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
77

What about
return (bool)m_decoded_thread_sp->GetErrorByInstructionIndex(m_pos);
Another idea is to just remove the IsError() function entirely since calling GetError() tells you if it's an error. iirc all error checks actually use GetError except for the checks inside of TraceHtr which is soon going to be deleted by @wallace in new patches, so you could just change those couple instances of IsError and remove it all together. Definitely not necessary, just spitballing ideas (:
@wallace what do you think?

wallace added inline comments.Mar 25 2022, 3:56 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
25
101–107

let's improve this method a bit

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
85–87

good

151

just call it args or instruction_args instead of __args

152

the goal is to avoid creating explicitly a IntelPTInstruction, so you should be able to achieve something like

m_instructions.emplace_back(args...);
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
199–201

delete this

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
122–123

print it in bytes instead

wallace added inline comments.Mar 25 2022, 3:59 PM
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
77

I don't think that's a good idea. The problem is calling GetErrorByInstructionIndex is that you then have an Error object that you need to consume. There's also the cost of creating this object even if you just want to know if there's an error or not and you don't want to do anything with the actual error message. It's better then to create the Error object only when needed.

zrthxn marked 13 inline comments as done.Mar 25 2022, 10:12 PM
zrthxn added inline comments.
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
25

This is needed at DecodedThread.h:66, not here

zrthxn marked an inline comment as done.Mar 25 2022, 10:15 PM

Test program info, (llvm-project/lldb/test/API/commands/trace/intelpt-trace/a.out)

(lldb) thread trace dump info
Trace technology: intel-pt

thread #1: tid = 3842849
  Raw trace size: 4 KiB
  Total number of instructions: 22
  Total approximate memory usage: 6.38 KiB
  Average memory usage per instruction: 296 bytes
zrthxn updated this revision to Diff 418376.Mar 25 2022, 10:17 PM

Clean up and finalize

wallace commandeered this revision.Mar 26 2022, 10:53 AM
wallace edited reviewers, added: zrthxn; removed: wallace.
wallace updated this revision to Diff 418404.Mar 26 2022, 10:55 AM
  • make tests pass
  • simplified the error handling. In fact, using Error objects might be too expensive and potentially provides little

value in the API, because the user needs to consume the Error forcefully. Besides that, once we expose this python,
the error will be a plain string, therefore, I'm now storing the error as a string. Error won't be that frequent,
so the cost of that is okay.

wallace accepted this revision.Mar 26 2022, 11:03 AM
zrthxn commandeered this revision.Mar 26 2022, 11:04 AM
zrthxn removed a reviewer: zrthxn.
jj10306 accepted this revision.Mar 26 2022, 11:14 AM
This revision is now accepted and ready to land.Mar 26 2022, 11:14 AM