Refactored (but not yet tested) version of the instruction decoding code
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
---|---|---|
181 | you need to have something like std::unordered_map<uint64_t, llvm::Error> m_errors; that way, you'll be able to quickly look for the error associated with an instruction index. The IntelPTInstruction int his case, instead of storing the Error, can just store one bit of information has_error = true/false; |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
---|---|---|
181 | nit: from https://llvm.org/docs/CodingStandards.html#c-standard-library Don't forget to update CalculateApproximateMemoryUsage() as well! Also, besides for being inline with the coding standards I linked above, using llvm::DenseMap here has the actual advantage that it exposes its approximate size via getMemorySize(), whereas there is no easy way to get the size of std::map. | |
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp | ||
114 |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
---|---|---|
181 | DenseMap looks interesting we should try that. Yes i will update the mem calculation and make it more accurate so some refactor will be needed. That'll be the next small patch once this works. |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
---|---|---|
147 | Return a reference here to avoid potential expensive copy when returning. Something else to consider is if we need/want an API exposing all the entire error map or if something like: |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
---|---|---|
147 | Yea I did think about that. In my opinion returning a reference would be good since with that you can get the size and error at each index and we don't need to have functions for each operation which would have long clunky names making the code less readable... |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp | ||
---|---|---|
102–104 | remove this | |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
147 | ahh!! DenseMap is for sure the way to go. The getMemorySize() method will help us a good deal. Thanks @jj10306 Besides that, we shouldn't expose publicly the internal representation of an object whose data structure might change. For example, sooner than later we might prefer to store the errors in a different way. So, to prevent breaking callers if we change this data structure, let's go for what Jakob proposes llvm::Error GetErrorByInstructionIndex(uint64_t insn_index) const which returns Error::success() in case the instruction is not an actual error | |
149 | Omit the thread, because you are appending to a DecodedThread. In any case, it's redundant to mention where you are appending to | |
152 | you should be able to omit the insn_index by using m_instructions.size() + m_errors.size() as the index | |
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp | ||
103–105 | update the documentation | |
107 | don't call it thread, call it decoded_thread. The thread is the one passed as parameter | |
107 | use make_shared here and return a DecodedThreadSP (alias for shared_ptr<DecodedThread>). That way, you'll avoid having to invoke shared_from_this() below | |
119 | remove this | |
145 | This will create a copy of the IntelPTInstruction before storing it in the vector. Instead, you should use the same semantics as vector::emplace_back(), which uses paratemer packs/variadic templates. You can even rename Append to Emplace in this case | |
175–207 | see below. We don't need to pass the raw_buffer size because it can be gotten from the buffer itself. Also, return the DecodedThreadSP directly | |
211–224 | Now we don't need to return the raw_trace_size as an out parameter because we are creating the decoded thread deeper in the stack. | |
214–215 | we don't need raw_trace_size to be an out parameter anymore. We can directly use it deep inside our decoding logic | |
249 | remove raw_trace_size as an out parameter | |
255–256 | the trace_size doesn't need to be passed |
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp | ||
---|---|---|
145 | Yea I was doing that before, the idea was to send those variadic args to emplace_back but that wasnt working. I introduced this to avoid having 2 Appends since we already have 2 constructors which fulfill that requirement, and I can change this to std::move to avoid copies if thats a concern |
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp | ||
---|---|---|
145 | Yes, use the same pattern than emolace_back uses with a parameter pack |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp | ||
---|---|---|
102–104 | Errors can only be copied, that's why we need to create a new instance of the error that is a copy of the original one. We can draw inspiration from IntelPTInstruction::ToError(), which can now be deleted | |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
143–146 | we can make the documentation clearer | |
150 | this has to use the new parameter pack semantics, so that you can pass either {pt_insn} or {pt_insn, timestamp} without having to create copies of the IntelPTInstruction class | |
152 | ||
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp | ||
107 | make this a shared pointer since the beginning. Use make_shared here |
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp | ||
---|---|---|
145 | ideally you should be able to do decoded_thread.AppendInstruction({insn}) here |
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp | ||
---|---|---|
211 | don't use expected. The DecodedThread object already can store errors. Just return a DecodedThreadSP and assign it a single error and return it. If you need to return the failed DecodedThread before you know the size of the buffer, just pass 0 as size | |
214 | remove the *, because we will make DecodeLiveThread not return an Expected |
there are many comments from the previous versions of this diff that you didn't apply. Go through all of them first :)
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
---|---|---|
85–87 | delete it. We don't want to leave old code as comments | |
117 | now that you changed this, could show share in the description of this diff the difference in byte size between the old and new code when tracing the same number of instructions? | |
143–146 | you didn't apply these changes | |
147 | or GetErrorByInstructionIndex | |
152 | same here |
Resolved many runtime errors.
One small thing still remains, with post mortem decoder
much closer! I'm glad you are starting to understand the patterns we use for this kind of code
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp | ||
---|---|---|
13 | delete this | |
14 | you don't need to import it. Maybe your c++ vscode extension has been autoimporting this one, in which case it's better to disable that feature. | |
128–131 | in order to have correct formatting all along, you need to use git clang-format: https://llvm.org/docs/Contributing.html#format-patches follow that guide. Whenever you are going to submit a patch, first run git clang-format and it will format your code correctly, so that you never again have to lose time doing that. It can even format comments | |
129–130 | here you also need to ask for the size of the DenseMap | |
136–139 | emplace will prevent unnecessary copies and also doesn't need you to pass a pair | |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
43–46 | delete commented code | |
72 | Pass the error by const reference, because we don't modify it | |
135–137 | ||
146 | ||
151 | remove this one. We need the version that accepts a parameter pack, as we discussed offline | |
153–154 | same here | |
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp | ||
108–109 | the magic of make_shared is that it uses parameter packs, so that it only constructs the object right where it'll store it, thus preventing unnecessary copies | |
175 | don't pass the process, because we can get it from the thread_sp object | |
199–201 | if the compiler complains mentioning that the Process pointer you are passing is const, you can do an unsafe cast that removes the const qualifier and write a comment here. I hope you don't need it anyway | |
206–207 | use make_shared instead of shared_from_this. That will be much more performant | |
210 | don't pass the process, it's not necessary anymore | |
214–218 | same here | |
245–246 | just return a direct DecodedThreadSP that holds any errors you might find here, same like LiveThreadDecoder::DoDecode() | |
256–261 | this will become simpler once DecodeTraceFile returns directly a DecodedThreadSP |
Before refactor
thread #1: tid = 37275 Raw trace size: 4 KiB Total number of instructions: 21 Total approximate memory usage: 5.38 KiB
After refactor
(lldb) thread trace dump info Trace technology: intel-pt thread #1: tid = 13690 Raw trace size: 4 KiB Total number of instructions: 20 Total approximate memory usage: 5.34 KiB
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp | ||
---|---|---|
38 | Feels kinda weird that the err param isn't being used? Not sure if it would be preferred but another option would be to make the default constructor construct an error instruction. | |
41 | Is this boolean necessary? In the case of an error, the other two fields also indicate an error so this doesn't seem to add much information. | |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
152 | Should we be using std::forward() here? Same question for the AppendError function | |
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp | ||
77 | nit: should we update this to use the error map? I don't think there's a significant difference performance wise, but the code would be a little cleaner imo and consistent with how GetError() works. |
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp | ||
---|---|---|
77 | That would sort of look like this I think if (m_decoded_thread_sp->GetErrorByInstructionIndex(m_pos).isA<ErrorSuccess>()) return false; else true; |
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp | ||
---|---|---|
77 | What about |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp | ||
---|---|---|
25 | ||
101–107 | let's improve this method a bit | |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h | ||
85–87 | good | |
151 | just call it args or instruction_args instead of __args | |
152 | the goal is to avoid creating explicitly a IntelPTInstruction, so you should be able to achieve something like m_instructions.emplace_back(args...); | |
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp | ||
199–201 | delete this | |
lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp | ||
122–123 | print it in bytes instead |
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp | ||
---|---|---|
77 | I don't think that's a good idea. The problem is calling GetErrorByInstructionIndex is that you then have an Error object that you need to consume. There's also the cost of creating this object even if you just want to know if there's an error or not and you don't want to do anything with the actual error message. It's better then to create the Error object only when needed. |
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp | ||
---|---|---|
25 | This is needed at DecodedThread.h:66, not here |
Test program info, (llvm-project/lldb/test/API/commands/trace/intelpt-trace/a.out)
(lldb) thread trace dump info Trace technology: intel-pt thread #1: tid = 3842849 Raw trace size: 4 KiB Total number of instructions: 22 Total approximate memory usage: 6.38 KiB Average memory usage per instruction: 296 bytes
- make tests pass
- simplified the error handling. In fact, using Error objects might be too expensive and potentially provides little
value in the API, because the user needs to consume the Error forcefully. Besides that, once we expose this python,
the error will be a plain string, therefore, I'm now storing the error as a string. Error won't be that frequent,
so the cost of that is okay.
delete commented code