This is an archive of the discontinued LLVM Phabricator instance.

[intel-pt][trace] Implement a "get supported trace type" packet
ClosedPublic

Authored by wallace on Oct 30 2020, 11:47 AM.

Details

Summary

Depends on D89283.

The goal of this packet (jTraceGetSupportedType) is to be able to query the gdb-server for the tracing technology that can work for the current debuggeer, which can make the user experience simpler but allowing the user to simply type

thread trace start

to start tracing the current thread without even telling the debugger to use "intel-pt", for example. Similarly, thread trace start [args...] would accept args beloging to the working trace type.

Also, if the user typed

help thread trace start

We could directly show the help information of the trace type that is supported for the target, or mention instead that no tracing is supported, if that's the case.

I added some simple tests, besides, when I ran this on my machine with intel-pt support, I got

$ process plugin packet send "jTraceSupportedType"                                                                                        
  packet: jTraceSupportedType
response: {"description":"Intel Processor Trace","pluginName":"intel-pt"}

On a machine without intel-pt support, I got

$ process plugin packet send "jTraceSupportedType"                                                                                        
  packet: jTraceSupportedType
response: E00;

Diff Detail

Unit TestsFailed

Event Timeline

wallace created this revision.Oct 30 2020, 11:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 30 2020, 11:47 AM
wallace requested review of this revision.Oct 30 2020, 11:47 AM
wallace edited the summary of this revision. (Show Details)Oct 30 2020, 11:55 AM
DavidSpickett added inline comments.
lldb/docs/lldb-gdb-remote.txt
245

Saying lldb-enumerations.h might save someone some grepping time, also saying that it should be in decimal.

Can you clarify what TraceType means? Is it:
The way the trace is generated/stored, like on chip vs off chip
The thing you're tracging, e.g. instructions or power usage
Specific trace versions e.g intel pt, arm coresight (pretty sure it's not this one)

If there's the possibility of one system having multiple types then the packet should at least have the format to hold them, even though we only read the first one.

Which your code already does since:
1,2;
Would stop parsing at the ',' anyway, but worth noting if it's relevant.

254

What is the AAAAA for?

labath added a subscriber: labath.Nov 2 2020, 2:10 AM
labath added inline comments.
lldb/docs/lldb-gdb-remote.txt
245

The j in these trace packets was supposed to refer to the fact they are json-formatted. Using json for this is overkill, but that means I'm now torn between wanting to preserve the j consistency and the jTrace consistency. If this were returning a list of types, using json for that might not be totally unreasonable, and would let us avoid this dilemma :P.

254

That's the extended (textual) error message -- lldb and gdb have independently invented incompatible formats for that (I share a blame it that :/).

This style of presentation is consistent with the other jTrace packets, though I am not sure it's that helpful really -- we support the extended error on any response packet and we don't document that always. (Also the error message will only be sent if the client actually signals support for it.)

(In fact, when I look at the implementation, I see that it will _never_ send the textual error message.)

wallace updated this revision to Diff 302402.Nov 2 2020, 2:01 PM
wallace edited the summary of this revision. (Show Details)

Thanks for the feedback! Some changes:

  • I'm no longer showing the AAAAA in the documentation, as @labath pointed out that I was not using it, and, in fact, there's no need to use it.
  • I'm now using a json array to represent the return value of the packet. It's going to be a 0 or 1 element array for, probably, many years, but it makes the packet more robust and can use the j prefix sensically.
  • I improved the documentation, addressing @DavidSpickett questions. In short, a TraceType corresponds to a high-level Trace plugin that can provide LLDB with a list of instructions executed by a given program.
  • I renamed eTraceTypeProcessorTrace to eTraceTypeIntelProcessorTrace (notice the Intel word) to disambiguate.
clayborg requested changes to this revision.Nov 2 2020, 2:21 PM

So I know Intel had gone and done some tracing support in an external kind of way before. This patch is preparing to use this functionality by asking for the supported trace types. I am not a fan of a global enumeration in lldb-enumerations.h controlling the tracing type and having to have GDB servers have to respond using these LLDB specific enumeration values. It would be nice if we can query the GDB server for the supported trace types and have them respond with a string for the tracing type. See my inlined comments for what was thinking. I also think that a GDB server should respond with a single kind of supported tracing, see inlined comments for more details.

lldb/docs/lldb-gdb-remote.txt
245

I'm not a fan of using the enumeration in lldb-enumerations.h here if we can avoid it. See my comment below about giving a name to the tracing types.

247–248

So my main question is: should a lldb-server be able to return multiple tracing types? I would vote to always respond with the best tracing that is available and ensure that lldb-server can only enable on kind of trace. I know IntelPT can be supported along with some other format that only saves that few branches (ABL?), but if IntelPT is available, should we respond with only IntelPT being available? It would be also nice to have a name for each returned tracing type instead of just a number? Maybe making the response a dictionary of tracing integer type to name would be nice?:

{ 1: "intel-pt", 2: "abl" }

I would rather we return just a single entry here if possible. If we can, this might be better as a dictionary response:

{ "name": "intel-pt", "trace-type": 1 }
lldb/include/lldb/Host/common/NativeProcessProtocol.h
396–400

So here we are returning only 1 tracing type? That is fine if we switch the reply to the jTraceSupportedTypes to be a single value, otherwise this would need to be an array. Not a fan of having to use enumerations from lldb-enumerations.h, so I would rather return a ConstString with the plug-in name if possible.

lldb/include/lldb/Target/Process.h
2554

Again only one is fine if we switch the jTraceSupportedTypes to return one value.

lldb/include/lldb/lldb-enumerations.h
776

enumerations in this header file cannot change as they are part of the API for LLDB, so we can't rename this. You can add a comment letting people know about this, but you can't rename this.

This revision now requires changes to proceed.Nov 2 2020, 2:21 PM
wallace updated this revision to Diff 302439.Nov 2 2020, 5:38 PM

After a sync up with Greg, I've followed his recommedations and did the following changes:

  • Kept the packet as a simple packet returning one single trace type. However, now the response is a json object {"pluginName": <name>, "description": <text>}. This makes it very explicit that the name returned should match a Trace plug-in name, making future implementations simpler.

We analyzed the existing jTrace packets, which were done several years ago by Intel and we realized that they are not generic enough for the way we are implementing Trace plugin. Those packets are too intel-pt specific and we'll be patching them to make them more generic. For example, the "metabuffersize" is something specific to intel pt.

We also don't like the fact that it's using a public enum TraceType. We are now trying to use instead a string name, which will be more user-friendly and will help connect different gdb-servers and Trace plug-ins.

  • Removed the changes to the TraceType enum and the old intel pt plugin.
wallace edited the summary of this revision. (Show Details)Nov 2 2020, 5:40 PM
wallace edited the summary of this revision. (Show Details)
wallace edited the summary of this revision. (Show Details)Nov 2 2020, 5:44 PM
wallace marked 8 inline comments as done.Nov 2 2020, 5:51 PM

See inlined comments. Just some questions on wether we are going to reuse the other existing "tTrace*" packets, or if we are going to make new ones.

lldb/docs/lldb-gdb-remote.txt
249

Can't IntelPT exist on a machine but not be enabled? If so I would suggest adding a few more key values:

"enabled": <boolean>,
"enableInstructions": <string>

"enabled" would say if this tracing mechanism is currently installed and if it is enabled or not.
"enableInstructions" could clarify what you would have to do to enable this tracing feature, like run a "sudo" command, or enable a kernel module and reboot, etc.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
3462

So are we going to reuse all of the other "jTrace*" packets and possibly expand their usage? If so, then this name is good. If we are going to make new packets for tracing then "jLLDBTraceSupportedType" might make more sense and all commands we would add would start with "jLLDBTrace".

wallace marked an inline comment as done.Nov 2 2020, 10:35 PM
wallace added inline comments.
lldb/docs/lldb-gdb-remote.txt
249

I was running some tests and checking the documentation on the kernel, and Intel PT is either available or not (from the OS perspective). There's no configuration to do, which leaves this case simple.

However, there are other situations like what you describe. If you want to use Intel PT on a VM, then you need to enable the corresponding capabilities on your VM software. This case would benefit from the fields you propose. Right now I wouldn't attempt to solve that, so I prefer not to add new fields. However, in the future, when that's needed, we can add these fields to the packet as it's a flexible json object.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
3462

I'm glad you mention this. I'm thinking about using a completely new set of packets and document the existing ones as deprecated.

The old packets include:

  • jTraceMetaRead -> it's confusing and not really useful for the current implementation. Even the external intel-pt plugin doesn't make use of it. It exposes the data gotten when invoking the SYS_perf_event_open syscall, which is too perf+linux only. It can be useful for getting more information out of the kernel, like some clock tracing, but nothing useful as of now.
  • jTraceConfigRead -> instead of having it, jTraceStart could return that config once tracing has been enabled. That would avoid one round trip, as decoding can't happen anyway without this information.
  • jTraceStop -> it expects a trace ID (which is intel-pt only in the current implementation), and an optional thread id. It could be simpler by just asking a list of thread ids, or none and then stop the entire process.
  • jTraceStart -> it's expecting fields like metabuffersize, which don't make sense if we want to generalize this. It also returns a trace ID (which is intel-pt only). It could instead receive a trace-plugin name, a list of thread ids, a process id, and then a set of json params. Also, as mentioned above, it could return the jTraceConfig, making the api simpler.

Overall, I'd prefer to implement new packets in a very generic way. So I'll take your jLLDB prefix.

Btw, We could implement a generic data query packet

  • jLLDBTraceQueryData -> which would receive a generic data query packet that would be handled by each trace type. It could return trace buffers, or the metadata from the jTraceMetaRead packet, or any plugin-specific data.

That would leave us with jLLDBTraceStart, jLLDBTraceStop, jLLDBTraceSupportedType and jLLDBTraceQueryData

wallace updated this revision to Diff 302468.Nov 2 2020, 10:55 PM
  • Renamed the packet to jLLDBTraceSupportedType following Greg's suggestion
  • Also, marked the old trace packets as deprecated, to discourage their usage. I doubt anyone is using them, as in March I discovered that the external intel-pt plugin was broken for a long time, due to a wrong python setup and a broken decoder. I fixed it in https://reviews.llvm.org/D76872, and by doing that I noticed that no one was using it. I tried to contact the Intel folks that worked on that feature in 2017, but it seems they are not working on that any more. Interestingly, the gdb intel-pt plugin has been a little bit active these years.
labath added inline comments.Nov 3 2020, 2:03 AM
lldb/docs/lldb-gdb-remote.txt
247

Having lldb-server match the lldb trace plugin name seems a bit backward to me. I think it'd be more natural if the server reports a name for the technology it uses and then the client picks a plugin to go with that technology.

I think it's fine if the process of "picking a plugin" is implemented by matching the GetPluginName() string to the value returned here, so I don't think that the code needs to change in anyway (except maybe droping the word "plugin" from the field name?). My quibble is about how this is framed in the documentation -- I'd say this should be the source of truth and trace plugin should reference it instead of the other way around. (One reason for that could be that we can change the plugin matching logic in the client any time we want, but changing the packet format brings up compatibility concerns.)

247–248

If we're expecting that lldb-server will be able to capture multiple kinds of traces (on the same processor architecture), then (strongly) believe lldb-server should be able to return/enumerate all of them. I don't think lldb-server should be in the business of deciding what the user wants to do (like, he may want deliberately use the "inferior" technology so he can compare the performance/features of the two trace types, or whatever). Defaulting to the most advanced format is of course fine, but that could be done by listing that format first (or giving them an explicit priority, etc.).

That said, there is still a big if on that statement -- tracing technologies are complex beasts, so I'd be surprised if a single processor supports two technologies with feature sets that overlap well enough that they may be thrown into the same bag. However, I don't know much about processor traces and Greg's comment seems to imply that they do exist.

(It's also worth noting that it'd be fairly easy to extend this packed to (optionally) return a list of these dictionaries if that becomes needed.)

lldb/include/lldb/Host/common/NativeProcessProtocol.h
398

We now have an UnimplementedError class you can return here.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
3462

Given the state the old intel plugin, I wouldn't be surprised if it was completely unused. As such I don't have a problem with new packets (if the old ones don't fit). However, I also don't want to carry the excess compatibility baggage for too long either. I'd say we should just delete all the old code as soon your implementation reaches rough feature parity (which shouldn't take long, as the old code didn't do much iirc).

wallace updated this revision to Diff 302630.Nov 3 2020, 11:09 AM
  • Use name instead of pluginName and UnimplementedError, as @labath suggested
  • Still keep the packet returning one single trace technology. It'll be a long time until two technologies collide, but if that ever happens, we can extend the packet to return a list or a dictionary. As of now, the only technology that could potentially collide with intel-pt is LBR, which traces only the last 32 branches and is pretty much defunct in new applications, as Intel PT is actually the modern version of LBR. Arm has Coresight, Windows also has intel-pt. AMD has IBS. For now, I'm in favor of keeping it simple.
labath added a comment.Nov 4 2020, 2:21 AM

Just some layering remarks. Other than that, I think this is fine.

lldb/include/lldb/Host/common/NativeProcessProtocol.h
17

You shouldn't include lldb/Target from here. We already have Utility/TraceOptions.h. Maybe TraceTypeInfo could go there instead?

lldb/include/lldb/Target/Process.h
2546–2547

This code shouldn't refer the the gdb packets. Theoretically we could have other process plugins supporting this.

lldb/include/lldb/Target/Trace.h
184–185

Neither should this.

lldb/source/Plugins/Process/Linux/ProcessorTrace.cpp
60

the empty check is superfluous.

wallace updated this revision to Diff 303031.Nov 4 2020, 9:38 PM
wallace marked 10 inline comments as done.

address issues

clayborg added inline comments.Nov 5 2020, 4:40 PM
lldb/docs/lldb-gdb-remote.txt
248

Should this specify that this name needs to match the name of the LLDB plug-in?

lldb/include/lldb/Utility/StringExtractorGDBRemote.h
165–170

Mark as deprecated?

170

Maybe add a space to separate the above deprecated stuff from this new stuff

lldb/include/lldb/Utility/TraceOptions.h
21 ↗(On Diff #303031)

We should specify that this needs to match the LLDB trace plug-in name on the header doc.

lldb/include/lldb/lldb-enumerations.h
772

Mark as deprecated?

wallace updated this revision to Diff 303313.Nov 5 2020, 7:16 PM
wallace marked 5 inline comments as done.

Address all comments, specially improved the documentation requested by @clayborg.

So down to just one question about the packet definition about if we should have an error code in JSON or if an error code even makes sense. Other than that this LGTM

lldb/docs/lldb-gdb-remote.txt
262

I know the deprecated trace packets allows an error code to be returned, but since we already have JSON being used here, I would be a shame to not return an error in the JSON with a string that is human readable instead of a EXX where XX are two hex digits. can we say the response is either:

{"name": <name>, "description", <description>}

or

{"error": <error-string>}

Or just have no error code? What would the error be able to tell us?

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
3462

I agree Pavel, lets remove the old code when we can.

labath added inline comments.Nov 10 2020, 5:23 AM
lldb/docs/lldb-gdb-remote.txt
262

An error response might be suitable in case the client sends this packet without a running process.

I think we just stick to the established error code format, which already contains provisions for returning textual errors. I don't believe any of our existing json packets use json-encoded errors...

wallace updated this revision to Diff 304232.Nov 10 2020, 9:23 AM

Added a test that accounts for a custom error message from the gdb-server.
I decided to follow @labath's approach of using the existing error code format (;AAAAAAA). It works well and the GDBClient classes already have good support for that, specifically StringExtractorGDBRemote.h

clayborg accepted this revision.Nov 10 2020, 4:35 PM

LGTM with the AAAAAAAA error string stuff included so we can get a nice string error message. Pavel, do you have anything?

This revision is now accepted and ready to land.Nov 10 2020, 4:35 PM
labath accepted this revision.Nov 11 2020, 1:17 AM

I didn't check all the details, but seems ok to me.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
3465–3468

Use early exits to avoid nesting.

This revision was landed with ongoing or failed builds.Nov 11 2020, 10:36 AM
This revision was automatically updated to reflect the committed changes.

@stella.stamenova I'm working on a fix right now.