This is an archive of the discontinued LLVM Phabricator instance.

Implementation of Intel(R) Processor Trace support for Linux
ClosedPublic

Authored by ravitheja on May 30 2017, 5:33 AM.

Details

Summary

This patch implements support for Intel(R) Processor Trace
in lldb server. The changes have support for
starting/stopping and reading the trace data. The code
is only available on Linux versions where the perf
attributes for aux buffers are available.

The patch also consists of Unit tests for testing the
core buffer reading function.

Event Timeline

ravitheja created this revision.May 30 2017, 5:33 AM
ravitheja edited the summary of this revision. (Show Details)May 30 2017, 5:36 AM
ravitheja edited the summary of this revision. (Show Details)
labath edited edge metadata.May 31 2017, 2:39 AM

First batch of comments from me, I think I will have some more once I gain more insight into this. The main one is about the constructor/initialize, destructor/destroy duality, which we should abolish. The rest is mostly stylistic.

source/Plugins/Process/Linux/NativeProcessLinux.cpp
615

if(traceMonitor)

616

As far as, I can tell, every call to StartProcessorTracing is followed by setUserId. Can we move the logic of setting the id inside the StartProcessorTracing function

2482

Please use llvm file API for this (see how getProcFile works -- it's just a very thin wrapper around the llvm functions)

2509

Please use LLDB_LOG (here and everywhere else in this patch)

2846

You are calling this function with uid == m_pt_process_uid, which means this parameter is redundant, and leads to redundant sanity checks. Please remove it.

2964

You are modifying the config, but the caller is ignoring these modifications. If you don't need these, we could remove the modifications, make the config argument const and end up with a much cleaner interface.

3055

I don't see a getProcFile overload with this signature (although I am not opposed to adding one). Are you sure this compiles?

3063

Is this scope necessary? I find it just confuses the reader...

3069

add: columns[1] = columns[1].trim();
Then you can skip calling trim everywhere else.

3074

columns[0].contains(foo)

3099

return here. Then you can avoid duplicating the check outside the loop.

source/Plugins/Process/Linux/NativeProcessLinux.h
157

How about void ReadCyclicBuffer(ArrayRef<uint8_t> buffer, size_t position, MutableArrayRef<uint8_t> &dest)

161

This seems suspicious. How did you come to choose this number?

264

Please move this into a separate file. NativeProcessLinux.cpp is big enough as it is.

278

This would be much cleaner, if we could avoid creating an "invalid trace" object. The way that's usually done is to have a factory function, returning say llvm::Expected<std::unique_ptr<T>> (or even llvm::Expected<T> if you are able to make your type movable easily). The factory function then does all the operations that can fail and returns an error in this case. Once it has everything ready, it just calls the constructor, forwarding all the necessary arguments. The constructor can perform any additional non-fallable initialization, if necessary.

This way all the code inside and around the object can assume that once it has an object of this type, it is a valid object. We don't have many examples of this pattern, but you can for example look at MinidumpParser::Create or the code in (D32930) for guidance. I am also working on a patch to switch NativeProcess creation to use this pattern.

295

I believe we decided to call this "trace ID"

unittests/CMakeLists.txt
77

Please put this under unittests/Process/Linux (to match unittests/Process/gdb-remote)

unittests/Linux/CMakeLists.txt
1 ↗(On Diff #100699)

Delete this. Just include the files as "Plugins/Process/Linux/XXX.h"

clayborg resigned from this revision.May 31 2017, 10:13 AM

I trust Pavel to review this since it is in the Linux native plugins mostly.

zturner added inline comments.May 31 2017, 11:02 AM
source/Plugins/Process/Linux/NativeProcessLinux.cpp
725–729

This function returns a Status. Can't we assume that traceMonitor will be valid if and only if the returned Status is a success condition? And if it's not a success condition, shouldn't you log the error?

2516–2521

Please use a range based for here.

2625

Can you do an early return here if the condition is not true?

2626

And here

2637–2641

Nothing specifically wrong with this, but it's implicitly convertible, so if you like it you can simply just pass these strings into the AddIntegerItem function as string literals.

2716

std::make_pair(thread, traceMonitor) might allow this to fit on one line. m_processor_trace_monitor.emplace(thread, traceMonitor); almost definitely would.

2743

Early return here.

2744

Who is responsible for freeing this memory?

2793–2794

Seems like this would be more straightforward to just say:

if (trace_type != eTraceTypeProcessorTrace)
  return NativeProcessProtocol::StartTrace(config, error);
2869–2870

Bunch of opportunities in this function for early returning on error conditions.

2875–2876

Range based for with an inverted conditional and early continue inside the loop.

2930–2937

Delete and replace callsites with llvm::Log2_64

2939–2942

What happens if you call this function twice in a row? Or from different threads? Is that something you care about?

3008–3009

Perhaps you can use llvm::MemoryBuffer here? It mmaps internally

3045

It looks to me like vendor_id param could be a StringRef&.

source/Plugins/Process/Linux/NativeProcessLinux.h
157

Better yet, drop the position argument entirely. ReadCyclicBuffer(src, N, dest) is equivalent to ReadCyclicBuffer(src.drop_front(N), dest);

266–267

Instead of having

void *m_mmap_data;
void *m_mmap_aux;
void *m_mmap_base;

and then doing some ugly casting every time someone calls getAuxBufferSize or getDataBufferSize, instead just write:

MutableArrayRef<uint8_t> m_mmap_data;
MutableArrayRef<uint8_t> m_mmap_aux;

and initializing these array refs up front using the size from the header. This way you don't have to worry about anyone using the buffers incorrectly, and the ReadPerfTraceAux(size_t offset) function no longer needs to return a Status, but instead it can just return MutableArrayRef<uint8_t> since nothing can go wrong.

344–345

Can you use an llvm::DenseMap here?

347–349

And an llvm::DenseSet here?

zturner added inline comments.May 31 2017, 11:02 AM
source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
1318–1320
for (const auto &I : buf)
  response.PutHex8(I);
ravitheja added inline comments.Jun 7 2017, 7:44 AM
source/Plugins/Process/Linux/NativeProcessLinux.h
157

So there are couple of things i would like to mention ->

  1. The data and aux buffers allocated are cyclic in nature, so we need to consider the cyc_start or starting index in order to correctly read them.
  2. The function ReadCyclicBuffer is very generic and is capable of reading a certain chunk of memory starting from a certain offset.
  3. Now I could have kept this function in the private domain so that the cyc_start need not be passed everytime, but then I also wanted to unit test this function.

Now keeping these in mind my questions are the following ->
@zturner @labath I need the offset and cyc_start , both are required so the position argument won't suffice ?

Please correct me if i am wrong.

161

Sometime ago I asked in the devlist about the error codes in lldb and got the answer that they were completely arbitrary, so 0x23 has no special meaning. The question that I would like to ask is do u think these error codes should be here ? coz in https://reviews.llvm.org/D33035 it was suggested by @clayborg that tools working on top of the SB API's should only receive Error Strings and not codes.

Currently anyone who would encounter these codes would have to refer here for their meaning. Also the remote packets don't allow me to send Error Strings instead of error codes.

266–267

As mentioned in my previous comments, the m_mmap_data and m_mmap_aux are cyclic buffers and unfortunately the kernel keeps overwriting the buffer in a cyclic manner. The last position written by the kernel can only be obtained by inspecting the data_head and aux_head fields in the perf_event_mmap_page structure (pointed by m_mmap_base).

Because of this scenario you see the ugly type casting. Now regarding the casting in getAuxBufferSize and getDataBufferSize I did not want to store the buffer sizes as even if store them since they are not the biggest contributors to the total type casting ugliness. plus if the correct sizes are already store in the perf_event_mmap_page structure why store them myself ?.

Now regarding ReadPerfTraceAux(size_t offset) , i still need the size argument coz what if the user does not want to read the complete data from offset to the end and just wants a chunk in between the offset and the end ?

ravitheja added inline comments.Jun 7 2017, 7:51 AM
source/Plugins/Process/Linux/NativeProcessLinux.cpp
3008–3009

In my case, I only want to allocate it using mmap with the options i use here, with
llvm::MemoryBuffer I see no option to force it to only use mmap and with the options i need ?

labath added inline comments.Jun 7 2017, 8:27 AM
source/Plugins/Process/Linux/NativeProcessLinux.h
157

I believe zachary did not understand what the function does. The position is indeed necessary. However, I believe the prototype I suggested will work for you.

As for the function itself, I think it is way over-engineered. More than half of the function is sanity-checking, and more than half of the tests are tests of the sanity checks.

With ArrayRef, the function could be implemented as:

auto remaining = buffer.drop_front(position);
if(remaining.size() > dest.size())
  std::copy(remaining.begin(), dest.size(), dest.begin());
else
  std::copy_n(buffer.begin(), remaining.size() - dest.size(), 
    std::copy(remaining.begin(), remaining.end(), dest.begin());

(this will be slightly more complicated if you need to handle the dest.size() > buffer.size() case, which I am not sure if you need)

161

If they're completely arbitrary, then how about starting with number one?

Not being able to send complex error messages across the gdb-protocol is a bit of a drag. I would not be opposed to adding such a facility, as I wished for that a couple of times in past. If you wish to do that, then we should start a separate discussion.

266–267

So, if this refers to a structure of type perf_event_mmap_page, why let the type of this be perf_event_mmap_page *? That way you can have just one cast, when you initialize the member, and not each time you access it.

ravitheja updated this revision to Diff 102540.Jun 14 2017, 6:48 AM
ravitheja marked 29 inline comments as done.

Changes suggested in previous round of feedback.

ravitheja added inline comments.Jun 14 2017, 6:49 AM
source/Plugins/Process/Linux/NativeProcessLinux.cpp
2846

Well I wanted this function to be capable of being called from other contexts as well, but i never ended up using that way.

2939–2942

What happens if you call this function twice in a row

Second call will fail,

Or from different threads

If called for different threads, then they will start to be traced, ofcourse assuming they are not being traced.

3055

Yes sorry I forgot to add this new file.

source/Plugins/Process/Linux/NativeProcessLinux.h
266–267

The thing is this structure is only available from a certain version of the perf_event.h. Although I have put macro guards everywhere I use this structure. But on previous Linux versions it won't be there, so I was afraid if i would run into issues.

Second round of comments. I still see a lot of places with redundant checks for what appear to be operational invariants. I'd like to get those cleared up to make the code flow better.

source/Plugins/Process/Linux/NativeProcessLinux.cpp
611

Every call to AddThread is followed by this block. Is there a reason we cannot just move it inside the AddThread function?

621

LLDB_LOG will already print the function name (same thing everywhere else you use __FUNCTION__), so you don't need to write it out yourself. Also the AsCString is unnecessary, as Status has a pretty-printer.

2527

unused variable

2620

As of r305462 you can write error = traceMonitor.takeError(), so you shouldn't need the extra variable.

2637

Should you delete the iterator from the map after this? In fact, the mere act of deleting the iterator should probably be what triggers the trace stop, in line with RAII principles.

2671

This is dead code. The caller has already checked this condition. This should be at most an assertion.

source/Plugins/Process/Linux/NativeProcessLinux.h
264

ErrorOr<T> is a better way to return a value *or* an error. Although, in this case, it sounds like you're really only interested in the found-or-not-found distinction, so you could just drop the Status argument and let a null return value signify an error.

278

I'd like to downgrade these to unique pointers to ProcessTraceMonitor. There's no reason for these to ever outlive or escape the process instance, so it's natural to say they are strongly owned by it. In other places where you use ProcessorTraceMonitorSP you can just use regular pointers or references (depending on whether they can be null or not).

283

I am confused about the purpose of this member variable. As far as I can tell, it will always contain *either* all of the threads in the process *or* none of them. Is there a situation where this is not the case?

source/Plugins/Process/Linux/ProcessorTrace.cpp
159

eErrorTypePOSIX is used for errno error values. Please don't try to pass your invented error codes as these.

322

dead code

413

Just put llvm_unreachable here. There shouldn't be any way to construct this object if tracing is not supported.

481

Dead code? The construction of the object shouldn't have succeeded if you failed to allocate the memory or initialize the trace.

source/Plugins/Process/Linux/ProcessorTrace.h
21

All the other files in the folder are in an additional process_linux namespace. This file should go there as well.

71

Make this MutableArrayRef<uint8_t>. Then you don't need the ugly cast when calling readcyclicbuffer, nor the GetAuxBufferSize function (as it just becomes m_mmap_aux.size()). Same for m_mmap_data.

80

Instead of global variables (we may want to debug more than one process eventually), we could have a member variable std::pair<user_id_t, TraceOptions> m_process_trace; (you can have a special class for that if you don't like .first and .second everywhere).

unittests/Linux/ProcessorTraceTest.cpp
1 ↗(On Diff #102540)

You seem to have forgotten to delete this.

unittests/Process/Linux/ProcessorTraceTest.cpp
42

I don't understand why you're testing these. Why would anyone create an ArrayRef pointing to null? If you get handed an array ref, you should be free to assume it points to valid data.

ravitheja added inline comments.Jun 16 2017, 12:26 AM
source/Plugins/Process/Linux/NativeProcessLinux.cpp
611

Yes that could be done.

source/Plugins/Process/Linux/NativeProcessLinux.h
283

Yes there can be situations like that, for e.g if a user starts tracing on an individual thread followed by tracing the whole process (meaning in two separate start trace calls) then this Set will contain the thread ids of all the threads except the one on which individual tracing was started. This can be extended further.

unittests/Process/Linux/ProcessorTraceTest.cpp
42

Well I see that ArrayRef can be constructed with a nullptr and a positive size, like

ArrayRef(nullptr, 5);

hence I added the that check . If u want I can remove these checks ?

ravitheja added inline comments.Jun 16 2017, 12:28 AM
source/Plugins/Process/Linux/ProcessorTrace.cpp
159

Yes I did not want to use eErrorTypePOSIX but when transitioning from Status to llvm::Error, the m_code is only retained for eErrorTypePOSIX else its not retained.

labath added inline comments.Jun 16 2017, 2:17 AM
source/Plugins/Process/Linux/NativeProcessLinux.h
283

Interesting... Is that actually useful for something? I find the semantic of "trace all threads in the process except those that happen to be traced already" a bit confusing.

If you have a use for it in mind then fine, but otherwise, I'd like go for the simpler option of failing the request to trace the whole process if some individual threads are traced already. "you can either trace the whole process *OR* any number of individual threads" sounds much easier to understand. Otherwise, you'd have to think about corner cases like "What if the individual thread trace is stopped but the process trace request remains?" Do you then automatically start tracing the remaining thread based on the previous "whole process trace" request, and so on...

source/Plugins/Process/Linux/ProcessorTrace.cpp
159

That's a good point. When I wrote the conversion function, there was no use case for this -- I think you're the first person who actually want's to use the error codes in some meaningful way.

What is your further plan for these error codes? Judging by the state of the D33035 you won't be able to use them to display the error messages to the user?

If you still want to preserve the error codes, we can definitely make this happen. Actually, llvm::Error makes this even easier, as it allows you to define new error categories in a distributed way. Frankly, I think the your use of the "generic" error category with custom error codes is a bit of a hack. I think the intended usage of the Status class was to define your own ErrorType enum value and tag your errors with that (but that does not scale well to many different sources of error codes).

unittests/Process/Linux/ProcessorTraceTest.cpp
42

Yes, please. That was never the intention of array ref -- it should be used in a way that in always refers to valid memory. Making that assumption will allow you to cut the size of your code in half (both the test and the implementation)

Just to make things clear, I will explain a use case

Suppose if we are debugging an application where the main thread spawns a second new thread ->

int main() {
int i = 0;   // user starts tracing on main thread -> gets traceid 1
.....  // Some statements
i++; // Here he starts tracing the whole process -> gets traceid 2.
        // Now traceid=2 represents the tracing instance on the 
        // whole process, meaning all new threads spawned in 
        //  the process will be traced with this id. Although the 
        //main thread is already being traced hence 
        // will not be a part of traceid =2
std::thread (thread_function) // New thread spawned, this will be traced with id =2
.....  // Some statements
        // To obtain the trace data or stop the trace on main thread
       //  the user can simply work with trace id =1 and not specify the
      // thread id anywhere.
        // For threads under the process trace id hood, the thread id needs to be specified for
        // obtaining the trace data or if the tracing needs to be stopped only for that thread.
       // Now if the tracing on the process is switched off then the tracing on the main thread
      // is not affected.
}

Now the set of threads

m_pt_traced_thread_group

will track threads that are being traced under the process trace id. Hence when a thread exits,
we need to update this set accordingly.

source/Plugins/Process/Linux/NativeProcessLinux.cpp
2637

Yes, but the thing is i have to update the Set of threads

m_pt_traced_thread_group

which is tracking how many threads are being traced under the process trace id. The StopTrace calls StopProcessorTracingOnThread eventually which deletes the element from the various containers. I can directly call StopProcessorTracingOnThread instead of StopTrace.

ravitheja added inline comments.Jun 16 2017, 6:30 AM
source/Plugins/Process/Linux/NativeProcessLinux.h
283

Yes thats what will happen. All threads spawned will be traced unless the tracing on the whole process is stopped.

ravitheja added inline comments.Jun 16 2017, 6:37 AM
source/Plugins/Process/Linux/ProcessorTrace.cpp
159

My plan is to perhaps implement a way to pass error strings along with the error packets, so that the tool in D33035 can directly use those strings. I guess then I can just use the eErrorTypeGeneric .

So keeping that in mind I can just remove the error codes and replace them everywhere with the strings ? Although the tool will not work unless the error strings are transported .

Thanks for spelling that out. However, it still does not sound like a convincing use case to me. Why would the user start to trace just one thread, only to later change his mind and trace the whole process instead. I'm not saying that can't happen, but it seems like something that we should discourage rather than try to support. The messaging I'd give to the user is: "you should trace the whole process unless you have a good reason not to". In case of a single threaded application, it won't make a difference, and if that app suddenly turns multithreaded, it will be traced as well. If he happens to have a single-thread trace running and later chooses to have a full process traced, he can always cancel that one and initiate a full trace instead.

Just to make things clear, I will explain a use case

Suppose if we are debugging an application where the main thread spawns a second new thread ->

int main() {
int i = 0;   // user starts tracing on main thread -> gets traceid 1
.....  // Some statements
i++; // Here he starts tracing the whole process -> gets traceid 2.
        // Now traceid=2 represents the tracing instance on the 
        // whole process, meaning all new threads spawned in 
        //  the process will be traced with this id. Although the 
        //main thread is already being traced hence 
        // will not be a part of traceid =2

Yes, and this is what makes it confusing to me. At this point trace 2 will not be tracing anything, even though in the comments and the code you use the terms like "process trace" and "trace all threads".

Another thing I find confusing is what would happen if the events were reversed:

  1. Start tracing the whole process
  2. Ok. Traceid = 1
  3. Start tracing thread 47
  4. Error. Thread 47 is already traced. (???)

So now tracing a single thread is forbidden if you already have a process trace running, but starting a process trace is ok even if there is a thread trace already. This seems more complicated to grasp than "you cannot trace the full process as you already trace one of it's threads".

source/Plugins/Process/Linux/NativeProcessLinux.cpp
2637

That makes it slightly better, but I still find extremely verbose -- I have to read through 50 lines of error checks, only to execute two lines of code:

m_pt_traced_thread_group.erase(thread);
m_processor_trace_monitor.erase(iter);

It looks like there is some problem with abstractions there.

source/Plugins/Process/Linux/ProcessorTrace.cpp
159

Ok, then let's drop the error enum, and use just strings for now(?) For what it's worth, at least it makes this consisted with the rest of the code base.

Although a bit confusing, there is more flexibility for the user.I must also point out that the trace buffer available is not unlimited and there can be situations where a user might simultaneously want to trace newly spawned threads with a smaller buffer and trace an individual thread with perhaps a bigger buffer size. Tracing all threads is definitely important coz the user might not want to separately start tracing on each new thread. Now the current design might be a bit confusing but I am willing to document it well with examples and uses cases.

source/Plugins/Process/Linux/NativeProcessLinux.h
278

Hi, I don't see the advantage of changing to unique pointers ? coz when the process dies they will be destroyed anyhow, plus using shared pointers makes it easier for functions operating with the ProcessTraceMonitor to work.

ravitheja added a comment.EditedJun 19 2017, 4:18 AM

Start tracing the whole process
Ok. Traceid = 1
Start tracing thread 47
Error. Thread 47 is already traced. (???)

The error is because its already being traced, so if someone wants the trace log, its already available and the user need not do anything extra.

I think if we should first sort out the functionality ?

Although a bit confusing, there is more flexibility for the user.I must also point out that the trace buffer available is not unlimited and there can be situations where a user might simultaneously want to trace newly spawned threads with a smaller buffer and trace an individual thread with perhaps a bigger buffer size. Tracing all threads is definitely important coz the user might not want to separately start tracing on each new thread. Now the current design might be a bit confusing but I am willing to document it well with examples and uses cases.

I agree that having the ability to trace new threads from the moment they spawn is important. The confusion arises from the fact that we are treating "trace all threads" and "trace new threads" as a single concept. I think the cleanest solution for me would be to split those up, so that we would have three operations: (1) trace a specific thread, (2) trace new threads, (3) trace all threads. Then (1) and (2) can be combined in arbitrary ways, and (3) is mutually exclusive with anything else. It could be I am over-designing this though. I'd like to hear the opinion of anyone else that is reading this.

source/Plugins/Process/Linux/NativeProcessLinux.h
278

It makes it clear that the Process is the owner of these objects (and not for example "sharing" them with anyone else). Plus you should use the simplest tool that gets the job done and unique_ptr is definitely simpler. So I'd reverse the question: If there is no need for using shared_ptr, why do it?

I disagree with the statement that it makes it harder for the functions to work. Please provide an example.

ravitheja added inline comments.Jun 19 2017, 4:53 AM
source/Plugins/Process/Linux/NativeProcessLinux.h
278

Ok I was thinking that working with shared_pointers was a bit cleaner approach than working with references or raw pointers, but in this case won't make much of a difference.

I will make them unique pointers. No problem.

Although a bit confusing, there is more flexibility for the user.I must also point out that the trace buffer available is not unlimited and there can be situations where a user might simultaneously want to trace newly spawned threads with a smaller buffer and trace an individual thread with perhaps a bigger buffer size. Tracing all threads is definitely important coz the user might not want to separately start tracing on each new thread. Now the current design might be a bit confusing but I am willing to document it well with examples and uses cases.

I agree that having the ability to trace new threads from the moment they spawn is important. The confusion arises from the fact that we are treating "trace all threads" and "trace new threads" as a single concept. I think the cleanest solution for me would be to split those up, so that we would have three operations: (1) trace a specific thread, (2) trace new threads, (3) trace all threads. Then (1) and (2) can be combined in arbitrary ways, and (3) is mutually exclusive with anything else. It could be I am over-designing this though. I'd like to hear the opinion of anyone else that is reading this.

Well that's the whole thread group idea. Currently we have only one thread group i.e the process group (all existing un traced threads + newly spawned ones).
With separate "trace all threads" and "trace new threads", it will be multiple thread groups. For e.g

Lets say an application spawns 10 threads, now here we can end up with 9 thread groups in the worst case if the user calls "trace all threads" after each
new spawned thread.

Now I wanted to avoid having multiple thread groups coz the implementation will become more complex.

Well that's the whole thread group idea. Currently we have only one thread group i.e the process group (all existing un traced threads + newly spawned ones).
With separate "trace all threads" and "trace new threads", it will be multiple thread groups. For e.g

Lets say an application spawns 10 threads, now here we can end up with 9 thread groups in the worst case if the user calls "trace all threads" after each
new spawned thread.

Now I wanted to avoid having multiple thread groups coz the implementation will become more complex.

That's not exactly how I imagined it. Calling "trace all threads" could still only be done once. You can only combine tracing a single thread and tracing newly created threads. So you could end up with ~10 trace instances, if you start tracing threads one by one, but the user would probably give up after the first two, and just start tracing all future threads in once instance.

However, this talk of groups has interested me. If we phrase it in terms of groups, I think the current behavior makes some sense -- the user can create a "group trace", and this group would contain all current untraced threads, as well as any threads that are created in the future. If noone else has any opinion on this, then I guess we can go with this. We should just make sure to document it and avoid function names like "trace_ALL_threads" and similar. :)

ravitheja updated this revision to Diff 103355.Jun 21 2017, 4:50 AM

Changes for last feedback.

I like the direction this is going in, but I see places for more cleanup. The main three items are:

  • making destruction cleaner
  • avoiding global variables
  • making ReadCyclicBuffer readable

the rest are just general nits.

source/Plugins/Process/Linux/NativeProcessLinux.cpp
2551

return process_config.takeError() is two lines shorter

2568

This is still called trace*ALL*threads

2640

the formatting of the bracket is wrong

2726

s/UID/traceid/

source/Plugins/Process/Linux/NativeProcessLinux.h
114

having default argument values on virtual functions is dangerous. Please remove these. AFAIK, noone is going to be calling these through a NativeProcessLinux pointer anyway.

source/Plugins/Process/Linux/ProcessorTrace.cpp
33

Are these functions still useful?

129

You can consider return ret.getError(), which will probably give you a more detailed error message.

148

drop AsCString()

243

drop c_str()

307

I'd like this work to be done in the destructor. Just like nobody should see a constructed-but-not-yet-initialized object, so the destroyed-but-not-destructed state should not exist. Then you don't need to worry about setting the state of the object to some sane values.

In fact, if you utilize std::unique_ptr capabilities, then you may not even need a destructor at all. Just define a class

class munmap_delete {
  size_t m_length;

public:
  munmap_delete(size_t length) : m_length(length) {}
  void operator()(void *ptr) { munmap(ptr, m_length); }
};

and then you can just declare `
std::unique_ptr<perf_event_mmap_page, munmap_delete> mmap_base(mmap_result, munmap_delete(mmap_size))

This way, the allocation and deallocation code are close together, and you don't need to write yourself // we allocated extra page reminders.

Since we store m_aux in a ArrayRef already, this would require duplicating the pointer in the unique_ptr member. I am fine with that, but if you want to avoid it you can just have a getAux() member function which sythesizes the arrayref on demand.

358

ReadCyclicBuffer will handle the case of a zero-sized buffer gracefully, right? You don't need to check the size at every level.

438

buffer = buffer.drop_back(bytes_remaining) ?

475

I think there are waaaaaay too many nested conditionals here. Let's try this implementation for a change:

SmallVector<ArrayRef<uint8_t>, 2> parts{ src.slice(src_cyc_index), src.drop_back(src_cyc_index)};
if (offset >= parts[0].size()) {
  offset -= parts[0].size();
  parts.drop_front();
}
parts[0] = parts[0].drop_front(offset);

That takes care of offset. Now on to the copy.

dst = dst.take_front(size_to_read);
auto next = dst.begin();
for(auto part: parts) {
  size_t chunk_size = std::min(size_to_read, part.size());
  next = std::copy_n(part.begin(), chunk_size, next);
  size_to_read -= chunk_size;
}

Much easier to reason about and less than half the size of the original implementation.

source/Plugins/Process/Linux/ProcessorTrace.h
80

I guess I wasn't too clear about this, but I meant a member variable in the Process class. This solves the "multiple process" issue, but creates a couple of other ones. All the call sites are now more complicated, and you have to worry about managing the lifetime of the entries in this map. If this was a Process member variable, those would be handled automatically (and it makes more sense, since it is the process class that owns these traces instances).

ravitheja marked 5 inline comments as done.Jun 22 2017, 8:11 AM
ravitheja added inline comments.
source/Plugins/Process/Linux/ProcessorTrace.cpp
307

I can get the desired effect, by moving this function to the destructor, but we won't be able to catch munmap errors. I guess logging should be fine ? right now we were able to report errors in munmap.

ravitheja added inline comments.Jun 22 2017, 8:20 AM
source/Plugins/Process/Linux/ProcessorTrace.h
80

Ok sorry now I understand what u meant.

labath added inline comments.Jun 22 2017, 8:25 AM
source/Plugins/Process/Linux/ProcessorTrace.cpp
307

They only way I see that munmap can fail is if you pass it a garbage argument.

On success, munmap() returns 0, on failure -1, and errno is set (probably to EINVAL). :D

That is a programmer error, so I would be fine with even asserting on it, but logging and ignoring is fine as well. It's not like you can do anything better at that point anyway -- you're already ignoring the return value of Destroy...

ravitheja updated this revision to Diff 103943.Jun 26 2017, 7:15 AM
ravitheja marked 6 inline comments as done.

More changes from past feedback.

Should I start looking at this or you have more changes planned? I still see manual memory management in the Destroy function...

BTW, what's your testing strategy for this?

source/Plugins/Process/Linux/ProcessorTrace.cpp
394

Isn't the drop-back expression equivalent to src.take_front(src_cyc_index)?

Yes you can start looking at it. The thing with munmap stuff is the following, you basically don't want to give the user an opportunity to have an uninitialized or instances which have been destroyed but not deleted.
So I removed the Destroy function from the public space. Now the user need not worry about Destroy function.

Regarding the testing strategy, we have tests at two levels, one at the SB API level and the other at the tool level.

ravitheja added inline comments.Jun 26 2017, 8:01 AM
source/Plugins/Process/Linux/ProcessorTrace.cpp
394

The problem with that is I don't see the documentation of some functions and I have to dig in the code.
Which is why I did not used it.

Yes you can start looking at it. The thing with munmap stuff is the following, you basically don't want to give the user an opportunity to have an uninitialized or instances which have been destroyed but not deleted.
So I removed the Destroy function from the public space. Now the user need not worry about Destroy function.

Ok, so that was one of my reasons for doing this. The other one is internal code cleanlyness -- it's much easier to verify that the code is healthy by just looking at it when the initialization and deinitialization is close together. unique_ptr allows you to do that. In this code

if ((result = mmap(..., size)) != MAP_FAILED)
  ptr_up = std::unique_ptr(result, mmap_deleter(size));

the init and deinit code is on adjecant line, and it's guaranteed than memory will be freed. Here:

first_function() {
ptr = mmap(..., size);
ref=ArrayRef(ptr, size-1);
...
}

...

second_function() {
  ...
  // Remember to increment size by one
  munmap(ref.data(), ref.size()+1);
  ... 
}

it's not so obvious because the code is far apart and you need to carry state around. To verify correctness I need to look at the two pieces of code, and then find all possible code paths between them.

Regarding the testing strategy, we have tests at two levels, one at the SB API level and the other at the tool level.

Cool, are you going to put some of those tests in this patch?

source/Plugins/Process/Linux/ProcessorTrace.cpp
394

That's fine, there are a lot of these APIs and it's not possible to remember all of them. However, now that you know about the function, wouldn't you agree that it's cleaner to use it?

Regarding the testing strategy, we have tests at two levels, one at the SB API level and the other at the tool level.

Cool, are you going to put some of those tests in this patch?

The problem with uploading tests is that they need minimum Broadwell machine and a newer Linux kernel (> 4.2 something )

Yes you can start looking at it. The thing with munmap stuff is the following, you basically don't want to give the user an opportunity to have an uninitialized or instances which have been destroyed but not deleted.
So I removed the Destroy function from the public space. Now the user need not worry about Destroy function.

Ok, so that was one of my reasons for doing this. The other one is internal code cleanlyness -- it's much easier to verify that the code is healthy by just looking at it when the initialization and deinitialization is close together. unique_ptr allows you to do that. In this code

if ((result = mmap(..., size)) != MAP_FAILED)
  ptr_up = std::unique_ptr(result, mmap_deleter(size));

the init and deinit code is on adjecant line, and it's guaranteed than memory will be freed. Here:

first_function() {
ptr = mmap(..., size);
ref=ArrayRef(ptr, size-1);
...
}

...

second_function() {
  ...
  // Remember to increment size by one
  munmap(ref.data(), ref.size()+1);
  ... 
}

it's not so obvious because the code is far apart and you need to carry state around. To verify correctness I need to look at the two pieces of code, and then find all possible code paths between them.

Ok I will write it like that. Please tell me if u want more changes, I will do them all together for the next patch.

source/Plugins/Process/Linux/ProcessorTrace.cpp
394

Ok I will use that.

ravitheja updated this revision to Diff 104160.Jun 27 2017, 7:20 AM

Removing Destroy() function.

ravitheja marked 2 inline comments as done.Jun 27 2017, 7:26 AM
ravitheja marked an inline comment as done.
labath accepted this revision.Jun 27 2017, 9:25 AM

I still see some room for improvement, but some of those require some infrastructure improvements, like being able to construct llvm::StringError (or equivalent easily), which I guess will have to wait until I fix those. I think this patch has gone on long enough.

Regarding the testing strategy, we have tests at two levels, one at the SB API level and the other at the tool level.

Cool, are you going to put some of those tests in this patch?

The problem with uploading tests is that they need minimum Broadwell machine and a newer Linux kernel (> 4.2 something )

Some kind of test is better than nothing, so I would still recommend you go ahead with the test. This wouldn't be the first piece of functionality without a proper test, but I hope you realize it is in your interest to have the functionality tested as much as possible -- people will be refactoring this code, and code around it -- if it's not tested, it will be broken sooner or later. In any case, I am going to leave that up to you.

This revision is now accepted and ready to land.Jun 27 2017, 9:25 AM
clayborg accepted this revision.Jun 27 2017, 9:42 AM
ravitheja closed this revision.Jun 28 2017, 12:58 AM