Page MenuHomePhabricator

Add a new tool for parallel safe bisection, "llvm-bisectd".
Needs RevisionPublic

Authored by aemerson on Nov 2 2021, 10:17 AM.

Details

Summary

Excerpt from Bisection.md document:

Introduction

The llvm-bisectd tool allows LLVM developers to rapidly bisect miscompiles in
clang or other tools running in parallel.

Bisection as a general debugging technique can be done in multiple ways. We can
bisect across the *time* dimension, which usually means that we're bisecting
commits made to LLVM. We could instead bisect across the dimension of the LLVM
codebase itself, disabling some optimizations and leaving others enabled, to
narrow down the configuration that reproduces the issue. We can also bisect in
the dimension of the target program being compiled, e.g. compiling some parts
with a known good configuration to narrow down the problematic location in the
program. The llvm-bisectd tool is intended to help with this last approach to
debugging: finding the place where a bug is introduced. It does so with the aim
of being minimally intrusive to the build system of the target program.

High level design

The bisection process with llvm-bisectd uses a client/server model, where all
the state about the bisection is maintained by the llvm-bisectd daemon. The
compilation tools (e.g. clang) send requests and get responses back telling
them what to do. As a developer, debugging using this methodology is intended
to be simple, with the daemon taking care of most of the complexity.

[End excerpt]

Diff Detail

Event Timeline

aemerson created this revision.Nov 2 2021, 10:17 AM
aemerson requested review of this revision.Nov 2 2021, 10:17 AM

D113031 contains an example of a client for this in GlobalISel.

paquette added inline comments.Nov 2 2021, 10:43 AM
llvm/docs/Bisection.md
41

In the GISel example patch, you disambiguate further using the target. Should you mention that here?

77

"into a vector" seems like an implementation detail

119

Should bisector support be only included in assert/debug builds?

Hi Amara,

I am kind of repeating what I said in https://reviews.llvm.org/D113031, but putting it here for better visibility.

I think the thing that drives the bisection shouldn't trickle down in the optimizations themselves, instead I would rather this information to be encoded in the IR itself (like a generalization of optnone).

For instance, for us a daemon based approach doesn't work at all because we are running a JIT compiler that runs in its own sandbox and we cannot query an external process from it.

Internally, we developed a bisect tool that annotates the IR upfront (to make an analogy with clang, this is as if clang would add a bunch of "bisect" attribute on the IR) before sending it for compilation instead of having the backend query a bisection client.
Note: technically we didn't modify clang, we instead had a specific pass at the beginning of the LLVM pipeline that would make all the bisection decisions, but also perform some transformation to isolate the bugs (like creating functions out of basic blocks, splitting the blocks to make the functions smaller, and blocking inlining.)

Cheers,
-Quentin

Hi Amara,

I am kind of repeating what I said in https://reviews.llvm.org/D113031, but putting it here for better visibility.

I think the thing that drives the bisection shouldn't trickle down in the optimizations themselves, instead I would rather this information to be encoded in the IR itself (like a generalization of optnone).

For instance, for us a daemon based approach doesn't work at all because we are running a JIT compiler that runs in its own sandbox and we cannot query an external process from it.

Internally, we developed a bisect tool that annotates the IR upfront (to make an analogy with clang, this is as if clang would add a bunch of "bisect" attribute on the IR) before sending it for compilation instead of having the backend query a bisection client.
Note: technically we didn't modify clang, we instead had a specific pass at the beginning of the LLVM pipeline that would make all the bisection decisions, but also perform some transformation to isolate the bugs (like creating functions out of basic blocks, splitting the blocks to make the functions smaller, and blocking inlining.)

Cheers,
-Quentin

How does that work when you have parallel builds? When multiple clang processes are running simultaneously, and you want to bisect to a specific translation unit, and then within that TU to a specific point in the module, don't you need some co-ordination?

Let me just step back a little bit and say that now that I think about what we did, having something that answers "should I run in this instance" is desirable, the implementation doesn't really matter. We did it with function attributes, but having a bisect client API like you're introducing is fine.
My only complain is that the client interface should not have remote in the name :P.

From an abstraction level, we need two things:

  1. Something that tells if an optimization needs to run (the remote bisect client here)
  2. Something that drives the on/off of the optimizations based on the previous state (here your daemon)

The way we did that was:
For #1 we added annotations in the IR
For #2 we implemented the driver directly in our JIT daemon

Essentially, that boils down to something that formulates a plan and something that executes the plan. At one point I was thinking that formulating the plan could be changing the pass pipeline (don't insert what you don't want to run), but that look like too much work :).

How does that work when you have parallel builds?

Each module is assigned an ID and the bisect plan and previous state are mapped to this ID.
The ID is saved in the module metadata but for now we didn't use it since all we needed was added to the IR via annotations (i.e., we didn't need to come up with a key to ask information about specific pass). In that regard, you're approach is more general.
For the ID, we used a hash of the module before the bisect annotations were added, i.e., as long as you don't change the front-end the ID are stable between runs.

To summarize:
Compute the module ID -> add annotation based on past information -> run the backend (at this point, the backend runs by itself.)

When multiple clang processes are running simultaneously, and you want to bisect to a specific translation unit, and then within that TU to a specific point in the module, don't you need some co-ordination?

At a high level here is what the driver was doing:

  • Bisect optnone on each module
  • Find the module(s) that creates the problem (the minimal set of modules that needs optimizations turn on)
  • Do the same on each function (the minimal set may involve more than one function)
  • Try to "outline" the basic blocks of each problematic function and do the same process on the newly created functions
  • Split the problematic basic block to make them smaller and continue
  • When you're happy with the size of the basic blocks, start bisecting the optimizations on the problematic functions (possibly basic block extracted). Right now we were only bisecting a handful of optimization because the final diff with the basic block splitting usually made the faulty optimization easy to find by hand.

The way it worked is all that state was saved in a file <shaderID>-bisect-info. You could bootstrap the process by populating the file by hand, i.e., by telling the JIT process which module you want to bisect.

As far as bisecting to a specific point in the TU, we were always going all the way down to the executable then you had to supply a script that tells whether or not the program is working. That's similar to what git bisect is doing (if the script runs 0, the program works, if that's one it doesn't). In your script you could check for whatever (specific sequence of asm, executable producing some results, etc.)

Note: The pass I was talking about in my previous reply that we insert in the LLVM pipeline, is generating all the information to start the bisect process (e.g., the shader ID, the list of all the functions, the list of all basic blocks). Then the driver was using this information to tell that pass to add some annotation on some function (e.g., optnone, noinline, etc.), but also to split some basic block and outline them (and attach some annotation on them).

Cheers,
-Quentin

Let me just step back a little bit and say that now that I think about what we did, having something that answers "should I run in this instance" is desirable, the implementation doesn't really matter. We did it with function attributes, but having a bisect client API like you're introducing is fine.
My only complain is that the client interface should not have remote in the name :P.

From an abstraction level, we need two things:

  1. Something that tells if an optimization needs to run (the remote bisect client here)
  2. Something that drives the on/off of the optimizations based on the previous state (here your daemon)

The way we did that was:
For #1 we added annotations in the IR
For #2 we implemented the driver directly in our JIT daemon

Essentially, that boils down to something that formulates a plan and something that executes the plan. At one point I was thinking that formulating the plan could be changing the pass pipeline (don't insert what you don't want to run), but that look like too much work :).

How does that work when you have parallel builds?

Each module is assigned an ID and the bisect plan and previous state are mapped to this ID.
The ID is saved in the module metadata but for now we didn't use it since all we needed was added to the IR via annotations (i.e., we didn't need to come up with a key to ask information about specific pass). In that regard, you're approach is more general.
For the ID, we used a hash of the module before the bisect annotations were added, i.e., as long as you don't change the front-end the ID are stable between runs.

To summarize:
Compute the module ID -> add annotation based on past information -> run the backend (at this point, the backend runs by itself.)

When multiple clang processes are running simultaneously, and you want to bisect to a specific translation unit, and then within that TU to a specific point in the module, don't you need some co-ordination?

At a high level here is what the driver was doing:

  • Bisect optnone on each module
  • Find the module(s) that creates the problem (the minimal set of modules that needs optimizations turn on)
  • Do the same on each function (the minimal set may involve more than one function)
  • Try to "outline" the basic blocks of each problematic function and do the same process on the newly created functions
  • Split the problematic basic block to make them smaller and continue
  • When you're happy with the size of the basic blocks, start bisecting the optimizations on the problematic functions (possibly basic block extracted). Right now we were only bisecting a handful of optimization because the final diff with the basic block splitting usually made the faulty optimization easy to find by hand.

The way it worked is all that state was saved in a file <shaderID>-bisect-info. You could bootstrap the process by populating the file by hand, i.e., by telling the JIT process which module you want to bisect.

As far as bisecting to a specific point in the TU, we were always going all the way down to the executable then you had to supply a script that tells whether or not the program is working. That's similar to what git bisect is doing (if the script runs 0, the program works, if that's one it doesn't). In your script you could check for whatever (specific sequence of asm, executable producing some results, etc.)

Note: The pass I was talking about in my previous reply that we insert in the LLVM pipeline, is generating all the information to start the bisect process (e.g., the shader ID, the list of all the functions, the list of all basic blocks). Then the driver was using this information to tell that pass to add some annotation on some function (e.g., optnone, noinline, etc.), but also to split some basic block and outline them (and attach some annotation on them).

Cheers,
-Quentin

Ok I think I sort of understand your flow now. I agree that it doesn't sound like our approaches are really conflicting. The remote bisection client code could certainly be hidden behind a more generic interface, and for your approach we could select an implementation that just queries the function attributes instead. For the bisection co-ordination with the files I'm not sure how that impacts this tooling, if it all. One reason I went for a daemon was that for some build systems, persistent files across builds are difficult to keep due to build sandboxing (sockets themselves need some workarounds to work with sandboxes).

I'll see what kind of abstraction I can come up with for the client in this patch, but it won't be tested since your tooling isn't upstream. I'm also guessing that you'd want to avoid using string keys for the function-attribute implementation?

I would like to note that there's some rudimentary support
for previous generation of this scattered throught the codebase,
namely DebugCounters for bugpoint.

I don't really have an opinion on the proposal at large,
but i think it may be important to not just introduce a yet another variant
of dealing with the same issue, but only have a single good modern way.

I would like to note that there's some rudimentary support
for previous generation of this scattered throught the codebase,
namely DebugCounters for bugpoint.

I don't really have an opinion on the proposal at large,
but i think it may be important to not just introduce a yet another variant
of dealing with the same issue, but only have a single good modern way.

Yes, I've seen DebugCounter. It's useful when you're already identified the file where something is going wrong, and you can enable the counters and pass counter values to opt. It doesn't however work across multiple TUs or support parallel debugging, so it's not solving the same problem.

I'm also guessing that you'd want to avoid using string keys for the function-attribute implementation?

Yeah the string keys are not ideal since that's difficult to know what we're dealing with at that point. If I were to switch to this API, I would need to know the module ID (that we insert before hand), the function, and the instance of the optimization (e.g., is it the first invocation of InstCombine, the second etc.) to read the right annotation.
That may prove difficult to have something general here.

Perhaps we could give the bisect client the context of what is being optimized (e.g., the module for module passes, the module and the function for function passes (or just the function since the module can be found from the function), the module, the function and the loop for loop passes, etc.) and where in the pipeline we are (e.g., the pass ID) and hide the formation of a string key within the bisect client API.

The bonus side effect, is that the bisect client could handle the multi instances of passes by itself (e.g., assuming the module as a unique ID set just once, we could use this ID and the passID to know which instance of the pass is running: the bisect client could keep a count of how many times it saw that pair.)

Note: For the context, for us, the path of a file is not a good discriminant because we compile directly in-memory (the front-end generates the IR in memory and don't write it to a file) so all modules have the same path (empty). Therefore, that context, the string keys, as shown in the GISel pass (https://reviews.llvm.org/D113031), would just yield the same key over and over across different modules for all the functions with the same name (like main).

Cheers,
-Quentin

I would like to note that there's some rudimentary support
for previous generation of this scattered throught the codebase,
namely DebugCounters for bugpoint.

I don't really have an opinion on the proposal at large,
but i think it may be important to not just introduce a yet another variant
of dealing with the same issue, but only have a single good modern way.

Yes, I've seen DebugCounter. It's useful when you're already identified the file where something is going wrong, and you can enable the counters and pass counter values to opt. It doesn't however work across multiple TUs or support parallel debugging, so it's not solving the same problem.

Perhaps it'd be worth considering the overlap here, and maybe avoiding it: What would it be like if this new thing /only/ bisected at the granularity of a whole compilation action (ie: one invocation of clang), rather than specific optimizations? (possibly even only at the "whole compiler" granularity - using two compilers - and choosing between one or the other for each compilation)

Then once the specific file has been identified, use the existing sub-file granularity tools (a wrapper script could help cover both of these for the user so it wasn't a bunch more manual work).

I think that might help prevent overlap of functionality/re-implementing similar/the same functionality through different mechanisms for reducing specific optimization applications?

(also: probably an add-on for the future: Have you considered not requiring a clean build for each bisection step? Would it be feasible to have the bisect tool delete (or produce insturctions the user can copy/paste, if preferred) specific output files it knows it's going to change - then letting the user rerun an incremental build that will cause those files to be regenerated and allowing the bisect tool to intercept their building to adjust how they're built - it seems like this only requires a reliable build system (so it could be optional, so it could be turned off in cases where a build system doesn't reliably check for existing outputs) and could significantly improve performance of such a tool?)

I would like to note that there's some rudimentary support
for previous generation of this scattered throught the codebase,
namely DebugCounters for bugpoint.

I don't really have an opinion on the proposal at large,
but i think it may be important to not just introduce a yet another variant
of dealing with the same issue, but only have a single good modern way.

Yes, I've seen DebugCounter. It's useful when you're already identified the file where something is going wrong, and you can enable the counters and pass counter values to opt. It doesn't however work across multiple TUs or support parallel debugging, so it's not solving the same problem.

Perhaps it'd be worth considering the overlap here, and maybe avoiding it: What would it be like if this new thing /only/ bisected at the granularity of a whole compilation action (ie: one invocation of clang), rather than specific optimizations? (possibly even only at the "whole compiler" granularity - using two compilers - and choosing between one or the other for each compilation)

Then once the specific file has been identified, use the existing sub-file granularity tools (a wrapper script could help cover both of these for the user so it wasn't a bunch more manual work).

I think that might help prevent overlap of functionality/re-implementing similar/the same functionality through different mechanisms for reducing specific optimization applications?

I think that restricting it to only allow bisection across translation units would be artificially constraining the functionality to not step on the toes of other features. That in itself doesn't seem the right approach, because the simplest thing right now is to support arbitrary bisection granularities. If we constrained it to only allow bisection down to TUs, then we haven't really simplified or re-used any code, all we've done is to make the user experience worse.

I'll re-iterate that the purpose of using a daemon and not the existing file-level granularity tools like DebugCounter is to support parallelism and minimal intrusion into the build system. That's the guiding principle behind this patch. If other tools use file-granularity bisection using simple counters, then they should stick with those features, they're solving different problems.

(also: probably an add-on for the future: Have you considered not requiring a clean build for each bisection step? Would it be feasible to have the bisect tool delete (or produce insturctions the user can copy/paste, if preferred) specific output files it knows it's going to change - then letting the user rerun an incremental build that will cause those files to be regenerated and allowing the bisect tool to intercept their building to adjust how they're built - it seems like this only requires a reliable build system (so it could be optional, so it could be turned off in cases where a build system doesn't reliably check for existing outputs) and could significantly improve performance of such a tool?)

This is a really nice idea that I hadn't considered. If we can improve the API so that bisection daemon knows the semantics of the keys, then it should in theory be simple to have the daemon touch those translation unit sources to trigger the build system to rebuild. Deleting the object files seems a bit harder, since we'd have to propagate the argument to -o through to the bisection point, whereas the TU input is available from the IR module.

I would like to note that there's some rudimentary support
for previous generation of this scattered throught the codebase,
namely DebugCounters for bugpoint.

I don't really have an opinion on the proposal at large,
but i think it may be important to not just introduce a yet another variant
of dealing with the same issue, but only have a single good modern way.

Yes, I've seen DebugCounter. It's useful when you're already identified the file where something is going wrong, and you can enable the counters and pass counter values to opt. It doesn't however work across multiple TUs or support parallel debugging, so it's not solving the same problem.

Perhaps it'd be worth considering the overlap here, and maybe avoiding it: What would it be like if this new thing /only/ bisected at the granularity of a whole compilation action (ie: one invocation of clang), rather than specific optimizations? (possibly even only at the "whole compiler" granularity - using two compilers - and choosing between one or the other for each compilation)

Then once the specific file has been identified, use the existing sub-file granularity tools (a wrapper script could help cover both of these for the user so it wasn't a bunch more manual work).

I think that might help prevent overlap of functionality/re-implementing similar/the same functionality through different mechanisms for reducing specific optimization applications?

I think that restricting it to only allow bisection across translation units would be artificially constraining the functionality to not step on the toes of other features. That in itself doesn't seem the right approach, because the simplest thing right now is to support arbitrary bisection granularities. If we constrained it to only allow bisection down to TUs, then we haven't really simplified or re-used any code, all we've done is to make the user experience worse.

I'll re-iterate that the purpose of using a daemon and not the existing file-level granularity tools like DebugCounter is to support parallelism and minimal intrusion into the build system. That's the guiding principle behind this patch. If other tools use file-granularity bisection using simple counters, then they should stick with those features, they're solving different problems.

Part of the direction is that it could influence/change the design of the feature, possibly to simplify it - for instance this could be implemented as a compiler wrapper (it could then even be compiler agnostic (maybe even tool agnostic/could be reused for other tooling investigations)) - for instance doing an A/B compiler comparison - the wrapper could do the network thing to check the key (keys would only be either input or output files (probably output files - input files could be ambiguous - the same input file might be rebuilt into multiple output targets with different arguments) and then run either one compiler or the other compiler - or otherwise massage the compiler arguments (disable an optimization, etc).

(also: probably an add-on for the future: Have you considered not requiring a clean build for each bisection step? Would it be feasible to have the bisect tool delete (or produce insturctions the user can copy/paste, if preferred) specific output files it knows it's going to change - then letting the user rerun an incremental build that will cause those files to be regenerated and allowing the bisect tool to intercept their building to adjust how they're built - it seems like this only requires a reliable build system (so it could be optional, so it could be turned off in cases where a build system doesn't reliably check for existing outputs) and could significantly improve performance of such a tool?)

This is a really nice idea that I hadn't considered. If we can improve the API so that bisection daemon knows the semantics of the keys, then it should in theory be simple to have the daemon touch those translation unit sources to trigger the build system to rebuild. Deleting the object files seems a bit harder, since we'd have to propagate the argument to -o through to the bisection point, whereas the TU input is available from the IR module.

i'd worry the input could be ambiguous (same file built with different -D flags - admittedly not /really/ common) but the output is /probably/ less ambiguous (build system's unlikely to rebuild/replace a given output)

Tyker added a subscriber: Tyker.Nov 26 2021, 2:00 PM
compnerd requested changes to this revision.Dec 4 2021, 11:54 AM
compnerd added inline comments.
llvm/lib/Support/RemoteBisectorClient.cpp
23

This doesn't cover the non-Unix path, where you need to add an include for WinSock2.h, Windows.h, and possibly WinSock.h.

37

Can you please introduce some wrappers for all the BSD socket functions? On Windows, GetAddrInfoW would be preferable over getaddrinfo. Additionally, where do you initialize the sockets library? (Yes, that is not a thing on Linux/macOS, but on Windows, before you can use any socket function, you need to invoke WSAStartup. This needs a proper hook point.

llvm/tools/llvm-bisectd/CMakeLists.txt
20

Please add a case for WIndows, where you need to link against Ws2_32.

llvm/tools/llvm-bisectd/llvm-bisectd.cpp
119

I don't know if there is a struct linger available on WIndows, it may need to be spelt LINGER.

187

GetAddrInfoW should be preferred over getaddrinfo on Windows IIRC

This revision now requires changes to proceed.Dec 4 2021, 11:54 AM
arsenm added inline comments.Dec 4 2021, 12:24 PM
llvm/lib/Support/Bisector.cpp
51–53

Merge these into one LLVM_DEBUG() block?

llvm/lib/Support/RemoteBisectorClient.cpp
40–42

Can you directly use Twine to produce the full error message from report_fatal_error

59–62

Ditto

65–67

Ditto

ychen added a subscriber: ychen.Dec 4 2021, 2:01 PM

Thanks for the feedback folks. To be honest I don't have the time right now to discuss and redesign the whole thing with David (have some parental leave coming up as well). If anyone else wants to pick this up and continue it feel free to do so, I published the patches to help other people with their debugging problems, but unless someone else picks this up and works to reach a consensus on design, it will have to lie unmaintained as a patch for Q1/Q2 next year at least.

How does this differ from opt-bisect?

Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2022, 3:36 PM

How does this differ from opt-bisect?

It's cross-process/whole-build. Imagine opt-bisect, but the action tracking is for a whole build. My hope was that maybe we could do something more coarse-grained on the build level (without the need for the compiler itself to be communicating with a service) in a wrapper - eg: good/bad compiler (or flag set) and each action gets one or the other, bisect down to the fewest actions that need the bad compiler, then use opt-bisect within a single action, holding others constant, etc. So we didn't have two different ways of doing fine-grained bisection.

How does this differ from opt-bisect?

It's cross-process/whole-build. Imagine opt-bisect, but the action tracking is for a whole build. My hope was that maybe we could do something more coarse-grained on the build level (without the need for the compiler itself to be communicating with a service) in a wrapper - eg: good/bad compiler (or flag set) and each action gets one or the other, bisect down to the fewest actions that need the bad compiler, then use opt-bisect within a single action, holding others constant, etc. So we didn't have two different ways of doing fine-grained bisection.

Right, there are two schools of thought on this. I think they're both valuable, the coarse grain wrapper approach for when you don't want to modify the compiler/want to just find a failing object file, and this one which relies on existing clients coded into the compiler.

I've been using an internal version of this tool over the past year, and it's proven invaluable in rapidly bisecting miscompiles. It was also useful to find a miscompute in the entirety of chromium, but it could do with improvements to avoid having to do an entire rebuild on each bisection step. Maybe there's a more holistic higher level design that would subsume other bisection techniques but I don't see the effort being worth it myself.

Matt added a subscriber: Matt.Tue, Jan 17, 11:01 AM