In r219544 (2014-10-10) I changed the default disassembly format to more closely resemble gdb's disassembly format. After living on this format for a few months, there are obvious shortcomings with C++ and Objective-C programs and I want to try a new approach.
Originally lldb's disassembly would display the Module & Function/Symbol name on a line by itself when a new function/symbol began, on each line of assembly display the file/load address followed by opcode, operands, and comments (e.g. showing the target of a branch insn). Branches to the same function would have a comment listing the full function name plus an offset. Note that the addresses did not display the offset, just raw addresses, meaning you had to compare the full address of the branch target with the disassembly output to find the target of the branch. When the branch target was in inlined code, lldb would print all of the inlined functions in the comment field (on separate lines).
In October I changed this to more closely resemble gdb's output: Each line has the file/load address, the function name, the offset into the function ("+35"), opcode, operand, comment. Comments pointing to the same function behaved the same but inlined functions were not included. I try to elide function argument types (e.g. from a demangled C++ name) but with templated methods it can be enormous.
This style of disassembly looks pretty good for short C function names. Like
(lldb) disass -c 20
0x7fff94fbe188 <mach_msg_trap>: movq %rcx, %r10 0x7fff94fbe18b <mach_msg_trap+3>: movl $0x100001f, %eax 0x7fff94fbe190 <mach_msg_trap+8>: syscall
-> 0x7fff94fbe192 <mach_msg_trap+10>: retq
0x7fff94fbe193 <mach_msg_trap+11>: nop 0x7fff94fbe194 <mach_msg_overwrite_trap>: movq %rcx, %r10
but as soon as you get a hefty C++ name in there, it becomes very messy:
0x107915454 <CommandObjectBreakpointList::DoExecute+68>: jne 0x1be9331 ; CommandObjectBreakpointList::DoExecute + 113 at CommandObjectBreakpoint.cpp:1420
Or, an extreme example that I found in lldb with 30 seconds of looking (function name only) -
std::1::function<std::1::shared_ptr<lldb_private::TypeSummaryImpl> (lldb_private::ValueObject&)>::function<CommandObjectTypeSummary::CommandObjectTypeSummary(lldb_private::CommandInterpreter&)::'lambda'(lldb_private::ValueObject&)>
I want to go with a hybrid approach between these two styles. When there is a new symbol, we print the full module + function name. On each assembly line, we print the file/load address, the offset into the function in angle brackets, opcode, operand, and in the comments branches to the SAME function follow the <+36> style. An example:
(lldb) disass
LLDB`CommandObjectBreakpointList::DoExecute:
0x107915410 <+0>: pushq %rbp 0x107915411 <+1>: movq %rsp, %rbp 0x107915414 <+4>: subq $0x170, %rsp 0x10791541b <+11>: movq %rdi, -0x20(%rbp) 0x10791541f <+15>: movq %rsi, -0x28(%rbp) 0x107915423 <+19>: movq %rdx, -0x30(%rbp) 0x107915427 <+23>: movq -0x20(%rbp), %rdx
-> 0x10791542b <+27>: movq %rdx, %rsi
0x10791542e <+30>: movb 0x165(%rdx), %al 0x107915434 <+36>: andb $0x1, %al 0x107915436 <+38>: movq %rsi, %rdi 0x107915439 <+41>: movzbl %al, %esi 0x10791543c <+44>: movq %rdx, -0xf8(%rbp) 0x107915443 <+51>: callq 0x107d87bb0 ; lldb_private::CommandObject::GetSelectedOrDummyTarget at CommandObject.cpp:1045 0x107915448 <+56>: movq %rax, -0x38(%rbp) 0x10791544c <+60>: cmpq $0x0, -0x38(%rbp) 0x107915454 <+68>: jne 0x107915481 ; <+113> at CommandObjectBreakpoint.cpp:1420 0x10791545a <+74>: leaq 0xf54d21(%rip), %rsi ; "Invalid target. No current target or breakpoints." 0x107915461 <+81>: movq -0x30(%rbp), %rdi 0x107915465 <+85>: callq 0x107d93640 ; lldb_private::CommandReturnObject::AppendError at CommandReturnObject.cpp:135 0x10791546a <+90>: movl $0x1, %esi 0x10791546f <+95>: movq -0x30(%rbp), %rdi 0x107915473 <+99>: callq 0x107d93760 ; lldb_private::CommandReturnObject::SetStatus at CommandReturnObject.cpp:172 0x107915478 <+104>: movb $0x1, -0x11(%rbp) 0x10791547c <+108>: jmp 0x1079158bd ; <+1197> at CommandObjectBreakpoint.cpp:1470 0x107915481 <+113>: movq -0x38(%rbp), %rdi
The main drawback for this new arrangement is that you may be looking at a long series of instructions and forget the name of the function/method. You'll need to scroll backwards to the beginning of the disassembly to find this function's names. Minor details include doing a two-pass over the instruction list to find the maximum length of the address component and padding all the lines so the opcodes line up. For instance,
(lldb) disass -c 30 -n mach_msg_trap
libsystem_kernel.dylib`mach_msg_trap:
0x7fff94fbe188 <+0>: movq %rcx, %r10 0x7fff94fbe18b <+3>: movl $0x100001f, %eax 0x7fff94fbe190 <+8>: syscall 0x7fff94fbe192 <+10>: retq 0x7fff94fbe193 <+11>: nop
dyld`mach_msg_trap:
0x7fff6a867210 <+0>: movq %rcx, %r10 0x7fff6a867213 <+3>: movl $0x100001f, %eax 0x7fff6a867218 <+8>: syscall 0x7fff6a86721a <+10>: retq 0x7fff6a86721b <+11>: nop
The disassembly format can be overridden by the 'disassembly-format' setting if people have specific preferences. But I think this new hybrid style of disassembly will work the best as a default given the kinds of method names we see with OO languages.
Comments? I'd like to land this in a couple days if no one feels strongly about it.