Page MenuHomePhabricator

Initial Assembly profiler for mips64

Authored by bhushan on Feb 17 2015, 3:45 AM.



This is initial implementation of assembly profiler which only scans prologue/epilogue assembly instructions to create CFI instructions.

Diff Detail


Event Timeline

bhushan updated this revision to Diff 20072.Feb 17 2015, 3:45 AM
bhushan retitled this revision from to Initial Assembly profiler for mips64.
bhushan updated this object.
bhushan edited the test plan for this revision. (Show Details)
bhushan added reviewers: jasonmolenda, clayborg.
bhushan set the repository for this revision to rL LLVM.
bhushan added subscribers: Unknown Object (MLST), dsanders, mohit.bhakkad.
jasonmolenda accepted this revision.Feb 18 2015, 7:59 PM
jasonmolenda edited edge metadata.

This looks pretty good, thanks for putting this together. I apologize for not having time to get back to you about this patch earlier.

I made a few in-line comments, they are simple changes. Please feel free to commit once you've addressed them if you have commit access. I didn't review the patch for correct unwinding of mips64 stack frames, I'll take your word that part is correct :) but in general it looks good.

193 ↗(On Diff #20072)

Only mips64 is supported by this code right now. Do you think there will be mips32 support added later? The conditionalized code to set up the register numbers (e.g. m_machine_ra_regnum) or get the number of registers defined in this ctor are only doing one thing right now with mips64 support. If you want to leave this conditionalized code in place so mips32 is easier to add in the future, that's fine.

295 ↗(On Diff #20072)

Doesn't the mips64 ISA have a fixed instruction length? It seems like instruction_length() should always return 4 - I don't think I'd even define the method if it were me.

For that matter, the only reason the x86 assembly profiler creates an llvm disassembler is to get the instruction length. I don't think the m_disasm_context ivar is needed in your profiler.

369 ↗(On Diff #20072)

I'm sure you copied this same bad format from UnwindAssembly-x86.cpp but please put the return type on a line by itself to be consistent with the lldb coding style. UnwindAssembly-x86.cpp was written early in lldb's bringup and the code style is a little off - I've been cleaning it up slowly over time when I notice mistakes but I'm sure these are still in that source file. So,

AssemblyParse_mips::add_imm_sp_pattern_p (int& amount)

(and the same for the rest of the methods also doing this)

418 ↗(On Diff #20072)

We prefer a space between the "if" and the expression - this also comes up in mov_fp_to_sp_pattern_p().

451 ↗(On Diff #20072)

If you didn't have a function bounds you cap it to 512 bytes in the ctor. You might not want to set a PlanValidAddressRange() here. I don't remember if the unwinder actually makes use of the valid address range or not -- but it's conceivable that someone could be stopped at byte 516 in the function and the unwinder wouldn't use your unwind plan. (or worse, the unwinder would start to use the offset in the future and break the mips64 unwinds.)

This revision is now accepted and ready to land.Feb 18 2015, 7:59 PM

I would like to ask for some tests before this gets committed.

Writing unwinder test cases is tricky. The *basic* functionality is tested by the entire testsuite any time they try to step-over or backtrace. But testing corner cases of tricky unwind situations involves either capturing core files of tricky situations or hand-writing OS/arch-specific assembly routines. It's not impossible but it's something we've never done until now.

(I personally have a dozen-or-so core files saved from cases where I had an unwinder bug - but they're not something I can share publicly and moreover, they're GIGANTIC.)

Does this work make any tests pass that didn't pass before?

For what it's worth, an example of a test file for this code would be a driver program written in C for simplicity which calls under1() written in assembly, which in turn calls under2(). lldb would stop in under2(), go up the stack to under1() and try to print a register that was preserved.

Some utterly invalid syntax for a test file would look like:

int main()




mov r16, 1
call under2


sd r16(sp)       # I have no idea if that is right -- save contents of r16 on the local stack frame
mov r16, 5
call puts          ## whatever, stop at a breakpoint here


If lldb could select frame 1 and see that r16 has the value 1, that's a simple test case.

But this test could only be run on a mips64 system.

I want to add test cases like this for x86 but it might take some fiddling to get assembly that would work on all the Unix x86 systems at least (nevermind Windows! ;) and I've never gotten around to it. I should start with at least simple x86 unwinder test cases that work correctly on Mac as a step 0.

I should start with at least simple x86 unwinder test cases that work

correctly on Mac as a step 0.

That would be incredibly helpful. I've asked ARM/Linaro, Imagination and
Intel for help on LLDB. Your examples in one architecture could be
templates for examples in other architectures.

If you don't have time for that, just putting together a doc of things for
them to check would be incredibly valuable.

I agree that unwinding is probably a hard thing to write tests for. But
right now it has basically 0 coverage, so I really hope we can improve
that. I'll be the first to admit that I am *not* an expert in this area,
but just looking at the patch description "This is initial implementation
of assembly profiler which only scans prologue/epilogue assembly
instructions to create CFI instructions" I feel like this is something that
would be a great candidate for unit tests. Since a unit test is written in
C++, it need not be written against the public API. You could just feed
the profiler sequences of bytes that represent prologues / epilogues, and
verify that the CFI instructions are correct. (I hope that even makes
sense, again I'm not an expert on this, just trying to make guesses based
on the patch description).

You wouldn't even need to have an input binary to make this work. Just
encode some prologues and epilogues into the unit test, and verify that the
profiler does the right thing.

OK, I see what you're thinking of. I was looking at this from a dynamic point of view with an actually running process. We need a live process for most of the unwinder stuff to work correctly -- you don't have any information about registers until you can get a RegisterContext from a Thread. That means you can only run the tests on a matching architecture system. (e.g. see the ctor in this patch where it gets a RegisterContext to look up registers by name).

We'd need also need to add a feature like "show me the unwind info for this block of bytes". What we really need is an SB API for representing unwind information and an API to get the unwind information for a block of bytes. Otherwise our test file has to do something like parse the output of "image show-unwind" in some form.

Given that these unit tests could only be run on the mips64 system itself, we don't get anything from adding all these facilities - we could do the hand-written assembly approach and test behaviors with the higher level lldb cmds.

dsanders added inline comments.Feb 19 2015, 2:20 AM
295 ↗(On Diff #20072)

That's right, instructions are always 4 bytes for MIPS32 and MIPS64. However, the microMIPS extension is a mixture of 2-byte and 4-byte instructions and can be mixed with MIPS32/MIPS64. I don't think we want to worry about microMIPS this early though.

Thanks jasonmolenda for your review and comments.
I will do the corrections as suggested by you.
My comments are inlined.

193 ↗(On Diff #20072)

Yes, mips32 support will be added later.

295 ↗(On Diff #20072)

Yes, mips64 only ISA has fixed instruction length, however this function will be useful when micromips support will be added. micromips has mix sized instructions (2 and 4 bytes).

369 ↗(On Diff #20072)

I will correct this.

418 ↗(On Diff #20072)

I will correct this.

451 ↗(On Diff #20072)

Ok, I will remove this code (line 451).

A change recently went in to the LLDB Coding Standard to remove the requirement that a space be inserted between function names and argument lists. So "foo ()" becomes foo(). It would be nice if new code going forward could adhere to this. If you want to do this very easily, then you can just run clang-format on your patch. While your patch is staged in git, just run "git clang-format" (src\llvm\tools\clang\tools\clang-format\git-clang-format will need to be in your PATH, and have +x permission)

clayborg requested changes to this revision.Feb 19 2015, 11:16 AM
clayborg edited edge metadata.

Patch looks good, but for anything that is playing with instructions, I would prefer if you subclass EmulateInstruction and make an instruction emulator for any instructions you plan on pulling apart. It would be nice to create an EmulateInstructionMIPS64 class that inherits from EmulateInstruction (see the lldb/source/Plugins/Instruction/ARM/EmulateInstructionARM.cpp for an example). We might have the need to parse/emulate MIPS64 instructions in other places and placing that parsing code into EmulateInstructionMIPS64 ensure others can re-use your instruction parser for in other places in the code.

See the comments at the top of EmulateInstruction.h for more details and let me know if you have any questions.

This revision now requires changes to proceed.Feb 19 2015, 11:16 AM

I don't want to necessarily disagree with Greg but his suggestion about implementing an instruction emulation approach is a pretty big change to this patch. There are two approaches to creating an unwind plan based on an instruction stream: Hard-code knowledge about a small set of instructions that appear in function epilogues/prologues and use a disassembler to step over the unknown instructions, or have an instruction emulator that knows the behavior of them (well enough) so it can model the relevant instructions.

For arm & arm64, lldb uses instruction emulation. This was aided by the ARM instruction xml files which allow for much of the emulation code to be generated.

The emulation approach allows for great flexibility, but it is (obviously) a lot more work. The emulation approach can win when unusual function prologue instructions appear -- with the hard-coded instruction analyzer you need to know all of the different instructions that the compiler or hand-written assembly may use to manipulate the stack pointer, frame pointer, the caller's instruction and any saved registers.

I don't think we should reject this patch in favor of writing this in the emulation style - that's likely to be a big ask.

Regardless of this patch, is instruction emulation going to yield the best
results for all architectures?

When you're talking about compiler generated code, a hand written assembly parser is easy. The complication comes when the compiler changes how it expresses prologues -- or hand written assembly routines.

A simple example for x86_64, you can save a register to the stack with either

push %rbx


movq %rax, -0x10(%rbp)

(using at&t assembly syntax). If your compiler only does the pushes and that's all you handle, your profiler will fail as soon as it sees the alternate instruction to do the same.

You see the assembly profilers like UnwindAssembly-x86 fall over if you don't think up all the instructions that might occur in prologue/epilogues. That's the risk of them.

I just looked into the existing arm & arm64 emulation routines and I think I mis-stated what Greg suggested. The arm instruction emulation (see source/Plugins/Instruction/ARM) is a full instruction emulation from the ARM xml instruction description files. It could be a big undertaking.

The arm64 instruction emulation (see source/Plugins/Instruction/ARM64) is a hand written emulation of only a dozen different instructions - I hadn't looked into it in detail until now. I think Greg would prefer that the MIPS assembly profiler be written in a similar style to the ARM64 instruction emulation routine.

The Instruction plugins are run by the UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp class. This class uses the Instruction subclass to classify the instruction stream and it generates the UnwindPlan from there.

For that matter, you could argue that UnwindAssembly-x86 could be rewritten in the same way.

I'm less against Greg's suggestion given all of this. It's separating the instruction recognition logic from the UnwindPlan creation logic and seems like a good direction to head in. I haven't done this for the x86 profiler yet myself so I feel bad about asking someone else to jump in here. The "full emulation of the entire ISA is more flexible" argument doesn't hold for something like the EmulateInstructionARM64 class but there is the separation of logic to argue in favor of it.

So here are my unsolicited thoughts on an issue I have a tenuous grasp of.

I'm going to go out on a limb and say that there isn't a whole lot of hand
written assembly written for MIPS in applications today. If this gets MIPS
on par with x86 support, maybe it's good enough for now? It sounds like
instruction emulation will give the best results but it sounds like it
might be overkill right now, especially if they're just trying to bring it

That said, it they are closing on clean tests and they want the best
results in all cases, I think instruction emulation is the way to go.

Am I missing something?


I think you're making the same mistake I made initially when Greg proposed "instruction emulation". This can mean two things:

  1. Full emulation of the entire instruction set by the Instruction plugin, with flags to indicate instructions that are relevant to prologue/epilogue analysis.
  2. Emulation a half dozen instructions used in prologues/epilogues, ignoring all the others and assuming they don't do anything related to unwinding.

In both cases, the EmulateInstruction plugin marks up attributes about the instructions ("This instruction just saved a register on the stack", "this instruction just changed the stack pointer"). Another architecture-independent class, UnwindAssemblyInstEmulation, uses these attributes and generates an UnwindPlan.

Greg's suggestion to write the MIPS assembly profiler is suggesting approach #2 above. He isn't looking for full ISA emulation like we happen to do for ARM. The ARM64 assembly profiler is a better example of what he wants here.

The UnwindAssembly-x86 class mixes the architecture specific instruction matching and the UnwindPlan generation together. He wants to see a separation of the architecture specific bits and the architecture independent bits, that's all.

The only hiccup to this, my only reluctance to push for this wholeheartedly, is that UnwindAssemblyInstEmulation *should* be architecture independent but to-date it has only been used for arm & arm64. It may have assumptions about the way those architectures set up/tear down stack frames.

It would be interesting to rewrite UnwindAssembly-x86 in the instruction emulation style to shake out any arm assumptions in UnwindAssemblyInstEmulation -- and when I say write it in instruction-emulation style, I don't mean adding any additional instructions, just the dozen insns it knows about today. But it'll be a little while before I have free time to do that myself.

FWIW, I was looking for something exactly like this, but with mips32 support, so I took this and built on top of it somewhat: I tried making more use of the disassembler, since I don't want to have to put assumptions about instruction encoding in here (which also made it easy to support mips32). The only problem I encountered was that is not a public LLVM header, but that problem seems surmountable (I'll start a thread on the mailing list). Not arguing against the instruction emulation approach of course, but wanted to post my version (which is working well for me) for those interested.

bhushan updated this revision to Diff 21561.Mar 10 2015, 3:50 AM
bhushan edited edge metadata.
bhushan added a subscriber: jaydeep.

This patch implements MIPS64 assembly profiler using EmulateInstruction class.

This looks good to me. Greg?

99 ↗(On Diff #21561)


clayborg accepted this revision.Mar 10 2015, 3:53 PM
clayborg edited edge metadata.

Looks great. Thanks for taking the time to do it right.

This revision is now accepted and ready to land.Mar 10 2015, 3:53 PM
Closed by commit rL232619: Initial Assembly profiler for mips64 (authored by bhushan.attarde). · Explain WhyMar 18 2015, 2:24 AM
This revision was automatically updated to reflect the committed changes.
labath added a subscriber: labath.Mar 18 2015, 3:01 AM
labath added inline comments.

I got a warning here. I presume you meant !(n == 29) here. I've fixed it, but please check.

flackr added a subscriber: flackr.Mar 18 2015, 9:00 AM

FYI, this seems to have broken the xcode build for me:

Undefined symbols for architecture x86_64:

"EmulateInstructionMIPS64::Initialize()", referenced from:
    lldb_private::Initialize() in liblldb-core.a(lldb.o)
"EmulateInstructionMIPS64::Terminate()", referenced from:
    lldb_private::Terminate() in liblldb-core.a(lldb.o)

ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)


I'm assuming it just needs an xcodeproj update.

FYI, this seems to have broken the xcode build for me:

Undefined symbols for architecture x86_64:

"EmulateInstructionMIPS64::Initialize()", referenced from:
    lldb_private::Initialize() in liblldb-core.a(lldb.o)
"EmulateInstructionMIPS64::Terminate()", referenced from:
    lldb_private::Terminate() in liblldb-core.a(lldb.o)

ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)


I'm assuming it just needs an xcodeproj update.

I put up a review to fix it: