This is an archive of the discontinued LLVM Phabricator instance.

[Disassembler] Simplify MCInst predicates
Needs ReviewPublic

Authored by vsk on Oct 18 2019, 6:00 PM.

Details

Reviewers
jasonmolenda
ab
Summary

DisassemblerLLVMC exposes a few MCInst predicates (e.g. HasDelaySlot).
Group the logic for evaluating the existing predicates together, to prep
for adding new ones.

This is NFC-ish: with this change, the existing predicates will return
the conservative defaults when the disassembler is unavailable instead
of false, but I'm not sure that really matters.

Diff Detail

Event Timeline

vsk created this revision.Oct 18 2019, 6:00 PM
vsk added a comment.Oct 19 2019, 11:01 AM

Hm, this patch is bugging me.

It looks a bit like instructions are still decoded multiple times in different ways (e.g. in the Decode and CalculateMnemonicOperandsAndComment methods, which both modify m_opcode). Any ideas on whether/how to consolidate these?

In D69210#1715679, @vsk wrote:

Hm, this patch is bugging me.

It looks a bit like instructions are still decoded multiple times in different ways (e.g. in the Decode and CalculateMnemonicOperandsAndComment methods, which both modify m_opcode). Any ideas on whether/how to consolidate these?

I am all for anything that will improve efficiency. This class has evolved over time where we started with just the "CalculateMnemonicOperandsAndComment" and then many other features (can branch, etc) were later built into the class. I don't believe instructions are kept around for long so they typically serve one of two purposes:

  • disassembly of instruction stream where only CalculateMnemonicOperandsAndComment is needed
  • inspection of multiple instructions for stepping looking at can branch and other information requests

So I am not sure the decoded multiple times in different ways is really important unless we do have a costly client that does both CalculateMnemonicOperandsAndComment and inspecting of instruction attributes (can branch, etc). Again, these objects are created, used and discarded currently AFAIK.

lldb/source/Plugins/Disassembler/llvm/DisassemblerLLVMC.cpp
87

Why init this to eLazyBoolYes?

vsk marked an inline comment as done.Oct 21 2019, 11:14 AM
In D69210#1715679, @vsk wrote:

Hm, this patch is bugging me.

It looks a bit like instructions are still decoded multiple times in different ways (e.g. in the Decode and CalculateMnemonicOperandsAndComment methods, which both modify m_opcode). Any ideas on whether/how to consolidate these?

I am all for anything that will improve efficiency. This class has evolved over time where we started with just the "CalculateMnemonicOperandsAndComment" and then many other features (can branch, etc) were later built into the class. I don't believe instructions are kept around for long so they typically serve one of two purposes:

  • disassembly of instruction stream where only CalculateMnemonicOperandsAndComment is needed
  • inspection of multiple instructions for stepping looking at can branch and other information requests

So I am not sure the decoded multiple times in different ways is really important unless we do have a costly client that does both CalculateMnemonicOperandsAndComment and inspecting of instruction attributes (can branch, etc). Again, these objects are created, used and discarded currently AFAIK.

Thanks for your comment Greg. Let me try and restate the issue I see as my concern isn't about performance.

It looks like Decode and CalculateMnemonicOperandsAndComment mutate m_opcode in different ways. Separately, the predicates read m_opcode. So I'm not sure whether/in-which-order the mutating methods need to be run before the predicates can safely be called. I'd like to consolidate all the code that mutates m_opcode in one place, to make the predicates always safe to call. Does that seem reasonable? Or am I overthinking something?

lldb/source/Plugins/Disassembler/llvm/DisassemblerLLVMC.cpp
87

I believe this preserves the existing behavior of the class. InstructionLLVMC conservatively says that instructions can branch, in the absence of information.

In D69210#1717042, @vsk wrote:
In D69210#1715679, @vsk wrote:

Hm, this patch is bugging me.

It looks a bit like instructions are still decoded multiple times in different ways (e.g. in the Decode and CalculateMnemonicOperandsAndComment methods, which both modify m_opcode). Any ideas on whether/how to consolidate these?

I am all for anything that will improve efficiency. This class has evolved over time where we started with just the "CalculateMnemonicOperandsAndComment" and then many other features (can branch, etc) were later built into the class. I don't believe instructions are kept around for long so they typically serve one of two purposes:

  • disassembly of instruction stream where only CalculateMnemonicOperandsAndComment is needed
  • inspection of multiple instructions for stepping looking at can branch and other information requests

So I am not sure the decoded multiple times in different ways is really important unless we do have a costly client that does both CalculateMnemonicOperandsAndComment and inspecting of instruction attributes (can branch, etc). Again, these objects are created, used and discarded currently AFAIK.

Thanks for your comment Greg. Let me try and restate the issue I see as my concern isn't about performance.

It looks like Decode and CalculateMnemonicOperandsAndComment mutate m_opcode in different ways. Separately, the predicates read m_opcode. So I'm not sure whether/in-which-order the mutating methods need to be run before the predicates can safely be called. I'd like to consolidate all the code that mutates m_opcode in one place, to make the predicates always safe to call. Does that seem reasonable? Or am I overthinking something?

It seems that CalculateMnemonicOperandsAndComment only mutates m_opcode when the instruction size returned by:

size_t inst_size = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);

is zero. It also is unclear to me that the mutating calls in CalculateMnemonicOperandsAndComment really do anything? They decode a value from the data, then then put them back into m_opcode? Also, if "inst_size" is zero in decode:

const size_t inst_size =
    mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
if (inst_size == 0)
  m_opcode.Clear();
else {
  m_opcode.SetOpcodeBytes(opcode_data, inst_size);
  m_is_valid = true;
}

Then m_opcode.Clear() is called and m_opcode won't contain anything, so I am guess only architectures with fixed opcode sizes will be able to show ".long" or any of that kind of stuff? And only those will trigger mutating the opcode value in CalculateMnemonicOperandsAndComment?

vsk added a comment.Oct 24 2019, 1:44 PM

I don't think it should be necessary to read the class in its entirety to understand when m_opcode is safe to use. However, as I'm not sure how the disassembler is called in to, I don't think it's a good idea to refactor the whole thing right away.

Let's start with this simple change to drop some redundant code, and maybe revisit things later?