This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/source/Plugins/Disassembler/llvm/
-
source/
-
Plugins/
-
Disassembler/
-
llvm/
1/2
DisassemblerLLVMC.cpp

Differential D69210

[Disassembler] Simplify MCInst predicates
Needs ReviewPublic

Authored by vsk on Oct 18 2019, 6:00 PM.

Download Raw Diff

Details

Reviewers

jasonmolenda
ab

Summary

DisassemblerLLVMC exposes a few MCInst predicates (e.g. HasDelaySlot).
Group the logic for evaluating the existing predicates together, to prep
for adding new ones.

This is NFC-ish: with this change, the existing predicates will return
the conservative defaults when the disassembler is unavailable instead
of false, but I'm not sure that really matters.

Diff Detail

Event Timeline

vsk created this revision.Oct 18 2019, 6:00 PM

Hm, this patch is bugging me.

It looks a bit like instructions are still decoded multiple times in different ways (e.g. in the Decode and CalculateMnemonicOperandsAndComment methods, which both modify m_opcode). Any ideas on whether/how to consolidate these?

In D69210#1715679, @vsk wrote:

Hm, this patch is bugging me.

It looks a bit like instructions are still decoded multiple times in different ways (e.g. in the Decode and CalculateMnemonicOperandsAndComment methods, which both modify m_opcode). Any ideas on whether/how to consolidate these?

I am all for anything that will improve efficiency. This class has evolved over time where we started with just the "CalculateMnemonicOperandsAndComment" and then many other features (can branch, etc) were later built into the class. I don't believe instructions are kept around for long so they typically serve one of two purposes:

disassembly of instruction stream where only CalculateMnemonicOperandsAndComment is needed
inspection of multiple instructions for stepping looking at can branch and other information requests

So I am not sure the decoded multiple times in different ways is really important unless we do have a costly client that does both CalculateMnemonicOperandsAndComment and inspecting of instruction attributes (can branch, etc). Again, these objects are created, used and discarded currently AFAIK.

lldb/source/Plugins/Disassembler/llvm/DisassemblerLLVMC.cpp
87	Why init this to eLazyBoolYes?

In D69210#1716861, @clayborg wrote:

In D69210#1715679, @vsk wrote:

Hm, this patch is bugging me.

It looks a bit like instructions are still decoded multiple times in different ways (e.g. in the Decode and CalculateMnemonicOperandsAndComment methods, which both modify m_opcode). Any ideas on whether/how to consolidate these?

I am all for anything that will improve efficiency. This class has evolved over time where we started with just the "CalculateMnemonicOperandsAndComment" and then many other features (can branch, etc) were later built into the class. I don't believe instructions are kept around for long so they typically serve one of two purposes:

disassembly of instruction stream where only CalculateMnemonicOperandsAndComment is needed

inspection of multiple instructions for stepping looking at can branch and other information requests

So I am not sure the decoded multiple times in different ways is really important unless we do have a costly client that does both CalculateMnemonicOperandsAndComment and inspecting of instruction attributes (can branch, etc). Again, these objects are created, used and discarded currently AFAIK.

Thanks for your comment Greg. Let me try and restate the issue I see as my concern isn't about performance.

It looks like Decode and CalculateMnemonicOperandsAndComment mutate m_opcode in different ways. Separately, the predicates read m_opcode. So I'm not sure whether/in-which-order the mutating methods need to be run before the predicates can safely be called. I'd like to consolidate all the code that mutates m_opcode in one place, to make the predicates always safe to call. Does that seem reasonable? Or am I overthinking something?

lldb/source/Plugins/Disassembler/llvm/DisassemblerLLVMC.cpp
87	I believe this preserves the existing behavior of the class. InstructionLLVMC conservatively says that instructions can branch, in the absence of information.

In D69210#1717042, @vsk wrote:

In D69210#1716861, @clayborg wrote:

In D69210#1715679, @vsk wrote:

Hm, this patch is bugging me.

It looks a bit like instructions are still decoded multiple times in different ways (e.g. in the Decode and CalculateMnemonicOperandsAndComment methods, which both modify m_opcode). Any ideas on whether/how to consolidate these?

I am all for anything that will improve efficiency. This class has evolved over time where we started with just the "CalculateMnemonicOperandsAndComment" and then many other features (can branch, etc) were later built into the class. I don't believe instructions are kept around for long so they typically serve one of two purposes:

disassembly of instruction stream where only CalculateMnemonicOperandsAndComment is needed

inspection of multiple instructions for stepping looking at can branch and other information requests

So I am not sure the decoded multiple times in different ways is really important unless we do have a costly client that does both CalculateMnemonicOperandsAndComment and inspecting of instruction attributes (can branch, etc). Again, these objects are created, used and discarded currently AFAIK.

Thanks for your comment Greg. Let me try and restate the issue I see as my concern isn't about performance.

It looks like Decode and CalculateMnemonicOperandsAndComment mutate m_opcode in different ways. Separately, the predicates read m_opcode. So I'm not sure whether/in-which-order the mutating methods need to be run before the predicates can safely be called. I'd like to consolidate all the code that mutates m_opcode in one place, to make the predicates always safe to call. Does that seem reasonable? Or am I overthinking something?

It seems that CalculateMnemonicOperandsAndComment only mutates m_opcode when the instruction size returned by:

size_t inst_size = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);

is zero. It also is unclear to me that the mutating calls in CalculateMnemonicOperandsAndComment really do anything? They decode a value from the data, then then put them back into m_opcode? Also, if "inst_size" is zero in decode:

const size_t inst_size =
    mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
if (inst_size == 0)
  m_opcode.Clear();
else {
  m_opcode.SetOpcodeBytes(opcode_data, inst_size);
  m_is_valid = true;
}

Then m_opcode.Clear() is called and m_opcode won't contain anything, so I am guess only architectures with fixed opcode sizes will be able to show ".long" or any of that kind of stuff? And only those will trigger mutating the opcode value in CalculateMnemonicOperandsAndComment?

I don't think it should be necessary to read the class in its entirety to understand when m_opcode is safe to use. However, as I'm not sure how the disassembler is called in to, I don't think it's a good idea to refactor the whole thing right away.

Let's start with this simple change to drop some redundant code, and maybe revisit things later?

Revision Contents

Path

Size

lldb/

source/

Plugins/

Disassembler/

llvm/

DisassemblerLLVMC.cpp

126 lines

Diff 225728

lldb/source/Plugins/Disassembler/llvm/DisassemblerLLVMC.cpp

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	private:
std::unique_ptr<llvm::MCSubtargetInfo> m_subtarget_info_up;		std::unique_ptr<llvm::MCSubtargetInfo> m_subtarget_info_up;
std::unique_ptr<llvm::MCAsmInfo> m_asm_info_up;		std::unique_ptr<llvm::MCAsmInfo> m_asm_info_up;
std::unique_ptr<llvm::MCContext> m_context_up;		std::unique_ptr<llvm::MCContext> m_context_up;
std::unique_ptr<llvm::MCDisassembler> m_disasm_up;		std::unique_ptr<llvm::MCDisassembler> m_disasm_up;
std::unique_ptr<llvm::MCInstPrinter> m_instr_printer_up;		std::unique_ptr<llvm::MCInstPrinter> m_instr_printer_up;
};		};

class InstructionLLVMC : public lldb_private::Instruction {		class InstructionLLVMC : public lldb_private::Instruction {
public:		private:
InstructionLLVMC(DisassemblerLLVMC &disasm,		void VisitInstruction() {
const lldb_private::Address &address,		// Be conservative. If we didn't understand the instruction, say it:
AddressClass addr_class)		// - Might branch
: Instruction(address, addr_class),		// - Does not have a delay slot
m_disasm_wp(std::static_pointer_cast<DisassemblerLLVMC>(		// - Is not a call
disasm.shared_from_this())),		m_does_branch = eLazyBoolYes;
		clayborgUnsubmitted Not Done Reply Inline Actions Why init this to eLazyBoolYes? clayborg: Why init this to eLazyBoolYes?
		vskAuthorUnsubmitted Done Reply Inline Actions I believe this preserves the existing behavior of the class. InstructionLLVMC conservatively says that instructions can branch, in the absence of information. vsk: I believe this preserves the existing behavior of the class. InstructionLLVMC conservatively…
m_does_branch(eLazyBoolCalculate), m_has_delay_slot(eLazyBoolCalculate),		m_has_delay_slot = eLazyBoolNo;
m_is_call(eLazyBoolCalculate), m_is_valid(false),		m_is_call = eLazyBoolNo;
m_using_file_addr(false) {}

~InstructionLLVMC() override = default;

bool DoesBranch() override {
if (m_does_branch == eLazyBoolCalculate) {
DisassemblerScope disasm(*this);		DisassemblerScope disasm(*this);
if (disasm) {		if (!disasm)
		return;

DataExtractor data;		DataExtractor data;
if (m_opcode.GetData(data)) {		if (!m_opcode.GetData(data))
		return;

bool is_alternate_isa;		bool is_alternate_isa;
lldb::addr_t pc = m_address.GetFileAddress();		lldb::addr_t pc = m_address.GetFileAddress();

DisassemblerLLVMC::MCDisasmInstance *mc_disasm_ptr =		DisassemblerLLVMC::MCDisasmInstance *mc_disasm_ptr =
GetDisasmToUse(is_alternate_isa, disasm);		GetDisasmToUse(is_alternate_isa, disasm);
const uint8_t *opcode_data = data.GetDataStart();		const uint8_t *opcode_data = data.GetDataStart();
const size_t opcode_data_len = data.GetByteSize();		const size_t opcode_data_len = data.GetByteSize();
llvm::MCInst inst;		llvm::MCInst inst;
const size_t inst_size =		const size_t inst_size =
mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);		mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
// Be conservative, if we didn't understand the instruction, say it
// might branch...
if (inst_size == 0)		if (inst_size == 0)
m_does_branch = eLazyBoolYes;		return;
else {
const bool can_branch = mc_disasm_ptr->CanBranch(inst);		m_does_branch = mc_disasm_ptr->CanBranch(inst) ? eLazyBoolYes : eLazyBoolNo;
if (can_branch)		m_has_delay_slot =
m_does_branch = eLazyBoolYes;		mc_disasm_ptr->HasDelaySlot(inst) ? eLazyBoolYes : eLazyBoolNo;
else		m_is_call = mc_disasm_ptr->IsCall(inst) ? eLazyBoolYes : eLazyBoolNo;
m_does_branch = eLazyBoolNo;
}
}
}
}		}

		public:
		InstructionLLVMC(DisassemblerLLVMC &disasm,
		const lldb_private::Address &address,
		AddressClass addr_class)
		: Instruction(address, addr_class),
		m_disasm_wp(std::static_pointer_cast<DisassemblerLLVMC>(
		disasm.shared_from_this())),
		m_does_branch(eLazyBoolCalculate), m_has_delay_slot(eLazyBoolCalculate),
		m_is_call(eLazyBoolCalculate), m_is_valid(false),
		m_using_file_addr(false) {}

		~InstructionLLVMC() override = default;

		bool DoesBranch() override {
		if (m_does_branch == eLazyBoolCalculate)
		VisitInstruction();
return m_does_branch == eLazyBoolYes;		return m_does_branch == eLazyBoolYes;
}		}

bool HasDelaySlot() override {		bool HasDelaySlot() override {
if (m_has_delay_slot == eLazyBoolCalculate) {		if (m_has_delay_slot == eLazyBoolCalculate)
DisassemblerScope disasm(*this);		VisitInstruction();
if (disasm) {
DataExtractor data;
if (m_opcode.GetData(data)) {
bool is_alternate_isa;
lldb::addr_t pc = m_address.GetFileAddress();

DisassemblerLLVMC::MCDisasmInstance *mc_disasm_ptr =
GetDisasmToUse(is_alternate_isa, disasm);
const uint8_t *opcode_data = data.GetDataStart();
const size_t opcode_data_len = data.GetByteSize();
llvm::MCInst inst;
const size_t inst_size =
mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
// if we didn't understand the instruction, say it doesn't have a
// delay slot...
if (inst_size == 0)
m_has_delay_slot = eLazyBoolNo;
else {
const bool has_delay_slot = mc_disasm_ptr->HasDelaySlot(inst);
if (has_delay_slot)
m_has_delay_slot = eLazyBoolYes;
else
m_has_delay_slot = eLazyBoolNo;
}
}
}
}
return m_has_delay_slot == eLazyBoolYes;		return m_has_delay_slot == eLazyBoolYes;
}		}

DisassemblerLLVMC::MCDisasmInstance *GetDisasmToUse(bool &is_alternate_isa) {		DisassemblerLLVMC::MCDisasmInstance *GetDisasmToUse(bool &is_alternate_isa) {
DisassemblerScope disasm(*this);		DisassemblerScope disasm(*this);
return GetDisasmToUse(is_alternate_isa, disasm);		return GetDisasmToUse(is_alternate_isa, disasm);
}		}

▲ Show 20 Lines • Show All 695 Lines • ▼ Show 20 Lines	if (Log *log =

log->PutString(ss.GetString());		log->PutString(ss.GetString());
}		}

return true;		return true;
}		}

bool IsCall() override {		bool IsCall() override {
if (m_is_call == eLazyBoolCalculate) {		if (m_is_call == eLazyBoolCalculate)
DisassemblerScope disasm(*this);		VisitInstruction();
if (disasm) {
DataExtractor data;
if (m_opcode.GetData(data)) {
bool is_alternate_isa;
lldb::addr_t pc = m_address.GetFileAddress();

DisassemblerLLVMC::MCDisasmInstance *mc_disasm_ptr =
GetDisasmToUse(is_alternate_isa, disasm);
const uint8_t *opcode_data = data.GetDataStart();
const size_t opcode_data_len = data.GetByteSize();
llvm::MCInst inst;
const size_t inst_size =
mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
if (inst_size == 0) {
m_is_call = eLazyBoolNo;
} else {
if (mc_disasm_ptr->IsCall(inst))
m_is_call = eLazyBoolYes;
else
m_is_call = eLazyBoolNo;
}
}
}
}
return m_is_call == eLazyBoolYes;		return m_is_call == eLazyBoolYes;
}		}

protected:		protected:
std::weak_ptr<DisassemblerLLVMC> m_disasm_wp;		std::weak_ptr<DisassemblerLLVMC> m_disasm_wp;
LazyBool m_does_branch;		LazyBool m_does_branch;
LazyBool m_has_delay_slot;		LazyBool m_has_delay_slot;
LazyBool m_is_call;		LazyBool m_is_call;
▲ Show 20 Lines • Show All 533 Lines • Show Last 20 Lines