This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] llvm-objdump: disassembling amdgcn object file
ClosedPublic

Authored by vpykhtin on Feb 8 2016, 10:47 AM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
Bigcheese

Commits

rGde04805e9f05: [AMDGPU] llvm-objdump: Minimal HSA Code Object disassembler support.
rGbd90c60afb86: [AMDGPU] llvm-objdump: Minimal HSA Code Object disassembler support.
rL265645: [AMDGPU] llvm-objdump: Minimal HSA Code Object disassembler support.
rL265550: [AMDGPU] llvm-objdump: Minimal HSA Code Object disassembler support.

Summary

Minimal ISA dumper for amdgcn object file.

.hsatext ELF kernel symbol references amd_kernel_code_t runtime control structure (256 bytes) followed by instructions, so skip it.

Diff Detail

Event Timeline

vpykhtin updated this revision to Diff 47224.Feb 8 2016, 10:47 AM

vpykhtin retitled this revision from to [AMDGPU] llvm-objdump, MCDisassembler: disassembling .hsatext section of HSA Code Object (draft).

vpykhtin updated this object.

vpykhtin added a reviewer: • tstellarAMD.

vpykhtin set the repository for this revision to rL LLVM.

vpykhtin added subscribers: nhaustov, SamWot.

Herald added a subscriber: aemerson. · View Herald TranscriptFeb 8 2016, 10:47 AM

Is there some reason why you can't parse the code object in from the AMDGPU implementation of getInstruction()?

In D16998#346610, @tstellarAMD wrote:

Is there some reason why you can't parse the code object in from the AMDGPU implementation of getInstruction()?

MCDisassembler::getInstruction is declared as follows:

/// Returns the disassembly of a single instruction.
///
/// \param Instr    - An MCInst to populate with the contents of the
///                   instruction.
/// \param Size     - A value to populate with the size of the instruction, or
///                   the number of bytes consumed while attempting to decode
///                   an invalid instruction.
/// ...
virtual DecodeStatus getInstruction(MCInst &Instr, uint64_t &Size,
                                    ArrayRef<uint8_t> Bytes, uint64_t Address,
                                    raw_ostream &VStream,
                                    raw_ostream &CStream) const = 0;

Its supposed to return filled MCInst as a result and return consumed bytes num.

Current DisassemblyObject just passes what it consider as 'code' to the disassembler and therefore inside getInstruction we facing the following:

We need to know whether we're actually inside an amd_kernel_code_t. Yes we can analize Address but llvm-mc calls dissassembler for a raw stream of bytes and we need to care of it too.
amd_kernel_code_t cannot be stored to MCInst but its desirable to dump it too as part of disassembly.
it would be a hack to try reposition reader specifying negative size in the case when amd_kernel_code_t is actually before amd_kernel_code_t
we need to process all code symbols in .hsasection to find all kernels ends as kernel size is not specified by design.
DisassemblyObject knows nothing about ELF::STT_AMDGPU_HSA_KERNEL

Hi,

In the loop that iterates over symbols and emits them, there is already some target specific code for AARCH64.
I would recommend adding a callback to MCDisassembler called something like disassembleSymbolStart() and call that function for the AMDGPU target, and then move the disassembly code for the amd_kernel_code_t into the AMDGPU backend.

In D16998#347911, @tstellarAMD wrote:

Hi,

In the loop that iterates over symbols and emits them, there is already some target specific code for AARCH64.
I would recommend adding a callback to MCDisassembler called something like disassembleSymbolStart() and call that function for the AMDGPU target, and then move the disassembly code for the amd_kernel_code_t into the AMDGPU backend.

For now I just moved all my code to AMDGPUDisassembler library (links with llvm-objdump already) and call DisassembleHSACodeObject in the beginning of llvm-objdump's DisassembleObject if Obj->getArch() == Triple::amdgcn. It still lefts llvm-objdump slightly AMDGPU target dependent but now I think it's not worth to touch MCDisassembler interface until we know all the requirements. It's likely that we might want to disassemble HSA code object completely with AMDGPU code not depending on llvm-objdump.

AMDGPU dependent code moved to AMDGPUDisassembler library.
Dissassembly code slightly enhanced to be ready assembled again. A lot of TODO though.
Skipping unussembled code bytes nicely

Herald added a subscriber: arsenm. · View Herald TranscriptFeb 11 2016, 9:49 AM

vpykhtin added parent revisions: D17150: [AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields, D17144: [AMDGPU] add AMDGPU target support to ELFObjectFile.h header.Feb 11 2016, 9:50 AM

vpykhtin added a parent revision: D16723: [AMDGPU] Disassembler: Added basic disassembler for AMDGPU target..Feb 11 2016, 10:00 AM

arsenm added inline comments.Feb 11 2016, 11:47 AM

lib/Target/AMDGPU/Disassembler/HSAObjCodeDisassembler.cpp
114	Would this be useful in the MCContext header?
151–153	This is dead code
159	The triple should be amdgcn-unknown-amdhsa. Why do you need to hardcode this here?
167–168	I don't think this should be able to happen. These checks should maybe all be asserts
209	Put declarations on separate lines
226	Single quotes
229	Single quotes
tools/llvm-objdump/llvm-objdump.cpp
834	Function should start with lowercase letter

vpykhtin added inline comments.Feb 11 2016, 12:23 PM

lib/Target/AMDGPU/Disassembler/HSAObjCodeDisassembler.cpp
114	Yep, I would move it there.
151–153	Well, when NDEBUG there is no DebugFlag, but it is tested below if (!DisAsm->getInstruction(Inst, EatenBytesNum, Code.slice(Index), Index, DebugFlag ? dbgs() : nulls(), nulls())) { I just dont like interleave code with macros a lot.
159	probably not need to hardcode, but I haven't found yet how to construct it properly.
167–168	ok.

vpykhtin added a project: Restricted Project.Feb 13 2016, 3:41 AM

• tstellarAMD added inline comments.Feb 24 2016, 6:48 PM

tools/llvm-objdump/llvm-objdump.cpp
840	You aren't going to be able to call target specific functions from a common tool like this. The only way this is going to work is with some kind of generic callback.

nhaustov added subscribers: ruiu, • rafael.Mar 2 2016, 1:54 AM

nhaustov added inline comments.Mar 9 2016, 5:36 AM

tools/llvm-objdump/llvm-objdump.cpp
840	llvm-objdump actually has custom code for AArch64 that tries to determine whether bytes being disassembled is code or data: // AArch64 ELF binaries can interleave data and text in the // same section. We rely on the markers introduced to // understand what we need to dump. ... To me, proposed code in separate file looks cleaner and does not clutter llvm-objdump.cpp. Anyway, another question is whether it is really only AMDGPU that needs custom handling of parts of .text segment? How do other platforms handle metadata needed for code startup? Any pointers? One reason to keep amd_kernel_code_t (metadata needed for kernel startup) close to actual ISA is effiiency and cache concerns. We can still probably change it in new format.

HSACodeObjectDisassembler class introduced
ELFNote support added
Target ISA determined based on amdgpu_hsa_note_isa_t ELF note (spec)
printing for assembler directives

Herald added a subscriber: rampitec. · View Herald TranscriptMar 9 2016, 11:14 AM

Testcase?

Added binary test

How has the test created. What is a ".co"?

In D16998#371942, @rafael wrote:

How has the test created. What is a ".co"?

".co" means CodeObject. It was created using out of llvm tree utility. It is of HSA Code Object v1.0 format which is slightly different from what AMDGPU assembler currently emitting, the main difference is - code symbol has zero size and amd_kernel_code_t (runtime control structure) and ISA could be non-adjacent. The dissasembler is supposed to accept both versions.

lib/Target/AMDGPU/Disassembler/HSAObjCodeDisassembler.cpp
114	Let's do it in separate change later.

Adding quote from summary so it can be seen in notification emails:

Problem: this change contains direct call for disassembleHSACodeObject from llvm-objdump which is actually located in the AMDGPU library. This is incorrect as require llvm-objdump always link with AMDGPU target. There should be a target callback somewhere that allows custom object file disassembly. One thought was to add such callback (something like disassembleObjectFile(ObjFile, OutStream)) to the MCDisassembler interface but it is a bit late because MCDisassembler is created when the target ISA specification is already known (it depends on MCSubtargetInfo) but the target ISA spec is yet to be read from the object file really.

In other words, this is a Elf_Rel file? Why do you need a new extension?

Right Elf_Rel. Disassembling of this format requires different features that is currently suggested by llvm-objdump and these are really target specific. In particular we would like to print all custom directives and structures so it can be fed directly to assembler input, for example amd_kernel_code_t runtime control structure definitions. Dumping this structure without llvm-objdump->AMDGPU link dependency would require copy-paste of the target code. I'm not sure it's possible to create sufficient number of target callbacks to accomplish this.

Another issue is with the HSA Code Object 1.0 format itself - zero sized kernel code symbol require to preprocess the whole .hsatext section to find start markers of every item in this section to calculate the kernel code size. This particular issue is fixed with what AMDGPU assembler currently emits but we would like disassembler could accept old files too.

llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

Made minimal possible change.

Limitations:

accepts object file produced by current AMDGPU assembler (no older formats support)
only ISA dumping (no directives (notes), amd_kernel_code_t etc.)
mcpu should be explicitly specified upon llvm-objdump run (no autodetection)

Added test using llvm-mc | llvm-objdump check.

Bigcheese requested changes to this revision.Mar 17 2016, 1:36 PM

Bigcheese added a reviewer: Bigcheese.

Bigcheese added a subscriber: Bigcheese.

Bigcheese added inline comments.

tools/llvm-objdump/llvm-objdump.cpp
395	This should be using a endian specific type. Either llvm::support::ulittle32_t or ubig32_t.
1061	Same with the endianness.

This revision now requires changes to proceed.Mar 17 2016, 1:36 PM

Thanks for the review!

Updated the diff with issues fixed.

ping.

lgtm

This revision is now accepted and ready to land.Apr 5 2016, 4:19 PM

Closed by commit rL265550: [AMDGPU] llvm-objdump: Minimal HSA Code Object disassembler support. (authored by vpykhtin). · Explain WhyApr 6 2016, 9:00 AM

This revision was automatically updated to reflect the committed changes.

I reverted it for a while, instruction bytes are printed in bigendian order on ppc despite on support::ulittle32_t usage. Need to fix.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

Disassembler/

CMakeLists.txt

2 lines

HSAObjCodeDisassembler.cpp

524 lines

tools/

llvm-objdump/

llvm-objdump.cpp

12 lines

Diff 50161

lib/Target/AMDGPU/Disassembler/CMakeLists.txt

	include_directories( ${CMAKE_CURRENT_BINARY_DIR}/.. ${CMAKE_CURRENT_SOURCE_DIR}/.. )			include_directories( ${CMAKE_CURRENT_BINARY_DIR}/.. ${CMAKE_CURRENT_SOURCE_DIR}/.. )

	add_llvm_library(LLVMAMDGPUDisassembler			add_llvm_library(LLVMAMDGPUDisassembler
	AMDGPUDisassembler.cpp			AMDGPUDisassembler.cpp
				HSAObjCodeDisassembler.cpp
	)			)

	add_dependencies(LLVMAMDGPUDisassembler AMDGPUCommonTableGen)			add_dependencies(LLVMAMDGPUDisassembler AMDGPUCommonTableGen)
				add_dependencies(LLVMAMDGPUDisassembler LLVMAMDGPUUtils)

lib/Target/AMDGPU/Disassembler/HSAObjCodeDisassembler.cpp

This file was added.

				//===--------------------- HSAObjCodeDisassembler.cpp ---------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file - disassebly of HSA Code Object file.
				//
				//===----------------------------------------------------------------------===//

				#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
				#include "Utils/AMDKernelCodeTUtils.h"
				#include "llvm/MC/MCAsmInfo.h"
				#include "llvm/MC/MCContext.h"
				#include <llvm/MC/MCDisassembler/MCDisassembler.h>
				#include <llvm/MC/MCInst.h>
				#include <llvm/MC/MCInstPrinter.h>
				#include <llvm/MC/MCInstrInfo.h>
				#include <llvm/MC/MCObjectFileInfo.h>
				#include "llvm/MC/MCRegisterInfo.h"
				#include <llvm/Object/ELFObjectFile.h>
				#include <llvm/Support/TargetRegistry.h>


				using namespace llvm;
				using namespace object;


				// AMD GPU Note Type Enumeration Values.
				#define NT_AMDGPU_HSA_CODE_OBJECT_VERSION 1
				#define NT_AMDGPU_HSA_HSAIL 2
				#define NT_AMDGPU_HSA_ISA 3
				#define NT_AMDGPU_HSA_PRODUCER 4
				#define NT_AMDGPU_HSA_PRODUCER_OPTIONS 5
				#define NT_AMDGPU_HSA_EXTENSION 6
				#define NT_AMDGPU_HSA_HLDEBUG_DEBUG 101
				#define NT_AMDGPU_HSA_HLDEBUG_TARGET 102

				LLVM_PACKED_START

				typedef struct amdgpu_hsa_note_code_object_version_s {
				uint32_t major_version;
				uint32_t minor_version;
				} amdgpu_hsa_note_code_object_version_t;

				typedef struct amdgpu_hsa_note_isa_s {
				uint16_t vendor_name_size;
				uint16_t architecture_name_size;
				uint32_t major;
				uint32_t minor;
				uint32_t stepping;
				char vendor_and_architecture_name[1];
				} amdgpu_hsa_note_isa_t;

				LLVM_PACKED_END

				StringRef getVendorName(const amdgpu_hsa_note_isa_t& ISA) {
				return StringRef(ISA.vendor_and_architecture_name,
				ISA.vendor_name_size - 1);
				}
				StringRef getArchName(const amdgpu_hsa_note_isa_t& ISA) {
				return StringRef(ISA.vendor_and_architecture_name + ISA.vendor_name_size,
				ISA.architecture_name_size - 1);
				}

				template <typename T>
				static ArrayRef<T> trimTrailingZeroes(ArrayRef<T> A, size_t Limit) {
				const auto SizeLimit = (Limit < A.size()) ? (A.size() - Limit) : 0;
				while (A.size() > SizeLimit && !A.back())
				A = A.drop_back();
				return A;
				}

				// TODO: Move this to ArrayRef.h
				template <typename NewT, typename OldT>
				ArrayRef<NewT> makeArrayRef(ArrayRef<OldT> Ref) {
				const auto NumBytes = Ref.size() * sizeof(OldT);
				assert(0 == (NumBytes % sizeof(NewT)));
				return makeArrayRef((const NewT*)Ref.data(), NumBytes / sizeof(NewT));
				}

				// TODO: Move this to elf headers
				struct ELFNote {
				uint32_t namesz;
				uint32_t descsz;
				uint32_t type;

				ELFNote() = delete;
				ELFNote(const ELFNote&) = delete;
				void operator=(const ELFNote&) = delete;

				enum { ALIGN = 4 };

				const char* name() const {
				return reinterpret_cast<const char>(this) + sizeof(this);
				}
				const char* desc() const {
				return name() + alignTo(namesz, ALIGN);
				}
				template <typename D>
				ErrorOr<const D&> as() const {
				if (descsz < sizeof(D))
				return make_error_code(object_error::parse_failed);
				return reinterpret_cast<const D>(desc());
				}
				size_t size() const {
				return sizeof(*this) + alignTo(namesz, ALIGN) + descsz;
				}
				};
				arsenmUnsubmitted Not Done Reply Inline Actions Would this be useful in the MCContext header? arsenm: Would this be useful in the MCContext header?
				vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions Yep, I would move it there. vpykhtin: Yep, I would move it there.
				vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions Let's do it in separate change later. vpykhtin: Let's do it in separate change later.

				const ELFNote* getNext(const ELFNote& N) {
				return reinterpret_cast<const ELFNote*>(
				N.desc() + alignTo(N.descsz, ELFNote::ALIGN));
				}

				// TODO: move this template somewhere to include/object
				template <typename Item>
				class const_varsize_item_iterator :
				std::iterator<std::forward_iterator_tag, const Item, void> {
				ArrayRef<uint8_t> Ref;

				const Item *item() const {
				return reinterpret_cast<const Item*>(Ref.data());
				}
				size_t getItemPadSize() const {
				assert(Ref.size() >= sizeof(Item));
				return (const uint8_t)getNext(item()) - (const uint8_t*)item();
				}

				public:
				const_varsize_item_iterator() {}
				const_varsize_item_iterator(ArrayRef<uint8_t> Ref_) : Ref(Ref_) {}

				bool valid() const {
				return Ref.size() >= sizeof(Item) && Ref.size() >= getItemPadSize();
				}

				ErrorOr<const Item&> operator*() const {
				if (!valid())
				return make_error_code(object_error::parse_failed);
				return *item();
				}

				bool operator==(const const_varsize_item_iterator &Other) const {
				return (Ref.size() == Other.Ref.size()) &&
				(Ref.empty() \|\| Ref.data() == Other.Ref.data());
				}

				arsenmUnsubmitted Not Done Reply Inline Actions This is dead code arsenm: This is dead code
				vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions Well, when NDEBUG there is no DebugFlag, but it is tested below if (!DisAsm->getInstruction(Inst, EatenBytesNum, Code.slice(Index), Index, DebugFlag ? dbgs() : nulls(), nulls())) { I just dont like interleave code with macros a lot. vpykhtin: Well, when NDEBUG there is no DebugFlag, but it is tested below if (!DisAsm->getInstruction…
				bool operator!=(const const_varsize_item_iterator &Other) const {
				return !(*this == Other);
				}

				const_varsize_item_iterator &operator++() { // preincrement
				Ref = Ref.slice(Ref.size() > sizeof(Item) ?
				arsenmUnsubmitted Not Done Reply Inline Actions The triple should be amdgcn-unknown-amdhsa. Why do you need to hardcode this here? arsenm: The triple should be amdgcn-unknown-amdhsa. Why do you need to hardcode this here?
				vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions probably not need to hardcode, but I haven't found yet how to construct it properly. vpykhtin: probably not need to hardcode, but I haven't found yet how to construct it properly.
				(std::min)(getItemPadSize(), Ref.size()) :
				Ref.size());
				return *this;
				}
				};


				class HSACodeObject {
				public:
				arsenmUnsubmitted Not Done Reply Inline Actions I don't think this should be able to happen. These checks should maybe all be asserts arsenm: I don't think this should be able to happen. These checks should maybe all be asserts
				vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions ok. vpykhtin: ok.
				typedef ELF64LEObjectFile MyELF;
				typedef MyELF::Elf_Sym Elf_Sym;
				const MyELF& Obj;

				HSACodeObject(const MyELF Obj_) : Obj(Obj_) {}

				auto symbols() const -> decltype(Obj.symbols()) { return Obj.symbols(); }

				const Elf_Sym* getELFSymbol(SymbolRef Symbol) const {
				return Obj.getSymbol(Symbol.getRawDataRefImpl());
				}

				ArrayRef<uint8_t> getSectionContentsAsArray(uint32_t SecIdx) const {
				auto ELF = Obj.getELFFile();
				// TODO: check ErrorOr
				return ELF->getSectionContentsAsArray<uint8_t>(ELF->getSection(SecIdx));
				}

				int getSectionIdx(StringRef SecName) const;

				typedef const_varsize_item_iterator<ELFNote> const_elf_note_iterator;

				iterator_range<const_elf_note_iterator> notes() const {
				const int Idx = getSectionIdx(".note");
				return Idx >= 0 ? notes(Idx) :
				make_range(const_elf_note_iterator(), const_elf_note_iterator());
				}

				iterator_range<const_elf_note_iterator> notes(int SecIdx) const {
				const auto SecData = getSectionContentsAsArray(SecIdx);
				return make_range(const_elf_note_iterator(SecData),
				const_elf_note_iterator());
				}

				const Elf_Sym* toKernelSym(SymbolRef Symbol) const {
				auto ElfSym = getELFSymbol(Symbol);
				return (ElfSym->getType() == ELF::STT_AMDGPU_HSA_KERNEL) ?
				ElfSym : nullptr;
				}

				const amd_kernel_code_t* getAMDKernelCodeT(const Elf_Sym* ElfSym) const {
				arsenmUnsubmitted Done Reply Inline Actions Put declarations on separate lines arsenm: Put declarations on separate lines
				assert(ElfSym->getType() == ELF::STT_AMDGPU_HSA_KERNEL);
				const auto SecBytes = getSectionContentsAsArray(ElfSym->st_shndx);
				const uint64_t Ofs = ElfSym->getValue();
				return reinterpret_cast<const amd_kernel_code_t*>(SecBytes.data() + Ofs);
				}

				uint64_t getKernelStartOffset(const Elf_Sym* ElfSym) const {
				assert(ElfSym->getType() == ELF::STT_AMDGPU_HSA_KERNEL);
				return ElfSym->getValue() +
				getAMDKernelCodeT(ElfSym)->kernel_code_entry_byte_offset;
				}

				ArrayRef<uint32_t> getKernelCode(const Elf_Sym* ElfSym) const;

				private:
				mutable SmallVector<uint64_t, 8> StartMarkers;

				arsenmUnsubmitted Done Reply Inline Actions Single quotes arsenm: Single quotes
				const decltype(StartMarkers)& getStartMarkers() const;
				};

				arsenmUnsubmitted Done Reply Inline Actions Single quotes arsenm: Single quotes
				int HSACodeObject::getSectionIdx(StringRef SecName) const {
				auto ELF = Obj.getELFFile();
				int Idx = 0;
				for (auto S : ELF->sections()) {
				// TODO: handle ELF error
				if (*ELF->getSectionName(&S) == SecName)
				return Idx;
				++Idx;
				}
				return -1;
				}

				ArrayRef<uint32_t>
				HSACodeObject::getKernelCode(const Elf_Sym* ElfSym) const {
				assert(ElfSym->getType() == ELF::STT_AMDGPU_HSA_KERNEL);

				auto &M = getStartMarkers();
				// TODO: check CodeStart/CodeEnd alignment
				const uint64_t CodeStart = getKernelStartOffset(ElfSym);
				const uint64_t CodeEnd = *std::upper_bound(M.begin(), M.end(), CodeStart);

				auto SecBytes = getSectionContentsAsArray(ElfSym->st_shndx);
				return makeArrayRef<uint32_t>(SecBytes.slice(CodeStart,
				CodeEnd - CodeStart));
				}

				// StartMarkers is a sorted array of every entity's begin in the code section.
				// We determine an entity's ending using upper bound on this array because
				// there'is no kernel code size specified by design
				const decltype(HSACodeObject::StartMarkers)&
				HSACodeObject::getStartMarkers() const {
				if (!StartMarkers.empty()) return StartMarkers;

				const int HsaTextSecIdx = getSectionIdx(".hsatext");
				if (HsaTextSecIdx < 0) return StartMarkers;

				for (auto &Symbol : Obj.symbols()) {
				auto ElfSym = getELFSymbol(Symbol);
				if (ElfSym->getType() != ELF::STT_AMDGPU_HSA_KERNEL \|\|
				ElfSym->st_shndx != HsaTextSecIdx)
				continue;
				StartMarkers.push_back(ElfSym->getValue());
				StartMarkers.push_back(getKernelStartOffset(ElfSym));
				}
				StartMarkers.push_back(getSectionContentsAsArray(HsaTextSecIdx).size());
				array_pod_sort(StartMarkers.begin(), StartMarkers.end());
				return StartMarkers;
				}

				/////////////////////////////////////
				// TODO: move this to MCContext

				class OwningMCContext : MCContext {
				std::unique_ptr<const MCRegisterInfo> MRI;
				std::unique_ptr<const MCAsmInfo> AsmInfo;

				OwningMCContext(decltype(MRI) &&MRI_,
				decltype(AsmInfo) &&AsmInfo_,
				const MCObjectFileInfo *MOFI,
				const SourceMgr *Mgr,
				bool DoAutoReset)
				: MCContext(AsmInfo_.get(), MRI_.get(), MOFI, Mgr, DoAutoReset)
				, MRI(std::move(MRI_))
				, AsmInfo(std::move(AsmInfo_)) {}

				friend std::unique_ptr<MCContext> createMCContext(
				const Target *TheTarget,
				StringRef TripleName,
				const MCObjectFileInfo *MOFI,
				const SourceMgr *Mgr,
				bool DoAutoReset);
				};

				std::unique_ptr<MCContext> createMCContext(const Target *TheTarget,
				StringRef TripleName,
				const MCObjectFileInfo *MOFI = nullptr,
				const SourceMgr *Mgr = nullptr,
				bool DoAutoReset = true) {
				decltype(OwningMCContext::MRI)
				MRI(TheTarget->createMCRegInfo(TripleName));
				if (!MRI)
				report_fatal_error("error: no register info for target " + TripleName);

				decltype(OwningMCContext::AsmInfo)
				AsmInfo(TheTarget->createMCAsmInfo(*MRI, TripleName));
				if (!AsmInfo)
				report_fatal_error("error: no assembly info for target " + TripleName);

				return std::unique_ptr<MCContext>(new OwningMCContext(std::move(MRI),
				std::move(AsmInfo),
				MOFI, Mgr, DoAutoReset));
				}

				//
				///////////////////////////////////////////

				class HSACodeObjectDisassembler {
				HSACodeObject Obj;
				raw_ostream &OS;
				raw_ostream &ES;

				mutable std::unique_ptr<MCContext> Ctx;
				mutable std::unique_ptr<const MCSubtargetInfo> STI;
				mutable std::unique_ptr<MCDisassembler> DisAsm;
				mutable std::unique_ptr<const MCInstrInfo> MII;
				mutable std::unique_ptr<MCInstPrinter> IP;

				const amdgpu_hsa_note_isa_s* findISANote() const;
				void init(const amdgpu_hsa_note_isa_s*) const;
				void printHeader() const;
				void printInstructions(ArrayRef<uint32_t> Code, uint64_t Address) const;

				void print(const amd_kernel_code_t* C) const;
				void print(const amdgpu_hsa_note_code_object_version_t&) const;
				void print(const amdgpu_hsa_note_isa_t&) const;

				template <typename T>
				void print(const ErrorOr<T>& S, StringRef ErrStr) const;

				public:
				HSACodeObjectDisassembler(const ELF64LEObjectFile *Obj_,
				raw_ostream &OS_,
				raw_ostream &ES_)
				: Obj(Obj_), OS(OS_), ES(ES_) {}

				void print() const;
				};

				const amdgpu_hsa_note_isa_s* HSACodeObjectDisassembler::findISANote() const {
				for (const auto &Note : Obj.notes()) {
				if (!Note) break;
				if (Note->type == NT_AMDGPU_HSA_ISA) {
				const auto ISANote = Note->as<amdgpu_hsa_note_isa_s>();
				return ISANote ? &ISANote.get() : nullptr;
				}
				}
				return nullptr;
				}

				static StringRef getCPUName(const amdgpu_hsa_note_isa_s* ISA) {
				if (ISA)
				switch (ISA->major) {
				case 7: return "kaveri";
				case 8:
				switch (ISA->minor) {
				case 0: return ISA->stepping > 0 ? "carrizo" : "kaveri";
				case 1: return "stoney";
				}
				break;
				case 9: return "stoney";
				}
				return "";
				}

				void HSACodeObjectDisassembler::init(const amdgpu_hsa_note_isa_s* ISA) const{
				const Target * const TheTarget = &TheGCNTarget;
				const StringRef TripleName = "amdgcn-unknown-amdhsa";

				Ctx = createMCContext(TheTarget, TripleName);

				STI.reset(TheTarget->createMCSubtargetInfo(TripleName, getCPUName(ISA), ""));
				DisAsm.reset(TheTarget->createMCDisassembler(STI, Ctx));

				MII.reset(TheTarget->createMCInstrInfo());
				IP.reset(TheTarget->createMCInstPrinter(Triple(TripleName),
				Ctx->getAsmInfo()->getAssemblerDialect(),
				Ctx->getAsmInfo(), MII, *Ctx->getRegisterInfo()));
				}

				void HSACodeObjectDisassembler::print() const {
				printHeader();
				init(findISANote());
				OS << "\n.hsatext\n\n";
				for (auto Symbol : Obj.symbols()) {
				auto KernelSym = Obj.toKernelSym(Symbol);
				if (!KernelSym) continue;

				OS << ".amdgpu_hsa_kernel " << *Symbol.getName() << "\n"
				<< *Symbol.getName() << ":\n";

				const auto KC = Obj.getAMDKernelCodeT(KernelSym);
				print(KC);

				auto Code = trimTrailingZeroes(Obj.getKernelCode(KernelSym), 256/4);
				if (!Code.empty()) {
				printInstructions(Code, KernelSym->getValue() +
				KC->kernel_code_entry_byte_offset);
				OS << '\n';
				}
				}
				}

				template <typename T>
				void HSACodeObjectDisassembler::print(const ErrorOr<T>& S, StringRef ErrStr) const {
				if (!S) {
				ES << "failed to read " << ErrStr;
				return;
				}
				print(*S);
				}

				void HSACodeObjectDisassembler::printHeader() const {
				for (const auto &Note : Obj.notes()) {
				if (!Note) break;
				switch (Note->type) {
				case NT_AMDGPU_HSA_CODE_OBJECT_VERSION:
				print(Note->as<amdgpu_hsa_note_code_object_version_t>(),
				"amdgpu_hsa_note_code_object_version_t");
				break;
				case NT_AMDGPU_HSA_ISA:
				print(Note->as<amdgpu_hsa_note_isa_s>(),
				"amdgpu_hsa_note_isa_s");
				break;
				default: break;
				}
				}
				}

				void HSACodeObjectDisassembler::print(
				const amdgpu_hsa_note_code_object_version_t& V) const {
				OS << ".hsa_code_object_version " << V.major_version
				<< "," << V.minor_version << '\n';
				}

				void HSACodeObjectDisassembler::print(const amdgpu_hsa_note_isa_t& ISA) const {
				OS << ".hsa_code_object_isa "
				<< ISA.major
				<< "," << ISA.minor
				<< "," << ISA.stepping
				<< ",\"" << getVendorName(ISA)
				<< "\",\"" << getArchName(ISA) << "\"\n";
				}

				void HSACodeObjectDisassembler::print(const amd_kernel_code_t* C) const {
				OS << " .amd_kernel_code_t\n";
				dumpAmdKernelCode(C, OS, " ");
				OS << " .end_amd_kernel_code_t\n";
				}

				void HSACodeObjectDisassembler::printInstructions(ArrayRef<uint32_t> Code,
				uint64_t Address) const {
				#ifdef NDEBUG
				const bool DebugFlag = false;
				#endif
				OS << "// Disassembly:\n";
				SmallString<40> InstStr, CommentStr;
				uint64_t Index = 0;
				while (Index < Code.size()) {
				InstStr.clear();
				raw_svector_ostream IS(InstStr);
				CommentStr.clear();
				raw_svector_ostream CS(CommentStr);

				MCInst Inst;
				uint64_t EatenBytesNum = 0;
				if (DisAsm->getInstruction(Inst, EatenBytesNum,
				makeArrayRef<uint8_t>(Code.slice(Index)),
				Address,
				DebugFlag ? dbgs() : nulls(),
				CS)) {
				IP->printInst(&Inst, IS, "", DisAsm->getSubtargetInfo());
				} else {
				IS << "\t// unrecognized instruction ";
				if (EatenBytesNum == 0)
				EatenBytesNum = 4;
				}
				assert(0 == EatenBytesNum % 4);

				OS << left_justify(IS.str(), 60) << format("// %012X:", Address);
				for (auto D : Code.slice(Index, EatenBytesNum / 4))
				OS << format(" %08X", D);

				if (!CS.str().empty())
				OS << " // " << CS.str();

				OS << '\n';
				OS.flush();

				Address += EatenBytesNum;
				Index += EatenBytesNum / 4;
				}
				}

				namespace llvm {

				// called from llvm-objdump
				void disassembleHSACodeObject(const ObjectFile *Obj,
				raw_ostream &OS,
				raw_ostream &ES) {
				if (Obj->getArch() != Triple::amdgcn) return;
				auto CObj = static_cast<const ELF64LEObjectFile*>(Obj);
				HSACodeObjectDisassembler(CObj, OS, ES).print();
				}

				} // end namespace llvm

tools/llvm-objdump/llvm-objdump.cpp

Show First 20 Lines • Show All 386 Lines • ▼ Show 20 Lines
}		}

template <class ELFT>		template <class ELFT>
static std::error_code getRelocationValueString(const ELFObjectFile<ELFT> *Obj,		static std::error_code getRelocationValueString(const ELFObjectFile<ELFT> *Obj,
const RelocationRef &RelRef,		const RelocationRef &RelRef,
SmallVectorImpl<char> &Result) {		SmallVectorImpl<char> &Result) {
DataRefImpl Rel = RelRef.getRawDataRefImpl();		DataRefImpl Rel = RelRef.getRawDataRefImpl();

typedef typename ELFObjectFile<ELFT>::Elf_Sym Elf_Sym;		typedef typename ELFObjectFile<ELFT>::Elf_Sym Elf_Sym;
		BigcheeseUnsubmitted Not Done Reply Inline Actions This should be using a endian specific type. Either llvm::support::ulittle32_t or ubig32_t. Bigcheese: This should be using a endian specific type. Either llvm::support::ulittle32_t or ubig32_t.
typedef typename ELFObjectFile<ELFT>::Elf_Shdr Elf_Shdr;		typedef typename ELFObjectFile<ELFT>::Elf_Shdr Elf_Shdr;
typedef typename ELFObjectFile<ELFT>::Elf_Rela Elf_Rela;		typedef typename ELFObjectFile<ELFT>::Elf_Rela Elf_Rela;

const ELFFile<ELFT> &EF = *Obj->getELFFile();		const ELFFile<ELFT> &EF = *Obj->getELFFile();

ErrorOr<const Elf_Shdr *> SecOrErr = EF.getSection(Rel.d.a);		ErrorOr<const Elf_Shdr *> SecOrErr = EF.getSection(Rel.d.a);
if (std::error_code EC = SecOrErr.getError())		if (std::error_code EC = SecOrErr.getError())
return EC;		return EC;
▲ Show 20 Lines • Show All 422 Lines • ▼ Show 20 Lines	if (Type == MachO::X86_64_RELOC_UNSIGNED && Rel.d.a > 0) {
if (PrevType == MachO::X86_64_RELOC_SUBTRACTOR)		if (PrevType == MachO::X86_64_RELOC_SUBTRACTOR)
return true;		return true;
}		}
}		}

return false;		return false;
}		}

		namespace llvm {
		arsenmUnsubmitted Done Reply Inline Actions Function should start with lowercase letter arsenm: Function should start with lowercase letter
		void disassembleHSACodeObject(const ObjectFile *Obj,
		raw_ostream &OS,
		raw_ostream &ES);
		}

static void DisassembleObject(const ObjectFile *Obj, bool InlineRelocs) {		static void DisassembleObject(const ObjectFile *Obj, bool InlineRelocs) {
		tstellarAMDUnsubmitted Not Done Reply Inline Actions You aren't going to be able to call target specific functions from a common tool like this. The only way this is going to work is with some kind of generic callback. tstellarAMD: You aren't going to be able to call target specific functions from a common tool like this.
		nhaustovUnsubmitted Not Done Reply Inline Actions llvm-objdump actually has custom code for AArch64 that tries to determine whether bytes being disassembled is code or data: // AArch64 ELF binaries can interleave data and text in the // same section. We rely on the markers introduced to // understand what we need to dump. ... To me, proposed code in separate file looks cleaner and does not clutter llvm-objdump.cpp. Anyway, another question is whether it is really only AMDGPU that needs custom handling of parts of .text segment? How do other platforms handle metadata needed for code startup? Any pointers? One reason to keep amd_kernel_code_t (metadata needed for kernel startup) close to actual ISA is effiiency and cache concerns. We can still probably change it in new format. nhaustov: llvm-objdump actually has custom code for AArch64 that tries to determine whether bytes being…

		if (Obj->getArch() == Triple::amdgcn) {
		disassembleHSACodeObject(Obj, outs(), errs());
		return;
		}

const Target *TheTarget = getTarget(Obj);		const Target *TheTarget = getTarget(Obj);

// Package up features to be passed to target/subtarget		// Package up features to be passed to target/subtarget
std::string FeaturesStr;		std::string FeaturesStr;
if (MAttrs.size()) {		if (MAttrs.size()) {
SubtargetFeatures Features;		SubtargetFeatures Features;
for (unsigned i = 0; i != MAttrs.size(); ++i)		for (unsigned i = 0; i != MAttrs.size(); ++i)
Features.AddFeature(MAttrs[i]);		Features.AddFeature(MAttrs[i]);
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	for (unsigned si = 0, se = Symbols.size(); si != se; ++si) {
outs() << '\n' << Symbols[si].second << ":\n";		outs() << '\n' << Symbols[si].second << ":\n";

#ifndef NDEBUG		#ifndef NDEBUG
raw_ostream &DebugOut = DebugFlag ? dbgs() : nulls();		raw_ostream &DebugOut = DebugFlag ? dbgs() : nulls();
#else		#else
raw_ostream &DebugOut = nulls();		raw_ostream &DebugOut = nulls();
#endif		#endif

for (Index = Start; Index < End; Index += Size) {		for (Index = Start; Index < End; Index += Size) {
		BigcheeseUnsubmitted Not Done Reply Inline Actions Same with the endianness. Bigcheese: Same with the endianness.
MCInst Inst;		MCInst Inst;

// AArch64 ELF binaries can interleave data and text in the		// AArch64 ELF binaries can interleave data and text in the
// same section. We rely on the markers introduced to		// same section. We rely on the markers introduced to
// understand what we need to dump.		// understand what we need to dump.
if (Obj->isELF() && Obj->getArch() == Triple::aarch64) {		if (Obj->isELF() && Obj->getArch() == Triple::aarch64) {
uint64_t Stride = 0;		uint64_t Stride = 0;

▲ Show 20 Lines • Show All 622 Lines • Show Last 20 Lines