This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
test/tools/llvm-mc/
-
tools/
-
llvm-mc/
-
doc-id-disassemble.s
-
doc-id.s
-
tools/llvm-mc/
-
llvm-mc/
2
llvm-mc.cpp

Differential D83725

[llvm-mc] Add --doc-id=<id> to support multiple documents in a file
AbandonedPublic

Authored by MaskRay on Jul 13 2020, 3:32 PM.

Download Raw Diff

Details

Reviewers

dblaikie
echristo
grimar
jhenderson

Summary

llvm-mc tests tend to (a) split into multiple files or (b) combine too
much stuff in one file if people don't like splitting.

yaml2doc supports --docnum=<num> to allow multiple documents in a file.
Combing tests prudently can improve readability. This patch adds a similar
--doc-id=<id> to llvm-mc.

Usage: llvm-mc --doc-id=aa %s

#-- aa
test aa
#-- bb
test bb

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	470 ms	linux > AddressSanitizer-x86_64-linux.TestCases/Linux::Unknown Unit Message ("")
	460 ms	linux > SanitizerCommon-asan-x86_64-Linux.Linux::Unknown Unit Message ("")
	270 ms	linux > SanitizerCommon-lsan-x86_64-Linux.Linux::Unknown Unit Message ("")
	350 ms	linux > SanitizerCommon-msan-x86_64-Linux.Linux::Unknown Unit Message ("")
	420 ms	linux > SanitizerCommon-tsan-x86_64-Linux.Linux::Unknown Unit Message ("")
		View Full Test Results (6 Failed)

Event Timeline

MaskRay created this revision.Jul 13 2020, 3:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2020, 3:32 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

yaml2doc supports --docnum=<num> to allow multiple documents in a file. Combing tests prudently can improve readability. This patch adds a similar --doc-id=<id> to llvm-mc.

Presumably "doc" makes sense in the yaml2doc context - perhaps "obj" would make more sense in llvm-mc

But I'm not sure it's the best direction - we have lots of input files (testing IR, testing C++, etc) where we don't have this ability to have multiple distinct inputs in one file & it seems to work OK? Is there something different about llvm-mc that motivates this feature here especially?

Harbormaster failed remote builds in B64062: Diff 277593!Jul 13 2020, 4:41 PM

In D83725#2148760, @dblaikie wrote:

yaml2doc supports --docnum=<num> to allow multiple documents in a file. Combing tests prudently can improve readability. This patch adds a similar --doc-id=<id> to llvm-mc.

I am open to an alternative to "doc". llvm-mc supports both assembly and disassembly (both are in textual forms). I'd expect "obj" to refer to a binary form input, which isn't the case for llvm-mc.

Presumably "doc" makes sense in the yaml2doc context - perhaps "obj" would make more sense in llvm-mc

But I'm not sure it's the best direction - we have lots of input files (testing IR, testing C++, etc) where we don't have this ability to have multiple distinct inputs in one file & it seems to work OK? Is there something different about llvm-mc that motivates this feature here especially?

If we want to test local things (how IR is lowered, how instructions are assembled, how bytes are disassembled), whatever the containing function is named does not matter. Say, we have a function foo, and we want to test a variant of it, we can just add another function bar, duplicating the content in foo and adding some variance.

However, this works poorly for some "global resources". For example, if we want to test the content of .strtab, the order of section headers, etc. We can't say: the first 3 input lines test the case of 3 symbols, the last 4 input lines test the case with 4 symbols - .symtab is a singleton, if you want to test 4 symbols, the 3 symbol case is not tested any more.

What we currently do is to add another test file, with some variation to the first test file. It is not rare for me to diff two test files and find the differences. It'd be easier if two similar things are tested in one file, with a comment explaining the differences.

In D83725#2148813, @MaskRay wrote:

In D83725#2148760, @dblaikie wrote:

yaml2doc supports --docnum=<num> to allow multiple documents in a file. Combing tests prudently can improve readability. This patch adds a similar --doc-id=<id> to llvm-mc.

I am open to an alternative to "doc". llvm-mc supports both assembly and disassembly (both are in textual forms). I'd expect "obj" to refer to a binary form input, which isn't the case for llvm-mc.

Fair point - 'asm', perhaps? (think that's probably fine/close enough for the disassembly that people aren't going to be too confused by it)

Presumably "doc" makes sense in the yaml2doc context - perhaps "obj" would make more sense in llvm-mc

But I'm not sure it's the best direction - we have lots of input files (testing IR, testing C++, etc) where we don't have this ability to have multiple distinct inputs in one file & it seems to work OK? Is there something different about llvm-mc that motivates this feature here especially?

If we want to test local things (how IR is lowered, how instructions are assembled, how bytes are disassembled), whatever the containing function is named does not matter. Say, we have a function foo, and we want to test a variant of it, we can just add another function bar, duplicating the content in foo and adding some variance.

However, this works poorly for some "global resources". For example, if we want to test the content of .strtab, the order of section headers, etc. We can't say: the first 3 input lines test the case of 3 symbols, the last 4 input lines test the case with 4 symbols - .symtab is a singleton, if you want to test 4 symbols, the 3 symbol case is not tested any more.

What we currently do is to add another test file, with some variation to the first test file. It is not rare for me to diff two test files and find the differences. It'd be easier if two similar things are tested in one file, with a comment explaining the differences.

Fair enough - thanks for the explanation! I'll probably leave it to some other folks who work more directly with llvm-mc than I do to weigh in on the design here, then.

To give an example, let's discuss MC/ELF/debug-md5.s and MC/ELF/debug-md5-err.s. There are two things worth noting:

a) the error cases are in a separate file. Once you add an invalid .file, assembling the whole file is invalid. All the RUN lines have to be RUN: not llvm-mc %s. Thus, people tend to place good cases in one file (so that they can test with RUN: llvm-mc %s and bad tests in another file. For some features, having both good and bad tests in one file may improve readability.
b) .debug_line is a global resource. Whenever we add a (valid) .file, we contribute an entry to the global resource. If we want to test some characteristics when include_directories[0] is A, and other characteristics when include_directories[0] is B. OK, we have to use another file.

Would it make sense to split this into a separate utility, so you could use (eg)

# RUN: extract bb %s | llvm-mc - 2>&1 | FileCheck %s --check-prefix=BB

in general, for any tool that can read from stdin? Or even

# RUN: extract bb %s -o %t.bb
# RUN: llvm-mc %t.bb 2>&1 | FileCheck %t.bb

... to consider only the CHECK lines in the extracted region?

In D83725#2148926, @rsmith wrote:
Would it make sense to split this into a separate utility, so you could use (eg)
# RUN: extract bb %s | llvm-mc - 2>&1 | FileCheck %s --check-prefix=BB
in general, for any tool that can read from stdin? Or even
# RUN: extract bb %s -o %t.bb
# RUN: llvm-mc %t.bb 2>&1 | FileCheck %t.bb
... to consider only the CHECK lines in the extracted region?

Teach the extract utility about comment markers of common file extensions (.s, .ll, .c, .cpp)? (To make editors happy, the separator should be a comment in that file (e.g. # ; // --- etc))

(I considered an in-utility option first because the syntax is the simplest. extract bb %s -o %t.bb is a bit long but I can accept it)

In D83725#2148934, @MaskRay wrote:

In D83725#2148926, @rsmith wrote:

Would it make sense to split this into a separate utility[...]?

Teach the extract utility about comment markers of common file extensions (.s, .ll, .c, .cpp)? (To make editors happy, the separator should be a comment in that file (e.g. # ; // --- etc))

I was thinking that you could scan the source file for lines containing your separator (eg ---) and assume the comment marker is whatever comes before it. That seems to work well enough for lit (looking for RUN: etc, preceded by anything).

Just chiming in to say +1 to something that achieves the end goal of being able to have multiple assembly files in the same input. A similar motivation I sometimes run into is wanting multiple input files to a tool, e.g. LLD or llvm-readobj, for testing some behaviour. I end up doing one of three things in this situation: 1) adding a separate file in the "Inputs" directory - this is not great because the test input is far away from the actual test (i.e. not in the same file), making it harder to follow; 2) echoing the second and later inputs to separate files at runtime - this is not great because it has a runtime cost; 3) a recent one I've used occasionally is the .if directive to generate different outputs from the same input - this isn't great as it can be somewhat confusing/hard to read at times.

I don't mind which approach is taken (separate tool/addition to llvm-mc), as long as it is simple to use and reasonably efficient.

llvm/tools/llvm-mc/llvm-mc.cpp
53–54	This mentions `---` but the code uses `--`. I'd actually prefer `---` probably, but it's minor, so happy to go with a different approach.

I like the aim of this patch and personally I'd slightly prefer to build in this functionality into llvm-mc rather than have an utility,
because it might make test cases shorter and a bit simpler.

One point from me: yaml2obj by default has --docnum=0. I.e. you can either specify the --docnum, or omit and have the same result.
But llvm-mc should not have such behavior. So I just want to ensure that when no --doc-id is requested, then llvm-mc does nothing
for an input. I see it is what this patch implements already, but there is no test for this case?

llvm/tools/llvm-mc/llvm-mc.cpp
317	Consider making this more generic Expected<std::unique_ptr<MemoryBuffer>> extractDocWithID(const MemoryBuffer& Buf, unsigned ID).

Wouldn't using .ifdef directives with --defsym on the command line work equally well? At least for .s files run through llvm-mc.
There are examples of this in the lit tests already, easy enough to find.

In D83725#2150173, @probinson wrote:

Wouldn't using .ifdef directives with --defsym on the command line work equally well? At least for .s files run through llvm-mc.
There are examples of this in the lit tests already, easy enough to find.

I somehow forgot this. .ifdef ERR + llvm-mc --defsym=ERR=1 seems convenient enough (e.g. D83751) so I am dropping this patch.

Create D83834 for the standalone utility 'extract'.

MaskRay abandoned this revision.Jul 14 2020, 5:34 PM

Revision Contents

Path

Size

llvm/

test/

tools/

llvm-mc/

doc-id-disassemble.s

15 lines

doc-id.s

26 lines

tools/

llvm-mc/

llvm-mc.cpp

41 lines

Diff 277593

llvm/test/tools/llvm-mc/doc-id-disassemble.s

This file was added.

				# REQUIRES: x86-registered-target
				# RUN: llvm-mc -triple=x86_64 --disassemble --doc-id=aa %s \| \
				# RUN: FileCheck %s --check-prefix=AA --implicit-check-not=retq

				# AA: nop

				# RUN: llvm-mc -triple=x86_64 --disassemble --doc-id=bb %s \| \
				# RUN: FileCheck %s --check-prefix=BB --implicit-check-not=nop

				# BB: retq

				#-- aa
				0x90
				#-- bb
				0xc3

llvm/test/tools/llvm-mc/doc-id.s

This file was added.

				# RUN: llvm-mc --doc-id=aa --preserve-comments %s 2>&1 \| FileCheck %s --check-prefix=AA --implicit-check-not=warning:

				# AA: {{.*}}.s:1:1: warning: aa
				# AA: # Comments are preserved.

				# RUN: llvm-mc --doc-id=bb %s 2>&1 \| FileCheck %s --check-prefix=BB --implicit-check-not=warning:

				# BB: {{.*}}.s:2:1: warning: bb

				# RUN: not llvm-mc --doc-id=dup %s 2>&1 \| FileCheck %s --check-prefix=DUP

				# DUP: error: {{.*}}.s: '#-- dup' occurred more than once

				# RUN: not llvm-mc --doc-id=not_exist %s 2>&1 \| FileCheck %s --check-prefix=NOT_EXIST

				# NOT_EXIST: error: {{.*}}.s: '#-- not_exist' not found

				#-- aa
				.warning "aa"
				# Comments are preserved.
				#-- bb

				.warning "bb"
				#-- dup
				.warning "dup"
				#-- dup

llvm/tools/llvm-mc/llvm-mc.cpp

Show All 26 Lines
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/MCTargetOptionsCommandFlags.h"		#include "llvm/MC/MCTargetOptionsCommandFlags.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compression.h"		#include "llvm/Support/Compression.h"
#include "llvm/Support/FileUtilities.h"		#include "llvm/Support/FileUtilities.h"
#include "llvm/Support/FormattedStream.h"		#include "llvm/Support/FormattedStream.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
#include "llvm/Support/InitLLVM.h"		#include "llvm/Support/InitLLVM.h"
		#include "llvm/Support/LineIterator.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
		#include "llvm/Support/SmallVectorMemoryBuffer.h"
#include "llvm/Support/SourceMgr.h"		#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/ToolOutputFile.h"		#include "llvm/Support/ToolOutputFile.h"
#include "llvm/Support/WithColor.h"		#include "llvm/Support/WithColor.h"

using namespace llvm;		using namespace llvm;

static mc::RegisterMCTargetOptionsFlags MOF;		static mc::RegisterMCTargetOptionsFlags MOF;

static cl::opt<std::string>		static cl::opt<std::string>
InputFilename(cl::Positional, cl::desc("<input file>"), cl::init("-"));		InputFilename(cl::Positional, cl::desc("<input file>"), cl::init("-"));

		static cl::opt<std::string> DocId(
		"doc-id",
		cl::desc("Treat input as separated by \"--- \", feed the part specified "
		"by \"--- <id>\" to assembler/disassembler"),
		jhendersonUnsubmitted Not Done Reply Inline Actions This mentions `---` but the code uses `--`. I'd actually prefer `---` probably, but it's minor, so happy to go with a different approach. jhenderson: This mentions `---` but the code uses `--`. I'd actually prefer `---` probably, but it's minor…
		cl::value_desc("id"));

static cl::opt<std::string> OutputFilename("o", cl::desc("Output filename"),		static cl::opt<std::string> OutputFilename("o", cl::desc("Output filename"),
cl::value_desc("filename"),		cl::value_desc("filename"),
cl::init("-"));		cl::init("-"));

static cl::opt<std::string> SplitDwarfFile("split-dwarf-file",		static cl::opt<std::string> SplitDwarfFile("split-dwarf-file",
cl::desc("DWO output filename"),		cl::desc("DWO output filename"),
cl::value_desc("filename"));		cl::value_desc("filename"));

▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	static int AssembleInput(const char ProgName, const Target TheTarget,
Parser->setTargetParser(*TAP);		Parser->setTargetParser(*TAP);
Parser->getLexer().setLexMasmIntegers(LexMasmIntegers);		Parser->getLexer().setLexMasmIntegers(LexMasmIntegers);

int Res = Parser->Run(NoInitialTextSection);		int Res = Parser->Run(NoInitialTextSection);

return Res;		return Res;
}		}

		static void handleDocId(std::unique_ptr<MemoryBuffer> &Buf, StringRef ProgName,
		grimarUnsubmitted Not Done Reply Inline Actions Consider making this more generic Expected<std::unique_ptr<MemoryBuffer>> extractDocWithID(const MemoryBuffer& Buf, unsigned ID). grimar: Consider making this more generic ``` Expected<std::unique_ptr<MemoryBuffer>> extractDocWithID…
		StringRef FileName) {
		const char DocBegin = nullptr, DocEnd = nullptr;
		for (line_iterator I(Buf, /SkipBlanks=*/false, '\0'); !I.is_at_eof();) {
		StringRef Line = *I++;
		if (!Line.consume_front("#-- "))
		continue;
		if (Line == DocId.getValue()) {
		if (DocBegin) {
		WithColor::error(errs(), ProgName)
		<< FileName << ": '#-- " << Line << "' occurred more than once\n";
		exit(1);
		}
		if (I.is_at_eof())
		break;
		DocBegin = I->data();
		} else if (DocBegin && !DocEnd) {
		DocEnd = Line.data();
		}
		}
		if (!DocBegin) {
		WithColor::error(errs(), ProgName)
		<< FileName << ": '#-- " << DocId << "' not found\n";
		exit(1);
		}
		if (!DocEnd)
		DocEnd = Buf->getBufferEnd();
		Buf.reset(new SmallVectorMemoryBuffer(SmallVector<char, 0>(DocBegin, DocEnd),
		Buf->getBufferIdentifier()));
		}

int main(int argc, char **argv) {		int main(int argc, char **argv) {
InitLLVM X(argc, argv);		InitLLVM X(argc, argv);

// Initialize targets and assembly printers/parsers.		// Initialize targets and assembly printers/parsers.
llvm::InitializeAllTargetInfos();		llvm::InitializeAllTargetInfos();
llvm::InitializeAllTargetMCs();		llvm::InitializeAllTargetMCs();
llvm::InitializeAllAsmParsers();		llvm::InitializeAllAsmParsers();
llvm::InitializeAllDisassemblers();		llvm::InitializeAllDisassemblers();
Show All 17 Lines	int main(int argc, char **argv) {

ErrorOr<std::unique_ptr<MemoryBuffer>> BufferPtr =		ErrorOr<std::unique_ptr<MemoryBuffer>> BufferPtr =
MemoryBuffer::getFileOrSTDIN(InputFilename);		MemoryBuffer::getFileOrSTDIN(InputFilename);
if (std::error_code EC = BufferPtr.getError()) {		if (std::error_code EC = BufferPtr.getError()) {
WithColor::error(errs(), ProgName)		WithColor::error(errs(), ProgName)
<< InputFilename << ": " << EC.message() << '\n';		<< InputFilename << ": " << EC.message() << '\n';
return 1;		return 1;
}		}
		if (!DocId.empty())
		handleDocId(*BufferPtr, ProgName, InputFilename);
MemoryBuffer *Buffer = BufferPtr->get();		MemoryBuffer *Buffer = BufferPtr->get();

SourceMgr SrcMgr;		SourceMgr SrcMgr;

// Tell SrcMgr about this buffer, which is what the parser will pick up.		// Tell SrcMgr about this buffer, which is what the parser will pick up.
SrcMgr.AddNewSourceBuffer(std::move(*BufferPtr), SMLoc());		SrcMgr.AddNewSourceBuffer(std::move(*BufferPtr), SMLoc());

// Record the location of the include directories so that the lexer can find		// Record the location of the include directories so that the lexer can find
▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines