This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/
-
llvm/
-
TableGen/
-
StringMatcher.h
-
Target/
-
Target.td
-
lib/TableGen/
-
TableGen/
-
StringMatcher.cpp
-
test/TableGen/
-
TableGen/
-
AllowDuplicateRegisterNames.td
-
utils/TableGen/
-
TableGen/
-
AsmMatcherEmitter.cpp

Differential D39845

[TableGen] Give the option of tolerating duplicate register names
ClosedPublic

Authored by asb on Nov 9 2017, 7:22 AM.

Download Raw Diff

Details

Reviewers

stoklund
kparzysz
jyknight
venkatra

Commits

rGd590c85753be: [TableGen] Give the option of tolerating duplicate register names
rL320018: [TableGen] Give the option of tolerating duplicate register names

Summary

A number of architectures re-use the same register names (e.g. for both 32-bit FPRs and 64-bit FPRs). They are currently unable to use the tablegen'erated MatchRegisterName and MatchRegisterAltName, as tablegen (when built with asserts enabled) will fail.

When the AllowDuplicateRegisterNames in AsmParser is set, duplicated register names will be tolerated. A backend can then coerce registers to the desired register class by (for instance) implementing validateTargetOperandClass.

At least the in-tree Sparc backend could benefit from this, as does RISC-V (single and double precision floating point registers).

Diff Detail

Repository: rL LLVM

Event Timeline

asb created this revision.Nov 9 2017, 7:22 AM

Herald added a subscriber: fedor.sergeev. · View Herald TranscriptNov 9 2017, 7:22 AM

asb added a child revision: D39895: [RISCV] MC layer support for the standard RV32D instruction set extension.Nov 10 2017, 3:39 AM

sdardis added a subscriber: sdardis.Nov 10 2017, 3:54 AM

Could you please add some tests?

lib/TableGen/StringMatcher.cpp
53 ↗	(On Diff #122246)	While you are at it could you also please fix the spacing here? std::string Indent(IndentCount * 2 + 4, ' ');

This looks fine. Do you plan to add a test case?

Updated to add a TableGen test for the generation of MatchRegisterName and MatchRegisterAltName when AllowDuplicateRegisterNames is set.

This looks reasonable to me. The only comment I have is that the patch does not make it clear what numeric register id exactly will be returned if there are several of them matching a given string. Currently, the numeric match is unique, but once we make it non-unique, the code in a target with "reused" names will need to handle getting a "wrong" register id. Let's take the testcase as an example---the user would need to write something like this:

if (Want32Bit) {
  // If we got a 64-bit register, map it to the 32-bit counterpart.
  switch (Reg) {
    case R0_64:
      Reg = R0_32;
      break;
    case R1_64:
      ...
  }
} else if (Want64Bit) {
  // If we got a 32-bit register, map it to the 64-bit counterpart.
  switch (Reg) {
    case R0_32:
      Reg = R0_64;
      break;
    case R1_32:
      ...
}

This may be somewhat inconvenient, so I'm thinking if some extra support for this could be implemented as well. What I have in mind is something like:

if (Want64Bit)
 Reg = givenAnyRepresentativeGiveMeTheCorrespondingRegisterThatIWant(Reg, IWant64BitRegister);

The only issue is how to specify the "IWant64BitRegister" part. This will be used in the asm parser, so it has to work with the MC layer alone (so, no TargetRegisterInfo, etc.), so register classes probably wouldn't work. (At first I thought about something like getCounterpartFromRegClass(Reg, MyTarget::R64RegClass)). Maybe adding some kind of an extra flag to "Register" would work?

let Namespace = "Arch" in {
class ArchReg<string n, list <string> alt, list <RegAltNameIndex> altidx, string disambiguator>
    : Register<n, disambiguator> {
  let AltNames = alt;
  let RegAltNameIndices = altidx;
}

foreach i = 0-3 in {
  def R#i#_32 : ArchReg<"r"#i, ["x"#i], [ABIRegAltName], "32 bit">;
  def R#i#_64 : ArchReg<"r"#i, ["x"#i], [ABIRegAltName], "64 bit">;
}

Then have TableGen generate a function like

unsigned disambiguate(unsigned Reg, StringRef Str) {
  for (R in possible candidates)
    if (R.Disambiguator == Str)   // This is some fake syntax, but you get the idea.
      return R.Id;
}

And in the asm parser, use it as

if (Want64Bit)
  Reg = disambiguate(Reg, "64 bit");

It wouldn't have to be string-based, this is just an illustration.

All of this would only apply to targets that set the flag "AllowDuplicateRegisterNames", for all other targets, the disambiguating flag could be unset and ignored, and such a function wouldn't be generated.

Just to add to my last comment---this patch is fine as is, but it would be nice to make the selection of the correct register easier.

Hi Krzysztof, I agree that further tablegen support for selecting aliased registers could be helpful. For what it's worth, you can see how this sort of case is handled for RV32D parsing in D39895. My main concern is creating false generality: adding a new tablegen feature that seems general-purpose but actually has a very very narrow set of sensible uses.

In my case, a tablegen'erated function conversion function based on the FPR32 reg being a subregister of the FPR64 reg would be sufficient, but this may not work for aliased register names in other contexts.

Makes sense. LGTM.

When you commit, could you add an explicit clarification to the comment in Target.td that MatchRegisterName and MatchRegisterAltName can return any id associated with the name (i.e. that there are no guarantees as to which specific numeric id will be returned in case of multiple matches)?

This revision is now accepted and ready to land.Dec 5 2017, 7:56 AM

@kparzysz: that's a good point. In my use of this, I actually have been assuming that the returned id is predictable. Currently, register ids seems to be assigned based on sorting register def names, so R0_32 is always preferred to R0_64 if they alias. It could be worth documenting and testing this? Or do you think it's better to leave undefined and update my RV32D patch so it's coded more defensively? (it will at least currently assert if an unexpected register id is returned).

I think that leaving the returned value unspecified is better, at least for now. Otherwise, we'd need to invent a way to identify the "preferred" register, and then it would need to be preserved by any future modifications to the MatchRegisterName et.al. functions.

In D39845#945646, @kparzysz wrote:

I think that leaving the returned value unspecified is better, at least for now. Otherwise, we'd need to invent a way to identify the "preferred" register, and then it would need to be preserved by any future modifications to the MatchRegisterName et.al. functions.

Passing a "preferred" register class (or similar) to MatchRegisterName is never really going to work I don't think, at least not without some fairly invasive changes to the asm parser to allow that information to be threaded through. Coercing a register later on is much easier. D39895 currently relies on the fact that MatchRegisterName and MathRegisterAltName prefer to return the F*_32 registers due to them sorting before F*_64. It would be pretty easy to check the current behaviour in a test, thus ensuring it doesn't change. But for now, let's leave it undefined as you suggest.

Sorry, I wasn't clear in my last comment. This comment doesn't add any new information, it's just a clarification.

By "preferred" I meant having some kind of an implicit ordering of the registers, and out of several candidates, the "preferred" one would, for example, be the first register in that order. The order could be based on the order in which they were defined in the .td file, or lexicographic sorting of their symbolic names (not assembly names), etc. My argument was that there is no intuitively clear way to establish such an ordering, and if we came up with something, then we'd need to make sure that MatchRegisterName returned a value in accordance with that order (which seems like an unnecessary constraint).

Closed by commit rL320018: [TableGen] Give the option of tolerating duplicate register names (authored by asb). · Explain WhyDec 7 2017, 1:52 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

TableGen/

StringMatcher.h

7 lines

Target/

Target.td

8 lines

lib/

TableGen/

StringMatcher.cpp

25 lines

test/

TableGen/

AllowDuplicateRegisterNames.td

86 lines

utils/

TableGen/

AsmMatcherEmitter.cpp

8 lines

Diff 125907

llvm/trunk/include/llvm/TableGen/StringMatcher.h

Show All 37 Lines	private:
const std::vector<StringPair> &Matches;		const std::vector<StringPair> &Matches;
raw_ostream &OS;		raw_ostream &OS;

public:		public:
StringMatcher(StringRef strVariableName,		StringMatcher(StringRef strVariableName,
const std::vector<StringPair> &matches, raw_ostream &os)		const std::vector<StringPair> &matches, raw_ostream &os)
: StrVariableName(strVariableName), Matches(matches), OS(os) {}		: StrVariableName(strVariableName), Matches(matches), OS(os) {}

void Emit(unsigned Indent = 0) const;		void Emit(unsigned Indent = 0, bool IgnoreDuplicates = false) const;

private:		private:
bool EmitStringMatcherForChar(const std::vector<const StringPair*> &Matches,		bool EmitStringMatcherForChar(const std::vector<const StringPair *> &Matches,
unsigned CharNo, unsigned IndentCount) const;		unsigned CharNo, unsigned IndentCount,
		bool IgnoreDuplicates) const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TABLEGEN_STRINGMATCHER_H		#endif // LLVM_TABLEGEN_STRINGMATCHER_H

llvm/trunk/include/llvm/Target/Target.td

Show First 20 Lines • Show All 1,168 Lines • ▼ Show 20 Lines	class AsmParser {
// Set to true if the target needs a generated 'alternative register name'		// Set to true if the target needs a generated 'alternative register name'
// matcher.		// matcher.
//		//
// This generates a function which can be used to lookup registers from		// This generates a function which can be used to lookup registers from
// their aliases. This function will fail when called on targets where		// their aliases. This function will fail when called on targets where
// several registers share the same alias (i.e. not a 1:1 mapping).		// several registers share the same alias (i.e. not a 1:1 mapping).
bit ShouldEmitMatchRegisterAltName = 0;		bit ShouldEmitMatchRegisterAltName = 0;

		// Set to true if MatchRegisterName and MatchRegisterAltName functions
		// should be generated even if there are duplicate register names. The
		// target is responsible for coercing aliased registers as necessary
		// (e.g. in validateTargetOperandClass), and there are no guarantees about
		// which numeric register identifier will be returned in the case of
		// multiple matches.
		bit AllowDuplicateRegisterNames = 0;

// HasMnemonicFirst - Set to false if target instructions don't always		// HasMnemonicFirst - Set to false if target instructions don't always
// start with a mnemonic as the first token.		// start with a mnemonic as the first token.
bit HasMnemonicFirst = 1;		bit HasMnemonicFirst = 1;

// ReportMultipleNearMisses -		// ReportMultipleNearMisses -
// When 0, the assembly matcher reports an error for one encoding or operand		// When 0, the assembly matcher reports an error for one encoding or operand
// that did not match the parsed instruction.		// that did not match the parsed instruction.
// When 1, the assmebly matcher returns a list of encodings that were close		// When 1, the assmebly matcher returns a list of encodings that were close
▲ Show 20 Lines • Show All 321 Lines • Show Last 20 Lines

llvm/trunk/lib/TableGen/StringMatcher.cpp

Show All 40 Lines	FindFirstNonCommonLetter(const std::vector<const
return Matches[0]->first.size();		return Matches[0]->first.size();
}		}

/// EmitStringMatcherForChar - Given a set of strings that are known to be the		/// EmitStringMatcherForChar - Given a set of strings that are known to be the
/// same length and whose characters leading up to CharNo are the same, emit		/// same length and whose characters leading up to CharNo are the same, emit
/// code to verify that CharNo and later are the same.		/// code to verify that CharNo and later are the same.
///		///
/// \return - True if control can leave the emitted code fragment.		/// \return - True if control can leave the emitted code fragment.
bool StringMatcher::		bool StringMatcher::EmitStringMatcherForChar(
EmitStringMatcherForChar(const std::vector<const StringPair*> &Matches,		const std::vector<const StringPair *> &Matches, unsigned CharNo,
unsigned CharNo, unsigned IndentCount) const {		unsigned IndentCount, bool IgnoreDuplicates) const {
assert(!Matches.empty() && "Must have at least one string to match!");		assert(!Matches.empty() && "Must have at least one string to match!");
std::string Indent(IndentCount*2+4, ' ');		std::string Indent(IndentCount * 2 + 4, ' ');

// If we have verified that the entire string matches, we're done: output the		// If we have verified that the entire string matches, we're done: output the
// matching code.		// matching code.
if (CharNo == Matches[0]->first.size()) {		if (CharNo == Matches[0]->first.size()) {
assert(Matches.size() == 1 && "Had duplicate keys to match on");		if (Matches.size() > 1 && !IgnoreDuplicates)
		report_fatal_error("Had duplicate keys to match on");

// If the to-execute code has \n's in it, indent each subsequent line.		// If the to-execute code has \n's in it, indent each subsequent line.
StringRef Code = Matches[0]->second;		StringRef Code = Matches[0]->second;

std::pair<StringRef, StringRef> Split = Code.split('\n');		std::pair<StringRef, StringRef> Split = Code.split('\n');
OS << Indent << Split.first << "\t // \"" << Matches[0]->first << "\"\n";		OS << Indent << Split.first << "\t // \"" << Matches[0]->first << "\"\n";

Code = Split.second;		Code = Split.second;
while (!Code.empty()) {		while (!Code.empty()) {
Show All 27 Lines	if (MatchesByLetter.size() == 1) {
} else {		} else {
// Do the comparison with if memcmp(Str.data()+1, "foo", 3).		// Do the comparison with if memcmp(Str.data()+1, "foo", 3).
// FIXME: Need to escape general strings.		// FIXME: Need to escape general strings.
OS << Indent << "if (memcmp(" << StrVariableName << ".data()+" << CharNo		OS << Indent << "if (memcmp(" << StrVariableName << ".data()+" << CharNo
<< ", \"" << Matches[0]->first.substr(CharNo, NumChars) << "\", "		<< ", \"" << Matches[0]->first.substr(CharNo, NumChars) << "\", "
<< NumChars << ") != 0)\n";		<< NumChars << ") != 0)\n";
OS << Indent << " break;\n";		OS << Indent << " break;\n";
}		}

return EmitStringMatcherForChar(Matches, FirstNonCommonLetter, IndentCount);		return EmitStringMatcherForChar(Matches, FirstNonCommonLetter, IndentCount,
		IgnoreDuplicates);
}		}

// Otherwise, we have multiple possible things, emit a switch on the		// Otherwise, we have multiple possible things, emit a switch on the
// character.		// character.
OS << Indent << "switch (" << StrVariableName << "[" << CharNo << "]) {\n";		OS << Indent << "switch (" << StrVariableName << "[" << CharNo << "]) {\n";
OS << Indent << "default: break;\n";		OS << Indent << "default: break;\n";

for (std::map<char, std::vector<const StringPair*>>::iterator LI =		for (std::map<char, std::vector<const StringPair*>>::iterator LI =
MatchesByLetter.begin(), E = MatchesByLetter.end(); LI != E; ++LI) {		MatchesByLetter.begin(), E = MatchesByLetter.end(); LI != E; ++LI) {
// TODO: escape hard stuff (like \n) if we ever care about it.		// TODO: escape hard stuff (like \n) if we ever care about it.
OS << Indent << "case '" << LI->first << "':\t // "		OS << Indent << "case '" << LI->first << "':\t // "
<< LI->second.size() << " string";		<< LI->second.size() << " string";
if (LI->second.size() != 1) OS << 's';		if (LI->second.size() != 1) OS << 's';
OS << " to match.\n";		OS << " to match.\n";
if (EmitStringMatcherForChar(LI->second, CharNo+1, IndentCount+1))		if (EmitStringMatcherForChar(LI->second, CharNo + 1, IndentCount + 1,
		IgnoreDuplicates))
OS << Indent << " break;\n";		OS << Indent << " break;\n";
}		}

OS << Indent << "}\n";		OS << Indent << "}\n";
return true;		return true;
}		}

/// Emit - Top level entry point.		/// Emit - Top level entry point.
///		///
void StringMatcher::Emit(unsigned Indent) const {		void StringMatcher::Emit(unsigned Indent, bool IgnoreDuplicates) const {
// If nothing to match, just fall through.		// If nothing to match, just fall through.
if (Matches.empty()) return;		if (Matches.empty()) return;

// First level categorization: group strings by length.		// First level categorization: group strings by length.
std::map<unsigned, std::vector<const StringPair*>> MatchesByLength;		std::map<unsigned, std::vector<const StringPair*>> MatchesByLength;

for (unsigned i = 0, e = Matches.size(); i != e; ++i)		for (unsigned i = 0, e = Matches.size(); i != e; ++i)
MatchesByLength[Matches[i].first.size()].push_back(&Matches[i]);		MatchesByLength[Matches[i].first.size()].push_back(&Matches[i]);

// Output a switch statement on length and categorize the elements within each		// Output a switch statement on length and categorize the elements within each
// bin.		// bin.
OS.indent(Indent*2+2) << "switch (" << StrVariableName << ".size()) {\n";		OS.indent(Indent*2+2) << "switch (" << StrVariableName << ".size()) {\n";
OS.indent(Indent*2+2) << "default: break;\n";		OS.indent(Indent*2+2) << "default: break;\n";

for (std::map<unsigned, std::vector<const StringPair*>>::iterator LI =		for (std::map<unsigned, std::vector<const StringPair*>>::iterator LI =
MatchesByLength.begin(), E = MatchesByLength.end(); LI != E; ++LI) {		MatchesByLength.begin(), E = MatchesByLength.end(); LI != E; ++LI) {
OS.indent(Indent*2+2) << "case " << LI->first << ":\t // "		OS.indent(Indent*2+2) << "case " << LI->first << ":\t // "
<< LI->second.size()		<< LI->second.size()
<< " string" << (LI->second.size() == 1 ? "" : "s") << " to match.\n";		<< " string" << (LI->second.size() == 1 ? "" : "s") << " to match.\n";
if (EmitStringMatcherForChar(LI->second, 0, Indent))		if (EmitStringMatcherForChar(LI->second, 0, Indent, IgnoreDuplicates))
OS.indent(Indent*2+4) << "break;\n";		OS.indent(Indent*2+4) << "break;\n";
}		}

OS.indent(Indent*2+2) << "}\n";		OS.indent(Indent*2+2) << "}\n";
}		}

llvm/trunk/test/TableGen/AllowDuplicateRegisterNames.td

				// RUN: llvm-tblgen -gen-asm-matcher -I %p/../../include %s \| FileCheck %s

				// Check that MatchRegisterName and MatchRegisterAltName are generated
				// correctly when multiple registers are defined with the same name and
				// AllowDuplicateRegisterNames is set.

				include "llvm/Target/Target.td"

				def ArchInstrInfo : InstrInfo;

				def ArchAsmParser : AsmParser {
				let AllowDuplicateRegisterNames = 1;
				let ShouldEmitMatchRegisterAltName = 1;
				}

				def Arch : Target {
				let InstructionSet = ArchInstrInfo;
				let AssemblyParsers = [ArchAsmParser];
				}

				let Namespace = "Arch" in {
				class ArchReg<string n, list <string> alt, list <RegAltNameIndex> altidx>
				: Register<n> {
				let AltNames = alt;
				let RegAltNameIndices = altidx;
				}

				def ABIRegAltName : RegAltNameIndex;

				foreach i = 0-3 in {
				def R#i#_32 : ArchReg<"r"#i, ["x"#i], [ABIRegAltName]>;
				def R#i#_64 : ArchReg<"r"#i, ["x"#i], [ABIRegAltName]>;
				}
				} // Namespace = "Arch"

				def GPR32 : RegisterClass<"Arch", [i32], 32, (add
				(sequence "R%u_32", 0, 3)
				)>;

				def GPR64 : RegisterClass<"Arch", [i64], 64, (add
				(sequence "R%u_64", 0, 3)
				)>;

				// CHECK: static unsigned MatchRegisterName(StringRef Name) {
				// CHECK: switch (Name.size()) {
				// CHECK: default: break;
				// CHECK: case 2: // 8 strings to match.
				// CHECK: if (Name[0] != 'r')
				// CHECK: break;
				// CHECK: switch (Name[1]) {
				// CHECK: default: break;
				// CHECK: case '0': // 2 strings to match.
				// CHECK: return 1; // "r0"
				// CHECK: case '1': // 2 strings to match.
				// CHECK: return 3; // "r1"
				// CHECK: case '2': // 2 strings to match.
				// CHECK: return 5; // "r2"
				// CHECK: case '3': // 2 strings to match.
				// CHECK: return 7; // "r3"
				// CHECK: }
				// CHECK: break;
				// CHECK: }
				// CHECK: return 0;
				// CHECK: }

				// CHECK: static unsigned MatchRegisterAltName(StringRef Name) {
				// CHECK: switch (Name.size()) {
				// CHECK: default: break;
				// CHECK: case 2: // 8 strings to match.
				// CHECK: if (Name[0] != 'x')
				// CHECK: break;
				// CHECK: switch (Name[1]) {
				// CHECK: default: break;
				// CHECK: case '0': // 2 strings to match.
				// CHECK: return 1; // "x0"
				// CHECK: case '1': // 2 strings to match.
				// CHECK: return 3; // "x1"
				// CHECK: case '2': // 2 strings to match.
				// CHECK: return 5; // "x2"
				// CHECK: case '3': // 2 strings to match.
				// CHECK: return 7; // "x3"
				// CHECK: }
				// CHECK: break;
				// CHECK: }
				// CHECK: return 0;
				// CHECK: }

llvm/trunk/utils/TableGen/AsmMatcherEmitter.cpp

Show First 20 Lines • Show All 2,432 Lines • ▼ Show 20 Lines	if (Reg.TheDef->getValueAsString("AsmName").empty())
continue;		continue;

Matches.emplace_back(Reg.TheDef->getValueAsString("AsmName"),		Matches.emplace_back(Reg.TheDef->getValueAsString("AsmName"),
"return " + utostr(Reg.EnumValue) + ";");		"return " + utostr(Reg.EnumValue) + ";");
}		}

OS << "static unsigned MatchRegisterName(StringRef Name) {\n";		OS << "static unsigned MatchRegisterName(StringRef Name) {\n";

StringMatcher("Name", Matches, OS).Emit();		bool IgnoreDuplicates =
		AsmParser->getValueAsBit("AllowDuplicateRegisterNames");
		StringMatcher("Name", Matches, OS).Emit(0, IgnoreDuplicates);

OS << " return 0;\n";		OS << " return 0;\n";
OS << "}\n\n";		OS << "}\n\n";
}		}

/// Emit the function to match a string to the target		/// Emit the function to match a string to the target
/// specific register enum.		/// specific register enum.
static void emitMatchRegisterAltName(CodeGenTarget &Target, Record *AsmParser,		static void emitMatchRegisterAltName(CodeGenTarget &Target, Record *AsmParser,
Show All 14 Lines	for (auto AltName : AltNames) {

Matches.emplace_back(AltName,		Matches.emplace_back(AltName,
"return " + utostr(Reg.EnumValue) + ";");		"return " + utostr(Reg.EnumValue) + ";");
}		}
}		}

OS << "static unsigned MatchRegisterAltName(StringRef Name) {\n";		OS << "static unsigned MatchRegisterAltName(StringRef Name) {\n";

StringMatcher("Name", Matches, OS).Emit();		bool IgnoreDuplicates =
		AsmParser->getValueAsBit("AllowDuplicateRegisterNames");
		StringMatcher("Name", Matches, OS).Emit(0, IgnoreDuplicates);

OS << " return 0;\n";		OS << " return 0;\n";
OS << "}\n\n";		OS << "}\n\n";
}		}

/// emitOperandDiagnosticTypes - Emit the operand matching diagnostic types.		/// emitOperandDiagnosticTypes - Emit the operand matching diagnostic types.
static void emitOperandDiagnosticTypes(AsmMatcherInfo &Info, raw_ostream &OS) {		static void emitOperandDiagnosticTypes(AsmMatcherInfo &Info, raw_ostream &OS) {
// Get the set of diagnostic types from all of the operand classes.		// Get the set of diagnostic types from all of the operand classes.
▲ Show 20 Lines • Show All 1,125 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Give the option of tolerating duplicate register namesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 125907

llvm/trunk/include/llvm/TableGen/StringMatcher.h

llvm/trunk/include/llvm/Target/Target.td

llvm/trunk/lib/TableGen/StringMatcher.cpp

llvm/trunk/test/TableGen/AllowDuplicateRegisterNames.td

llvm/trunk/utils/TableGen/AsmMatcherEmitter.cpp

[TableGen] Give the option of tolerating duplicate register names
ClosedPublic