- This patch adds in support to determine whether a particular label is valid for the hlasm variant
- The label syntax being checked is that of an ordinary HLASM symbol (Reference, Chapter 2 (Coding and Structure) - Terms, Literals and Expressions - Terms - Symbols - Ordinary Symbol)
- To achieve this, the virtual function isLabel defined in MCTargetAsmParser.h is made use of
- The isLabel function is overridden in SystemZAsmParser for the hlasm variant, and the syntax is checked appropriately
- Things remain unchanged for the att variant
- Further patches will add in support to emit the label. These future patches will make use of this isLabel function
Details
Diff Detail
Event Timeline
llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp | ||
---|---|---|
1609 | Checking the reverse condition and doing the quick exit first reduces the indentation level of almost the entire function body. Please consider it. | |
1610 | Missing period at the end of the sentence. | |
1611 | Missing "label" after HLASM? | |
1614 | Please fix the spelling for "alphanumeric". Also, I don't think the hyphen should be used here. If the "62" is a reference to the possible length, then the sentence is missing "up to". Also, by "alphabetic", you mean (aside from _) 'considered alphabetic in the "C" locale'? | |
1616 | Let's stick with "alphabetic"? Also, add the period to end the sentence. | |
1617 | This is redundant with point 1 unless if some more subtle point is being made. | |
1618–1620 | Please fix various typos, missing commas, and formatting issues. | |
1623 | It might not hurt to check that the string isn't empty somehow. | |
1630 | Is this supposed to be a call to the locale-sensitive check? LLVM has an isAlpha function. | |
1635 | s/an alphabet/alphabetic/; | |
1637 | Minor nit: Use prefix ++. | |
1638 | See my comment about isalpha versus isAlpha. |
llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCAsmInfo.h | ||
---|---|---|
17 | Hmm, if we move this to a more "public" location we might want to rename the type to something like SystemZAsmDialect to avoid potential conflicts. |
Thank you reviewers for your feedback! I will address them shortly.
An additional point:
I checked the V1R6 Reference manual again and I seem to have forgotten to include a check for "National Characters" ($,@ and #). I would like to remedy this by adding two private inline functions isHLASMAlpha and isHLASMAlnum which encapsulates the checks for what is an "HLASM" alphabet, and use them in the overridden isLabel routine. Is this acceptable?
llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp | ||
---|---|---|
1614 | Yes. By "alphabetic", aside from "_" it is what is considered alphabetic in the "C" locale (a character in the range 'a' to 'z' or 'A' to 'Z') | |
1630 | As mentioned above, the intent of the call to isalpha is to check whether the character is alphabetic according to the c-locale. If it is preferred to use the isAlpha function provided in StringExtras.h I can use it. | |
1638 | Same comment as above. If it is preferred to use the LLVM isAlnum I can use that. |
This raises an issue having to do with the encoding used for string literals. It seems that the XL compiler processes the string like it would for emission into the object file and then the assembly is parsed as IBM-1047.
For example:
%:pragma filetag("IBM-1143") %:include <stdio.h> int main(void) ??< unsigned a = 0; %:pragma convert("IBM-1026") __asm__( "LAB??/x5B??/x7B??/x7C AFI %0,C'A' " : "+r"(a) : ); printf("%02x??/n", a); ??>
parses okay because 0x5B, 0x7B, and 0x7C are the encodings for the "National Characters" in IBM-1047.
llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp | ||
---|---|---|
1614 | I think adding (in the "C" locale) to the comment would help clarify that. | |
1630 | I believe isAlpha is preferred (because isalpha will give an answer based on the current locale). | |
1638 | Same response: isAlnum is a better match in terms of intent. |
- Fixed grammatical errors in various comments and in some cases, made them more descriptive.
- Added support to check for national characters ($,@,#) along with "_". To achieve this, introduced two new private inline functions called isHLASMAlpha and isHLASMAlnum
llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp | ||
---|---|---|
1623 | I would assume the parsing support which will eventually make use of this routine, will ensure that we don't call it if its empty. But like you mentioned, there's no harm in checking if it isn't empty. I've added the check. |
Thank you for bringing this up Hubert!
For now, We don’t have a firm decision on whether we will implement/support pragma convert. I understand that if it is supported, it will be a problem since the character values for the national characters will be different based on the encoding. At a future point, when we have more information, we can come back and review whether certain applicable functions would need to take the encoding into account, or, whether it can be designed in a way such that for example, everything is encoded in UTF-8 and the emission/translation is handled elsewhere (preferably much earlier than the respective backend). For now, I'm assuming the current implementation is fine.
And the current implementation here expects UTF-8 (or that's what the internal character encoding is assumed to be anyway). There's a patch for -fexec-charset and the string being parsed here went into the IR in what form?
And the current implementation here expects UTF-8 (or that's what the internal character encoding is assumed to be anyway). There's a patch for -fexec-charset and the string being parsed here went into the IR in what form?
Currently, the asm string goes into the IR in ASCII, irrespective of the -fexec-charset option.
LGTM.
The mentioned issue with the encoding does not affect this change. It's nevertheless a very valuable hint, as there are some implications with char/string literals inside an inline asm string, but this is not part of this change.
Numeric escapes will be a problem (because any usage of such in HLASM inline assembly is not likely to be encoding for UTF-8). I don't know how common such usage is.
Yes, the defined semantics associated with constructs like CE'^' were designed to work from a baseline where the HLASM source is in EBCDIC. With XL, inline assembly just doesn't work with -qASCII without also having the associated strings "converted" to EBCDIC by user intervention of some sort. So now there's a need to convert only some of the characters from the inline assembly to EBCDIC from UTF-8 at some point (which needs to be one where we know what EBCDIC encoding to convert to). If there's no EBCDIC encoding to convert to (a -qASCII case), then an error seems reasonable.
but this is not part of this change.
Agreed.
Checking the reverse condition and doing the quick exit first reduces the indentation level of almost the entire function body. Please consider it.