This is an archive of the discontinued LLVM Phabricator instance.

[MC] Parse directives with arguments as macro arguments
AbandonedPublic

Authored by jcai19 on Mar 27 2020, 5:34 PM.

Details

Summary

Currently when expanding an macro that takes directives as arguments,
the integrated assembler treats the name of a directive and its
arguments separated by space as separate arguments to the macro instead
of one argument. For example, the following sample code will fail to
assemble since .section and .foo are treated as two arguments to the
alternative_inst macro.

.macro alternative_insn insn
\insn
+.endm

+alternative_insn .section .foo

This change fixes such issue by identifying directives used as macro
arguments and include their arguments as part of the macro argument,
such as https://github.com/ClangBuiltLinux/linux/issues/939.

Diff Detail

Event Timeline

jcai19 created this revision.Mar 27 2020, 5:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2020, 5:34 PM
jcai19 updated this revision to Diff 253264.Mar 27 2020, 5:40 PM

Update summary to include the bug reported.

jcai19 edited the summary of this revision. (Show Details)
jcai19 added subscribers: manojgupta, llozano.
jcai19 updated this revision to Diff 253267.Mar 27 2020, 5:57 PM

Remove a blank line.

Thanks for the patch!

llvm/lib/MC/MCParser/AsmParser.cpp
2635

Is this the most precise way to check for directives? I see DirectiveKind, and DirectiveKindMap, maybe there's a way to verify that the Tok is a valid directive?

jcai19 marked an inline comment as done.Mar 27 2020, 6:04 PM
jcai19 added inline comments.
llvm/lib/MC/MCParser/AsmParser.cpp
2635

Right that's a good point I'll investigate further.

Harbormaster failed remote builds in B50755: Diff 253261!
Harbormaster failed remote builds in B50758: Diff 253267!
MaskRay requested changes to this revision.Mar 27 2020, 11:48 PM

This change looks more like a hack rather than a proper solution. It does not even improve GNU as compatibility. With this patch, MC will accept alternative_insn .section .foo, .section .bar, however, this is rejected by GNU as.

GNU as preprocesses a line with do_scrub_chars. For the input .inst (0), do_scrub_chars will delete the space before ( and make it .inst(0).
macro_expand will read .inst(0) as an actual parameter (the process is rather hacky: .init is tokenized first but then get_any_string reads .inst(0))

This change just makes the code even harder to understand, I think. I don't know how to fix the problem in general. We probably should work around it in the Linux kernel. One approach is to take ihalip's suggestion on https://github.com/ClangBuiltLinux/linux/issues/939#issuecomment-601776123. Another is to quote actual parameters with double quotes, i.e. alternative_insn ".section .foo", ".section .bar"

This revision now requires changes to proceed.Mar 27 2020, 11:48 PM
jcai19 updated this revision to Diff 253461.Mar 29 2020, 3:02 PM

Change how to identify directives.

jcai19 marked an inline comment as done.Mar 29 2020, 3:20 PM

This patch is not for GNU-compatibility. GNU's handling of spaces between directives and their arguments when expanding macros are confusing IMO -- it only supports selective directives. For example, it supports .inst (XXX) for aarch64, but not .section .foo or .space 0xf on x86 as you mentioned. For .section, if works if you do not have any space between .section and the argument, and surround it with parenthesis, but them will also be included as part of the section name, e.g. alternative_insn .section(.foo), .section(.bar) will work on gcc, but the object file will have (.foo) and (.bar) sections. I think we should either unconditionally support all of such directives, or not at all. I am fine with either option, and would happily close this if one of the workarounds is accepted into Linux.

llvm/lib/MC/MCParser/AsmParser.cpp
2635

DirectiveKindMap does not include all the directives, such .section used in this example or .inst for aarch64.

This patch is not for GNU-compatibility. GNU's handling of spaces between directives and their arguments when expanding macros are confusing IMO -- it only supports selective directives. For example, it supports .inst (XXX) for aarch64, but not .section .foo or .space 0xf on x86 as you mentioned. For .section, if works if you do not have any space between .section and the argument, and surround it with parenthesis, but them will also be included as part of the section name, e.g. alternative_insn .section(.foo), .section(.bar) will work on gcc, but the object file will have (.foo) and (.bar) sections. I think we should either unconditionally support all of such directives, or not at all. I am fine with either option, and would happily close this if one of the workarounds is accepted into Linux.

You may not follow my comment. It is not that GNU as special cases certain directives, but rather it has a subtle do_scrub_chars rule.

.macro alternative_insn insn1, insn2
\insn1
\insn2
.endm

alternative_insn .section .foo, .section .bar       # Error: too many positional arguments
alternative_insn ".section .foo", ".section .bar"  # accepted

arch/arm64/include/asm/sysreg.h __emit_inst(x) really is likely a use of an unintentional GNU as "feature". The right solution is simply to quote the macro arguments. The rule as implemented by this patch can just make the clang vs GNU as differences more confusing.

MaskRay requested changes to this revision.Mar 29 2020, 10:31 PM
This revision now requires changes to proceed.Mar 29 2020, 10:31 PM
jcai19 abandoned this revision.Apr 1 2020, 6:00 PM

This patch is motived by the issue exposed by https://github.com/ClangBuiltLinux/linux/issues/939. It is however believed to better fix the issue on Linux instead, so close this revision.