This matches the behavior of GNU assembler which supports symbolic expressions in absolute expressions used in assembly directives.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
What happens if you try to print the assembly back out? I.e., if you run
llvm-mc with filetype=obj?
Cheers,
Rafael
That would currently trigger an error error: expected absolute expression since MCAsmStreamer doesn't currently have any layout information. Is this something you would like to support as well?
Yes, I think supporting this is necessary, since "clang -S foo.c" requires exactly that (on platforms that use the integrated assembler).
I think to make that work, everything in AsmParser that currently uses parseAbsoluteExpression/evaluateAsAbsolute will need to be switched to parsing an arbitrary expression, and passing it on to MCStreamer as-is without evaluating. MCObjectStreamer can then assert that you've only given it an absolute expression, and evaluate it. MCAsmStreamer would just emit the expression as-is for the final assembler to deal with.
For many directives, that wouldn't be too bad. ".zero", for example. You can just add an MCStreamer::EmitFill taking two expressions, instead of two integers, and change AsmParser to call that version.
For others, that actually do something with the value other than error-check it and pass it through, it'll be somewhat more complex.
That was the alternative design that I was originally considering but I wanted to start with something simpler. I'll give it a try and see how complicated it's going to be.
If we are deferring on error reporting past the current compile, we should
emit line number directives for the user.
niravd added a subscriber: niravd.
Repository:
rL LLVM
BTW, I'd certainly recommend trying to hande only a subset of the directives. E.g. supporting the align/fill types of things, but avoiding really complex beasts ".if", ".rept", and such.
I have updated the implementation to follow the proposed approach. So far I've only implemented support for .fill, .skip, .space and .zero. I was also considering supporting .align, but this directive is a little more special due to various restrictions on various platforms and I'm not sure if the symbolic expression are actually being used in combination with .align. I have also restricted the symbolic expression support to the first argument which is the size, it might be possible to support them in other arguments as well (GNU assembler does), but again I'm not sure how useful that would be.
include/llvm/MC/MCStreamer.h | ||
---|---|---|
591 ↗ | (On Diff #58170) | Please rename to match casing of other Emit functions. (EmitSpace). |
602 ↗ | (On Diff #58170) | Why isn't this named EmitFill? It's of a distinct type from the other calls and is only for dealing with Fill directives and the distinction seems more confusing than multiple variants of EmitFill. |
605 ↗ | (On Diff #58170) | Save as above. |
lib/MC/MCAsmStreamer.cpp | ||
828 ↗ | (On Diff #58170) | Since we cannot guarantee a lack of errors at this point we should emit line directives before and after this so that we preserve the error message location with respect to the original file. |
lib/MC/MCParser/AsmParser.cpp | ||
2749 ↗ | (On Diff #58170) | Please decompose this into parseExpression and evaluateAsAbsolute to distinguish expression structure errors (should throw an error here) from value errors which we may defer. |
include/llvm/MC/MCStreamer.h | ||
---|---|---|
602 ↗ | (On Diff #58170) | EmitFill is used for emitting "fill bytes" in case of directives like .space and .zero while .fill requires a special handling. I could rename both emitSpace and emitRepeatedValue to EmitFill if you prefer since it's distinct type from the other one, I just wanted to avoid confusion, but I feel like it's going to be confusing in either case. |
lib/MC/MCAsmStreamer.cpp | ||
---|---|---|
823 ↗ | (On Diff #58170) | You don't need to this in the asm streamer. My suggestion would be to just make this a variant of emitFill. We would have one overload that takes a number and one that takes an expression. The implementations would be:
|
lib/MC/MCAsmStreamer.cpp | ||
---|---|---|
823 ↗ | (On Diff #58170) | In case of MCAsmStreamer, where would we evaluate the MCExpr expression then? Does it mean we should always print out the expression even if it's absolute? |
Should this be limited only to the .space case or .fill as well? The
.fill implementation currently emits multiple directives in the constant
case, we could instead just emit .fill directive so it's the same both in
absolute and symbol case, but that means we'd have to change the current
behavior, is that fine with you?
The updated patch implements the requested changes. The biggest difference is that we no longer perform evaluation for .fill and .skip/.space/.zero directives and even in the absolute case defer to the symbol version following Rafael's suggestion.
include/llvm/MC/MCStreamer.h | ||
---|---|---|
591 ↗ | (On Diff #58170) | I'm following the new convention rather than existing casing. I can rename the remaining Emit* functions in a separate patch. |
lib/MC/MCAsmStreamer.cpp | ||
828 ↗ | (On Diff #58170) | This should no longer be necessary as the error handling case below has been fixed. |
lib/MC/MCParser/AsmParser.cpp | ||
2749 ↗ | (On Diff #58170) | This was a leftover from the previous version, I've reverted this case to the original version. |
lib/MC/MCAsmStreamer.cpp | ||
---|---|---|
828 ↗ | (On Diff #58379) | I believe it still is. Consider the case where the symbolic expression emitted for NumBytes is eventually evaluated to a non-positive number or is invalid? Errors in the compilation of the output assembly would point to location in the output asm file not the original as it should. |
LGTM with nits.
I think the interface can be simplified further, but that should be a follow up. What I think should be done in another patch is to make the versions that take an integer be non virtual and just create and forward to the version that takes an expression.
include/llvm/MC/MCObjectStreamer.h | ||
---|---|---|
144 ↗ | (On Diff #58379) | NumBytes can be a reference, no? |
include/llvm/MC/MCStreamer.h | ||
584 ↗ | (On Diff #58379) | Size is not a parameter of this version, nor is Value. |
595 ↗ | (On Diff #58379) | This don't match the signature. |
lib/MC/MCAsmStreamer.cpp | ||
828 ↗ | (On Diff #58379) | This is the asm streamer, so it will just print a negative number in the .s. |
831 ↗ | (On Diff #58379) | git-clang-format the patch |
lib/MC/MCAsmStreamer.cpp | ||
---|---|---|
828 ↗ | (On Diff #58379) | That's not the case that I'm concerned about. Let's say it's an expression (TEST1 - TEST0 - 4) for which we don't know should cause an error until we generate an object. With this we will to output this to the file without checking. If it's a problem, we will absolutely error when we attempt to generate an object but we've lost the original location of the error and we'll instead get the location from the new file which we do not gaurantee to be the same. This breaks the property that generating an assembly and then making an object file has the same output as just generating the object file directly. |
That's not the case that I'm concerned about.
Let's say it's an expression (TEST1 - TEST0 - 4) for which we don't know should cause an error until we generate an object. With this we will to output this to the file without checking. If it's a problem, we will absolutely error when we attempt to generate an object but we've lost the original location of the error and we'll instead get the location from the new file which we do not gaurantee to be the same.
For the sake of argument lets say it is clang creating that expression.
If someone is running "clang -S test.c && clang -c test.s" we will
report an error in the .s. There is no way around it: the .s doesn't
have a way to cary the location for use in the error message.
If someone is running "clang -c test.c" we will not use the asm streamer.
This breaks the property that generating an assembly and then making an object file has the same output as just generating the object file directly.
If there is no error, that property is not broken.
If there is an error, that property doesn't exist already. It is
perfectly possible to create a .s that will not assemble.
If there is the desire to change that, what I think should happen is
for -S to create and discard an object file in memory. Without
creating an actual assembler we will never know for sure if a .s can
or cannot be assembled, but that is independent from this change.
Cheers,
Rafael
Just testing phabricator. (it seems to have ignored the previous responses
in this thread; not sure why)
I've addressed issues pointed by Rafael and added FIXME comments about the location directives.
The formating still looks funny. There are returns in the same line as
ifs, which I don't think clang format produces.
The formatting should be fixed now, seems like git-clang-format picked up a wrong style when I ran it before.