This is an archive of the discontinued LLVM Phabricator instance.

[MC] Support symbolic expressions in assembly directives
ClosedPublic

Authored by phosek on May 17 2016, 2:48 PM.

Details

Summary

This matches the behavior of GNU assembler which supports symbolic expressions in absolute expressions used in assembly directives.

Diff Detail

Repository
rL LLVM

Event Timeline

phosek updated this revision to Diff 57524.May 17 2016, 2:48 PM
phosek retitled this revision from to [MC] Support symbolic expressions in assembly directives.
phosek updated this object.
phosek set the repository for this revision to rL LLVM.
phosek added a subscriber: llvm-commits.

What happens if you try to print the assembly back out? I.e., if you run
llvm-mc with filetype=obj?

Cheers,
Rafael

I mean filetype=asm :-)

That would currently trigger an error error: expected absolute expression since MCAsmStreamer doesn't currently have any layout information. Is this something you would like to support as well?

That would currently trigger an error error: expected absolute expression since MCAsmStreamer doesn't currently have any layout information. Is this something you would like to support as well?

Yes, I think supporting this is necessary, since "clang -S foo.c" requires exactly that (on platforms that use the integrated assembler).

I think to make that work, everything in AsmParser that currently uses parseAbsoluteExpression/evaluateAsAbsolute will need to be switched to parsing an arbitrary expression, and passing it on to MCStreamer as-is without evaluating. MCObjectStreamer can then assert that you've only given it an absolute expression, and evaluate it. MCAsmStreamer would just emit the expression as-is for the final assembler to deal with.

For many directives, that wouldn't be too bad. ".zero", for example. You can just add an MCStreamer::EmitFill taking two expressions, instead of two integers, and change AsmParser to call that version.

For others, that actually do something with the value other than error-check it and pass it through, it'll be somewhat more complex.

That was the alternative design that I was originally considering but I wanted to start with something simpler. I'll give it a try and see how complicated it's going to be.

niravd added a subscriber: niravd.May 20 2016, 4:17 AM

If we are deferring on error reporting past the current compile, we should
emit line number directives for the user.
niravd added a subscriber: niravd.

Repository:

rL LLVM

http://reviews.llvm.org/D20337

That was the alternative design that I was originally considering but I wanted to start with something simpler. I'll give it a try and see how complicated it's going to be.

BTW, I'd certainly recommend trying to hande only a subset of the directives. E.g. supporting the align/fill types of things, but avoiding really complex beasts ".if", ".rept", and such.

phosek updated this revision to Diff 58170.May 23 2016, 4:48 PM

I have updated the implementation to follow the proposed approach. So far I've only implemented support for .fill, .skip, .space and .zero. I was also considering supporting .align, but this directive is a little more special due to various restrictions on various platforms and I'm not sure if the symbolic expression are actually being used in combination with .align. I have also restricted the symbolic expression support to the first argument which is the size, it might be possible to support them in other arguments as well (GNU assembler does), but again I'm not sure how useful that would be.

niravd added inline comments.May 23 2016, 7:54 PM
include/llvm/MC/MCStreamer.h
591

Please rename to match casing of other Emit functions. (EmitSpace).

602

Why isn't this named EmitFill? It's of a distinct type from the other calls and is only for dealing with Fill directives and the distinction seems more confusing than multiple variants of EmitFill.

605

Save as above.

lib/MC/MCAsmStreamer.cpp
828

Since we cannot guarantee a lack of errors at this point we should emit line directives before and after this so that we preserve the error message location with respect to the original file.

lib/MC/MCParser/AsmParser.cpp
2749

Please decompose this into parseExpression and evaluateAsAbsolute to distinguish expression structure errors (should throw an error here) from value errors which we may defer.

phosek added inline comments.May 23 2016, 8:00 PM
include/llvm/MC/MCStreamer.h
602

EmitFill is used for emitting "fill bytes" in case of directives like .space and .zero while .fill requires a special handling. I could rename both emitSpace and emitRepeatedValue to EmitFill if you prefer since it's distinct type from the other one, I just wanted to avoid confusion, but I feel like it's going to be confusing in either case.

rafael added inline comments.May 24 2016, 4:38 AM
lib/MC/MCAsmStreamer.cpp
823

You don't need to this in the asm streamer. My suggestion would be to just make this a variant of emitFill. We would have one overload that takes a number and one that takes an expression. The implementations would be:

  • In the asm streamer: the ones taking an MCExpr does the printing. The other one creates an MCExpr and forwards.
  • In the obj streamer: what you have now. Try to evaluate the expr and maybe fail. If it works, forward to the overload that takes a number.
phosek added inline comments.May 24 2016, 11:29 AM
lib/MC/MCAsmStreamer.cpp
823

In case of MCAsmStreamer, where would we evaluate the MCExpr expression then? Does it mean we should always print out the expression even if it's absolute?

phosek added a subscriber: phosek.May 24 2016, 6:02 PM

Should this be limited only to the .space case or .fill as well? The
.fill implementation currently emits multiple directives in the constant
case, we could instead just emit .fill directive so it's the same both in
absolute and symbol case, but that means we'd have to change the current
behavior, is that fine with you?

phosek updated this revision to Diff 58379.May 24 2016, 7:32 PM

The updated patch implements the requested changes. The biggest difference is that we no longer perform evaluation for .fill and .skip/.space/.zero directives and even in the absolute case defer to the symbol version following Rafael's suggestion.

phosek marked 6 inline comments as done.May 24 2016, 7:37 PM
phosek added inline comments.
include/llvm/MC/MCStreamer.h
591

I'm following the new convention rather than existing casing. I can rename the remaining Emit* functions in a separate patch.

lib/MC/MCAsmStreamer.cpp
828

This should no longer be necessary as the error handling case below has been fixed.

lib/MC/MCParser/AsmParser.cpp
2749

This was a leftover from the previous version, I've reverted this case to the original version.

phosek marked 8 inline comments as done.May 24 2016, 7:38 PM
niravd added inline comments.May 24 2016, 8:50 PM
lib/MC/MCAsmStreamer.cpp
828

I believe it still is. Consider the case where the symbolic expression emitted for NumBytes is eventually evaluated to a non-positive number or is invalid? Errors in the compilation of the output assembly would point to location in the output asm file not the original as it should.

rafael accepted this revision.May 25 2016, 6:32 AM
rafael added a reviewer: rafael.

LGTM with nits.

I think the interface can be simplified further, but that should be a follow up. What I think should be done in another patch is to make the versions that take an integer be non virtual and just create and forward to the version that takes an expression.

include/llvm/MC/MCObjectStreamer.h
144

NumBytes can be a reference, no?

include/llvm/MC/MCStreamer.h
584

Size is not a parameter of this version, nor is Value.

595

This don't match the signature.

lib/MC/MCAsmStreamer.cpp
828

This is the asm streamer, so it will just print a negative number in the .s.

831

git-clang-format the patch

This revision is now accepted and ready to land.May 25 2016, 6:32 AM
niravd added inline comments.May 25 2016, 7:03 AM
lib/MC/MCAsmStreamer.cpp
828

That's not the case that I'm concerned about.

Let's say it's an expression (TEST1 - TEST0 - 4) for which we don't know should cause an error until we generate an object. With this we will to output this to the file without checking. If it's a problem, we will absolutely error when we attempt to generate an object but we've lost the original location of the error and we'll instead get the location from the new file which we do not gaurantee to be the same.

This breaks the property that generating an assembly and then making an object file has the same output as just generating the object file directly.

That's not the case that I'm concerned about.

Let's say it's an expression (TEST1 - TEST0 - 4) for which we don't know should cause an error until we generate an object. With this we will to output this to the file without checking. If it's a problem, we will absolutely error when we attempt to generate an object but we've lost the original location of the error and we'll instead get the location from the new file which we do not gaurantee to be the same.

For the sake of argument lets say it is clang creating that expression.

If someone is running "clang -S test.c && clang -c test.s" we will
report an error in the .s. There is no way around it: the .s doesn't
have a way to cary the location for use in the error message.

If someone is running "clang -c test.c" we will not use the asm streamer.

This breaks the property that generating an assembly and then making an object file has the same output as just generating the object file directly.

If there is no error, that property is not broken.
If there is an error, that property doesn't exist already. It is
perfectly possible to create a .s that will not assemble.

If there is the desire to change that, what I think should happen is
for -S to create and discard an object file in memory. Without
creating an actual assembler we will never know for sure if a .s can
or cannot be assembled, but that is independent from this change.

Cheers,
Rafael

Just testing phabricator. (it seems to have ignored the previous responses
in this thread; not sure why)

phosek updated this revision to Diff 58445.May 25 2016, 9:53 AM
phosek edited edge metadata.
phosek marked 4 inline comments as done.

I've addressed issues pointed by Rafael and added FIXME comments about the location directives.

The formating still looks funny. There are returns in the same line as
ifs, which I don't think clang format produces.

phosek updated this revision to Diff 58493.May 25 2016, 1:36 PM

The formatting should be fixed now, seems like git-clang-format picked up a wrong style when I ran it before.

This revision was automatically updated to reflect the committed changes.