This is an archive of the discontinued LLVM Phabricator instance.

[ms] [llvm-ml] Add initial MASM STRUCT/UNION support
ClosedPublic

Authored by epastor on Feb 27 2020, 2:15 PM.

Details

Summary

Add support for user-defined types to MasmParser, including initialization and field access.

Known issues:

  • Omitted entry initializers (e.g., <,0>) do not work consistently for nested structs/arrays.
  • Size checking/inference for values with known types is not yet implemented.
  • Some ml64.exe syntaxes for accessing STRUCT fields are not recognized.
    • [<register>.<struct name>].<field>
    • [<register>[<struct name>.<field>]]
    • (<struct name> PTR [<register>]).<field>
    • [<variable>.<struct name>].<field>
    • (<struct name> PTR <variable>).<field>

Diff Detail

Event Timeline

epastor created this revision.Feb 27 2020, 2:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2020, 2:15 PM
epastor updated this revision to Diff 247376.Feb 28 2020, 3:25 PM

Switch away from std::variant to allow compilation without C++17

epastor updated this revision to Diff 247496.Mar 1 2020, 5:32 AM

Address remaining C++17-only usage (non-void emplace_back), and fix clang-tidy errors.

epastor updated this revision to Diff 247497.Mar 1 2020, 5:36 AM

Formatting fixup

epastor updated this revision to Diff 247500.Mar 1 2020, 5:51 AM

Rebasing to HEAD

rnk added a subscriber: rnk.Mar 2 2020, 10:23 AM
rnk added inline comments.
llvm/lib/MC/MCParser/MasmParser.cpp
167

I think this union is creating a fair amount of code complexity. The more clang-y code pattern would be to do something like:

  • Use a BumpPtrAllocator for memory allocation to make these all owned by the asm parser, there should be one already somewhere
  • Use raw pointers to point between these classes, avoids need for unique_ptr or other ownership management
  • Set up the usual class hierarchy: class FieldInitializer { FieldType FT; ... }; class IntFieldInit : public FieldInitializer { ... }; class RealFieldInit : public FieldInitializer { ... }; class StructFieldInit : public FieldInitializer { ... };

Then the parseFieldInitializer variants could return a nullable pointer to indicate parsing success or failure. WDYT?

The drawback of this approach is that objects in BumpPtrAllocators are not destroyed, so they cannot own external resources. Most vectors can just become ArrayRefs pointing to arrays in the same allocator. The main thing I see that this would affect is the StringMap for fields and APInt for int fields. So, I guess I'm not sure it's worth a big rewrite, but it's something to consider.

epastor marked an inline comment as done.Apr 22 2020, 7:43 AM
epastor added inline comments.
llvm/lib/MC/MCParser/MasmParser.cpp
167

I agree it would might be worth the change - but the rewrite looks to be pretty big, and complicated due to several of the members in these objects. Let's move forward in a full review for now, and discuss the right answer as we go.

epastor updated this revision to Diff 259288.Apr 22 2020, 7:51 AM

Rebase to HEAD

Sorry it took a while. This is a good chunk of work, thanks for doing it. Smaller changes are faster to review :)

As a general point, before sending out a change I'd suggest doing a detailed read by yourself to find things like typos in comments, notice things like needless duplication (sometimes, it's the right tradeoff), etc.

llvm/include/llvm/MC/MCParser/MCAsmParser.h
173

nit: LookUpFieldOffset. "Lookup" is a noun, "look up" is the verb.

llvm/lib/MC/MCParser/MasmParser.cpp
116

Can you add comments what each means? Since REAL is an APInt below, I guess it's also an integer type?

123

Should this be a SmallVector?

(The clang approach would probably to allocate field infos right after the struct info so that it's in one slab, but probably not worth it here.)

167

Sounds good to me, but maybe add a FIXME with rnk's suggestion.

211

s/integral values/struct values/

...but why pass in a FieldType at all if only one value is valid? Why not just

FieldInitializer(SmallVector<const MCExpr *, 1> &&Values) : FT(FT_INTEGRAL) {
  new (&IntInfo) IntFieldInfo(Values);
}

?

219

same

225

same

233

same

357

can you use llvm::alignTo() for this?

403

sentence case

406

same

3527

This is identical to the else you added in the previous function as far as I can tell. Can you change this to be less copy-pasty? (extract function, or make one of the two dispatch to the other, or what have you)

3540

this is very nested. maybe try this structure:

if lcurly
  return parseFieldInitializerCurly()
if less
  return parseFieldInitializeLess
if ...

i.e. add helper functions, with comments. If you find yourself at nesting depth 6, that's often a sign that extracting a function might be a good idea. I find "every part of a function should be at the same level of abstraction" a good rule of thumb fairly often.

3605

This looks very similar to the previous function.
extract the common stuff into helper functions.

3669

same

llvm/test/tools/llvm-ml/struct.test
10

can you use some variation in keyword and identifier case to add test coverage for this all being case-insensitive?

epastor updated this revision to Diff 262110.May 5 2020, 7:49 AM

Address comments; details in inline responses.

Sorry it took a while. This is a good chunk of work, thanks for doing it. Smaller changes are faster to review :)

As a general point, before sending out a change I'd suggest doing a detailed read by yourself to find things like typos in comments, notice things like needless duplication (sometimes, it's the right tradeoff), etc.

With apologies... I did do a detailed read. I didn't spot the things you did. I suppose that's what review is for, right?

epastor marked 21 inline comments as done.May 5 2020, 7:59 AM
epastor added inline comments.
llvm/include/llvm/MC/MCParser/MCAsmParser.h
173

Changed. However... there are several functions in other files that use "lookup" rather than "lookUp" in a verb context. I'd been trying to match those.

llvm/lib/MC/MCParser/MasmParser.cpp
123

It can't be a SmallVector, since FieldInfo is an incomplete type.

211

Good point; fixed!

357

That is EXTREMELY useful, and I didn't realize it existed! Thanks!

3527

Thanks for highlighting this; I'd missed an easy way to clean up the repetition.

3540

Thank you very much for highlighting this. I'd been very aware of it as a problem at the time, but failed to find a good way to clean it up. With your prompting and fresh eyes, I've been able to find a much cleaner approach.

epastor updated this revision to Diff 262118.May 5 2020, 8:08 AM
epastor marked 5 inline comments as done.

Rebase against HEAD

Sorry, I looked through this a while ago but forgot to hit "Submit" :(

This is looking pretty good. Only the test coverage / LengthOf==1 questions below left really.

It's weird that both parsing and emission are in the same (giant) class, but that's not new at least.

llvm/include/llvm/MC/MCParser/MCAsmParser.h
173

Ah, local consistency is a good reason. No need to change it next time when I say something like that (but no need to change it back either)

llvm/lib/MC/MCParser/MasmParser.cpp
320

could you add comments what Type and LengthOf mean? Offset and SizeOf are self-explanatory, I can guess at LengthOf, but Type makes me come up empty.

3343

This is again identical to the added block in the previous function; extract a addFieldToCurrentStruct(IdVal, Size) or addIntegralField (for consistency with the Real version below) function. (Is Name / NameLocnot used here in the StructInProgress case?)

3624

We're in a LengthOf == 1 block. How could we hit this? Does anything in here modify Field?

3631

We're in a LengthOf == 1 block. How could we hit this? Does anything in here modify Field?

3687

What if it's something else? This is user-controlled and could have arbitrary contents, no?

epastor updated this revision to Diff 264008.May 14 2020, 8:50 AM
epastor marked 7 inline comments as done.

Addressing feedback, and rebasing to HEAD.

epastor added inline comments.May 14 2020, 8:54 AM
llvm/lib/MC/MCParser/MasmParser.cpp
3343

Good point. Done.

3624

Sorry, copy/paste error. Fixed.

3687

Yes, but in that case parseToken() will correctly fail due to not finding LessGreater... in which case we'll return "true" to pass the error upwards.

parseToken() is an imperative and returns true if it fails, unlike parseOptionalToken() which is an attempt and returns true if it succeeds.

Thanks, the code looks close to done to me now. Can you add a tests that covers most of the error cases as well?

llvm/lib/MC/MCParser/MasmParser.cpp
3323

.

3457

.

…also in many other places. curl https://reviews.llvm.org/file/data/mrzekpri33kb4kddo3tr/PHID-FILE-ejca6nujwdhbsxqgces2/D75306.diff | grep '^\+.*// [A-Z].*[^.]$' shows some of them.

3698

llvm style says no else after return/continue/break (https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code) (just put the else's body's contents right after the if and dedent)

epastor updated this revision to Diff 274814.Jul 1 2020, 7:52 AM

Rebase to HEAD

epastor updated this revision to Diff 275719.Jul 6 2020, 8:09 AM

Improve error handling and add tests

Also fix comment formatting, and address other feedback.

epastor marked an inline comment as done.Jul 6 2020, 8:13 AM

Thanks, Nico! I've added test coverage for many (most?) of the error cases as well.

thakis accepted this revision.Jul 7 2020, 10:54 AM
thakis added inline comments.
llvm/test/tools/llvm-ml/struct_errors.test
10 ↗(On Diff #275719)

This diag could probably add "expected at most %d elements, got %d" at the end

48 ↗(On Diff #275719)

this could maybe say "alignment must be power of two, got %d" or similar (I realize it's an existing diag)

This revision is now accepted and ready to land.Jul 7 2020, 10:54 AM
epastor updated this revision to Diff 276185.Jul 7 2020, 12:43 PM

Improve error messages and fix a missing OnFailure case

This revision was automatically updated to reflect the committed changes.
uabelho added a subscriber: uabelho.Jul 8 2020, 1:58 AM
uabelho added inline comments.
llvm/lib/MC/MCParser/MasmParser.cpp
3675

gcc (7.4.0) warns with
../lib/MC/MCParser/MasmParser.cpp:3676:1: warning: control reaches end of non-void function [-Wreturn-type]
here.
It can be silenced with an llvm_unreachable if that's acceptable.

3828

gcc 7.4.0 warning here too
../lib/MC/MCParser/MasmParser.cpp:3829:1: warning: control reaches end of non-void function [-Wreturn-type]

3830

gcc warns that this method is unused. Will it be used or can it perhaps be removed?

3907

gcc 7.4.0 warning here too
../lib/MC/MCParser/MasmParser.cpp:3908:1: warning: control reaches end of non-void function [-Wreturn-type]