This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj] Add -D k=v to preprocess the input YAML
ClosedPublic

Authored by MaskRay on Jan 31 2020, 5:23 PM.

Details

Summary

Examples:

yaml2obj -D MACHINE=EM_386 a.yaml -o a.o
yaml2obj -D MACHINE=0x1234 a.yaml -o a.o

where a.yaml contains:

--- !ELF
FileHeader:
  Class:   ELFCLASS64
  Data:    ELFDATA2MSB
  Type:    ET_REL
  Machine: [[MACHINE]]

Diff Detail

Event Timeline

MaskRay created this revision.Jan 31 2020, 5:23 PM
MaskRay updated this revision to Diff 241854.Jan 31 2020, 5:27 PM

drop a redundant test

Unit tests: fail. 62372 tests passed, 2 failed and 839 were skipped.

failed: LLVM.tools/yaml2obj/ELF/unused-overridden.yaml
failed: libc++.std/thread/thread_mutex/thread_mutex_requirements/thread_mutex_requirements_mutex/thread_mutex_class/try_lock.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 5 warnings. 0 of them are added as review comments below (why?).

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Unit tests: pass. 62374 tests passed, 0 failed and 839 were skipped.

clang-tidy: fail. clang-tidy found 0 errors and 5 warnings. 0 of them are added as review comments below (why?).

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

MaskRay updated this revision to Diff 241864.Jan 31 2020, 11:52 PM
MaskRay edited the summary of this revision. (Show Details)

Improve tests

Unit tests: fail. 62379 tests passed, 1 failed and 839 were skipped.

failed: libc++.std/thread/thread_mutex/thread_mutex_requirements/thread_mutex_requirements_mutex/thread_mutex_recursive/try_lock.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 5 warnings. 0 of them are added as review comments below (why?).

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

grimar added a comment.EditedFeb 3 2020, 1:34 AM

Sorry, but I'd probably prefer to have a different approach.
Something that works like a preprocessor. We could have:

--- !ELF
FileHeader:
  [[FOO]]:   ELFCLASS32
  Data:    ELFDATA2LSB
  Type:    ET_REL
  Machine: [[MACHINE]]

And then have a code that replaces `[[MACHINE]]', '[[FOO]]' strings in the input data
with something that is specified with -D option. e.g.

yaml2obj -DMACHINE="XXX" -DFOO="Class"

gives

--- !ELF
FileHeader:
  Class:   ELFCLASS32
  Data:    ELFDATA2LSB
  Type:    ET_REL
  Machine: XXX

It is much more generic (can be used for all targets, fields, etc), and it should be much simpler to implement.
As a bonus we will be able to modify any part of the YAML. It also makes inputs to be more clear,
because currently it is not obvious that some part is overridable.

And I do not have any concerns about the fact we modify the input YAML description here,
because the usage of such construction is limited to our test cases
and we should not met issues with it. When there is no -D option given, no preprocessing should happen. Should be safe enough.

I mostly agree with @grimar, although I don't think it's necessary to support arbitrary Key names, only Value names. Should the proposal (in whatever form it takes) go up on the mailing lists as an RFC? It feels like a useful feature that more people might want to talk about.

MaskRay updated this revision to Diff 242158.Feb 3 2020, 12:26 PM
MaskRay retitled this revision from [yaml2obj] Support enumeration() with context and add -D e_machine= to override the value in YAML to [yaml2obj] Add -D k=v to preprocess the input YAML.
MaskRay edited the summary of this revision. (Show Details)

Repurpose.

Adopt grimar's suggestion

Unit tests: fail. 62422 tests passed, 1 failed and 845 were skipped.

failed: LLVM.Analysis/ConstantFolding/vscale-shufflevector.ll

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

grimar added a comment.Feb 4 2020, 1:13 AM

I am going to debug this. My first impression is that the direction is right,
but it needs a bit of polishing.
Will add more comments soon.

llvm/test/tools/yaml2obj/ELF/emachine.yaml
1

This is a refactoring change, right? I'd leave such changes for a follow-up patch(es).

llvm/test/tools/yaml2obj/preprocess.yaml
10 ↗(On Diff #242158)

It is probably usefull to add cases with more than one -D:

-D FOO=x -D BAR=y
-D FOO=x -D FOO=y
llvm/tools/yaml2obj/yaml2obj.cpp
35

Seems the description should start from the upper case letter and perhaps I'd reword it somehow.
It is not a list it seems? It is a single redefinition as far I understand.

45

Perhaps Substitutions or may be just Defines? Up to you,
but -D option is closer to the latter word I think.

47

What is F stands for? I`d rename to Name and Value..

jhenderson added inline comments.Feb 4 2020, 1:24 AM
llvm/test/tools/yaml2obj/preprocess.yaml
1 ↗(On Diff #242158)

Perhaps also worth a test case that "-DNAME=..." is accepted, since that matches the compiler syntax, so would be a likely usage.

22 ↗(On Diff #242158)

I'm not sure this comment line adds anything. My immediate reaction was "what's this test got to do with FileCheck?"

llvm/tools/yaml2obj/yaml2obj.cpp
35

I'd change this help text to "Defined the specified macros to their specified definition. The syntax is <macro>=<definition>".

47

Let's use Macro and Definition

The "field/value" naming doesn't really make sense since the new functionality is much more general purpose.

50

This error should a) mention the switch name, and b) probably not use the word "override" since nothing is being overridden as such. How about "invalid syntax for -D: xxx" (where xxx is the input string). If you think you need extra context, I'd split "missing '=' from "missing macro name".

60

I != StringRef::npos

(@grimar got in with comments as I was writing mine, so sorry for any duplication!)

grimar added inline comments.Feb 4 2020, 3:37 AM
llvm/test/tools/yaml2obj/preprocess.yaml
26 ↗(On Diff #242158)

This does not work for me under windows.

echo -e 'a[[\n[[a\n[[a]\n[[a][[a]][[a]]' | cmp - %t.nosubst

I see the following when I run this line:

D:\Work3\LLVM\llvm-project\build\test\tools\yaml2obj\Output>echo -e 'a[[\n[[a\n[
[a]\n[[a][[a]][[a]]' | cmp - preprocess.yaml.tmp.nosubst
- preprocess.yaml.tmp.nosubst различаются: байт 1, строка 1

What means that byte 1 at the line 1 is different from the source. (My apologies, I do not know how to
switch to displaying errors in English for cmp).

I did not dig deeper because anyways I do not like this way of checking symbol names (sorry :),
so I'd ask you to rewrite in a more traditional way.

llvm/tools/yaml2obj/yaml2obj.cpp
53
Substitution[F] = V;

try_emplace is not needed to be used here, is't? StringRef is a cheap type for copying and
moving concept does not make sense at all I think.

75

I'd suggest something like the following

std::vector<std::pair<std::string, StringRef>> Substitution;
  for (StringRef Define : D) {
    StringRef F, V;
    std::tie(F, V) = Define.split('=');
    if (!Define.count('=') || F.empty()) {
      ErrHandler("bad override, missing field name: " + Define);
      return {};
    }
    Substitution.push_back({(Twine("[[") + F + "]]").str(), V});
  }

  std::string Document = Buf.str();
  if (Substitution.empty())
    return Document;

  for (std::pair<std::string, StringRef> &P : Substitution) {
    while (true) {
      auto I = Document.find(P.first);
      if (I == std::string::npos)
        break;
      Document = Document.replace(I, P.first.size(), P.second.str());
    }
  }
MaskRay updated this revision to Diff 242394.Feb 4 2020, 11:54 AM
MaskRay marked 12 inline comments as done.

Address review comments.

llvm/test/tools/yaml2obj/preprocess.yaml
26 ↗(On Diff #242158)

Switched to

echo -e '.........' > ...
diff -u ... %t.nosubst

Unfortunately we cannot use FileCheck because FileCheck does not like [[ which is not used as a FileCheck pattern.

llvm/tools/yaml2obj/yaml2obj.cpp
75

Maybe we should avoid quadratic complexity behavior, just in case that there are a long run of [[[[[[[[[[.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

MaskRay updated this revision to Diff 242441.Feb 4 2020, 3:09 PM
MaskRay marked an inline comment as done.

Simplify with Buf.find_first_of

(OT: StringRef::find_first_of is O(n * 256) but the method can be imroved.)

MaskRay updated this revision to Diff 242442.Feb 4 2020, 3:12 PM

Simplify...

Unit tests: unknown.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Unit tests: unknown.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

grimar added inline comments.Feb 5 2020, 12:49 AM
llvm/test/tools/yaml2obj/preprocess.yaml
26 ↗(On Diff #242158)

Unfortunately we cannot use FileCheck because FileCheck does not like [[ which is not used as a FileCheck pattern.

Ah, I see. I did not know that. That is unfortunate behavior.

llvm/tools/yaml2obj/yaml2obj.cpp
75

just in case that there are a long run of [[[[[[[[[[.

It reminded me about D36307. I believe that [[[[[[ is a misuse of the feature and hence we probably should not care much.
The approach I suggested is just very simple and it is much easier to read. I think the common use case for this feature will be
"replace 1-2 defines in a short enough YAML", i.e. I am not sure we should think about an additional optimizations here.

Probably it is time to hear a third opinion though. @jhenderson, what do you think?

MaskRay marked an inline comment as done.Feb 5 2020, 1:02 AM
MaskRay added inline comments.
llvm/tools/yaml2obj/yaml2obj.cpp
75

Doing replace repeatedly may also have a non-convergence problem...

jhenderson added inline comments.Feb 5 2020, 6:27 AM
llvm/tools/yaml2obj/yaml2obj.cpp
75

Recursive replacement causes issues. For example -DFOO=[[FOO]] would result in an infinite loop. Convergence issues are only a problem if we try to do this. I personally don't think we should in the first version.

Another issue is what to do about variables that expand to other variables. One example is -D VAR1=FOO -D VAR2=[[VAR1]]BAR which will result in [[VAR2]] becoming [[VAR1]]BAR if specified in that order, or FOOBAR if VAR2 is defined first. I don't think we want this behaviour, as it is unstable and unobvious.

(Relatedly, I think we should have a test showing that variables that expand to a pattern that could be another variable aren't recursively expanded)

As for the quadratic complexity, I'm not too bothered by it in this case, as I doubt we'll ever get a YAML doc in a test where it really matters (and even if we do, there won't be many, so the overall slow-down in the testsuite will be minimal).

FWIW, I think we can probably forbid names containing '[' or ']' themselves.

MaskRay updated this revision to Diff 242668.Feb 5 2020, 9:39 AM

Improve a test

MaskRay marked an inline comment as done.Feb 5 2020, 9:42 AM
MaskRay added inline comments.
llvm/tools/yaml2obj/yaml2obj.cpp
75

Added the following to macro.yaml

# RUN: yaml2obj --docnum=2 -D a0='[[a1]]' -D a1='[[a0]]' %s | llvm-nm --just-symbol-name --no-sort - > %t.recur
# RUN: echo -e '[[a1]]\n[[a0]]' > %t0.recur
# RUN: diff -u %t0.recur %t.recur

This patch avoids quadratic complexity and non-convergence with very little code, so I think sticking with the current version is fine.

Unit tests: fail. 62504 tests passed, 8 failed and 844 were skipped.

failed: LLVM.tools/yaml2obj/help.test
failed: lld.ELF/compressed-input-alignment.test
failed: lld.ELF/invalid/bad-reloc-target.test
failed: lld.ELF/invalid/common-symbol-alignment.test
failed: lld.ELF/invalid/dynamic-section-broken.test
failed: lld.ELF/invalid/symtab-sh-info.s
failed: lld.ELF/mips-elf-flags-err.test
failed: lld.ELF/mips-fp-flags-err.test

clang-tidy: pass.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

OK. I have 2 non-strong comments/suggestions about the current code.

llvm/tools/yaml2obj/yaml2obj.cpp
45

Should we exit early for a general case?

if (D.empty())
  return Buf.str();

It might simplify things for those who want to debug the code.

65

Here is a little logical issue. If we have input like "[[AAA[",
then you still assign "AAA" to Macro and perform a lookup,
but at fact "AAA" is not a macro and you do not need to do a lookup for it.

MaskRay marked 2 inline comments as done.Feb 6 2020, 12:44 AM
MaskRay added inline comments.
llvm/tools/yaml2obj/yaml2obj.cpp
45

When an early exit is just for the fast path (for the common case), I prefer not to add them.

For example,

if (thing.empty())
  return;
for (... : thing) {
  do something
}

I think if (thing.empty()) is unnecessary, unless the code is a critical path and the early return thing has significant performance improvement.

65

Yes. The if can be split to avoid a lookup. It will add two lines. I can do that.

grimar added inline comments.Feb 6 2020, 1:00 AM
llvm/tools/yaml2obj/yaml2obj.cpp
45

I see what you mean, though here the situation is a bit different probably: too many code is involved, i.e. it is not a trivial loop from your example. it is somehting much more. I do not insist here though, so up to you.

65

I am not concerned about the perfomance at all here, but I'd do that for code readability as it is a bit too noticable and IMO might look like a possible issue for a new readers.

MaskRay updated this revision to Diff 242907.Feb 6 2020, 7:59 AM

Split an if

MaskRay marked 2 inline comments as done.Feb 6 2020, 7:59 AM
grimar accepted this revision.Feb 6 2020, 11:38 PM

LGTM, thanks! Please wait for @jhenderson too.

This revision is now accepted and ready to land.Feb 6 2020, 11:38 PM
jhenderson accepted this revision.Feb 7 2020, 1:11 AM

LGTM too!

This revision was automatically updated to reflect the committed changes.