This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj] - Add `ELFYAML::YAMLIntUInt` to fix how we parse a relocation `Addend` key.
ClosedPublic

Authored by grimar on Mar 3 2020, 7:40 AM.

Details

Summary

This patch makes Relocation::Addend to be ELFYAML::YAMLInt and not int64_t.

  1. For an 64-bit object any hex/decimal addends in the range [INT64_MIN, UINT64_MAX] is accepted.
  2. For an 32-bit object any hex/decimal addends in range [INT32_MIN, UINT32_MAX] is accepted.
  3. Negative hex numbers like -0xffffffff are not accepted.
  4. It is printed as decimal. I.e. obj2yaml will print something like "Addend: 125", this matches the current behavior.

This fixes all FIXMEs in relocation-addend.yaml.

Diff Detail

Event Timeline

grimar created this revision.Mar 3 2020, 7:40 AM
Herald added a project: Restricted Project. · View Herald Transcript
MaskRay added a comment.EditedMar 3 2020, 9:42 AM

For decimal values: a) 64-bit case: accept any [INT64MIN, INT64MAX] value. b) 32-bit case: accept any [INT32MIN, INT32MAX] value.

It should probably accept the union of int32_t and uint32_t, i.e. [-2147483648, 4294967295]. Examples: R_AARCH64_ABS32, R_AARCH64_PREL32, R_PPC64_ADDR32.

Not all relocation types behave this way, but we can treat all relocations the same way and be permissive here.

See https://github.com/llvm/llvm-project/blob/master/lld/ELF/Target.h#L223 checkIntUInt

llvm/lib/ObjectYAML/ELFYAML.cpp
998

Not very necessary, I think.

grimar marked an inline comment as done.EditedMar 4 2020, 12:18 AM

It should probably accept the union of int32_t and uint32_t, i.e. [-2147483648, 4294967295]. Examples: R_AARCH64_ABS32, R_AARCH64_PREL32, R_PPC64_ADDR32.

I think your suggestion is fine. I'll go this way if there will be no objections.

But what should we print (obj2yaml)? We print int64_t decimal currently.
Should we switch to printing a hex form then?

llvm/lib/ObjectYAML/ELFYAML.cpp
998

This syntax is just strange.
Imagine you do Addend: 0xFFFFFFFF, i.e. you mean -1.
If you write Addend: -0xFFFFFFFF, the result will be 1, or 0x00000001 in hex. The hex sequence is
very different because of that minus. I think it is a bad practice to use -0xFFFFFFFF form for Addend value
or any other key probably.

We either need a test case for this, or to restrict it explicitly probably.
(Doesn't seem we can just ignore it? And I do not think we want to support it).
I selected to restrict it.

MaskRay added a comment.EditedMar 4 2020, 8:39 AM

It should probably accept the union of int32_t and uint32_t, i.e. [-2147483648, 4294967295]. Examples: R_AARCH64_ABS32, R_AARCH64_PREL32, R_PPC64_ADDR32.

I think your suggestion is fine. I'll go this way if there will be no objections.

But what should we print (obj2yaml)? We print int64_t decimal currently.
Should we switch to printing a hex form then?

For most relocation types, the addend should be interpreted as a signed integer. We don't want obj2yaml to learn the semantics of relocation types, so letting obj2yaml unconditionally dump an addend as a signed integer should be fine. An addend (of a 32-bit object) greater than or equal to 0x80000000 is usually accepted as input, so yaml2obj should also accept them.

I just think that -0x1 and -0x5 don't look so strange (no need to reject), but I'd like to hear a third opinion.

My inclination is that yaml2obj should support both signed and unsigned variants of both hex and decimal numbers, if feasible. I think obj2yaml should print as unsigned hex numbers, since that is what is displayed by tools such as llvm-readelf. In other words, an addend of -4 would be printed by obj2yaml as "0xFFFFFFFFFFFFFFFC" in a 64-bit object, but could be specified as "-4", "-0x0000000000000004", or "0xFFFFFFFFFFFFFFFC" for yaml2obj.

grimar added a comment.Mar 5 2020, 6:34 AM

I am not strongly against the idea to print them as unsigned hex numbers, but FTR:

I think obj2yaml should print as unsigned hex numbers, since that is what is displayed by tools such as llvm-readelf.

But not GNU readelf. See "FIXME" in D75671: llvm-readelf -r displays decimal addends.

I am not strongly against the idea to print them as unsigned hex numbers, but FTR:

I think obj2yaml should print as unsigned hex numbers, since that is what is displayed by tools such as llvm-readelf.

But not GNU readelf. See "FIXME" in D75671: llvm-readelf -r displays decimal addends.

Apparently my memory is playing tricks on me. llvm-readobj displays unsigned hex.

My inclination is that yaml2obj should support both signed and unsigned variants of both hex and decimal numbers, if feasible. I think obj2yaml should print as unsigned hex numbers, since that is what is displayed by tools such as llvm-readelf. In other words, an addend of -4 would be printed by obj2yaml as "0xFFFFFFFFFFFFFFFC" in a 64-bit object, but could be specified as "-4", "-0x0000000000000004", or "0xFFFFFFFFFFFFFFFC" for yaml2obj.

Seems @MaskRay's answer https://reviews.llvm.org/D75527#1907129 responds this.

I think we can start with fixing of hex values accepting. The current patch does not change the obj2yaml's output. We can change/not-change it later.

grimar added a comment.Mar 6 2020, 5:38 AM

My inclination is that yaml2obj should support both signed and unsigned variants of both hex and decimal numbers, if feasible.

I'd like to clarify what you mean.

Are you ok to support Addend: -4294967295? (4294967295 == 0xFFFFFFFF) for 32 bit ELF
and Addend: -18446744073709551615 (18446744073709551615 == 0xFFFFFFFFFFFFFFFF) for 64-bit one?
(it is the same as Addend: 1)

Looks like that because we are not going to restrict
Addend: -0xFFFFFFFF or Addend: -0xFFFFFFFFFFFFFFFF it seems.

If so, then this patch can be a bit simpler, we can probably check - and always read value as unsigned int64.
I.e. both:

  • [-18446744073709551615, 18446744073709551615]
  • [ -0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF]
StringRef ScalarTraits<ELFYAML::YAMLInt>::input(StringRef Scalar, void *Ctx,
                                                ELFYAML::YAMLInt &Val) {
  const bool Is64 = static_cast<ELFYAML::Object *>(Ctx)->Header.Class ==
                    ELFYAML::ELF_ELFCLASS(ELF::ELFCLASS64);
  StringRef ErrMsg = "invalid number";

  bool IsNegative = Scalar.front() == '-';
  if (IsNegative)
    Scalar = Scalar.drop_front();

  unsigned long long UInt;
  if (getAsUnsignedInteger(Scalar, /*Radix=*/0, UInt))
    return ErrMsg;

  // For a 32-bit target we allow values in a range of a uint32_t.
  if (!Is64 && (UInt > UINT32_MAX))
    return ErrMsg;

  Val = IsNegative ? (-UInt) : UInt;
  return "";
}

Did you mean something like that?

My inclination is that yaml2obj should support both signed and unsigned variants of both hex and decimal numbers, if feasible.

I'd like to clarify what you mean.

Are you ok to support Addend: -4294967295? (4294967295 == 0xFFFFFFFF) for 32 bit ELF
and Addend: -18446744073709551615 (18446744073709551615 == 0xFFFFFFFFFFFFFFFF) for 64-bit one?
(it is the same as Addend: 1)

Looks like that because we are not going to restrict
Addend: -0xFFFFFFFF or Addend: -0xFFFFFFFFFFFFFFFF it seems.

If so, then this patch can be a bit simpler, we can probably check - and always read value as unsigned int64.
I.e. both:

  • [-18446744073709551615, 18446744073709551615]
  • [ -0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF]
StringRef ScalarTraits<ELFYAML::YAMLInt>::input(StringRef Scalar, void *Ctx,
                                                ELFYAML::YAMLInt &Val) {
  const bool Is64 = static_cast<ELFYAML::Object *>(Ctx)->Header.Class ==
                    ELFYAML::ELF_ELFCLASS(ELF::ELFCLASS64);
  StringRef ErrMsg = "invalid number";

  bool IsNegative = Scalar.front() == '-';
  if (IsNegative)
    Scalar = Scalar.drop_front();

  unsigned long long UInt;
  if (getAsUnsignedInteger(Scalar, /*Radix=*/0, UInt))
    return ErrMsg;

  // For a 32-bit target we allow values in a range of a uint32_t.
  if (!Is64 && (UInt > UINT32_MAX))
    return ErrMsg;

  Val = IsNegative ? (-UInt) : UInt;
  return "";
}

Did you mean something like that?

Not looked at the code too critically, but that more or less looks like what I'd expect. In hex land, I expect a given hex number to correspond identically to the bytes written. In other words 0xffffffffffffffff would be written with the byte values 0xff 0xff etc, and have an effective value of -1. I think therefore it's logical to cause a '-' sign to negate the number. In other words, -0xffffffffffffffff would end up as 1. If somebody instead writes a decimal number, I think it can just be written as you'd expect (i.e. it's positive or negative, depending on the presence of the '-' sign). The difficulty is what to do about overflow detection, I guess. To keep things simple, I think it's best to do what you suggest in the quoted post. If I'm not mistaken, it's what the "natural" overflow behaviour would be for C++, so probably isn't too surprising.

grimar updated this revision to Diff 249394.Mar 10 2020, 8:27 AM
grimar edited the summary of this revision. (Show Details)
  • Updated implementation as discussed.

@MaskRay, are you OK with this approach?

jhenderson accepted this revision.Mar 13 2020, 2:24 AM

LGTM, with a couple of suggestions.

llvm/test/tools/yaml2obj/ELF/relocation-addend.yaml
135

Maybe these should use 'G' instead of 'Q', since that's the first invalid value?

138

Maybe also worth testing '-' on its own, and '--1234'.

This revision is now accepted and ready to land.Mar 13 2020, 2:24 AM
grimar updated this revision to Diff 250171.Mar 13 2020, 3:46 AM
grimar marked 3 inline comments as done.
  • Addressed review comments.
llvm/test/tools/yaml2obj/ELF/relocation-addend.yaml
138

Maybe also worth testing '-'

This helped to catch an assert, thanks! It turned out that on such input we have an empty Scalar
(because of the error state) in ScalarTraits<ELFYAML::YAMLInt>::input and that was not handled properly
before.

MaskRay added a comment.EditedMar 13 2020, 9:35 AM

For an 64-bit object any hex/decimal addends in range [-(2^64 - 1), 2^64 - 1)] is accepted.
For an 64-bit object any hex/decimal addends in range [-(2^32 - 1), 2^32 - 1)] is accepted.

My feeling is that the ranges should be:
[-(2^63 - 1), 2^64 - 1)]
[-(2^31 - 1), 2^32 - 1)]

It should probably accept the union of int32_t and uint32_t, i.e. [-2147483648, 4294967295]. Examples: R_AARCH64_ABS32, R_AARCH64_PREL32, R_PPC64_ADDR32.

For example, for an ELFCLASS32 object, an addend of -0xffffffff or -0x80000001 is invalid.

grimar added a comment.EditedMar 14 2020, 7:00 AM

For an 64-bit object any hex/decimal addends in range [-(2^64 - 1), 2^64 - 1)] is accepted.
For an 64-bit object any hex/decimal addends in range [-(2^32 - 1), 2^32 - 1)] is accepted.

My feeling is that the ranges should be:
[-(2^63 - 1), 2^64 - 1)]
[-(2^31 - 1), 2^32 - 1)]

It should probably accept the union of int32_t and uint32_t, i.e. [-2147483648, 4294967295]. Examples: R_AARCH64_ABS32, R_AARCH64_PREL32, R_PPC64_ADDR32.

But it does not work good for other relocations, isn't? Different relocations allows different ranges.
We are anyways not 100% correct with such approach until we don't handle each possible relocation individually.

Currently (this diff), YAMLInt is kind of universal type. It can be used not only for addends, but as a
type or a base for type for other keys probably.

For example, for an ELFCLASS32 object, an addend of -0xffffffff or -0x80000001 is invalid.

I have to say about what I think about negative hex numbers again, sorry.
Negitive hex numbers in YAML is something I suggested to ban in the first version of this diff.

I can read -0xffffffff as --1 or as -4294967296 or as a -0x00000000ffffffff at the same time.
But why actually we want to allow users to do this and think about values that hard?
Who will use negative hex numbers in a YAML tests and for what? I think it accepts less readable
tests and probably gives no any benefits. I think we should not focus too much on them (or just ban).

Speaking about decimal version of -0xffffffff: I have a test in this diff:

## Addend == -(2^32 - 1)
# RUN: yaml2obj --docnum=2 %s -o %t32.decimal.min -DADDEND=-4294967295
# RUN: llvm-readobj -r %t32.decimal.min | FileCheck %s --check-prefix=TEST -DADDEND=0x1
# RUN: yaml2obj --docnum=2 %s -o %t32.hex.min -DADDEND=-0xFFFFFFFF
# RUN: llvm-readobj -r %t32.hex.min | FileCheck %s --check-prefix=TEST -DADDEND=0x1

-0xffffffff is accepted for 32-bits platform and reads the same as -4294967295, i.e. as 0x1.

While I understand why you think that -0xffffffff` should be restricted in this case,
the current behavior looks acceptable for me in general.

I need this patch for D75671 which wants to do Addend: 0xffffffffffffffff
for example and fails:

YAML:17:17: error: invalid number
        Addend: 0xffffffffffffffff
                ^~~~~~~~~~~~~~~~~~
yaml2obj: error: failed to parse YAML input: Invalid argument

This is a problem I am trying to solve first of all.

MaskRay added a comment.EditedMar 14 2020, 8:10 AM

For an 64-bit object any hex/decimal addends in range [-(2^64 - 1), 2^64 - 1)] is accepted.
For an 64-bit object any hex/decimal addends in range [-(2^32 - 1), 2^32 - 1)] is accepted.

My feeling is that the ranges should be:
[-(2^63 - 1), 2^64 - 1)]
[-(2^31 - 1), 2^32 - 1)]

It should probably accept the union of int32_t and uint32_t, i.e. [-2147483648, 4294967295]. Examples: R_AARCH64_ABS32, R_AARCH64_PREL32, R_PPC64_ADDR32.

But it does not work good for other relocations, isn't? Different relocations allows different ranges.
We are anyways not 100% correct with such approach until we don't handle each possible relocation individually.

I agree. So we just take the union of the use cases of all relocation types, but the range [-0x80000000, 0xffffffff] does not have to be larger.

Currently (this diff), YAMLInt is kind of universal type. It can be used not only for addends, but as a
type or a base for type for other keys probably.

For example, for an ELFCLASS32 object, an addend of -0xffffffff or -0x80000001 is invalid.

I have to say about what I think about negative hex numbers again, sorry.
Negitive hex numbers in YAML is something I suggested to ban in the first version of this diff.

I can read -0xffffffff as --1 or as -4294967296 or as a -0x00000000ffffffff at the same time.
But why actually we want to allow users to do this and think about values that hard?
Who will use negative hex numbers in a YAML tests and for what? I think it accepts less readable
tests and probably gives no any benefits. I think we should not focus too much on them (or just ban).

Speaking about decimal version of -0xffffffff: I have a test in this diff:

## Addend == -(2^32 - 1)
# RUN: yaml2obj --docnum=2 %s -o %t32.decimal.min -DADDEND=-4294967295
# RUN: llvm-readobj -r %t32.decimal.min | FileCheck %s --check-prefix=TEST -DADDEND=0x1
# RUN: yaml2obj --docnum=2 %s -o %t32.hex.min -DADDEND=-0xFFFFFFFF
# RUN: llvm-readobj -r %t32.hex.min | FileCheck %s --check-prefix=TEST -DADDEND=0x1

-0xffffffff is accepted for 32-bits platform and reads the same as -4294967295, i.e. as 0x1.

While I understand why you think that -0xffffffff` should be restricted in this case,
the current behavior looks acceptable for me in general.

I need this patch for D75671 which wants to do Addend: 0xffffffffffffffff
for example and fails:

YAML:17:17: error: invalid number
        Addend: 0xffffffffffffffff
                ^~~~~~~~~~~~~~~~~~
yaml2obj: error: failed to parse YAML input: Invalid argument

This is a problem I am trying to solve first of all.

I do take inspiration from lld's checkIntUInt: D14957 D45051 D63690.

For example, for an ELFCLASS32 object, an addend of -0xffffffff or -0x80000001 is invalid.

I don't differentiate a decimal and a hexadecimal. Put it in another way, I think an addend of -4294967295 or -2147483649 is invalid. An addend is used to represent an address relative to a symbol or the load base by an offset. The offset can be as small as -2147483648 but it cannot be smaller than that.

Latest updates look good to me, but holding off another LGTM, given the range comments.

I think I can be persuaded that -0x12345678 should not be allowed. I guess very few people would write that, so emitting an error is probably sufficient. I've given it some more thought and I'm okay with rejecting ranges that can't be physically represented in the field too, although I'd allow the maximum/minimum possible values (i.e. INT64_MIN/UINT64_MAX, and possibly 32-bit equivalents). Basically, I'll defer to either of you two on this one. I don't have any strong preferences.

grimar updated this revision to Diff 250561.Mar 16 2020, 8:04 AM
grimar retitled this revision from [yaml2obj] - Add `ELFYAML::YAMLInt` to fix how we parse a relocation `Addend` key. to [yaml2obj] - Add `ELFYAML::YAMLIntUInt` to fix how we parse a relocation `Addend` key..
grimar edited the summary of this revision. (Show Details)
  • Reimplemented in according to latest discussion.
  • Renamed the new type to YAMLIntUInt.
grimar updated this revision to Diff 250564.Mar 16 2020, 8:10 AM
  • Removed 2 minor unrelated cleanup changes.
MaskRay accepted this revision.Mar 16 2020, 8:50 AM
grimar marked an inline comment as done.Mar 17 2020, 1:00 AM

@jhenderson, does this version LGTY?

llvm/lib/ObjectYAML/ELFYAML.cpp
996

I'll change this to "might want to use them".

jhenderson accepted this revision.Mar 17 2020, 3:01 AM

LGTM, with a couple of small nits.

llvm/lib/ObjectYAML/ELFYAML.cpp
996

Let's be more specific, since I think the real reason is that it is ambiguous what to do with them:

"We do not accept negative hex numbers because their meaning is ambiguous. For example, would -0xfffffffff mean 1 or INT32_MIN?"

llvm/test/tools/yaml2obj/ELF/relocation-addend.yaml
7–8

I don't think you need the i64/ui64 bits of the values.

Also, why not just -9223372036854775808?

72–73

Same as above.

grimar marked an inline comment as done.Mar 17 2020, 4:04 AM
grimar added inline comments.
llvm/test/tools/yaml2obj/ELF/relocation-addend.yaml
7–8

Also, why not just -9223372036854775808?

I've just took it from my stdint.h:

// These macros must exactly match those in the Windows SDK's intsafe.h.
#define INT8_MIN         (-127i8 - 1)
#define INT16_MIN        (-32767i16 - 1)
#define INT32_MIN        (-2147483647i32 - 1)
#define INT64_MIN        (-9223372036854775807i64 - 1)
#define INT8_MAX         127i8
#define INT16_MAX        32767i16
#define INT32_MAX        2147483647i32
#define INT64_MAX        9223372036854775807i64
#define UINT8_MAX        0xffui8
#define UINT16_MAX       0xffffui16
#define UINT32_MAX       0xffffffffui32
#define UINT64_MAX       0xffffffffffffffffui64

I do not mind to change it.

This revision was automatically updated to reflect the committed changes.
grimar marked 3 inline comments as done.