This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readobj] - Remove an excessive helper for printing dynamic tags.
ClosedPublic

Authored by grimar on Dec 23 2019, 5:18 AM.

Details

Summary

This removes the getTypeString from readeobj source because it
almost duplicates the existent method: ELFFile<ELFT>::getDynamicTagAsString.

Side effect: now it prints "<unknown:>0xHEXVALUE" instead of "(unknown)" for unknown values.
llvm-readelf before this patch printed:

0x0000000012345678 (unknown) 0x8765432187654321
0x000000006abcdef0 (unknown) 0x9988776655443322
0x0000000076543210 (unknown) 0x5555666677778888

and now it prints:

0x0000000012345678 (<unknown:>0x12345678) 0x8765432187654321
0x000000006abcdef0 (<unknown:>0x6abcdef0) 0x9988776655443322
0x0000000076543210 (<unknown:>0x76543210) 0x5555666677778888

GNU reaedlf prints different thing:

0x0000000012345678 (<unknown>: 12345678) 0x8765432187654321
0x000000006abcdef0 (Operating System specific: 6abcdef0) 0x9988776655443322
0x0000000076543210 (Processor Specific: 76543210) 0x5555666677778888

I am not sure we want to follow GNU here. Even if we do, it should be separate
patch probably. The new output looks better and closer to GNU anyways,
and the code is a bit simpler.

Diff Detail

Event Timeline

grimar created this revision.Dec 23 2019, 5:18 AM
Herald added a project: Restricted Project. · View Herald Transcript
grimar updated this revision to Diff 235130.Dec 23 2019, 5:22 AM
  • Removed a piece of code that belongs to a different change.
MaskRay accepted this revision.Dec 23 2019, 1:42 PM

std::string ELFFile<ELFT>::getDynamicTagAsString(uint64_t Type) const is used by llvm-objdump -p (private headers). It is indeed almost identical to the llvm-readelf -d function.

Regarding the behavior change (<unknown>) -> (<unknown:>0x12345678), I agree with you that it probably does not matter.

This revision is now accepted and ready to land.Dec 23 2019, 1:42 PM
This revision was automatically updated to reflect the committed changes.

I am not sure we want to follow GNU here. Even if we do, it should be separate patch probably. The new output looks better and closer to GNU anyways, and the code is a bit simpler.

FWIW, I'm not sure I see a benefit from diverging from GNU here. The only thing I can think of is the lack of '0x' potentially making the value confusing, but the value is actually listed in the first column anyway, so that point isn't exactly an issue.

I am not sure we want to follow GNU here. Even if we do, it should be separate patch probably. The new output looks better and closer to GNU anyways, and the code is a bit simpler.

FWIW, I'm not sure I see a benefit from diverging from GNU here. The only thing I can think of is the lack of '0x' potentially making the value confusing, but the value is actually listed in the first column anyway, so that point isn't exactly an issue.

See:

GNU prints:

0x0000000012345678 (<unknown>: 12345678) 0x8765432187654321
0x000000006abcdef0 (Operating System specific: 6abcdef0) 0x9988776655443322
0x0000000076543210 (Processor Specific: 76543210) 0x5555666677778888

We print:

0x0000000012345678 (<unknown:>0x12345678) 0x8765432187654321
0x000000006abcdef0 (<unknown:>0x6abcdef0) 0x9988776655443322
0x0000000076543210 (<unknown:>0x76543210) 0x5555666677778888

The benefit I see is that we do not need to care about printing "Operating System specific" or "Processor Specific" text
and doing actions to align columns after that. I.e. code that prints "<unknown:>" is just a bit simpler.
It is not clear how much useful to say that unknown value is "Operating System specific" or "Processor Specific".
(I am OK to copy GNU's behavior here if we decide we want it, though)

(Seems we need to change "<unknown:>" to "<unknown>: ", btw, it looks like a bug)

The only thing I can think of is the lack of '0x' potentially making the value confusing, but the value is actually listed in the first column anyway, so that point isn't exactly an issue.

I think having "0x" prefix for hex values in the output is always(?) a good thing. I think it is a GNU output bug and we probably should not be bug-compatible with GNU here.

The benefit I see is that we do not need to care about printing "Operating System specific" or "Processor Specific" text
and doing actions to align columns after that. I.e. code that prints "<unknown:>" is just a bit simpler.
It is not clear how much useful to say that unknown value is "Operating System specific" or "Processor Specific".
(I am OK to copy GNU's behavior here if we decide we want it, though)

As someone who works in a system with a lot of one or other of the OS/Processory specific tags, I would much rather see OS/Proc-specific than Unknown for those cases.

The only thing I can think of is the lack of '0x' potentially making the value confusing, but the value is actually listed in the first column anyway, so that point isn't exactly an issue.

I think having "0x" prefix for hex values in the output is always(?) a good thing. I think it is a GNU output bug and we probably should not be bug-compatible with GNU here.

I don't dispute having an 0x prefix is a good thing, but I don't think it's a bug. If you can persuade the GNU guys to change their behaviour, I'd be more than happy to support the 0x being added. However, it's quite common for it not to be present in various GNU tool outputs from my experience.

GNU output style really needs to be compatible with GNU readelf. That was one of my main takeaways having talked to various people at the BoF and round table at last year's Euro LLVM.

I think our goal is to make llvm-readobj -d and llvm-readelf -d consistent. It seems we have a preferable output in mind which is inconsistent with GNU readelf. I agree that omitting the value from the (unknown) column is better.

0x0000000012345678 (unknown) 0x8765432187654321

@grimar Can you send an email to the mailing list binutils@sourceware.org and ask whether they'd like to change?

I think our goal is to make llvm-readobj -d and llvm-readelf -d consistent. It seems we have a preferable output in mind which is inconsistent with GNU readelf. I agree that omitting the value from the (unknown) column is better.

0x0000000012345678 (unknown) 0x8765432187654321

@grimar Can you send an email to the mailing list binutils@sourceware.org and ask whether they'd like to change?

I'd like to clarify. GNU readelf currently prints:

0x0000000012345678 (<unknown>: 12345678) 0x8765432187654321
0x000000006abcdef0 (Operating System specific: 6abcdef0) 0x9988776655443322
0x0000000076543210 (Processor Specific: 76543210) 0x5555666677778888

Do you suggest to start using (unknown) everywhere?

0x0000000012345678 (unknown) 0x8765432187654321
0x000000006abcdef0 (unknown) 0x9988776655443322
0x0000000076543210 (unknown) 0x5555666677778888

Or do you want to keep Operating System specific and Processor Specific text?
(I think James wants to keep them).

0x0000000012345678 (unknown) 0x8765432187654321
0x000000006abcdef0 (Operating System specific) 0x9988776655443322
0x0000000076543210 (Processor Specific) 0x5555666677778888

Assuming we want to keep them, I am thinking about shortening and lower-casing:

0x0000000012345678 (unknown) 0x8765432187654321
0x000000006abcdef0 (OS specific) 0x9988776655443322
0x0000000076543210 (processor specific) 0x5555666677778888

I think our goal is to make llvm-readobj -d and llvm-readelf -d consistent.

FWIW, I don't see a massive need for this (we don't make, e.g. symbol output consistent), but I don't oppose attempting it as long as llvm-readelf output is consistent with GNU readelf.

MaskRay added a comment.EditedJan 14 2020, 11:40 AM

Assuming we want to keep them, I am thinking about shortening and lower-casing:

0x0000000012345678 (unknown) 0x8765432187654321
0x000000006abcdef0 (OS specific) 0x9988776655443322
0x0000000076543210 (processor specific) 0x5555666677778888

This looks good.

I think our goal is to make llvm-readobj -d and llvm-readelf -d consistent.

FWIW, I don't see a massive need for this (we don't make, e.g. symbol output consistent), but I don't oppose attempting it as long as llvm-readelf output is consistent with GNU readelf.

I was postulating the original intention of this patch: make things consistent in places where the exact llvm-readobj -d output does not matter that much / share code. In GNU readelf's output, the second column duplicates some information from the first column, so it is not ideal.

0x0000000012345678 (<unknown>: 12345678) 0x8765432187654321
0x000000006abcdef0 (Operating System specific: 6abcdef0) 0x9988776655443322
0x0000000076543210 (Processor Specific: 76543210) 0x5555666677778888

Assuming we want to keep them, I am thinking about shortening and lower-casing:

0x0000000012345678 (unknown) 0x8765432187654321
0x000000006abcdef0 (OS specific) 0x9988776655443322
0x0000000076543210 (processor specific) 0x5555666677778888

This looks good.

OK. Lets wait a day or two more to see if anyone else has comments/ideas/objections about this and I'll post the suggestion to GNU mail list then.