This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readelf] - Improve ELF type field dumping.
ClosedPublic

Authored by grimar on Dec 11 2020, 2:24 AM.

Details

Summary

This is related to https://bugs.llvm.org/show_bug.cgi?id=40868.

Currently we don't print OS Specific/`Processor Specific/<unknown>
prefixes when dumping the ELF file type. This is not consistent
with GNU readelf. The patch fixes it.

Also, this patch removes the types.test, because we already have
file-types.test, which tests more cases and this patch revealed that
we have such a duplicate.

Diff Detail

Event Timeline

grimar created this revision.Dec 11 2020, 2:24 AM
grimar requested review of this revision.Dec 11 2020, 2:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 11 2020, 2:24 AM
MaskRay added inline comments.Dec 11 2020, 9:29 AM
llvm/tools/llvm-readobj/ELFDumper.cpp
3532

I think the value is the main body and Processor Specific is a comment. If something needs to be placed in (), it should not be the value.

grimar added inline comments.Dec 11 2020, 11:32 PM
llvm/tools/llvm-readobj/ELFDumper.cpp
3532

I am printing exactly what GNU readelf prints here.

jhenderson accepted this revision.Dec 14 2020, 12:13 AM

Looks good to me.

This revision is now accepted and ready to land.Dec 14 2020, 12:13 AM
This revision was automatically updated to reflect the committed changes.

We've just run into an unexpected side-effect of this change, when this got merged into our downstream branch. We have a number of new ET_* values in the OS-specific range. In our downstream branch, we've added new entries to ElfObjectFileType so that these values are printed nicely as Type: OurCustomType. However, with this change, this has changed to Type: OS Specific: (OurCustomType) which isn't ideal. We can probably live with it, but I wonder if a small refactor might be better to allow us to have a clean downstream patch without really harming upstream code. My initial suggestion would be to change to something like this:

  if (!makeArrayRef(ElfObjectFileType).end() ==
             llvm::find_if(ElfObjectFileType,
                           [&](const EnumEntry<unsigned> &E) {
                             return E.Value == e.e_type;
                           })) {
    if (e.e_type >= ET_LOPROC) {
      Str = "Processor Specific: (" + Str + ")";
    } else if (e.e_type >= ET_LOOS) {
      Str = "OS Specific: (" + Str + ")";
    } else {
      Str = "<unknown>: " + Str;
    }
}

I think the overall code complexity is about the same. What do you think?

We've just run into an unexpected side-effect of this change, when this got merged into our downstream branch. We have a number of new ET_* values in the OS-specific range. In our downstream branch, we've added new entries to ElfObjectFileType so that these values are printed nicely as Type: OurCustomType. However, with this change, this has changed to Type: OS Specific: (OurCustomType) which isn't ideal. We can probably live with it, but I wonder if a small refactor might be better to allow us to have a clean downstream patch without really harming upstream code. My initial suggestion would be to change to something like this:

  if (!makeArrayRef(ElfObjectFileType).end() ==
             llvm::find_if(ElfObjectFileType,
                           [&](const EnumEntry<unsigned> &E) {
                             return E.Value == e.e_type;
                           })) {
    if (e.e_type >= ET_LOPROC) {
      Str = "Processor Specific: (" + Str + ")";
    } else if (e.e_type >= ET_LOOS) {
      Str = "OS Specific: (" + Str + ")";
    } else {
      Str = "<unknown>: " + Str;
    }
}

I think the overall code complexity is about the same. What do you think?

I guess it might be OK. It is probably a bit inconistent with GNU which does:

static char *
get_file_type (unsigned e_type)
{
  static char buff[32];

  switch (e_type)
    {
    case ET_NONE: return _("NONE (None)");
    case ET_REL:  return _("REL (Relocatable file)");
    case ET_EXEC: return _("EXEC (Executable file)");
    case ET_DYN:  return _("DYN (Shared object file)");
    case ET_CORE: return _("CORE (Core file)");

    default:
      if ((e_type >= ET_LOPROC) && (e_type <= ET_HIPROC))
	snprintf (buff, sizeof (buff), _("Processor Specific: (%x)"), e_type);
      else if ((e_type >= ET_LOOS) && (e_type <= ET_HIOS))
	snprintf (buff, sizeof (buff), _("OS Specific: (%x)"), e_type);
      else
	snprintf (buff, sizeof (buff), _("<unknown>: %x"), e_type);
      return buff;
    }
}

But at the same time it looks reasonable to print known/named types first.
I can prepare a patch, though seems there is no way to test such change.

Lets see what @MaskRay and/or others think too.

I guess it might be OK. It is probably a bit inconistent with GNU which does:

static char *
get_file_type (unsigned e_type)
{
  static char buff[32];

  switch (e_type)
    {
    case ET_NONE: return _("NONE (None)");
    case ET_REL:  return _("REL (Relocatable file)");
    case ET_EXEC: return _("EXEC (Executable file)");
    case ET_DYN:  return _("DYN (Shared object file)");
    case ET_CORE: return _("CORE (Core file)");

    default:
      if ((e_type >= ET_LOPROC) && (e_type <= ET_HIPROC))
	snprintf (buff, sizeof (buff), _("Processor Specific: (%x)"), e_type);
      else if ((e_type >= ET_LOOS) && (e_type <= ET_HIOS))
	snprintf (buff, sizeof (buff), _("OS Specific: (%x)"), e_type);
      else
	snprintf (buff, sizeof (buff), _("<unknown>: %x"), e_type);
      return buff;
    }
}

I'm not sure it is all that consistent actually - in the GNU case, the code immediately returns the print out of known types before falling back to unknown types, so if we had the same patch in GNU readelf, it'd likely be the same as the old output.

But at the same time it looks reasonable to print known/named types first.
I can prepare a patch, though seems there is no way to test such change.

Lets see what @MaskRay and/or others think too.

Thanks for considering it. FWIW, I agree testing might be a possible issue, but on the other hand, I think it could have easily happened that someone wrote my suggestion in the first place, and I don't think I'd have noticed any real lack of testing.

Also a very minor advantage of my approach is that if llvm-readelf were to ever gain support for known OS/Processor specific types, the code would be simpler (just add to the list), and the output possibly more consistent.

llvm/test/tools/llvm-readobj/ELF/file-types.test