This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj] [COFF] Subtract the image base for section virtual addresses
AbandonedPublic

Authored by mstorsjo on Nov 27 2018, 5:31 AM.

Details

Reviewers
rnk
zturner
ruiu
Summary

Serialized yaml versions of COFF executables have section addresses as absolute addresses, while the section header itself stores the relative virtual address. When writing fields back into an executable, subtract the image base.

Alternatively, if we want the yaml representation to be a more exact dump of the source file, we could change obj2yaml instead.

Diff Detail

Event Timeline

mstorsjo created this revision.Nov 27 2018, 5:31 AM
ruiu accepted this revision.Nov 27 2018, 8:19 AM

LGTM

This is perhaps my personal preference, but I think making it a relative address from ImageBase seems like a good idea, because by doing that we get smaller numbers that are easier to read.

This revision is now accepted and ready to land.Nov 27 2018, 8:19 AM

LGTM

This is perhaps my personal preference, but I think making it a relative address from ImageBase seems like a good idea, because by doing that we get smaller numbers that are easier to read.

So you mean that you'd prefer changing obj2yaml instead? Is there any specification on what the yaml format should be? Changing obj2yaml would make more sense as well, since then we'd write more exactly what the file contains, instead of the interpretation of it.

ruiu added a comment.Nov 27 2018, 10:37 AM

Oh, I misunderstood this patch. Sorry for the confusion. My preference was just what I wrote in the previous comment, but that's not a strong preference or anything, and it also makes sense to make values in a YAML file the same as the ones in the actual file, so this patch is fine to me.

YAML object file is an LLVM thing, and I don't think there's a specification as to what value should be written to a YAML file, and I don't think we can mechanically define it. YAML object file resembles actual binary object file, but YAML object file is slightly more abstract (e.g. it doesn't contain section offsets in a file or does not really specify any concrete file layout).

Let's ask @majnemer who implemented support for executables in yaml originially.

Maybe I'm misunderstanding something. In COFFYAML.cpp we have this code:

IO.mapOptional("VirtualAddress", Sec.Header.VirtualAddress, 0U);

So, we write out the virtual address exactly as it is in the binary header file. So why, when going the other direction, do we subtract the image base? I don't have a strong preference on whether we store the VA or the RVA (I have a mild preference for storing the value exactly as it is in the object file), but whatever we do, yaml2obj and obj2yaml at least need to agree with other, right?

Maybe I'm misunderstanding something. In COFFYAML.cpp we have this code:

IO.mapOptional("VirtualAddress", Sec.Header.VirtualAddress, 0U);

So, we write out the virtual address exactly as it is in the binary header file. So why, when going the other direction, do we subtract the image base? I don't have a strong preference on whether we store the VA or the RVA (I have a mild preference for storing the value exactly as it is in the object file), but whatever we do, yaml2obj and obj2yaml at least need to agree with other, right?

Also, in coff2yaml.cpp we have this code which initializes the above field:

const object::coff_section *COFFSection = Obj.getCOFFSection(ObjSection);
COFFYAML::Section NewYAMLSection;
...
NewYAMLSection.Header.VirtualAddress = ObjSection.getAddress();

So, the value we print in the yaml is indeed the exact value from the object file. If we change one but not the other, then it will regress the ability of object files to round-trip through yaml.

Maybe I'm misunderstanding something. In COFFYAML.cpp we have this code:

IO.mapOptional("VirtualAddress", Sec.Header.VirtualAddress, 0U);

So, we write out the virtual address exactly as it is in the binary header file. So why, when going the other direction, do we subtract the image base? I don't have a strong preference on whether we store the VA or the RVA (I have a mild preference for storing the value exactly as it is in the object file), but whatever we do, yaml2obj and obj2yaml at least need to agree with other, right?

Yes, obviously. And right now, obj2yaml/coff2yaml.cpp does NewYAMLSection.Header.VirtualAddress = ObjSection.getAddress();, where ObjSection.getAddress() calls down to this:

uint64_t COFFObjectFile::getSectionAddress(DataRefImpl Ref) const {
  const coff_section *Sec = toSec(Ref);
  uint64_t Result = Sec->VirtualAddress;
 
  // The section VirtualAddress does not include ImageBase, and we want to
  // return virtual addresses.
  Result += getImageBase();
  return Result;
}

So just writing NewYAMLSection.Header.VirtualAddress = COFFSection->VirtualAddress; would fix this. That's why I'd like to know @majnemer's original intent.

Maybe I'm misunderstanding something. In COFFYAML.cpp we have this code:

IO.mapOptional("VirtualAddress", Sec.Header.VirtualAddress, 0U);

So, we write out the virtual address exactly as it is in the binary header file. So why, when going the other direction, do we subtract the image base? I don't have a strong preference on whether we store the VA or the RVA (I have a mild preference for storing the value exactly as it is in the object file), but whatever we do, yaml2obj and obj2yaml at least need to agree with other, right?

Also, in coff2yaml.cpp we have this code which initializes the above field:

const object::coff_section *COFFSection = Obj.getCOFFSection(ObjSection);
COFFYAML::Section NewYAMLSection;
...
NewYAMLSection.Header.VirtualAddress = ObjSection.getAddress();

So, the value we print in the yaml is indeed the exact value from the object file. If we change one but not the other, then it will regress the ability of object files to round-trip through yaml.

Roundtripping executables is broken right now, which this patch fixes. Object files are unaffected by this patch. Just try it out for yourself.

zturner accepted this revision.Nov 29 2018, 9:20 AM

I see where the confusion came from now. Personally I'd prefer to have the YAML represent as closely to the underlying object format as possible, so I'd be in favor of writing the RVA in YAML.

mstorsjo abandoned this revision.Nov 29 2018, 12:59 PM

Solved the same issue with D54965 instead.