This is an archive of the discontinued LLVM Phabricator instance.

[ELF2] merge-string.s test fixed for win32 configuraton.
ClosedPublic

Authored by grimar on Oct 29 2015, 4:57 AM.

Details

Summary

I noticed that merge-string.s test fails under win32/vs2015 configuration. Patch fixes it, below are some details:

size_t is declared as:
typedef unsigned int size_t;
for that case.

When it was assigned to:
OutputOffset = Builder.add(Entry);
it value was 0xFFFFFFFF and not 0xFFFFFFFFFFFFFFFF like it supposed to be, but then it was added to offsets list:
S->Offsets.push_back(std::make_pair(Offset, OutputOffset));
where its value was converted to 0x00000000ffffffff since Offsets is a list of pairs: std::pair<uintX_t, uintX_t>

Later in code the next comparsion happened:
if (Base != uintX_t(-1)) //And it was not equal to -1 here as was expected

return Base + Addend; //This return worked when it should not.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar updated this revision to Diff 38724.Oct 29 2015, 4:57 AM
grimar retitled this revision from to [ELF2] merge-string.s test fixed for win32 configuraton..
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar.
ruiu edited edge metadata.Oct 29 2015, 6:21 AM

I think we should change the type used for the pair to size_t instead of changing the type used here to uintX_t. StringTableBuilder cannot build a table that's larger than size_t's max, so converting size_t to a possibly larger type here does not make much sense.

In D14171#277730, @ruiu wrote:

I think we should change the type used for the pair to size_t instead of changing the type used here to uintX_t. StringTableBuilder cannot build a table that's larger than size_t's max, so converting size_t to a possibly larger type here does not make much sense.

As for me it is an issue of StringTableBuilder. Which uses template class DenseMap<StringRef, size_t> that has size_t as parameter for size type, but StringTableBuilder itself does not has template argument to set its size type from outside.

ruiu added a comment.Oct 29 2015, 7:29 AM

I don't think that's the design flaw of StringTableBuilder. StringTableBuilder creates a string table in memory, so naturally it cannot create a table that's larger than the virtual memory space. Thus size_t is the appropriate type.

In D14171#277782, @ruiu wrote:

I don't think that's the design flaw of StringTableBuilder. StringTableBuilder creates a string table in memory, so naturally it cannot create a table that's larger than the virtual memory space. Thus size_t is the appropriate type.

Ok, I really dont want argue here and will make that change. Just remember how not long time ago Bill Gates also was sure that 640kb is enough for everyone. Thus one day that change probably will be required anyways.

ruiu added a comment.Oct 29 2015, 7:46 AM

Well, that's a different story. size_t is the largest type you can handle on your computer and not artificially capped. We don't fundamentally expect that any data structure we create in the linker can be larger than that. Eventually everything is written using mmap'ed IO, so even the output file itself cannot be larger than 4G on 32-bit computers. If you want to create an extremely large executable, you need to use a 64-bit computer, but that's a reasonable restriction. 32-bit computers cannot execute binaries that's larger than their addressable memory space anyways.

Idea of virtual space is clear for me. I still can imagine implementations of StringTableBuilder where size_t will not be enough (like keeping strings in multiple files), but that is not real case, I agree.

One last though:
When using size_t instead of strict type like uint_x we have different behavior of linker depending on environment. And that can cause the fails of tests like we had. Using of strict types can help to avoid that. So I mean even uint32_t here can be better that size_t since behavior between 32x/64x configurations will be consistent. But at the same time I dont see the solid reasons to artifically cap to 32 bits variable. That returns to idea to have template size type for StringTableBuilder, because truncation of 64 bits size_t to something like uint32_t is still bad to have. Or If StringTableBuilder really does not need that (I would agree now) then using of uint_x (assuming its size >= size_t) is still looks like a solution for behavior consistency.

ruiu added a subscriber: ruiu.Oct 29 2015, 8:44 AM

The problem of the code is that assigned a size_t value to a uintX_t
variable, assuming that they are the same type. Use size_t consistently
when you represent an offset of an array on the *host* machine. This is not
a compromise or any artificial capping. You are confusing uintX_t, which is
size_t of the *target* machine, with size_t of the host machine.

(Or, are you suggesting that we should be able to cross-link >4GB
executables on a 32-bit computer?)

In D14171#277829, @ruiu wrote:

(Or, are you suggesting that we should be able to cross-link >4GB
executables on a 32-bit computer?)

Yes, I mean that using uintX_t (target) for such cases will keep execution logic consistent and gives ability for cross-linking.

Actually my primary idea that its good to have similiar behavior of linker on 32x/64x hosts. Cross-linking is not what I think is needed even if assume that it is really possible (what I think is not if we talking about 4gb+ executables).

ruiu added a comment.Oct 29 2015, 9:10 AM

That's a really huge redesigning. We are currently assuming that we can
read all inputs and keep them and all intermediate data in memory while
linking object files. In order to do that your wish to do, we have to write
extremely large intermediate results (such as a strong take >4GB) to disk
and read it back. We cannot use mmap IO anymore. Sections are no longer
representable as a pointer and the length. I don't think you really mean
that.
2015/10/29 8:50 "George Rimar" <grimar@accesssoftek.com>:

grimar added a comment.

In http://reviews.llvm.org/D14171#277829, @ruiu wrote:

(Or, are you suggesting that we should be able to cross-link >4GB
executables on a 32-bit computer?)

Yes, I mean that using uintX_t (target) for such cases will keep execution
logic consistent and gives ability for cross-linking.

http://reviews.llvm.org/D14171

Yes, I didn`t mean extreme cases like 4gb executables..
In short what I wanted to say is that storing size_t types is some kind of dangerous because behavior is depends on host type. So storing uint_x which is depends on target elf file is more safe and keeps behavior consistent here.
But honestly this issue really does not need so many attention.. let me just update the patch and lets just close it :)

grimar updated this revision to Diff 38745.Oct 29 2015, 10:10 AM
grimar edited edge metadata.

Review comment addressed.

ruiu added a comment.Oct 29 2015, 11:42 AM

LGTM. That's I think that's not super important but meaningful discussion.
:)
2015/10/29 10:10 "George Rimar" <grimar@accesssoftek.com>:

grimar updated this revision to Diff 38745.
grimar added a comment.

Review comment addressed.

http://reviews.llvm.org/D14171

Files:

ELF/InputSection.cpp
ELF/InputSection.h

Index: ELF/InputSection.h

  • ELF/InputSection.h

+++ ELF/InputSection.h
@@ -84,7 +84,7 @@

typedef typename llvm::object::ELFFile<ELFT>::Elf_Shdr Elf_Shdr;

public:

  • std::vector<std::pair<uintX_t, uintX_t>> Offsets;

+ std::vector<std::pair<uintX_t, size_t>> Offsets;

MergeInputSection(ObjectFile<ELFT> *F, const Elf_Shdr *Header);
static bool classof(const InputSectionBase<ELFT> *S);
// Translate an offset in the input section to an offset in the output

Index: ELF/InputSection.cpp

  • ELF/InputSection.cpp

+++ ELF/InputSection.cpp
@@ -164,17 +164,17 @@

// Find the element this offset points to.
auto I = std::upper_bound(
    this->Offsets.begin(), this->Offsets.end(), Offset,
  • (const uintX_t &A, const std::pair<uintX_t, uintX_t> &B) {

+ [](const uintX_t &A, const std::pair<uintX_t, size_t> &B) {

      return A < B.first;
    });
size_t End = I == this->Offsets.end() ? Data.size() : I->first;
--I;
uintX_t Start = I->first;

// Compute the Addend and if the Base is cached, return.
uintX_t Addend = Offset - Start;
  • uintX_t &Base = I->second;
  • if (Base != uintX_t(-1))

+ size_t &Base = I->second;
+ if (Base != size_t(-1))

  return Base + Addend;

// Map the base to the offset in the output section and cashe it.
This revision was automatically updated to reflect the committed changes.