Fixes assert in addLinkerModuleCoffGroup() when using by-ordinal imports only
Details
Diff Detail
Event Timeline
To be honest, I know almost nothing about PDB. I'm only changing this because I traced my own build problems to it (Xbox-specific, we're creating xbe-files from relocatable PE files including DWARF-4 debug info generated by clang - log can be found here.
This patch resolves the issues we've encountered when building on platforms that updated to LLD 9, but I don't know whether this is an appropriate (or correct) solution. I also don't know how I would construct a test case for this.
Thanks for investigating Stefan! It is a bit strange that a PartialSection is created with no chunks. Could you possibly build a Debug version of LLD and try to understand where this is coming from? What is the name and characteristics of that PartialSection? Try to conditionally break into Writer::createPartialSection() to see where it's being created. It'll help also reduce your case to a test. It's hard otherwise to judge if this is a real problem or something related to the usage (or inputs).
Thanks for taking the time to give me some suggestions.
I dug in a bit more, and it seems that the empty section is called ".idata$6". I caught its creation by adding an assert to createPartialSection(), and it's created by addSyntheticIdata(): add(".idata$6", idata.hints);. idea.hints is a std::vector of chunks, and its size is zero, so that's where this section originates from.
I then tried to look up where idata.hints is supposed to get populated, and it seems that the only place adding elements to it is IdataContents::create() in COFF/DLL.cpp. Now, if I'm understanding that method correctly, it's supposed to add chunks to the hints-vector for every function imported by name (as opposed to by ordinal)? If so, then that's probably why our project triggers the problem - we're producing binaries for the original Xbox, which doesn't have real support for dynamic linking, but we're using an import library (which imports by ordinals) for calling into the kernel (the kernel resolves these imports when loading an .xbe-file).
Inspecting that method somewhat further, I noticed that, while "lookups" and "addresses" get a terminating null entry, "hints" doesn't get one - changing the code to add one resolves the assert obviously, but after reading https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#the-idata-section it seems to me that there should not be such an entry for the hints-table. Is it allowed to have an empty hints table? What would the correct behavior for the linker be here?
If I'm correct, we're the only ones triggering the assert because we don't have a single import by name.
Can you give me some further feedback on my theories here?
Your explanation makes sense, a hint-less EXE would bind to a very specific set of runtime libraries, which never happens on Windows, but make sense in case of embedded development.
Indeed the hint table does not need null termination. I would assume the following in lld//COFF/Writer.cpp would solve your issue:
if (!idata.hints.empty()) add(".idata$6", idata.hints);
Remains to see how the modern (Windows 10) NT loader handles non-existing hint tables.
Could you possibly add a test please, with a mention explaining your use-case on Xbox? You will need to create a ordinal-only library in the test, then try linking a "hello world" with that library. Binary are best avoided in the tests (being able to reproduce the data for the test is another good thing).
Ensure all tests pass afterwards - ninja check-lld should do it in a MINGW32 shell (or in a regular cmd.exe if you have the GnuWin32 tools installed and in the %PATH%).
I updated the patch and included a test case. I ran the test case before and after the change to make sure it behaves and tests correctly.
So, you are creating an executable that has an import table that has only ordinals. Is my understanding correct? Indeed, that condition is rare and I've thought of that case before when I implemented this part.
I think your fix is correct. Does all tests still pass? Please run ninja check-all (or equivalent).
lld/test/COFF/imports-ordinal-only.s | ||
---|---|---|
6 | I'd dump the import table to verify that a correct import table is actually created in the resulting executable. |
So, you are creating an executable that has an import table that has only ordinals. Is my understanding correct? Indeed, that condition is rare and I've thought of that case before when I implemented this part.
Yes, that's what we were having issues with.
I think your fix is correct. Does all tests still pass? Please run ninja check-all (or equivalent).
I ran make check-lld (and now make check-all as well, just to be sure), and all tests passed.
lld/test/COFF/imports-ordinal-only.s | ||
---|---|---|
6 | I updated the test. Is this what you had in mind? |
LGTM
lld/test/COFF/imports-ordinal-only.s | ||
---|---|---|
6 | Yes, but please add --match-full-lines to FileCheck so that if Name exists in an actual output the test will fail. By default, if an actual output is a substring of a CHECK line, it will succeed. |
I updated the patch to add --match-full-lines as requested. Can you commit the patch for me, as I don't have commit access?
The test failed on my Linux box with the following message. Is that expected?
/usr/local/google/home/ruiu/llvm/lld/test/COFF/imports-ordinal-only.s:15:15: error: CHECK-NEXT: expected string not found in input
CHECK-NEXT: lookup 000020b4 time 00000000 fwd 00000000 name 000020c4 addr 000020bc
^
<stdin>:5:2: note: scanning from here
lookup 000020bc time 00000000 fwd 00000000 name 000020cc addr 000020c4
I'll run the test on Windows machine too.
It looks like the same test fails on Windows with the same output, so I'll fix it to match the actual output and submit. Does this sound OK?
The output on my Arch Linux machine matches the strings the test checks for - is it normal for these values to change? Should we ignore them?
Build reproducibility (a property that if you feed the exact same inputs to compilers or linkers you can get the exact same output) is important. Unfrotunately, by default, COFF has a timestamp field that changes virtually every time, but it looks like the difference between you and me is a difference how things are laid out in the file. That is not supposed to vary.
We can just ignore the line and check only the following three lines
- CHECK-NEXT: DLL Name: test.dll
- CHECK-NEXT: Hint/Ord Name
- CHECK-NEXT: 1
but I'd like to understand what is going on. Are you at the repo head?
but I'd like to understand what is going on. Are you at the repo head?
My copy was a few days old, but I just updated and the test still works for me.
I have a suspicion though - the path to the PDB gets embedded into the .rdata section, so that might shift things around depending on length of the path to the LLVM repo.
I'll update the patch to ignore the addresses.
Is it OK to drop -debug option from the lld-link command line? I think we don't need that option there.
Without the -debug parameter the test can be false positive as it won't reach the assert in the PDB code (in addLinkerModuleCoffGroup) then
lld/test/COFF/imports-ordinal-only.s | ||
---|---|---|
4 | A general comment (I meant to write this yesterday but I forgot); llvm-dlltool generally is a MinGW specific tool which can imply MinGW specific quirks. In this case I don't think there's anything specific which would introduce unexpected/unintended details, but I think it also might be possible to generate the implib with lld-link -def:ordinal-only-implib.def -implib:%t-implib.a |
lld/test/COFF/imports-ordinal-only.s | ||
---|---|---|
4 | I will make change as you suggested. Thanks! |
A general comment (I meant to write this yesterday but I forgot); llvm-dlltool generally is a MinGW specific tool which can imply MinGW specific quirks. In this case I don't think there's anything specific which would introduce unexpected/unintended details, but I think it also might be possible to generate the implib with lld-link -def:ordinal-only-implib.def -implib:%t-implib.a