This is an archive of the discontinued LLVM Phabricator instance.

PDB support of function-level linking and splitted functions
ClosedPublic

Authored by aleksandr.urakov on Jun 4 2018, 5:25 AM.

Details

Summary

The patch adds support of splitted functions (when MSVC is used with PGO) and function-level linking feature.

SymbolFilePDB::ParseCompileUnitLineTable function relies on fact that ranges of compiled source files in the binary are continuous and don't intersect each other. The function creates LineSequence for each file and inserts it into LineTable, and implementation of last one relies on continuity of the sequence. But it's not always true when function-level linking enabled, e.g. in added input test file test-pdb-function-level-linking.exe there is xstring's stdbasic_string_char_stdchar_traits_charstdallocator_char_____max_size (.00454820) between test-pdb-function-level-linking.cpp's foo (.00454770) and main (.004548F0).

To fix the problem this patch renews the sequence on each address gap.

Diff Detail

Event Timeline

asmith accepted this revision.Jun 5 2018, 10:21 AM

LGTM - will commit for you

This revision is now accepted and ready to land.Jun 5 2018, 10:21 AM
asmith closed this revision.Jun 5 2018, 10:23 AM

Thank you!

But it looks like the binary files (*.pdb and *.exe) have not been committed properly (they have zero size).

labath added a subscriber: labath.Jun 6 2018, 2:30 AM

I have reverted this because of the broken tests.

However, I have to also ask: isn't there better way to test this? (one that does not depend on opaque checked-in binaries). On linux, I could check-in a .s file which has the line table exactly as I want it and then have the test assert that. Is there some suitable intermediate form here?

It seems that lld tests use some yaml form to store pdb's. Is there any chance we could do the same?

asmith added a comment.Jun 6 2018, 7:14 AM

Thank you!

But it looks like the binary files (*.pdb and *.exe) have not been committed properly (they have zero size).

I've seen that issue before. I think you have to specify --binary when generating the diff.

git diff --no-prefix --binary branch master > patch.file

I think you have to specify --binary when generating the diff.

I have specified --binary when I was generating the patch, and there is encoded binaries in the patch file, may be they was somehow cutted out during an upload to the Phabricator... I have attached the original patch to this message.

The problem is that Phabricator doesn't preserve binary in diffs

asmith added a comment.Jun 6 2018, 2:20 PM

I have reverted this because of the broken tests.

However, I have to also ask: isn't there better way to test this? (one that does not depend on opaque checked-in binaries). On linux, I could check-in a .s file which has the line table exactly as I want it and then have the test assert that. Is there some suitable intermediate form here?

It seems that lld tests use some yaml form to store pdb's. Is there any chance we could do the same?

I wish these binaries didn’t exist too. Im not sure if the lld approach works because I think we need the VS generated PDBs for function level testing. Maybe Zachary as a better idea.

Do you just need a pdb, or does it really need to be a vs pdb? lld can
generate high quality pdbs now. So it might be possible to use lld to link
and produce a pdb when you run the test.

Pavel’s suggestion is equally viable, you can dump a pdb to yaml and
convert it back to a pdb at test time.

The real problem is the exe. It’s harder to generate exes at test time
because we have to ensure that dependent libraries are present on the
system.

If it has to be an msvc generated pdb, can you elaborate on why? Tbh I’m
not really against checking in pdbs. Exes I’d like to find a way to avoid
checking in wherever possible though. And even then, sometimes I don’t have
any better ideas other than compile and link before running the test

labath added a comment.Jun 7 2018, 2:49 AM

Do you just need a pdb, or does it really need to be a vs pdb? lld can
generate high quality pdbs now. So it might be possible to use lld to link
and produce a pdb when you run the test.

Pavel’s suggestion is equally viable, you can dump a pdb to yaml and
convert it back to a pdb at test time.

The real problem is the exe. It’s harder to generate exes at test time
because we have to ensure that dependent libraries are present on the
system.

If it has to be an msvc generated pdb, can you elaborate on why? Tbh I’m
not really against checking in pdbs. Exes I’d like to find a way to avoid
checking in wherever possible though. And even then, sometimes I don’t have
any better ideas other than compile and link before running the test

I guess that part that makes generating the exe during test tricky is the feedback loop needed for PGO. I don't know enough about the windows ecosystem to tell if there is a different way to generate these kinds of split line tables (on linux I can think of a several).

I suppose the reason you can't do the same yaml2obj trick on the .exe is because yaml2obj does not support serializing exe's yet?

unittests/SymbolFile/PDB/Inputs/test-pdb-splitted-function.cpp
8–9 ↗(On Diff #149727)

Could we shrink the size of these binaries by not using c and c++ standard library features?

11–27 ↗(On Diff #149727)

Could you clang-format this patch?

unittests/SymbolFile/PDB/SymbolFilePDBTests.cpp
375–398 ↗(On Diff #149727)

I think these kinds of checks would be a nice fit for a lldb-test symbols --verify option.

Yes, it does really need to be a VS PDB. Function-level linking is a linker's feature and I can't found support of this feature in lld, so I think module addresses inconsistency will not be reproducible with lld.

I've tried to convert a VS PDB to YAML and back in the next way:

llvm-pdbutil.exe pdb2yaml -all test-pdb-function-level-linking.pdb > test-pdb-function-level-linking.yaml
llvm-pdbutil.exe yaml2pdb test-pdb-function-level-linking.yaml

and I have put it near the executable, but it seems that the PDB was broken - nor lldb, nor IDA can load it, but I didn't investigated the problem in detail yet.

As for executable binaries, I think the key problem is that SymbolFilePDBTests has two responsibilities:

  • Test the retrieving of data from PDB. All tests find a PDB location from the executable binary (e.g. TestAbilitiesForPDB checks that we have found a correct symbol file) and then somehow retrieve data from the PDB (e.g. via DIA interface). As far as I understand, this part is not tested somewhere else. This part requires original executable and PDB binaries.
  • Test the processing of retrieved data (e.g. building the line table), and this is exactly our case. Now this part requires binaries only in VerifyLineEntry function. It uses a module object to convert a virtual address to an Address structure. The structure is used by the line table to a find corresponding entry, but I think that we can introduce a new function for this purpose, which would work with simple file address. So we can avoid using of the module object in VerifyLineEntry function, and make this part independent of binaries.

So if we will split testing of this parts and will find a way to load symbol information from some intermediate form, I think it will be possible to reduce binaries usage. But it requires some more time and investigation.

I suppose the reason you can't do the same yaml2obj trick on the .exe is because yaml2obj does not support serializing exe's yet?

Yes, it's so.

Could we shrink the size of these binaries by not using c and c++ standard library features?

The problem is that it is hard to reproduce splitting of a compiled source file on little executables, so I haven't found better way to reproduce it than using standard library.

Could you clang-format this patch?

Yes, sure, I'll update patch.

I think these kinds of checks would be a nice fit for a lldb-test symbols --verify option.

Can you explain this in detail, please?

Formatted test sources corresponding to LLVM rules.

labath added a comment.Jun 7 2018, 3:20 AM

I think these kinds of checks would be a nice fit for a lldb-test symbols --verify option.

Can you explain this in detail, please?

We have an lldb-test executable, which we use for writing tests (see tests in lit/SymbolFile). It's symbols subcommand currently just dumps their contents and then we match that.

Your checks don't have any assumptions about the input hardcoded in them. This may be both good and bad, but anyway, my idea was to add a --verify flag to lldb-test (similar to how llvm-dwarfdump --verify works) which would run these checks (and any other we can think of later). Then, the test would simply consist of running lldb-test symbols --verify your.exe and asserting that it returns success.

We have an lldb-test executable, which we use for writing tests (see tests in lit/SymbolFile). It's symbols subcommand currently just dumps their contents and then we match that.

Your checks don't have any assumptions about the input hardcoded in them. This may be both good and bad, but anyway, my idea was to add a --verify flag to lldb-test (similar to how llvm-dwarfdump --verify works) which would run these checks (and any other we can think of later). Then, the test would simply consist of running lldb-test symbols --verify your.exe and asserting that it returns success.

Thank you, I'll implement this. Will be it better to create a different review for this?

labath added a comment.Jun 7 2018, 3:31 AM

I don't think that's necessary. The actual fix is really simple and this whole discussion is just about figuring out what to do with tests.

labath added a comment.Jun 7 2018, 3:37 AM

Btw, would it be possible to use the /order directive to achive what you want? (/order:function_from_first_file,function_from_second_file,another_function_from_first_file)

Btw, would it be possible to use the /order directive to achive what you want? (/order:function_from_first_file,function_from_second_file,another_function_from_first_file)

Yes, thanks for pointing to this!

Use /ORDER switch to reduce test binaries size.

The implementation of lldb-test symbols -verify option.

Function level linking is a compiler feature, not a linker feature. It's enabled via the /Gy option in the compiler and causes the compiler to put each function into its own COMDAT. The linker can then use this to discard more functions (e.g. during /OPT:REF).

clang-cl also supports the /Gy option, right?

Yes, you are right. Sorry, I didn't know about such option in clang and googling by 'clang function-level linking' didn't show me any relevant information.

But Pavel have prompted me Microsoft linker's option '/ORDER', and it is always possible to reproduce the problem with it. Without that option it was very difficult to reproduce. Has lld some analog?

As a general rule, lld-link is command line compatible with MSVC and
clang-cl is command line compatible with cl. So, the /order option should
work exactly the same with lld-link as it does with link.

labath added a comment.Jun 7 2018, 7:27 AM

Thank you for implementing the lldb-test extension. Now that we have that, and the /order, we should be able to get rid of the binaries for the function-level-linking test. You should be able to rewrite it into something like this:

lit/SymbolFile/PDB/function-level-linking.cpp:

// REQUIRES: windows lld
// 
// RUN: clang-cl /c /Zi /Gy %s /o %t
// RUN: lld-link /debug:full /nodefaultlib /entry:main order:@%S/Inputs/function-level-linking.ord %t.obj /out:%t
// RUN: lldb-test symbols -verify %t

#include "function-level-linking.h" // You'll probably need to adjust the include path for this to work.

int foo() {
  return 0;
}

int main() {
  return foo() + bar() + baz();
}

As for the second test, does it actually bring anything new to the table? As far as I can tell, the line table parsing code does not actually use any information about functions. Combined with the fact that we don't know how to write the test without checked in binaries, maybe we could just drop it ?

As a general rule, lld-link is command line compatible with MSVC and
clang-cl is command line compatible with cl. So, the /order option should
work exactly the same with lld-link as it does with link.

It seeems that it is not implemented:

lld-link /debug /nodefaultlib /entry:main /order:@test-pdb-function-level-linking.ord test-pdb-function-level-linking.obj

C:\LLVM\bin\lld-link.EXE: error: could not open /order:@test-pdb-function-level-linking.ord: no such file or directory

lld-link /? also don't list such option.

labath added a comment.Jun 7 2018, 7:34 AM

As a general rule, lld-link is command line compatible with MSVC and
clang-cl is command line compatible with cl. So, the /order option should
work exactly the same with lld-link as it does with link.

It seeems that it is not implemented:

lld-link /debug /nodefaultlib /entry:main /order:@test-pdb-function-level-linking.ord test-pdb-function-level-linking.obj

C:\LLVM\bin\lld-link.EXE: error: could not open /order:@test-pdb-function-level-linking.ord: no such file or directory

lld-link /? also don't list such option.

That's strange. I've learned of this option by browsing the lld-link /? output (I didn't check whether it works though). Are you sure you are on the latest master branch of llvm?

Thank you for implementing the lldb-test extension. Now that we have that, and the /order, we should be able to get rid of the binaries for the function-level-linking test. You should be able to rewrite it into something like this:

lit/SymbolFile/PDB/function-level-linking.cpp:

// REQUIRES: windows lld
// 
// RUN: clang-cl /c /Zi /Gy %s /o %t
// RUN: lld-link /debug:full /nodefaultlib /entry:main order:@%S/Inputs/function-level-linking.ord %t.obj /out:%t
// RUN: lldb-test symbols -verify %t

#include "function-level-linking.h" // You'll probably need to adjust the include path for this to work.

int foo() {
  return 0;
}

int main() {
  return foo() + bar() + baz();
}

As for the second test, does it actually bring anything new to the table? As far as I can tell, the line table parsing code does not actually use any information about functions. Combined with the fact that we don't know how to write the test without checked in binaries, maybe we could just drop it ?

Yes, you are right, the second test does not bring anything new to the table. I have made it just by Leonard Mosescu suggestion. I think we can drop it.

As for /ORDER switch, I have tried it on the release version. Thanks for the point, I'll try the master version.

Avoid binaries in tests.

I replaced all binaries by a lit-based test, as you told. The only thing, I have made it more like other PDB tests.

asmith added a comment.Jun 7 2018, 7:48 PM

Thanks! Much better without the binaries.

Thank you for explanations!

labath added a comment.Jun 8 2018, 1:27 AM

Thank you for your patience. I am very happy with the final result here.

I personally like to keep number of files in a test minimal (hence I inlined the test into the cpp file), but overall that's not really important.

Doesn't the LIT based test drop the split-function case (originally
produced with PGO)?

Sorry for being late to the party, but it seems beneficial to have both LIT
*and* checked in binaries since in general they are complementary: checking
against freshly built binaries only covers a matching set of toolchain
components (in particular it's hard to cover the cross-targeting scenarios).

Other than the inconvenience with Phabricator, is there a reason not to
include the original tests as well? The size of the binaries?

Doesn't the LIT based test drop the split-function case (originally
produced with PGO)?

Sorry for being late to the party, but it seems beneficial to have both LIT
*and* checked in binaries since in general they are complementary: checking
against freshly built binaries only covers a matching set of toolchain
components (in particular it's hard to cover the cross-targeting scenarios).

Other than the inconvenience with Phabricator, is there a reason not to
include the original tests as well? The size of the binaries?

At least when it comes to checked in executables, we trigger virus scanners sometimes which is pretty annoying. I don't mind a checked in PDB, but they get pretty big sometimes and you have to go out of your way to make them small by specifying things like /nodefaultlib. But if you can get PDBs below about 200k, then checking them in might not be so bad. But I'd like to avoid checking in executables.

lemo added a comment.Jun 8 2018, 10:10 AM

I agree, checked in binaries are not always pretty. But some coverage
depends checked in binaries (or at very least is dramatically harder to get
the same thing from source)

Are we saying that sacrificing coverage to keep tests smaller should be the
default trade off?