This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Move duplicate symbol check after input file parsing
ClosedPublic

Authored by MaskRay on Feb 15 2022, 6:13 PM.

Details

Summary

https://discourse.llvm.org/t/parallel-input-file-parsing/60164

To decouple symbol initialization and section initialization, Defined::section
assignment should be postponed after input file parsing. To avoid spurious
duplicate definition error due to two definitions in COMDAT groups of the same
signature, we should postpone the duplicate symbol check.

The function is called postScan instead of a more specific name like
checkDuplicateSymbols, because we may merge Symbol::mergeProperties into
postScan. It is placed after compileBitcodeFiles to apply to ET_REL files
produced by LTO. This causes minor diagnostic regression
for skipLinkedOutput configurations: ld.lld --thinlto-index-only a.bc b.o
(bitcode definition prevails) won't detect duplicate symbol error. I think this
is an acceptable compromise. The important cases where (a) both files are
bitcode or (b) --thinlto-index-only is unused are still detected.

Diff Detail

Event Timeline

MaskRay created this revision.Feb 15 2022, 6:13 PM
MaskRay requested review of this revision.Feb 15 2022, 6:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2022, 6:13 PM
ikudrin added inline comments.Feb 16 2022, 7:37 AM
lld/ELF/Driver.cpp
2291

In D119909, this loop is moved later in the method. It would be better to place it in the right position from the beginning.

2291–2308

In D119909, a helper function initializeLocalSymbols() is added for a similar purpose. It would be better to do the same for ObjFile<T>::postParse().

2309–2310
  • Why no parallel execution for bitcode files?
  • Shouldn't this loop be placed later in the function to detect duplicate symbols if some bitcode files are parsed after this point? They may define an --entry symbol, as an example.
MaskRay updated this revision to Diff 409313.Feb 16 2022, 10:15 AM
MaskRay edited the summary of this revision. (Show Details)

Finalize postParse position.

This means we need to sacrifice a bit for a duplicate symbol error in this patch.

MaskRay marked 3 inline comments as done.Feb 16 2022, 10:21 AM
MaskRay added inline comments.
lld/ELF/Driver.cpp
2309–2310

Added. Because I thought the code size may not justify the very little performance improvement... With LTO, symbol resolution takes a very tiny time.
That said, the code of parallelForEach has been significantly lowered by D117510, so perhaps just simply add it even if the gain is very low.

--entry is a good point. I added a similar --undefined-glob test to lto/duplicated.ll

MaskRay marked an inline comment as done.Feb 16 2022, 10:22 AM
ikudrin added inline comments.Feb 17 2022, 2:14 AM
lld/ELF/Driver.cpp
2432

If this is moved next to the loop for bitcode files, it should detect all the errors and avoid not-that-helpful error messages like

ld.lld: error: duplicate symbol: f
>>> defined at ld-temp.o
>>>            lto.tmp:(f)
>>> defined at ...test\ELF\lto\Output\duplicated.ll.tmp/b.o:(.text+0x0)

If I understand it right, the object files for LTO cannot introduce new duplicate symbols, no?

As for future extensions of the postParse() method, which should also be applied to the LTO-produced object files, we can have an additional list of them and run postParse() only for them, separately, after compileBitcodeFiles().

2432

Why not add a function like postParseObjectFile() instead of the lambda?

lld/ELF/Symbols.cpp
653–657

The code should be removed, not commented out.

MaskRay updated this revision to Diff 410024.Feb 18 2022, 2:11 PM
MaskRay marked 3 inline comments as done.
MaskRay edited the summary of this revision. (Show Details)

Add postParseObjectFile
Use two postParseObjectFile loops

MaskRay updated this revision to Diff 410028.Feb 18 2022, 2:17 PM
MaskRay edited the summary of this revision. (Show Details)

rebase

MaskRay added inline comments.Feb 18 2022, 2:24 PM
lld/ELF/Driver.cpp
2432

If I understand it right, the object files for LTO cannot introduce new duplicate symbols, no?

Ideally, yes. Due to some properties like isUsedInRegularObj, it needs more thoughts whether the loop can be omitted. Perhaps do this omission after the changes are proven to be stable.

It looks like the patch was somehow intermixed with D119909. git cannot apply it on ToT.

MaskRay updated this revision to Diff 410439.EditedFeb 21 2022, 8:58 PM
This comment has been deleted.
MaskRay updated this revision to Diff 410440.Feb 21 2022, 9:01 PM

Restore the previous diff.
(the update to D119909 accidentally changed this Differential)

ikudrin accepted this revision.Feb 22 2022, 1:48 AM

LGTM.

lld/ELF/Driver.cpp
2432

Maybe define newObjectFiles in this patch so that the lines are not immediately changed in the subsequent patch (D119909)?

This revision is now accepted and ready to land.Feb 22 2022, 1:48 AM
MaskRay updated this revision to Diff 410574.Feb 22 2022, 10:00 AM
MaskRay marked an inline comment as done.

comment

This revision was landed with ongoing or failed builds.Feb 22 2022, 10:08 AM
This revision was automatically updated to reflect the committed changes.
ormris removed a subscriber: ormris.Feb 22 2022, 10:09 AM