This is an archive of the discontinued LLVM Phabricator instance.

Add GUID-ValueID map and original name to ThinLTO summary.
Needs ReviewPublic

Authored by haojiewang on Jul 9 2017, 5:58 PM.

Details

Summary

For now ThinLTO summary file has the same size of IR bitcode file, while it should be much smaller than that, since only the ModuleSummaryIndex, IRSymtab and Strtab should be written in it. GUID-ValueID map and original name are required in ThinLink phase, which are not included in module summary. If we want to reduce the summary file size, we should add these records into module summary, so in the next patch, we can do the reduction based on this change.

Event Timeline

haojiewang created this revision.Jul 9 2017, 5:58 PM

We need the GUID-ValueID map in ThinLink phase because in summary we only record the ValueID to identify every record. To get the mapping between this ValueID and GlobalValue's GUID, we need this GUID-ValueID map. In the revision before, we get this map by Value Symbol Table(VST), but there is some redundancy information in this block. In the future revision, only SummaryIndex, IRSymtab and Strtab will be recorded in summary file to reduce the size of it, so we should also record this GUID-ValueID map to pass this information to ThinLink phase without VST block.

The context and the motivation is *still* not clear to me. To begin why do we need the *names* when doing the thin link?

tejohnson edited edge metadata.Jul 10 2017, 7:08 AM

The context and the motivation is *still* not clear to me. To begin why do we need the *names* when doing the thin link?

Right - I think we don't need the IRSymtab or strtab with this approach since the valueID->GUID correspondence is being recorded directly. We had talked about emitting the value IDs via the VST and the names via the symtab/strtab, but that doesn't work well (it's not that the VST has redundant info, as Haojie indicated in his comment, but rather that it doesn't have the right info - it no longer maps the value id to the name or even directly to a symtab entry). This issue just affects the summary description of this patch since the actual minimization is being left for a follow-on - Haojie, can you update the summary description of the patch?

lib/Bitcode/Reader/BitcodeReader.cpp
5225

Error string needs update (not just a combined record, but any summary record now)

lib/Bitcode/Writer/BitcodeWriter.cpp
133

This isn't quite right - we are still only generating the value id in this class for guids from indirect call profiles. I would suggest going back to the old description and leaving the map as-is. More on that below.

321

Needs a function description. Note that this needs to be called before we write out the summaries, since we write out the GUID->ValueID records first.

However, I don't think you need to do this at all. There is really no reason to fill up the map with the other GUID->valueId mappings since they can be obtained on the fly when we write the FS_VALUE_GUID records.

I'd suggest instead changing the code that writes the FS_VALUE_GUID records such that it not only walks the map (for the value Ids we have to synthesize here for indirect call profiles), but also does the below Module traversals to emit the rest of them (without ever adding them to the map).

3407

the original name's GUID actually.

lib/Bitcode/Writer/BitcodeWriter.cpp
3408

Please add here *why* this is done, other the comment is fairly not helpful: it just tells me wha the function is doing.
I think the reason is "This is needed because promoting internal symbol to global requires generating a new GUID from the original name".
That said, unless there is another reason I missed, we could also immediately instead store two GUID for internal symbols: the local and the global.
Also, alternatively and even more simpler I think, we could just always use the global GUID for internal symbols, whether they're promoted or not.

The context and the motivation is *still* not clear to me. To begin why do we need the *names* when doing the thin link?

We need the GUID because it is needed when we build the combined index in thin link phase. We need ValueId is because the size of GUID is too large so that we do not want to write it everywhere, so we set a ValueId which is indexing from 0. And we should also set a GUID to ValueId map to record the mapping between them.
We need the original name because it is needed in FunctionImport. So we should pass is to thin link phase.

The context and the motivation is *still* not clear to me. To begin why do we need the *names* when doing the thin link?

Right - I think we don't need the IRSymtab or strtab with this approach since the valueID->GUID correspondence is being recorded directly. We had talked about emitting the value IDs via the VST and the names via the symtab/strtab, but that doesn't work well (it's not that the VST has redundant info, as Haojie indicated in his comment, but rather that it doesn't have the right info - it no longer maps the value id to the name or even directly to a symtab entry). This issue just affects the summary description of this patch since the actual minimization is being left for a follow-on - Haojie, can you update the summary description of the patch?

I think the VST is still holding the right info now. If I understand the code correctly, VST will record the mapping between ValueId and ValueName. Then in thin link phase, GUID and OriginalNameGUID will be computed by ValueName along with linkage and module path, so we can get the ValueID to GUID map, and also the value of OriginalGUID. But the ValueName may be too large(since it is a string), so we can replace it with GUID and OriginalNameGUID(if it is needed).

The context and the motivation is *still* not clear to me. To begin why do we need the *names* when doing the thin link?

Right - I think we don't need the IRSymtab or strtab with this approach since the valueID->GUID correspondence is being recorded directly. We had talked about emitting the value IDs via the VST and the names via the symtab/strtab, but that doesn't work well (it's not that the VST has redundant info, as Haojie indicated in his comment, but rather that it doesn't have the right info - it no longer maps the value id to the name or even directly to a symtab entry). This issue just affects the summary description of this patch since the actual minimization is being left for a follow-on - Haojie, can you update the summary description of the patch?

I think the VST is still holding the right info now. If I understand the code correctly, VST will record the mapping between ValueId and ValueName. Then in thin link phase, GUID and OriginalNameGUID will be computed by ValueName along with linkage and module path, so we can get the ValueID to GUID map, and also the value of OriginalGUID. But the ValueName may be too large(since it is a string), so we can replace it with GUID and OriginalNameGUID(if it is needed).

Just talked to Haojie offline, but for everyone else following along - the function level VST (for instructions, bbs, etc) still includes the name, but the global VST (for the global values we will have in the index) does not.

The context and the motivation is *still* not clear to me. To begin why do we need the *names* when doing the thin link?

We need the GUID because it is needed when we build the combined index in thin link phase. We need ValueId is because the size of GUID is too large so that we do not want to write it everywhere, so we set a ValueId which is indexing from 0. And we should also set a GUID to ValueId map to record the mapping between them.
We need the original name because it is needed in FunctionImport. So we should pass is to thin link phase.

You manage to not really answer fully my question here. But I think infer it from the code in my previous inline comment, it is "we need the name to recompute a GUID for internal symbol promoted to global", correct?

The context and the motivation is *still* not clear to me. To begin why do we need the *names* when doing the thin link?

We need the GUID because it is needed when we build the combined index in thin link phase. We need ValueId is because the size of GUID is too large so that we do not want to write it everywhere, so we set a ValueId which is indexing from 0. And we should also set a GUID to ValueId map to record the mapping between them.
We need the original name because it is needed in FunctionImport. So we should pass is to thin link phase.

You manage to not really answer fully my question here. But I think infer it from the code in my previous inline comment, it is "we need the name to recompute a GUID for internal symbol promoted to global", correct?

Just talked to Teresa, the FunctionImport seems no longer need the original name(but seems that there are still some code in FunctionImport that refer original name), but PGO will use this information.

The context and the motivation is *still* not clear to me. To begin why do we need the *names* when doing the thin link?

We need the GUID because it is needed when we build the combined index in thin link phase. We need ValueId is because the size of GUID is too large so that we do not want to write it everywhere, so we set a ValueId which is indexing from 0. And we should also set a GUID to ValueId map to record the mapping between them.
We need the original name because it is needed in FunctionImport. So we should pass is to thin link phase.

You manage to not really answer fully my question here. But I think infer it from the code in my previous inline comment, it is "we need the name to recompute a GUID for internal symbol promoted to global", correct?

Just talked to Teresa, the FunctionImport seems no longer need the original name(but seems that there are still some code in FunctionImport that refer original name), but PGO will use this information.

To clarify, I believe Sample PGO is using the original name (as opposed to instrumentation PGO). Dehao?

I am having trouble figuring out how we originally used this information, it was added here D19454 (I believe for the old workaround of not importing from modules with non-renamable values as discussed in D18298, which we now have a better fix for).

pcc added a subscriber: pcc.Jul 10 2017, 12:20 PM
pcc added a comment.Jul 10 2017, 12:30 PM

the global VST (for the global values we will have in the index) does not.

Not quite, global VSTs include a reference to the symbol name in the strtab.

But I think infer it from the code in my previous inline comment, it is "we need the name to recompute a GUID for internal symbol promoted to global", correct?

Names of globals are also necessary for symbol resolution.

Updating D35189. Leave the GUIDToValueIDMap just for indirect call; change the codes that writes the FS_VALUE_GUID records to Module traversals in writePerModuleGlobalValueSummary function; modify some comments.