This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/IR/
-
llvm/
-
IR/
47
ThinLTOInfo.h
-
lib/IR/
-
IR/
-
CMakeLists.txt
8
ThinLTOInfo.cpp

Differential D11721

[ThinLTO] Data structures for holding ThinLTO function index/summary
AbandonedPublic

Authored by tejohnson on Aug 3 2015, 8:22 AM.

Download Raw Diff

Details

Reviewers: None

Summary

This patch adds support for the main data structures needed for holding the ThinLTO function index/summary that is used to enable importing.

Related RFCs:
RFC: ThinLTO File Format (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-August/088947.html)
RFC: ThinLTO File API and Data Structures (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-August/088951.html)

Diff Detail

Event Timeline

tejohnson updated this revision to Diff 31233.Aug 3 2015, 8:22 AM

tejohnson retitled this revision from to [ThinLTO] Data structures for holding ThinLTO function index/summary.

tejohnson updated this object.

tejohnson added a subscriber: llvm-commits.Aug 3 2015, 10:02 AM

jevinskie added a subscriber: jevinskie.Aug 11 2015, 2:24 PM

Pinging since the first one didn't get sent to llvm-commits (I added the subscriber after uploading, which apparently doesn't work well).

At a very high level, the parallel data structures being introduced here really bother me. We already have a materialization mechanism for lazily loading functions into the module. I know that we fairly eagerly materialize today, but have you considered a) materializing less eagerly and b) merging your index structures into the materialization mechanisms?

(Small comments inline, but by no means a complete review.)

include/llvm/IR/ThinLTOInfo.h
2	Meta comment: This shouldn't be ThinLTO specific. If you want to propose adding an index to the IR, that's fine, but it needs to be extension used by everything, not something specific to ThinLTO.
26	Until this is filled in, the patch isn't really worth reviewing...
43	This shouldn't be a public flag...
47	Why isn't the function linkage enough here?

In D11721#223084, @reames wrote:

At a very high level, the parallel data structures being introduced here really bother me. We already have a materialization mechanism for lazily loading functions into the module. I know that we fairly eagerly materialize today, but have you considered a) materializing less eagerly and b) merging your index structures into the materialization mechanisms?

By parallel assume you mean the fact that this is outside the Module structure, and that the associated object file interfaces in D11723 are separate from the IRObjectFile data structures?

The reasons are discussed a bit in the RFC, but it is buried in there, so I copied the motivation here, which essentially boils down to the fact that the module's ThinLTO information is not needed when parsing the rest of the module and vice versa:

"In order to save time and memory in the plugin step, we don’t want to parse the entire bitcode for a module during the plugin step (which only needs to read the function index/summary for merging into a combined index/summary). We also don’t need to parse the ThinLTO information out of the module’s IR when constructing the Module object during the backend compile step (the ThinLTO importing step will read the combined index file, not the module’s own ThinLTO index).

Therefore any data structure created to encapsulate ThinLTO function index and summary information should be independent of LTOModule and IRObjectFile, as those structures are created when we read/parse the module’s normal (non-ThinLTO) bitcode, and result in the creation of a Module object. "

Let me know if I misinterpreted your concern.

(Small comments inline, but by no means a complete review.)

include/llvm/IR/ThinLTOInfo.h
2	I thought the information here was very ThinLTO specific. It isn't used when parsing the rest of the module, but is meant to hold information required for making ThinLTO importing decisions and doing the importing. Do you think some of this is useful in other cases?
26	I'll go ahead and add a field I was using in my prototype implementation, specifically the function's instruction count. This data structure will evolve as we tune the importing heuristics.
43	None of these should be actually, looks like I accidentally left out the "private" at the top.
47	Because we don't want to parse all of the IR when building the combined index during the plugin step. See the ThinLTOFunctionSummaryIndex::mergeFrom() implementation below where this is used to do renaming of the functions in the combined index, to disambiguate local functions from different modules.

tejohnson mentioned this in D11722: [ThinLTO] Bitcode reading/writing support for ThinLTO function summary/index.Aug 12 2015, 2:32 PM

tejohnson updated this object.Aug 12 2015, 2:36 PM

Addressed some review comments: Added the instruction count, which is the baseline metric used in ThinLTO importing decisions, to the ThinLTOFunctionSummary class. Also made ThinLTOFunctionInfo class member data private.

davidxl added a subscriber: davidxl.Aug 13 2015, 11:12 AM

davidxl added inline comments.

include/llvm/IR/ThinLTOInfo.h
48	This field 'ModulePath' belongs to ThinLTOFunctionSummaryIndex class (when used before combining).
52	Is there a 1-1 mapping from ThinLTOFunctionInfo to FunctionSummary? If yes, can ThinLTOFunctionInfo inherite from FunctionSummary?
164	why passing by value?
lib/IR/ThinLTOInfo.cpp
45	The ModPath should be read once from 'Other' outside the loop.
57	Is it using the same naming scheme when local function is static promoted? If yes, it might be better to have a common helper function such as "globalizeThinLTOLocalName", or "getGlobalThinLTONameForLocal" ..

davidxl added inline comments.Aug 13 2015, 10:49 PM

include/llvm/IR/ThinLTOInfo.h
117	Why making a reverse map from stringPath to module id instead of a map from module id to module path? In the combined summary index, ThinLTOFunctionInfo object should contain a module id field. From that module id, the module path can be obtained.
125	I think it is better to have a ModulePath field in this class -- this field will only be used for per-module SummaryIndex.

mehdi_amini added a subscriber: mehdi_amini.Aug 14 2015, 4:47 PM

Will update the patch as noted in responses.

But we need to decide on the best way to represent the module path strings. See my responses below for why I have the current organization and let me know if you think otherwise. The two best choices I see are:

(current patch) Use ThinLTOModulePathStringTable to own the module path strings, which are referenced via StringRefs saved in the per-function info. The module ID saved in this table is just used for consistent renaming. This is also a little easier when renaming the module's own strings, since it doesn't require saving the ThinLTO module ID on the Module object, we can just lookup via the Module's path/name already saved there.
Change ThinLTOModulePathStringTable to be map from ID to string (needs to be a std::string since there won't be a structure owning the string memory anymore). Save module id in per-function info (instead of string ref), use it to lookup the module path from the ThinLTOModulePathStringTable when we want to do importing. Need to find the ThinLTO module ID associated with the current module's path when reading in the module string table and save it on Module object for later renaming. Save a module path on the Index in the per-module case and use it when building the combined index.

include/llvm/IR/ThinLTOInfo.h
48	But it can't be on the ThinLTOFunctionSummaryIndex class in the combined index. I could have a ModulePath on the ThinLTOFunctionSummaryIndex class just for the case of the per-module index, but it seemed confusing to have it in two places (and which one is used depends on the context). Since StringRefs are small and fast to construct (just a pointer to a char * and a length), and we need one here anyway for the combined case, it seemed clearer to me to always keep it in the per-function info. For the per-module case this would be populated when we read in the per-module ThinLTO information from the IR.
52	When creating the combined index it is not 1-1. Another reason why I have a pointer here instead of inheriting or combining the two is to be more efficient in the case where the function summary information is read lazily during importing. I.e. initially only the ThinLTO symtab and module paths are read, and each function's summary is only read when considering importing that function. In fact, I should move the BitcodeIndex and ModulePath into the ThinLTOFunctionSummary object for the same reason - they are only populated when the function summary is read/parsed. Will make that change to the patch.
117	The string table map is not queried directly, as described in the associated RFC it is just for owning the module strings, since the string is duplicated when inserting to the StringMap. We don't need to save the module ID and use it to look up in the string table since it is fast and fairly efficient just to save the StringRef directly. I will update the comments in this file to reflect some of the info in the RFC and make it hopefully a lot clearer.
125	In the combined index we need a module path in the function info, as noted earlier. I think it is confusing to have module paths in multiple places used in different contexts.
164	Will change to pass by ref.
lib/IR/ThinLTOInfo.cpp
45	Relates to the earlier discussion on where to save the module path.
57	Agreed. Will add a helper function.

davidxl added inline comments.Aug 17 2015, 2:17 PM

include/llvm/IR/ThinLTOInfo.h
27	Should also include body bitcode offset info, right?
48	ok.
52	Makes sense. I think it is better to add more comments before the FunctionSummary declaration to indicate this enables lazy reading of summary from combined summary index file during importing so that it is more memory efficient. Also document that ThinLTOFunctionInfo should only include minimal info required to locate the summary etc.
117	ok.
125	ok

I am going to upload a new patch now that addresses the past few rounds of comments.

I'm also going to upload a new patch D11723 in a few minutes that uses the mergeFrom interface to create a combined index in the gold plugin.

include/llvm/IR/ThinLTOInfo.h
27	Yes, that is going to move here in the new patch I am about to upload here.
52	Added some comments in the new patch.

Address review comments.

tejohnson mentioned this in D11723: [ThinLTO] ThinLTO object file interfaces.Aug 17 2015, 9:38 PM

Add accessor method needed by new D11722 patch.

Hi Teresa,

I finally managed to find time to finish reading the related RFC. I haven't understand all the use-cases that drove some design decisions (like why ThinLTOFunctionSummary and ThinLTOFunctionInfo are two separate entities). I'll re-read the RFC tomorrow and I might become more clear (if you have a pointer where it is discussed, I'm interested).

I left some inline comments, hopefully you'll find some useful :)

Mehdi

include/llvm/IR/ThinLTOInfo.h
2	I agree with Philip, I'm not sure it needs to be "LTO". Now it is not a blocker to me, it shouldn't be a big deal to introduce it here and refactor it later. At least it won't be coupled to anything else.
25	I'll never say enough how I really enjoyed the job you did with documenting your stuff with RFCs on the mailing list. Now I hope this information (in the RFCs) will be nicely translated in the code as well. Since you'll introduce a lot of new classes, what about trying to adopt off-hand a format that will result is an nice doxygen? Especially in stuff that goes the public header. Just a hint to help: we turned on "autobrief" on in LLVM in May, so the first line is interpreted as a the "brief description" unless you add the keyword \brief (then is it the first sentence that is matched). / Do the thing with the stuff. / / This does whatever, using \c param1 for something and returning / some sort of result. or / \brief This function does X. It caches some values or something, / and is complicated in some minor way. (Now there are no strong requirements of sticking with the doxygen syntax that I'm aware of, and it won't prevent your patches from being integrated).
80	Is seems to me from reading the implementation that unless you call `recordParsedInfo()` with `nullptr` you always have the invariant `Parsing == !FunctionSummary`. So it seems that this flag is useless?
98	This implies that `ThinLTOFunctionInfo` owns its `FunctionSummary`, so is there a reason not to use `std::unique_ptr`?
102	This is scary to read with respect to what I see in the destructor, how are you avoiding double deletion?
132	Can you provide some information about what happens in the case of COMDAT functions? (if you did on the mailing I'd appreciate a pointer to the RFC thread)
161	Why do you need this declaration?
164	Not clear to me how a default copy constructor can involve "std::swap"?
168	I believe the usual pattern is to take by value instead of an R-value ref, unless I missed something.
188	Since `ThinLTOFunctionInfo` owns the `FunctionSummary`, it sound it shouldn't be copied but moved, you'll run into double free. (unless I misunderstood something about the `ThinLTOFunctionInfo` above, in which case forget this comment)
191	I don't think you gain anything by doing this test which involves two queries in the map (lookup + insert/update). What about just doing: FunctionMap.lookup(FuncName).push_back(std::move(Info));
lib/IR/ThinLTOInfo.cpp
24	I'm just starting to understand `ThinLTOFunctionSummaryIndex` and some "weirdness" associated. Do you build a `ThinLTOFunctionSummaryIndex` for each module and only then you combine them together using this `mergeFrom()` method? Looks like a missing abstraction level, some concepts are mixed together, which would explain why you are using a static `NextModuleId` here, which sounds fishy to me. Also this assumes that the `Other` will have an empty ModuleMap, i.e. all the functions come from the same module. (unless I misunderstood what is going on here)
44	It seems to me that this supposed that `ThinLTOFunctionSummary` has been parsed. If the `ModulePath` field was stored in the `ThinLTOFunctionInfo`, this wouldn't be an issue. But since I don't fully understand the reason for the separation of these two classes (I'm probably missing a use-case).

Hi Mehdi,

Thanks for the comments! Responses below. I am not going to update the patch just yet as I am working on implementing the bitcode reading/writing for the format proposed in the latest round of D11722 comments, including the records so that I can do better end to end testing, and there are going to be a few more changes here as a result. I tried to explain the separation of the summary and info structs and some of the other issues below. Let me know if that doesn't clarify the organization.

BTW, since there are now so many RFCs and patches floating around, I put together a page that points to all of them (and I will try to keep this up to date!). See https://sites.google.com/site/llvmthinlto/.

Thanks
Teresa

include/llvm/IR/ThinLTOInfo.h
2	I'll probably go ahead and remove the ThinLTO part of the name, as I am planning to do for the bitcode sections. Some of it is still pretty ThinLTO-specific but I will note that in the comments instead of the filename and data structure names.
25	Thanks for the tip. I will change the comments to be nice for doxygen.
80	Yeah, I have removed that flag in the new version of the patch I am working on. It was previously more interesting when I had the bitcode index and module path in this structure, but I have moved all of that to the function summary (and am moving the local function flag as well now).
98	I was trying to be clever about sharing the pointer in the combined index creation phase, but this is not going to work as is anyway. I will probably change this to a unique_ptr as you suggest, and just duplicate when it is merged into the combined index.
102	Yeah I already hit an issue with this in some follow on work I am doing to populate these during bitcode writing, as this struct is being copied around and was duplicating the pointer leading to double-frees. I am changing this.
132	There are a couple mentions in the RFCs, relating to why this is a vector and also relating to how they are handled from a linkage handling aspect. The former can be found in the file API and data structures RFC (http://lists.llvm.org/pipermail/llvm-dev/2015-August/088920.html): "There may be more than one ThinLTOFunctionInfo for a given function name in the combined function index/summary map due to COMDATs. While the plugin step that creates the combined function map may decide to select one representative COMDAT instance, we don’t want the design to preclude holding multiple as it may be advantageous to import a particular COMDAT from different modules in different backend instances." The latter can be found in Section 3 of the symbol linkage RFC (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088025.html)
161	Don't, removing it.
164	I don't think this is right now, looks like StringMap copy constructor calls std::move. I was trying to reason through what would happen if using the default copy constructor. But I think it is better to simply disable the copy constructor as I don't think we should need that.
168	Will fix.
188	Yep, I am changing this.
191	Good idea, will fix.
lib/IR/ThinLTOInfo.cpp
24	I'll try to document this better with comments. Let me explain here and see if this makes more sense. This function is only used during the phase-2 linker step, when all we are doing is taking the per-module function indexes (pulled out of the bitcode IR .o files created by -fthinlto -c), and merging them into a global combined index that will be emitted as a separate file (for use during importing in the phase-3 parallel backends). The per-module indexes don't have a module string table in bitcode, as the index is from a single module. So here we need to build the ModulePathStringTable in "this" index which is the combined index being built. See the gold-plugin.cpp call to mergeFrom() in the latest D11723 patch for how this is used. When we build the module path string table for the combined index we need to assign a unique ID to each module, which is what NextModuleId is for. It doesn't technically need to be a static member, but I felt that better reflected the fact that this is only used for one index - the combined index. Let me know if that makes sense or whether some changes to the data structure would better reflect this.
44	Yes, when combining the per-module indexes the function summary needs to have been fully parsed. Although since the per-module index doesn't actually have the module path string table, in the per-module summaries that field is populated once we parse a per-module index (from the module's identifier), not from the bitcode itself. The reason for keeping the function summary separate from the function info struct is when reading the combined index during importing we want to support lazy reading of the summaries (since we won't need most of them).

Obsolete. New patch is http://reviews.llvm.org/D13107

Revision Contents

Path

Size

include/

llvm/

IR/

ThinLTOInfo.h

223 lines

lib/

IR/

CMakeLists.txt

1 line

ThinLTOInfo.cpp

65 lines

Diff 32486

include/llvm/IR/ThinLTOInfo.h

This file was added.

				//===-- llvm/ThinLTOInfo.h - ThinLTO Index and Summary ----------- C++ --===//
				//
				reamesUnsubmitted Not Done Reply Inline Actions Meta comment: This shouldn't be ThinLTO specific. If you want to propose adding an index to the IR, that's fine, but it needs to be extension used by everything, not something specific to ThinLTO. reames: Meta comment: This shouldn't be ThinLTO specific. If you want to propose adding an index to…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions I thought the information here was very ThinLTO specific. It isn't used when parsing the rest of the module, but is meant to hold information required for making ThinLTO importing decisions and doing the importing. Do you think some of this is useful in other cases? tejohnson: I thought the information here was very ThinLTO specific. It isn't used when parsing the rest…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I agree with Philip, I'm not sure it needs to be "LTO". Now it is not a blocker to me, it shouldn't be a big deal to introduce it here and refactor it later. At least it won't be coupled to anything else. mehdi_amini: I agree with Philip, I'm not sure it needs to be "LTO". Now it is not a blocker to me, it…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions I'll probably go ahead and remove the ThinLTO part of the name, as I am planning to do for the bitcode sections. Some of it is still pretty ThinLTO-specific but I will note that in the comments instead of the filename and data structure names. tejohnson: I'll probably go ahead and remove the ThinLTO part of the name, as I am planning to do for the…
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				/// @file
				/// ThinLtoInfo.h This file contains the declarations the classes that hold
				// the ThinLTO function index and summary.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_IR_THINLTOINFO_H
				#define LLVM_IR_THINLTOINFO_H

				#include "llvm/ADT/SmallString.h"
				#include "llvm/ADT/StringMap.h"
				#include "llvm/Support/MemoryBuffer.h"

				namespace llvm {

				// Function summary information to aid decisions and implementation of
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I'll never say enough how I really enjoyed the job you did with documenting your stuff with RFCs on the mailing list. Now I hope this information (in the RFCs) will be nicely translated in the code as well. Since you'll introduce a lot of new classes, what about trying to adopt off-hand a format that will result is an nice doxygen? Especially in stuff that goes the public header. Just a hint to help: we turned on "autobrief" on in LLVM in May, so the first line is interpreted as a the "brief description" unless you add the keyword \brief (then is it the first sentence that is matched). / Do the thing with the stuff. / / This does whatever, using \c param1 for something and returning / some sort of result. or / \brief This function does X. It caches some values or something, / and is complicated in some minor way. (Now there are no strong requirements of sticking with the doxygen syntax that I'm aware of, and it won't prevent your patches from being integrated). mehdi_amini: I'll never say enough how I really enjoyed the job you did with documenting your stuff with…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the tip. I will change the comments to be nice for doxygen. tejohnson: Thanks for the tip. I will change the comments to be nice for doxygen.
				// importing. This is a separate class from ThinLTOFunctionInfo to
				reamesUnsubmitted Not Done Reply Inline Actions Until this is filled in, the patch isn't really worth reviewing... reames: Until this is filled in, the patch isn't really worth reviewing...
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions I'll go ahead and add a field I was using in my prototype implementation, specifically the function's instruction count. This data structure will evolve as we tune the importing heuristics. tejohnson: I'll go ahead and add a field I was using in my prototype implementation, specifically the…
				// enable lazy reading of this function summary information from the
				davidxlUnsubmitted Not Done Reply Inline Actions Should also include body bitcode offset info, right? davidxl: Should also include body bitcode offset info, right?
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Yes, that is going to move here in the new patch I am about to upload here. tejohnson: Yes, that is going to move here in the new patch I am about to upload here.
				// combined index file during imporing.
				class ThinLTOFunctionSummary {
				private:
				// Path of module containing function IR, used to locate module when
				// importing this function.
				StringRef ModulePath;
				// Index of function within bitcode module block in its module, used
				// to quickly import this function.
				uint64_t BitcodeIndex;

				// The rest of the information is used to help decide whether importing
				// is likely to be profitable.
				// Other information will be added as the importing is tuned, such
				// as hotness (when profile available), and other function characteristics.

				// Number of instructions (ignoring debug instructions, e.g.) computed
				reamesUnsubmitted Not Done Reply Inline Actions This shouldn't be a public flag... reames: This shouldn't be a public flag...
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions None of these should be actually, looks like I accidentally left out the "private" at the top. tejohnson: None of these should be actually, looks like I accidentally left out the "private" at the top.
				// during the initial compile step when the function index is first built.
				unsigned InstCount;

				public:
				reamesUnsubmitted Not Done Reply Inline Actions Why isn't the function linkage enough here? reames: Why isn't the function linkage enough here?
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Because we don't want to parse all of the IR when building the combined index during the plugin step. See the ThinLTOFunctionSummaryIndex::mergeFrom() implementation below where this is used to do renaming of the functions in the combined index, to disambiguate local functions from different modules. tejohnson: Because we don't want to parse all of the IR when building the combined index during the plugin…
				ThinLTOFunctionSummary(StringRef ModPath, uint64_t BitcodeIdx,
				davidxlUnsubmitted Not Done Reply Inline Actions This field 'ModulePath' belongs to ThinLTOFunctionSummaryIndex class (when used before combining). davidxl: This field 'ModulePath' belongs to ThinLTOFunctionSummaryIndex class (when used before…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions But it can't be on the ThinLTOFunctionSummaryIndex class in the combined index. I could have a ModulePath on the ThinLTOFunctionSummaryIndex class just for the case of the per-module index, but it seemed confusing to have it in two places (and which one is used depends on the context). Since StringRefs are small and fast to construct (just a pointer to a char * and a length), and we need one here anyway for the combined case, it seemed clearer to me to always keep it in the per-function info. For the per-module case this would be populated when we read in the per-module ThinLTO information from the IR. tejohnson: But it can't be on the ThinLTOFunctionSummaryIndex class in the combined index. I could have a…
				davidxlUnsubmitted Not Done Reply Inline Actions ok. davidxl: ok.
				unsigned NumInsts) :
				ModulePath(ModPath),
				BitcodeIndex(BitcodeIdx),
				InstCount(NumInsts) { }
				davidxlUnsubmitted Not Done Reply Inline Actions Is there a 1-1 mapping from ThinLTOFunctionInfo to FunctionSummary? If yes, can ThinLTOFunctionInfo inherite from FunctionSummary? davidxl: Is there a 1-1 mapping from ThinLTOFunctionInfo to FunctionSummary? If yes, can…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions When creating the combined index it is not 1-1. Another reason why I have a pointer here instead of inheriting or combining the two is to be more efficient in the case where the function summary information is read lazily during importing. I.e. initially only the ThinLTO symtab and module paths are read, and each function's summary is only read when considering importing that function. In fact, I should move the BitcodeIndex and ModulePath into the ThinLTOFunctionSummary object for the same reason - they are only populated when the function summary is read/parsed. Will make that change to the patch. tejohnson: When creating the combined index it is not 1-1. Another reason why I have a pointer here…
				davidxlUnsubmitted Not Done Reply Inline Actions Makes sense. I think it is better to add more comments before the FunctionSummary declaration to indicate this enables lazy reading of summary from combined summary index file during importing so that it is more memory efficient. Also document that ThinLTOFunctionInfo should only include minimal info required to locate the summary etc. davidxl: Makes sense. I think it is better to add more comments before the FunctionSummary declaration…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Added some comments in the new patch. tejohnson: Added some comments in the new patch.

				StringRef modulePath() const {
				return ModulePath;
				}

				uint64_t bitcodeIndex() const {
				return BitcodeIndex;
				}

				unsigned instCount() {
				return InstCount;
				}
				};

				// Class to hold pointer to function summary and information required
				// for parsing it. After parsing of the ThinLTO symtab, this initially
				// holds the offset into the corresponding ThinLTO function summary bitcode
				// block. After parsing the associated summary information from the summary
				// block the FunctionSummary is populated and stored here.
				class ThinLTOFunctionInfo {
				private:
				// Function summary information used to help make importing decisions.
				ThinLTOFunctionSummary *FunctionSummary;
				// Used during bitcode parsing and writing to hold the offset of the
				// corresponding function summary in the function summary section.
				uint64_t FunctionSummarySecOffset;
				// True if we are still parsing into this instance.
				bool Parsing;
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Is seems to me from reading the implementation that unless you call `recordParsedInfo()` with `nullptr` you always have the invariant `Parsing == !FunctionSummary`. So it seems that this flag is useless? mehdi_amini: Is seems to me from reading the implementation that unless you call `recordParsedInfo()` with…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Yeah, I have removed that flag in the new version of the patch I am working on. It was previously more interesting when I had the bitcode index and module path in this structure, but I have moved all of that to the function summary (and am moving the local function flag as well now). tejohnson: Yeah, I have removed that flag in the new version of the patch I am working on. It was…
				// Used to flag functions that have local linkage types and need to
				// have module identifier appended before placing into the combined
				// index, to disambiguiate from other functions with the same name.
				bool IsLocalFunction;

				public:
				// Constructed during parsing
				ThinLTOFunctionInfo(uint64_t SecOffset) :
				FunctionSummary(nullptr),
				FunctionSummarySecOffset(SecOffset),
				Parsing(true),
				IsLocalFunction(false) { }

				~ThinLTOFunctionInfo() {
				// Support sharing FunctionSummary object between per-module and
				// combined indexes.
				if (FunctionSummary)
				delete FunctionSummary;
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions This implies that `ThinLTOFunctionInfo` owns its `FunctionSummary`, so is there a reason not to use `std::unique_ptr`? mehdi_amini: This implies that `ThinLTOFunctionInfo` owns its `FunctionSummary`, so is there a reason not to…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions I was trying to be clever about sharing the pointer in the combined index creation phase, but this is not going to work as is anyway. I will probably change this to a unique_ptr as you suggest, and just duplicate when it is merged into the combined index. tejohnson: I was trying to be clever about sharing the pointer in the combined index creation phase, but…
				}

				// Use the default copy constructor (when creating combined index),
				// which will cause the same ThinLTOFunctionSummary object to be shared.
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions This is scary to read with respect to what I see in the destructor, how are you avoiding double deletion? mehdi_amini: This is scary to read with respect to what I see in the destructor, how are you avoiding double…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Yeah I already hit an issue with this in some follow on work I am doing to populate these during bitcode writing, as this struct is being copied around and was duplicating the pointer leading to double-frees. I am changing this. tejohnson: Yeah I already hit an issue with this in some follow on work I am doing to populate these…

				// Record the function information parsed out of the function
				// summary block.
				void recordParsedInfo(ThinLTOFunctionSummary *FuncSummary) {
				assert(Parsing);
				Parsing = false;
				FunctionSummary = FuncSummary;
				}

				ThinLTOFunctionSummary *functionSummary() const {
				assert(!Parsing);
				return FunctionSummary;
				}

				void isLocalFunction(bool Local) {
				davidxlUnsubmitted Not Done Reply Inline Actions Why making a reverse map from stringPath to module id instead of a map from module id to module path? In the combined summary index, ThinLTOFunctionInfo object should contain a module id field. From that module id, the module path can be obtained. davidxl: Why making a reverse map from stringPath to module id instead of a map from module id to module…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions The string table map is not queried directly, as described in the associated RFC it is just for owning the module strings, since the string is duplicated when inserting to the StringMap. We don't need to save the module ID and use it to look up in the string table since it is fast and fairly efficient just to save the StringRef directly. I will update the comments in this file to reflect some of the info in the RFC and make it hopefully a lot clearer. tejohnson: The string table map is not queried directly, as described in the associated RFC it is just for…
				davidxlUnsubmitted Not Done Reply Inline Actions ok. davidxl: ok.
				IsLocalFunction = Local;
				}
				bool isLocalFunction() const {
				return IsLocalFunction;
				}

				uint64_t functionSummarySecOffset() {
				return FunctionSummarySecOffset;
				davidxlUnsubmitted Not Done Reply Inline Actions I think it is better to have a ModulePath field in this class -- this field will only be used for per-module SummaryIndex. davidxl: I think it is better to have a ModulePath field in this class -- this field will only be used…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions In the combined index we need a module path in the function info, as noted earlier. I think it is confusing to have module paths in multiple places used in different contexts. tejohnson: In the combined index we need a module path in the function info, as noted earlier. I think it…
				davidxlUnsubmitted Not Done Reply Inline Actions ok davidxl: ok
				}
				};

				// List of function info structures for a particular function name held
				// in the ThinLTOFunctionMap. Requires a vector in the case of multiple
				// COMDAT functions of the same name.
				typedef std::vector<ThinLTOFunctionInfo> ThinLTOFunctionInfoList;
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Can you provide some information about what happens in the case of COMDAT functions? (if you did on the mailing I'd appreciate a pointer to the RFC thread) mehdi_amini: Can you provide some information about what happens in the case of COMDAT functions? (if you…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions There are a couple mentions in the RFCs, relating to why this is a vector and also relating to how they are handled from a linkage handling aspect. The former can be found in the file API and data structures RFC (http://lists.llvm.org/pipermail/llvm-dev/2015-August/088920.html): "There may be more than one ThinLTOFunctionInfo for a given function name in the combined function index/summary map due to COMDATs. While the plugin step that creates the combined function map may decide to select one representative COMDAT instance, we don’t want the design to preclude holding multiple as it may be advantageous to import a particular COMDAT from different modules in different backend instances." The latter can be found in Section 3 of the symbol linkage RFC (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088025.html) tejohnson: There are a couple mentions in the RFCs, relating to why this is a vector and also relating to…
				// Map from function name to corresponding function info structures.
				typedef StringMap<ThinLTOFunctionInfoList> ThinLTOFunctionMap;

				// String table to hold/own module path strings, which additionally holds the
				// module ID assigned to each module during the plugin step. The StringMap
				// makes a copy of and owns inserted strings.
				typedef StringMap<uint64_t> ThinLTOModulePathStringTable;

				// Class to hold module path string table and ThinLTOFunctionMap,
				// and encapsulate methods for operating on them.
				class ThinLTOFunctionSummaryIndex {
				public:
				typedef ThinLTOFunctionMap::const_iterator const_funcinfo_iterator;

				private:
				// Map from function name to list of function information instances
				// for functions of that name (may be duplicates in the COMDAT case, e.g.).
				ThinLTOFunctionMap FunctionMap;

				// Holds strings for combined index, mapping to the corresponding module ID.
				ThinLTOModulePathStringTable ModulePathStringTable;

				// Used for assigning module ids when per-module index structures merged
				// to form combined index, necessary for consistent renaming of promoted
				// static (local) variables.
				static uint64_t NextModuleId;

				public:
				ThinLTOFunctionSummaryIndex() { }
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Why do you need this declaration? mehdi_amini: Why do you need this declaration?
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Don't, removing it. tejohnson: Don't, removing it.

				// Use default copy constructor. For the StringMaps this will
				// call std::swap during the assignment.
				davidxlUnsubmitted Not Done Reply Inline Actions why passing by value? davidxl: why passing by value?
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Will change to pass by ref. tejohnson: Will change to pass by ref.
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Not clear to me how a default copy constructor can involve "std::swap"? mehdi_amini: Not clear to me how a default copy constructor can involve "std::swap"?
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions I don't think this is right now, looks like StringMap copy constructor calls std::move. I was trying to reason through what would happen if using the default copy constructor. But I think it is better to simply disable the copy constructor as I don't think we should need that. tejohnson: I don't think this is right now, looks like StringMap copy constructor calls std::move. I was…

				// Use when the ThinLTOFunctionMap has already been created
				// (e.g. from a native object wrapped bitcode file's symtab).
				void setFunctionMap(ThinLTOFunctionMap &&FuncMap) {
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I believe the usual pattern is to take by value instead of an R-value ref, unless I missed something. mehdi_amini: I believe the usual pattern is to take by value instead of an R-value ref, unless I missed…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Will fix. tejohnson: Will fix.
				FunctionMap = std::move(FuncMap);
				}

				const_funcinfo_iterator funcinfo_begin() const {
				return FunctionMap.begin();
				}
				const_funcinfo_iterator funcinfo_end() const {
				return FunctionMap.end();
				}

				// Return the list of function info objects for a given function.
				ThinLTOFunctionInfoList getFunctionInfoList(StringRef FuncName) const {
				return FunctionMap.lookup(FuncName);
				}

				// Add a function info for a function of the given name.
				void addFunctionInfo(StringRef FuncName,
				ThinLTOFunctionInfo &Info) {
				if (FunctionMap.count(FuncName))
				FunctionMap.lookup(FuncName).push_back(Info);
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Since `ThinLTOFunctionInfo` owns the `FunctionSummary`, it sound it shouldn't be copied but moved, you'll run into double free. (unless I misunderstood something about the `ThinLTOFunctionInfo` above, in which case forget this comment) mehdi_amini: Since `ThinLTOFunctionInfo` owns the `FunctionSummary`, it sound it shouldn't be copied but…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Yep, I am changing this. tejohnson: Yep, I am changing this.
				else
				FunctionMap.insert(std::make_pair(FuncName,
				ThinLTOFunctionInfoList(1, Info)));
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I don't think you gain anything by doing this test which involves two queries in the map (lookup + insert/update). What about just doing: FunctionMap.lookup(FuncName).push_back(std::move(Info)); mehdi_amini: I don't think you gain anything by doing this test which involves two queries in the map…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Good idea, will fix. tejohnson: Good idea, will fix.
				}

				// Add a new module path, mapped to the given module Id, and return StringRef
				// owned by string table map.
				StringRef addModulePath(StringRef ModPath, uint64_t ModId) {
				return ModulePathStringTable.insert(
				std::make_pair(ModPath, ModId)).first->first();
				}

				// Iterator to allow writer to walk through table during emission.
				iterator_range<StringMap<uint64_t>::const_iterator>
				modPathStringEntries() const {
				return llvm::make_range(ModulePathStringTable.begin(),
				ModulePathStringTable.end());
				}

				// Used to create the combined function index/summary from multiple
				// per-module instances.
				void mergeFrom(ThinLTOFunctionSummaryIndex *Other);

				static std::string getGlobalThinLTONameForLocal(StringRef Name,
				uint64_t ModId) {
				SmallString<256> NewName(Name);
				NewName += ".llvm.";
				NewName += ModId;
				return NewName.str();
				}
				};

				} // End llvm namespace

				#endif

lib/IR/CMakeLists.txt

Show All 33 Lines	add_llvm_library(LLVMCore
Metadata.cpp		Metadata.cpp
MetadataTracking.cpp		MetadataTracking.cpp
Module.cpp		Module.cpp
Operator.cpp		Operator.cpp
Pass.cpp		Pass.cpp
PassManager.cpp		PassManager.cpp
PassRegistry.cpp		PassRegistry.cpp
Statepoint.cpp		Statepoint.cpp
		ThinLTOInfo.cpp
Type.cpp		Type.cpp
TypeFinder.cpp		TypeFinder.cpp
Use.cpp		Use.cpp
User.cpp		User.cpp
Value.cpp		Value.cpp
ValueSymbolTable.cpp		ValueSymbolTable.cpp
ValueTypes.cpp		ValueTypes.cpp
Verifier.cpp		Verifier.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/IR		${LLVM_MAIN_INCLUDE_DIR}/llvm/IR
)		)

add_dependencies(LLVMCore intrinsics_gen)		add_dependencies(LLVMCore intrinsics_gen)

lib/IR/ThinLTOInfo.cpp

This file was added.

				//===-- ThinLTOInfo.cpp - ThinLTO Index and Summary -----------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements ThinLTO function index and summary classes for the
				// IR library.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/IR/ThinLTOInfo.h"
				#include "llvm/ADT/StringMap.h"
				using namespace llvm;

				uint64_t ThinLTOFunctionSummaryIndex::NextModuleId = 0;

				// Used to create the combined function index/summary from multiple
				// per-module instances.
				void ThinLTOFunctionSummaryIndex::mergeFrom(
				ThinLTOFunctionSummaryIndex *Other) {
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I'm just starting to understand `ThinLTOFunctionSummaryIndex` and some "weirdness" associated. Do you build a `ThinLTOFunctionSummaryIndex` for each module and only then you combine them together using this `mergeFrom()` method? Looks like a missing abstraction level, some concepts are mixed together, which would explain why you are using a static `NextModuleId` here, which sounds fishy to me. Also this assumes that the `Other` will have an empty ModuleMap, i.e. all the functions come from the same module. (unless I misunderstood what is going on here) mehdi_amini: I'm just starting to understand `ThinLTOFunctionSummaryIndex` and some "weirdness" associated.
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions I'll try to document this better with comments. Let me explain here and see if this makes more sense. This function is only used during the phase-2 linker step, when all we are doing is taking the per-module function indexes (pulled out of the bitcode IR .o files created by -fthinlto -c), and merging them into a global combined index that will be emitted as a separate file (for use during importing in the phase-3 parallel backends). The per-module indexes don't have a module string table in bitcode, as the index is from a single module. So here we need to build the ModulePathStringTable in "this" index which is the combined index being built. See the gold-plugin.cpp call to mergeFrom() in the latest D11723 patch for how this is used. When we build the module path string table for the combined index we need to assign a unique ID to each module, which is what NextModuleId is for. It doesn't technically need to be a static member, but I felt that better reflected the fact that this is only used for one index - the combined index. Let me know if that makes sense or whether some changes to the data structure would better reflect this. tejohnson: I'll try to document this better with comments. Let me explain here and see if this makes more…

				++NextModuleId;

				const_funcinfo_iterator OtherFuncInfoLists = Other->funcinfo_begin();
				StringRef ModPath;
				for (; OtherFuncInfoLists != Other->funcinfo_end(); OtherFuncInfoLists++) {
				StringRef FuncName = OtherFuncInfoLists->getKey();
				ThinLTOFunctionInfoList List = OtherFuncInfoLists->second;

				// Assert that the func info list only has one entry.
				assert(List.size() == 1);
				// Note the module path string ref is copied and still owned by the
				// original per-module ThinLTOFunctionInfo. This should be ok, we
				// just need to keep those around until the combined index is written.
				ThinLTOFunctionInfo Info = List.front();

				// Get the module path string ref for this module if we haven't already
				// saved a reference to it.
				if (ModPath.empty()) {
				ModPath = Info.functionSummary()->modulePath();
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions It seems to me that this supposed that `ThinLTOFunctionSummary` has been parsed. If the `ModulePath` field was stored in the `ThinLTOFunctionInfo`, this wouldn't be an issue. But since I don't fully understand the reason for the separation of these two classes (I'm probably missing a use-case). mehdi_amini: It seems to me that this supposed that `ThinLTOFunctionSummary ` has been parsed. If the…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Yes, when combining the per-module indexes the function summary needs to have been fully parsed. Although since the per-module index doesn't actually have the module path string table, in the per-module summaries that field is populated once we parse a per-module index (from the module's identifier), not from the bitcode itself. The reason for keeping the function summary separate from the function info struct is when reading the combined index during importing we want to support lazy reading of the summaries (since we won't need most of them). tejohnson: Yes, when combining the per-module indexes the function summary needs to have been fully parsed.
				addModulePath(ModPath, NextModuleId);
				davidxlUnsubmitted Not Done Reply Inline Actions The ModPath should be read once from 'Other' outside the loop. davidxl: The ModPath should be read once from 'Other' outside the loop.
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Relates to the earlier discussion on where to save the module path. tejohnson: Relates to the earlier discussion on where to save the module path.
				}
				else
				assert(ModPath == Info.functionSummary()->modulePath());

				// If it is a local function, rename it.
				if (Info.isLocalFunction()) {
				// Any local functions are virtually renamed when being added to the
				// combined index map, to disambiguate from other functions with
				// the same name. The symbol table created for the compiled index
				// file should contain the renamed symbols.
				FuncName = ThinLTOFunctionSummaryIndex::getGlobalThinLTONameForLocal(
				FuncName, NextModuleId);
				davidxlUnsubmitted Not Done Reply Inline Actions Is it using the same naming scheme when local function is static promoted? If yes, it might be better to have a common helper function such as "globalizeThinLTOLocalName", or "getGlobalThinLTONameForLocal" .. davidxl: Is it using the same naming scheme when local function is static promoted? If yes, it might be…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Agreed. Will add a helper function. tejohnson: Agreed. Will add a helper function.
				}

				// Add func info to existing list.
				// There may be duplicates when combining ThinLTOFunctionMap entries,
				// due to COMDAT functions. Any local functions are virtually renamed
				addFunctionInfo(FuncName, Info);
				}
				}