This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
1
LangRef.rst
-
include/llvm/IR/
-
llvm/
-
IR/
2/3
DebugInfoMetadata.h
-
Function.h
-
lib/
-
IR/
-
Metadata.cpp
-
Transforms/
-
Utils/
1/10
AddDiscriminators.cpp
-
LoopUnroll.cpp
-
Vectorize/
8
LoopVectorize.cpp
-
test/Transforms/
-
Transforms/
-
AddDiscriminators/
-
basic.ll
-
call-nested.ll
-
call.ll
-
diamond.ll
-
first-only.ll
-
inlined.ll
-
multiple.ll
-
oneline.ll
-
LoopVectorize/
-
discriminator.ll

Differential D26420

Encode duplication factor from loop vectorization and loop unrolling to discriminator.
ClosedPublic

Authored by danielcdh on Nov 8 2016, 1:52 PM.

Download Raw Diff

Details

Reviewers

davidxl
echristo
probinson
aprantl
hfinkel

Commits

rGfb02f7140a7c: Encode duplication factor from loop vectorization and loop unrolling to…
rL294782: Encode duplication factor from loop vectorization and loop unrolling to…

Summary

This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html

When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations.

The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default.

Diff Detail

Build Status

Buildable 3884
Build 3884: arc lint + arc unit

Event Timeline

danielcdh updated this revision to Diff 77252.Nov 8 2016, 1:52 PM

danielcdh retitled this revision from to Encode duplication factor from loop vectorization and loop unrolling to discriminator..

danielcdh updated this object.

danielcdh added reviewers: hfinkel, probinson, aprantl, davidxl.

danielcdh added a subscriber: llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptNov 8 2016, 1:52 PM

ping

aprantl added inline comments.Nov 15 2016, 3:13 PM

include/llvm/IR/DebugInfoMetadata.h
1772–1773	This magic formula needs to be documented somewhere. Is there anything preventing / detecting potential collisions here?

hfinkel added inline comments.Nov 15 2016, 3:37 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
876	I apologize for rehashing this, but now I'm confused again. We have two situations: Loop is unrolled (or code is otherwise duplicated). In this case, the discriminator value must be different (so that the relevant counts are summed) Instruction is vectorized. In this case we need a discriminator value with a duplication factor (so that the counts are multiplied by the duplication factor because each vector instruction represents DF scalar instructions). And both situations can obviously be combined. It seems like, in general, we're trying to take a shortcut here: instead of giving each instruction from an unrolled loop a different discriminator (so that all of the relevant counts will be summed), we're giving them all the same discriminator with a duplication factor. This will work in some cases, but not if some of the loop iterations (or instructions therein) are executed conditionally. I don't see why this is worthwhile. Can you please explain?

danielcdh added inline comments.Nov 15 2016, 4:23 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
876	You are right. For unrolled loop, the most accurate way would be assign a distinct discriminator for each cloned body, and use "sum" to compute the source count. We are taking a short-cut here to assign all copies with one discriminator (with duplication factor), and use max*dup_factor to compute the source count. This works fine when all cloned copy has uniformed behavior and thus similar execution count, otherwise will overcount the source. So the cons of using duplication factor for unrolled loop is less accuracy. The pros is that it saves on line table size. Maybe a better solution would be: if there is control flow inside the unrolled loop, then use distinct discriminator for each clone, otherwise use duplication factor. In this patch, I only added the logic for the duplication factor. I will have another patch to add the logic to "add distinct discriminator for clones" in another patch. How about I add a FIXME here and address loop-unroll-with-control-flow in that patch?

ping...

hfinkel added inline comments.Nov 21 2016, 2:40 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
876	So the cons of using duplication factor for unrolled loop is less accuracy. The pros is that it saves on line table size. Can you explain again why this saves on line-table size? In this patch, I only added the logic for the duplication factor. I will have another patch to add the logic to "add distinct discriminator for clones" in another patch. How about I add a FIXME here and address loop-unroll-with-control-flow in that patch? That's fine by me.

Thanks for the review.

Any update comments about the proposed encoding before it's ready to land?

Thanks,
Dehao

lib/Transforms/Vectorize/LoopVectorize.cpp
876	For the following case: #1 for (i = 0; i < 4; i++) #2 a[i] = b[i]; If we use duplication factor, the line table for the expanded code will be: a[0] = b[0]; line 2, discriminator 0x400 a[1] = b[1]; line 2, discriminator 0x400 a[2] = b[2]; line 2, discriminator 0x400 a[3] = b[3]; line 2, discriminator 0x400 The all share the same location so there is no new line table entry for the clone code. But with ditinct discriminator, the expanded code will be: a[0] = b[0]; line 2, discriminator 0x10000 a[1] = b[1]; line 2, discriminator 0x20000 a[2] = b[2]; line 2, discriminator 0x30000 a[3] = b[3]; line 2, discriminator 0x40000 There are 2 issues: each line has distinct location, so we need 4 entries in the debug line table. the encoding for distinct discriminator is longer than duplication factor.

In D26420#601874, @danielcdh wrote:

Thanks for the review.

Any update comments about the proposed encoding before it's ready to land?

Could we be smarter about the encoding so that the distinct-copies case did not take up so much space? It seems like the design right now gives a fixed number of low bits to the duplication factor. Could we use a variable number of bits? For example, because of the underlying LEB128 encoding, we could use some of the lowest-order bits as indicators, or better, use a variable-length code:

low: < duplication factor > < copy id > : high

where each factor is a variable-length code (e.g. a prefix code: https://en.wikipedia.org/wiki/Prefix_code). For example, the Fibonacci code (https://en.wikipedia.org/wiki/Fibonacci_coding) is a well-known code which is good when the encoded numbers are likely to be small. In short, we might encode:

bitreverse( Fibonacci(copy-id + 1) | Fibonacci(dup-factor + 1) )

In this case, if either field is empty, then we waste only 2 bits encoding that field. Small numbers take only a small number of bits. What do you think?

Thanks,
Dehao

In D26420#601965, @hfinkel wrote:

In D26420#601874, @danielcdh wrote:

Thanks for the review.

Any update comments about the proposed encoding before it's ready to land?

Could we be smarter about the encoding so that the distinct-copies case did not take up so much space? It seems like the design right now gives a fixed number of low bits to the duplication factor. Could we use a variable number of bits? For example, because of the underlying LEB128 encoding, we could use some of the lowest-order bits as indicators, or better, use a variable-length code:

low: < duplication factor > < copy id > : high

where each factor is a variable-length code (e.g. a prefix code: https://en.wikipedia.org/wiki/Prefix_code). For example, the Fibonacci code (https://en.wikipedia.org/wiki/Fibonacci_coding) is a well-known code which is good when the encoded numbers are likely to be small. In short, we might encode:

bitreverse( Fibonacci(copy-id + 1) | Fibonacci(dup-factor + 1) )

In this case, if either field is empty, then we waste only 2 bits encoding that field. Small numbers take only a small number of bits. What do you think?

Thanks for the suggestion. I spent some time study the prefix based encoding, which is a great idea. But when it applies here, with fibonacci encoding, 3 bits can only represent numbers up to 3, 4 bits can represent up to 5 and 7 bits can represent up to 21. This does not seem enough as duplication factor usually will need 5+ bits, making it hard to fit multiple-pieces into 1-byte ULEB.

Another approach, as you mentioned, is to use the lower bits to encode which info it represents, so that when there is only 1 info available, we can more efficiently encode it into one byte. This is a great idea. I did some experiments to explore its potential: when assigning duplication factor, I always remove the original discriminator and put the DP in the lower 7 bits to make it always fit into 1 byte. The debug_line size increase comparing with the current patch is show below:

447.dealII 8.01% 6.16%
453.povray 6.50% 5.04%
482.sphinx3 7.54% 5.74%
470.lbm 0.00% 0.00%
444.namd 6.19% 4.89%
433.milc 23.12% 17.63%
450.soplex 2.66% 1.99%
445.gobmk 7.51% 5.02%
471.omnetpp 0.52% 0.36%
458.sjeng 10.31% 7.37%
473.astar 5.44% 4.26%
456.hmmer 9.74% 7.50%
401.bzip2 9.01% 6.33%
462.libquantum 10.79% 8.36%
403.gcc 2.74% 1.77%
464.h264ref 29.62% 21.14%
483.xalancbmk 1.42% 1.12%
429.mcf 9.55% 7.45%
400.perlbench 1.96% 1.21%
mean 7.81% 5.84%

The first column of data represents the current implementation. The 2nd column is the experiment (fitting all DP into one byte), which represent the upper-bound of any encoding.

From the data, looks like the improvement is marginal. So I'm wondering does it justify the added complexity comparing with fixed-width encoding?

Thanks,
Dehao

anemet added a subscriber: anemet.Dec 1 2016, 10:56 AM

In D26420#603775, @danielcdh wrote:

In D26420#601965, @hfinkel wrote:

In D26420#601874, @danielcdh wrote:

Thanks for the review.

Any update comments about the proposed encoding before it's ready to land?

Could we be smarter about the encoding so that the distinct-copies case did not take up so much space? It seems like the design right now gives a fixed number of low bits to the duplication factor. Could we use a variable number of bits? For example, because of the underlying LEB128 encoding, we could use some of the lowest-order bits as indicators, or better, use a variable-length code:

low: < duplication factor > < copy id > : high

where each factor is a variable-length code (e.g. a prefix code: https://en.wikipedia.org/wiki/Prefix_code). For example, the Fibonacci code (https://en.wikipedia.org/wiki/Fibonacci_coding) is a well-known code which is good when the encoded numbers are likely to be small. In short, we might encode:

bitreverse( Fibonacci(copy-id + 1) | Fibonacci(dup-factor + 1) )

In this case, if either field is empty, then we waste only 2 bits encoding that field. Small numbers take only a small number of bits. What do you think?

Thanks for the suggestion. I spent some time study the prefix based encoding, which is a great idea. But when it applies here, with fibonacci encoding, 3 bits can only represent numbers up to 3, 4 bits can represent up to 5 and 7 bits can represent up to 21. This does not seem enough as duplication factor usually will need 5+ bits, making it hard to fit multiple-pieces into 1-byte ULEB.

Another approach, as you mentioned, is to use the lower bits to encode which info it represents, so that when there is only 1 info available, we can more efficiently encode it into one byte. This is a great idea. I did some experiments to explore its potential: when assigning duplication factor, I always remove the original discriminator and put the DP in the lower 7 bits to make it always fit into 1 byte. The debug_line size increase comparing with the current patch is show below:

447.dealII 8.01% 6.16%
453.povray 6.50% 5.04%
482.sphinx3 7.54% 5.74%
470.lbm 0.00% 0.00%
444.namd 6.19% 4.89%
433.milc 23.12% 17.63%
450.soplex 2.66% 1.99%
445.gobmk 7.51% 5.02%
471.omnetpp 0.52% 0.36%
458.sjeng 10.31% 7.37%
473.astar 5.44% 4.26%
456.hmmer 9.74% 7.50%
401.bzip2 9.01% 6.33%
462.libquantum 10.79% 8.36%
403.gcc 2.74% 1.77%
464.h264ref 29.62% 21.14%
483.xalancbmk 1.42% 1.12%
429.mcf 9.55% 7.45%
400.perlbench 1.96% 1.21%
mean 7.81% 5.84%

The first column of data represents the current implementation. The 2nd column is the experiment (fitting all DP into one byte), which represent the upper-bound of any encoding.

From the data, looks like the improvement is marginal. So I'm wondering does it justify the added complexity comparing with fixed-width encoding?

Thanks for doing those experiments! Regarding the second form, many of those changes are on the order of a few percent -- that seems worthwhile. Can you post the patch?

Thanks,
Dehao

Updated the encoding algorithm to use the lowest bit to represent whether BaseDiscriminator/DuplicationFactor is available.

With this change, the debug info size impact is demonstrated in the last column of below:

447.dealII 8.01% 6.16% 6.92%
453.povray 6.50% 5.04% 5.38%
482.sphinx3 7.54% 5.74% 6.22%
470.lbm 0.00% 0.00% 0.00%
444.namd 6.19% 4.89% 5.14%
433.milc 23.12% 17.63% 19.96%
450.soplex 2.66% 1.99% 2.25%
445.gobmk 7.51% 5.02% 6.22%
471.omnetpp 0.52% 0.36% 0.45%
458.sjeng 10.31% 7.37% 8.38%
473.astar 5.44% 4.26% 4.46%
456.hmmer 9.74% 7.50% 8.36%
401.bzip2 9.01% 6.33% 7.72%
462.libquantum 10.79% 8.36% 8.75%
403.gcc 2.74% 1.77% 2.28%
464.h264ref 29.62% 21.14% 27.37%
483.xalancbmk 1.42% 1.12% 1.16%
429.mcf 9.55% 7.45% 7.69%
400.perlbench 1.96% 1.21% 1.73%
mean 7.81% 5.84% 6.68%

In D26420#611383, @danielcdh wrote:

Updated the encoding algorithm to use the lowest bit to represent whether BaseDiscriminator/DuplicationFactor is available.

With this change, the debug info size impact is demonstrated in the last column of below:

What are the other columns?

In D26420#611614, @hfinkel wrote:

In D26420#611383, @danielcdh wrote:

Updated the encoding algorithm to use the lowest bit to represent whether BaseDiscriminator/DuplicationFactor is available.

With this change, the debug info size impact is demonstrated in the last column of below:

What are the other columns?

They are copied from the previous experiment:

The first column of data represents the current implementation. The 2nd column is the experiment (fitting all DP into one byte), which represent the upper-bound of any encoding.

hfinkel added inline comments.Dec 3 2016, 6:53 PM

include/llvm/IR/DebugInfoMetadata.h
1803	Please move the comments that explain the encoding to above `DILocation::getBaseDiscriminator` and write some text to explain what is going on.
lib/Transforms/Utils/AddDiscriminators.cpp
192–193	Please add a utility function for the encoding.
lib/Transforms/Vectorize/LoopVectorize.cpp
218	Based on one of the other threads, I suppose we're going to add some command-line flag to enable/disable this? That being the case, we'll read this setting from some function attribute string instead of using a cl::opt.

danielcdh added inline comments.Dec 5 2016, 9:48 AM

lib/Transforms/Utils/AddDiscriminators.cpp
192–193	Thanks for the comment! I agree with all the comments, but before I address the comments and move forward with this encoding, let's discuss more whether we want to use lower bits to indicate discriminator type. Comparing with fixed position encoding (i.e. DP always consume 2 ULEB128 bytes): Pros: saves ~1% debug line table size (on average) Cons: added complexity to clang code and also offline profile creation code limited representation scope: with the new algorithm we need to have 1 extra bit to represent the profile type, which means we need to limit the useful bit to be 6 in order to fit it to 1-byte ULEB128. As a result, the maximum raw discriminator will be 63 instead of 127. Comments?
lib/Transforms/Vectorize/LoopVectorize.cpp
218	Yes, we will guard all the logic by check profile-debug flag once https://reviews.llvm.org/D25435 is submitted. I'm not sure whether it will be a function attribute string, or simply a flag passed down from frontend. Meanwhile, I think it will be worth to have a flag to control whether we will encode duplication in discriminator so that we have a choice to fall back to the old discriminator assignment algorithm. We can remove the flag later if we find it useless.

hfinkel added inline comments.Dec 7 2016, 12:02 PM

lib/Transforms/Utils/AddDiscriminators.cpp
192–193	saves ~1% debug line table size (on average) Yes. but this 1% is 20% or more of the increase. More importantly for me, it reduces the relative expense (in terms of binary-size increase) of using distinct location tags vs. using duplication factors. added complexity to clang code and also offline profile creation code If this patch is any indication, then the extra complexity seems minor.
192–193	limited representation scope: with the new algorithm we need to have 1 extra bit to represent the profile type, which means we need to limit the useful bit to be 6 in order to fit it to 1-byte ULEB128. As a result, the maximum raw discriminator will be 63 instead of 127. I don't like the fixed bit limit here. How about using the high bit to indicate that there are more bits in the discriminator (that's how ULEB128 itself works, right?)?
lib/Transforms/Vectorize/LoopVectorize.cpp
218	Yes, we will guard all the logic by check profile-debug flag once https://reviews.llvm.org/D25435 is submitted. I'm not sure whether it will be a function attribute string, or simply a flag passed down from frontend. I think it needs to be an attribute. Otherwise, it won't work correctly with LTO.

update the encoding, add comments, update test.

lib/Transforms/Vectorize/LoopVectorize.cpp
218	I did not realize the LTO issue. But let's address that problem in https://reviews.llvm.org/D25435

ping...

Thanks,
Dehao

ping...

Thanks,
Dehao

ping...

Thanks,
Dehao

ping...

I apologize for my delay in getting back to this. Thanks for updating this with a more-flexible encoding scheme. It looks like in the future, should we desire, we can extend the number of fields (and/or their size). This LGTM. We can fixup replacing/supplementing the cl::opt with a function attribute in follow-up.

include/llvm/IR/DebugInfoMetadata.h
1321	We should say that, for example, the AddDiscriminators pass will do this.
lib/Transforms/Utils/AddDiscriminators.cpp
100	I'm assuming that this is for testing, correct? We'll need to add a function attribute to really control this.

This revision is now accepted and ready to land.Jan 24 2017, 3:14 PM

danielcdh added inline comments.Jan 24 2017, 4:08 PM

lib/Transforms/Utils/AddDiscriminators.cpp
100	This is for testing purpose, and will be "true" by default when -fdebug-info-profiling is set. I'm not sure why would we need this as a function attribute. I would suppose all binaries built with this on or off. Or am I missing something?

hfinkel added inline comments.Jan 24 2017, 5:22 PM

lib/Transforms/Utils/AddDiscriminators.cpp
100	Because we shouldn't use command-line options to communicate between the frontend and the backend. Some legacy aside, this is the strongly-preferred mechanism. It is also necessary for this to work correctly with LTO.

danielcdh added inline comments.Jan 24 2017, 5:37 PM

lib/Transforms/Utils/AddDiscriminators.cpp
100	How about using TargetOption.DebugInfoForProfiling to decide, as we check it in lib/CodeGen/AsmPrinter/DwarfDebug.cpp?

hfinkel added a reviewer: echristo.Jan 24 2017, 5:40 PM

hfinkel added inline comments.

lib/Transforms/Utils/AddDiscriminators.cpp
100	I believe that's deprecated too (because it also does not work with LTO). @echristo , can you comment on this (i.e. do we need to use a function-attribute here)?

danielcdh marked an inline comment as done.Feb 1 2017, 4:53 PM

danielcdh added inline comments.

lib/Transforms/Utils/AddDiscriminators.cpp
100	Removed the flag and checks CompileUnit flag (debugInfoForProfiling) to decided if we want to encode or not.

rebase and remove the flag and use CompileUnit flag to decide if we want to encode duplication factor.

PTAL

Herald added a subscriber: mehdi_amini. · View Herald TranscriptFeb 1 2017, 4:53 PM

Harbormaster completed remote builds in B3525: Diff 86750.Feb 1 2017, 4:53 PM

ping...

In D26420#664205, @danielcdh wrote:

rebase and remove the flag and use CompileUnit flag to decide if we want to encode duplication factor.

PTAL

Makes sense to me, please go ahead.

Could you please add a note about debugInfoForProfiling to the LangRef section on DICompileUnit?

update LangRef

Harbormaster completed remote builds in B3884: Diff 88024.Feb 10 2017, 11:17 AM

hfinkel added inline comments.Feb 10 2017, 11:58 AM

docs/LangRef.rst
4004	debugInfoForProfiling is not a 'tuple containing debug info to be emitted'. Instead of adding it to that list, how about adding another sentence like: The ``debugInfoForProfiling:`` field is a boolean indicating whether or not line-table discriminators are updated to provide more-accurate profiling results.

update

Harbormaster completed remote builds in B3894: Diff 88044.Feb 10 2017, 12:58 PM

danielcdh closed this revision.Feb 10 2017, 1:20 PM

In D26420#673911, @danielcdh wrote:

update

LGTM

Revision Contents

Path

Size

docs/

LangRef.rst

8 lines

include/

llvm/

IR/

DebugInfoMetadata.h

111 lines

Function.h

3 lines

lib/

IR/

Metadata.cpp

9 lines

Transforms/

Utils/

AddDiscriminators.cpp

8 lines

LoopUnroll.cpp

7 lines

Vectorize/

LoopVectorize.cpp

18 lines

test/

Transforms/

AddDiscriminators/

2 lines

2 lines

6 lines

4 lines

2 lines

4 lines

4 lines

14 lines

LoopVectorize/

discriminator.ll

70 lines

Diff 88024

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,994 Lines • ▼ Show 20 Lines
	metadata nodes are related to debug info.			metadata nodes are related to debug info.

	.. _DICompileUnit:			.. _DICompileUnit:

	DICompileUnit			DICompileUnit
	"""""""""""""			"""""""""""""

	``DICompileUnit`` nodes represent a compile unit. The ``enums:``,			``DICompileUnit`` nodes represent a compile unit. The ``enums:``,
	``retainedTypes:``, ``subprograms:``, ``globals:``, ``imports:`` and ``macros:``			``retainedTypes:``, ``subprograms:``, ``globals:``, ``imports:``, ``macros:``
	fields are tuples containing the debug info to be emitted along with the compile			and ``debugInfoForProfiling:`` fields are tuples containing the debug info to be
				hfinkelUnsubmitted Not Done Reply Inline Actions debugInfoForProfiling is not a 'tuple containing debug info to be emitted'. Instead of adding it to that list, how about adding another sentence like: The ``debugInfoForProfiling:`` field is a boolean indicating whether or not line-table discriminators are updated to provide more-accurate profiling results. hfinkel: debugInfoForProfiling is not a 'tuple containing debug info to be emitted'. Instead of adding…
	unit, regardless of code optimizations (some nodes are only emitted if there are			emitted along with the compile unit, regardless of code optimizations (some
	references to them from instructions).			nodes are only emitted if there are references to them from instructions).

	.. code-block:: text			.. code-block:: text

	!0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang",			!0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang",
	isOptimized: true, flags: "-O2", runtimeVersion: 2,			isOptimized: true, flags: "-O2", runtimeVersion: 2,
	splitDebugFilename: "abc.debug", emissionKind: FullDebug,			splitDebugFilename: "abc.debug", emissionKind: FullDebug,
	enums: !2, retainedTypes: !3, subprograms: !4,			enums: !2, retainedTypes: !3, subprograms: !4,
	globals: !5, imports: !6, macros: !7, dwoId: 0x0abcd)			globals: !5, imports: !6, macros: !7, dwoId: 0x0abcd)
	▲ Show 20 Lines • Show All 9,054 Lines • Show Last 20 Lines

include/llvm/IR/DebugInfoMetadata.h

Show First 20 Lines • Show All 1,309 Lines • ▼ Show 20 Lines	return getLine() != RHS.getLine() \|\|
getFilename() != RHS.getFilename() \|\|		getFilename() != RHS.getFilename() \|\|
getDirectory() != RHS.getDirectory();		getDirectory() != RHS.getDirectory();
}		}

/// Get the DWARF discriminator.		/// Get the DWARF discriminator.
///		///
/// DWARF discriminators distinguish identical file locations between		/// DWARF discriminators distinguish identical file locations between
/// instructions that are on different basic blocks.		/// instructions that are on different basic blocks.
		///
		/// There are 3 components stored in discriminator, from lower bits:
		///
		/// Base discriminator: assigned by AddDiscriminators pass to identify IRs
		hfinkelUnsubmitted Done Reply Inline Actions We should say that, for example, the AddDiscriminators pass will do this. hfinkel: We should say that, for example, the AddDiscriminators pass will do this.
		/// that are defined by the same source line, but
		/// different basic blocks.
		/// Duplication factor: assigned by optimizations that will scale down
		/// the execution frequency of the original IR.
		/// Copy Identifier: assigned by optimizations that clones the IR.
		/// Each copy of the IR will be assigned an identifier.
		///
		/// Encoding:
		///
		/// The above 3 components are encoded into a 32bit unsigned integer in
		/// order. If the lowest bit is 1, the current component is empty, and the
		/// next component will start in the next bit. Otherwise, the the current
		/// component is non-empty, and its content starts in the next bit. The
		/// length of each components is either 5 bit or 12 bit: if the 7th bit
		/// is 0, the bit 2~6 (5 bits) are used to represent the component; if the
		/// 7th bit is 1, the bit 2~6 (5 bits) and 8~14 (7 bits) are combined to
		/// represent the component.

inline unsigned getDiscriminator() const;		inline unsigned getDiscriminator() const;

/// Returns a new DILocation with updated \p Discriminator.		/// Returns a new DILocation with updated \p Discriminator.
inline DILocation *cloneWithDiscriminator(unsigned Discriminator) const;		inline const DILocation *cloneWithDiscriminator(unsigned Discriminator) const;

		/// Returns a new DILocation with updated base discriminator \p BD.
		inline const DILocation *setBaseDiscriminator(unsigned BD) const;

		/// Returns the duplication factor stored in the discriminator.
		inline unsigned getDuplicationFactor() const;

		/// Returns the copy identifier stored in the discriminator.
		inline unsigned getCopyIdentifier() const;

		/// Returns the base discriminator stored in the discriminator.
		inline unsigned getBaseDiscriminator() const;

		/// Returns a new DILocation with duplication factor \p DF encoded in the
		/// discriminator.
		inline const DILocation *cloneWithDuplicationFactor(unsigned DF) const;

/// When two instructions are combined into a single instruction we also		/// When two instructions are combined into a single instruction we also
/// need to combine the original locations into a single location.		/// need to combine the original locations into a single location.
///		///
/// When the locations are the same we can use either location. When they		/// When the locations are the same we can use either location. When they
/// differ, we need a third location which is distinct from either. If		/// differ, we need a third location which is distinct from either. If
/// they have the same file/line but have a different discriminator we		/// they have the same file/line but have a different discriminator we
/// could create a location with a new discriminator. If they are from		/// could create a location with a new discriminator. If they are from
Show All 16 Lines	Metadata *getRawInlinedAt() const {
if (getNumOperands() == 2)		if (getNumOperands() == 2)
return getOperand(1);		return getOperand(1);
return nullptr;		return nullptr;
}		}

static bool classof(const Metadata *MD) {		static bool classof(const Metadata *MD) {
return MD->getMetadataID() == DILocationKind;		return MD->getMetadataID() == DILocationKind;
}		}

		/// With a give unsigned int \p U, use up to 13 bits to represent it.
		/// old_bit 1~5 --> new_bit 1~5
		/// old_bit 6~12 --> new_bit 7~13
		/// new_bit_6 is 0 if higher bits (7~13) are all 0
		static unsigned getPrefixEncodingFromUnsigned(unsigned U) {
		U &= 0xfff;
		return U > 0x1f ? (((U & 0xfe0) << 1) \| (U & 0x1f) \| 0x20) : U;
		}

		/// Reverse transformation as getPrefixEncodingFromUnsigned.
		static unsigned getUnsignedFromPrefixEncoding(unsigned U) {
		return (U & 0x20) ? (((U >> 1) & 0xfe0) \| (U & 0x1f)) : (U & 0x1f);
		}

		/// Returns the next component stored in discriminator.
		static unsigned getNextComponentInDiscriminator(unsigned D) {
		if ((D & 1) == 0)
		return D >> ((D & 0x40) ? 14 : 7);
		else
		return D >> 1;
		}
};		};

/// Subprogram description.		/// Subprogram description.
///		///
/// TODO: Remove DisplayName. It's always equal to Name.		/// TODO: Remove DisplayName. It's always equal to Name.
/// TODO: Split up flags.		/// TODO: Split up flags.
class DISubprogram : public DILocalScope {		class DISubprogram : public DILocalScope {
friend class LLVMContextImpl;		friend class LLVMContextImpl;
▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines
};		};

unsigned DILocation::getDiscriminator() const {		unsigned DILocation::getDiscriminator() const {
if (auto *F = dyn_cast<DILexicalBlockFile>(getScope()))		if (auto *F = dyn_cast<DILexicalBlockFile>(getScope()))
return F->getDiscriminator();		return F->getDiscriminator();
return 0;		return 0;
}		}

DILocation *DILocation::cloneWithDiscriminator(unsigned Discriminator) const {		const DILocation *
		DILocation::cloneWithDiscriminator(unsigned Discriminator) const {
DIScope *Scope = getScope();		DIScope *Scope = getScope();
// Skip all parent DILexicalBlockFile that already have a discriminator		// Skip all parent DILexicalBlockFile that already have a discriminator
// assigned. We do not want to have nested DILexicalBlockFiles that have		// assigned. We do not want to have nested DILexicalBlockFiles that have
// mutliple discriminators because only the leaf DILexicalBlockFile's		// mutliple discriminators because only the leaf DILexicalBlockFile's
// dominator will be used.		// dominator will be used.
for (auto *LBF = dyn_cast<DILexicalBlockFile>(Scope);		for (auto *LBF = dyn_cast<DILexicalBlockFile>(Scope);
LBF && LBF->getDiscriminator() != 0;		LBF && LBF->getDiscriminator() != 0;
LBF = dyn_cast<DILexicalBlockFile>(Scope))		LBF = dyn_cast<DILexicalBlockFile>(Scope))
Scope = LBF->getScope();		Scope = LBF->getScope();
DILexicalBlockFile *NewScope =		DILexicalBlockFile *NewScope =
DILexicalBlockFile::get(getContext(), Scope, getFile(), Discriminator);		DILexicalBlockFile::get(getContext(), Scope, getFile(), Discriminator);
return DILocation::get(getContext(), getLine(), getColumn(), NewScope,		return DILocation::get(getContext(), getLine(), getColumn(), NewScope,
getInlinedAt());		getInlinedAt());
}		}

		unsigned DILocation::getBaseDiscriminator() const {
		unsigned D = getDiscriminator();
		if ((D & 1) == 0)
		return getUnsignedFromPrefixEncoding(D >> 1);
		else
		return 0;
		}

		unsigned DILocation::getDuplicationFactor() const {
		unsigned D = getDiscriminator();
		aprantlUnsubmitted Not Done Reply Inline Actions This magic formula needs to be documented somewhere. Is there anything preventing / detecting potential collisions here? aprantl: # This magic formula needs to be documented somewhere. # Is there anything preventing /…
		D = getNextComponentInDiscriminator(D);
		if (D == 0 \|\| (D & 1))
		return 1;
		else
		return getUnsignedFromPrefixEncoding(D >> 1);
		}

		unsigned DILocation::getCopyIdentifier() const {
		return getUnsignedFromPrefixEncoding(getNextComponentInDiscriminator(
		getNextComponentInDiscriminator(getDiscriminator())));
		}

		const DILocation *DILocation::setBaseDiscriminator(unsigned D) const {
		if (D == 0)
		return this;
		else
		return cloneWithDiscriminator(getPrefixEncodingFromUnsigned(D) << 1);
		}

		const DILocation *DILocation::cloneWithDuplicationFactor(unsigned DF) const {
		DF *= getDuplicationFactor();
		if (DF <= 1)
		return this;

		unsigned BD = getBaseDiscriminator();
		unsigned CI = getCopyIdentifier() << (DF > 0x1f ? 14 : 7);
		unsigned D = CI \| (getPrefixEncodingFromUnsigned(DF) << 1);

		if (BD == 0)
		D = (D << 1) \| 1;
		hfinkelUnsubmitted Done Reply Inline Actions Please move the comments that explain the encoding to above `DILocation::getBaseDiscriminator` and write some text to explain what is going on. hfinkel: Please move the comments that explain the encoding to above `DILocation::getBaseDiscriminator`…
		else
		D = (D << (BD > 0x1f ? 14 : 7)) \| (getPrefixEncodingFromUnsigned(BD) << 1);

		return cloneWithDiscriminator(D);
		}

class DINamespace : public DIScope {		class DINamespace : public DIScope {
friend class LLVMContextImpl;		friend class LLVMContextImpl;
friend class MDNode;		friend class MDNode;

unsigned Line;		unsigned Line;
unsigned ExportSymbols : 1;		unsigned ExportSymbols : 1;

DINamespace(LLVMContext &Context, StorageType Storage, unsigned Line,		DINamespace(LLVMContext &Context, StorageType Storage, unsigned Line,
▲ Show 20 Lines • Show All 922 Lines • Show Last 20 Lines

include/llvm/IR/Function.h

Show First 20 Lines • Show All 665 Lines • ▼ Show 20 Lines	/// @}
void setSubprogram(DISubprogram *SP);		void setSubprogram(DISubprogram *SP);

/// \brief Get the attached subprogram.		/// \brief Get the attached subprogram.
///		///
/// Calls \a getMetadata() with \a LLVMContext::MD_dbg and casts the result		/// Calls \a getMetadata() with \a LLVMContext::MD_dbg and casts the result
/// to \a DISubprogram.		/// to \a DISubprogram.
DISubprogram *getSubprogram() const;		DISubprogram *getSubprogram() const;

		/// Returns true if we should emit debug info for profiling.
		bool isDebugInfoForProfiling() const;

private:		private:
void allocHungoffUselist();		void allocHungoffUselist();
template<int Idx> void setHungoffOperand(Constant *C);		template<int Idx> void setHungoffOperand(Constant *C);

/// Shadow Value::setValueSubclassData with a private forwarding method so		/// Shadow Value::setValueSubclassData with a private forwarding method so
/// that subclasses cannot accidentally use it.		/// that subclasses cannot accidentally use it.
void setValueSubclassData(unsigned short D) {		void setValueSubclassData(unsigned short D) {
Value::setValueSubclassData(D);		Value::setValueSubclassData(D);
Show All 12 Lines

lib/IR/Metadata.cpp

	Show First 20 Lines • Show All 1,453 Lines • ▼ Show 20 Lines
	void Function::setSubprogram(DISubprogram *SP) {			void Function::setSubprogram(DISubprogram *SP) {
	setMetadata(LLVMContext::MD_dbg, SP);			setMetadata(LLVMContext::MD_dbg, SP);
	}			}

	DISubprogram *Function::getSubprogram() const {			DISubprogram *Function::getSubprogram() const {
	return cast_or_null<DISubprogram>(getMetadata(LLVMContext::MD_dbg));			return cast_or_null<DISubprogram>(getMetadata(LLVMContext::MD_dbg));
	}			}

				bool Function::isDebugInfoForProfiling() const {
				if (DISubprogram *SP = getSubprogram()) {
				if (DICompileUnit *CU = SP->getUnit()) {
				return CU->getDebugInfoForProfiling();
				}
				}
				return false;
				}

	void GlobalVariable::addDebugInfo(DIGlobalVariableExpression *GV) {			void GlobalVariable::addDebugInfo(DIGlobalVariableExpression *GV) {
	addMetadata(LLVMContext::MD_dbg, *GV);			addMetadata(LLVMContext::MD_dbg, *GV);
	}			}

	void GlobalVariable::getDebugInfo(			void GlobalVariable::getDebugInfo(
	SmallVectorImpl<DIGlobalVariableExpression *> &GVs) const {			SmallVectorImpl<DIGlobalVariableExpression *> &GVs) const {
	SmallVector<MDNode *, 1> MDs;			SmallVector<MDNode *, 1> MDs;
	getMetadata(LLVMContext::MD_dbg, MDs);			getMetadata(LLVMContext::MD_dbg, MDs);
	for (MDNode *MD : MDs)			for (MDNode *MD : MDs)
	GVs.push_back(cast<DIGlobalVariableExpression>(MD));			GVs.push_back(cast<DIGlobalVariableExpression>(MD));
	}			}

lib/Transforms/Utils/AddDiscriminators.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines

// Command line option to disable discriminator generation even in the		// Command line option to disable discriminator generation even in the
// presence of debug information. This is only needed when debugging		// presence of debug information. This is only needed when debugging
// debug info generation issues.		// debug info generation issues.
static cl::opt<bool> NoDiscriminators(		static cl::opt<bool> NoDiscriminators(
"no-discriminators", cl::init(false),		"no-discriminators", cl::init(false),
cl::desc("Disable generation of discriminator information."));		cl::desc("Disable generation of discriminator information."));

// Create the legacy AddDiscriminatorsPass.		// Create the legacy AddDiscriminatorsPass.
		hfinkelUnsubmitted Not Done Reply Inline Actions I'm assuming that this is for testing, correct? We'll need to add a function attribute to really control this. hfinkel: I'm assuming that this is for testing, correct? We'll need to add a function attribute to…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions This is for testing purpose, and will be "true" by default when -fdebug-info-profiling is set. I'm not sure why would we need this as a function attribute. I would suppose all binaries built with this on or off. Or am I missing something? danielcdh: This is for testing purpose, and will be "true" by default when -fdebug-info-profiling is set.
		hfinkelUnsubmitted Not Done Reply Inline Actions Because we shouldn't use command-line options to communicate between the frontend and the backend. Some legacy aside, this is the strongly-preferred mechanism. It is also necessary for this to work correctly with LTO. hfinkel: Because we shouldn't use command-line options to communicate between the frontend and the…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions How about using TargetOption.DebugInfoForProfiling to decide, as we check it in lib/CodeGen/AsmPrinter/DwarfDebug.cpp? danielcdh: How about using TargetOption.DebugInfoForProfiling to decide, as we check it in…
		hfinkelUnsubmitted Not Done Reply Inline Actions I believe that's deprecated too (because it also does not work with LTO). @echristo , can you comment on this (i.e. do we need to use a function-attribute here)? hfinkel: I believe that's deprecated too (because it also does not work with LTO). @echristo , can you…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Removed the flag and checks CompileUnit flag (debugInfoForProfiling) to decided if we want to encode or not. danielcdh: Removed the flag and checks CompileUnit flag (debugInfoForProfiling) to decided if we want to…
FunctionPass *llvm::createAddDiscriminatorsPass() {		FunctionPass *llvm::createAddDiscriminatorsPass() {
return new AddDiscriminatorsLegacyPass();		return new AddDiscriminatorsLegacyPass();
}		}

/// \brief Assign DWARF discriminators.		/// \brief Assign DWARF discriminators.
///		///
/// To assign discriminators, we examine the boundaries of every		/// To assign discriminators, we examine the boundaries of every
/// basic block and its successors. Suppose there is a basic block B1		/// basic block and its successors. Suppose there is a basic block B1
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for (auto &I : B.getInstList()) {
Location L = std::make_pair(DIL->getFilename(), DIL->getLine());		Location L = std::make_pair(DIL->getFilename(), DIL->getLine());
auto &BBMap = LBM[L];		auto &BBMap = LBM[L];
auto R = BBMap.insert(&B);		auto R = BBMap.insert(&B);
if (BBMap.size() == 1)		if (BBMap.size() == 1)
continue;		continue;
// If we could insert more than one block with the same line+file, a		// If we could insert more than one block with the same line+file, a
// discriminator is needed to distinguish both instructions.		// discriminator is needed to distinguish both instructions.
// Only the lowest 7 bits are used to represent a discriminator to fit		// Only the lowest 7 bits are used to represent a discriminator to fit
// it in 1 byte ULEB128 representation.		// it in 1 byte ULEB128 representation.
unsigned Discriminator = (R.second ? ++LDM[L] : LDM[L]) & 0x7f;		unsigned Discriminator = R.second ? ++LDM[L] : LDM[L];
		hfinkelUnsubmitted Done Reply Inline Actions Please add a utility function for the encoding. hfinkel: Please add a utility function for the encoding.
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the comment! I agree with all the comments, but before I address the comments and move forward with this encoding, let's discuss more whether we want to use lower bits to indicate discriminator type. Comparing with fixed position encoding (i.e. DP always consume 2 ULEB128 bytes): Pros: saves ~1% debug line table size (on average) Cons: added complexity to clang code and also offline profile creation code limited representation scope: with the new algorithm we need to have 1 extra bit to represent the profile type, which means we need to limit the useful bit to be 6 in order to fit it to 1-byte ULEB128. As a result, the maximum raw discriminator will be 63 instead of 127. Comments? danielcdh: Thanks for the comment! I agree with all the comments, but before I address the comments and…
		hfinkelUnsubmitted Not Done Reply Inline Actions saves ~1% debug line table size (on average) Yes. but this 1% is 20% or more of the increase. More importantly for me, it reduces the relative expense (in terms of binary-size increase) of using distinct location tags vs. using duplication factors. added complexity to clang code and also offline profile creation code If this patch is any indication, then the extra complexity seems minor. hfinkel: > saves ~1% debug line table size (on average) Yes. but this 1% is 20% or more of the increase.
		hfinkelUnsubmitted Not Done Reply Inline Actions limited representation scope: with the new algorithm we need to have 1 extra bit to represent the profile type, which means we need to limit the useful bit to be 6 in order to fit it to 1-byte ULEB128. As a result, the maximum raw discriminator will be 63 instead of 127. I don't like the fixed bit limit here. How about using the high bit to indicate that there are more bits in the discriminator (that's how ULEB128 itself works, right?)? hfinkel: > limited representation scope: with the new algorithm we need to have 1 extra bit to represent…
I.setDebugLoc(DIL->cloneWithDiscriminator(Discriminator));		I.setDebugLoc(DIL->setBaseDiscriminator(Discriminator));
DEBUG(dbgs() << DIL->getFilename() << ":" << DIL->getLine() << ":"		DEBUG(dbgs() << DIL->getFilename() << ":" << DIL->getLine() << ":"
<< DIL->getColumn() << ":" << Discriminator << " " << I		<< DIL->getColumn() << ":" << Discriminator << " " << I
<< "\n");		<< "\n");
Changed = true;		Changed = true;
}		}
}		}

// Traverse all instructions and assign new discriminators to call		// Traverse all instructions and assign new discriminators to call
// instructions with the same lineno that are in the same basic block.		// instructions with the same lineno that are in the same basic block.
// Sample base profile needs to distinguish different function calls within		// Sample base profile needs to distinguish different function calls within
// a same source line for correct profile annotation.		// a same source line for correct profile annotation.
for (BasicBlock &B : F) {		for (BasicBlock &B : F) {
LocationSet CallLocations;		LocationSet CallLocations;
for (auto &I : B.getInstList()) {		for (auto &I : B.getInstList()) {
CallInst *Current = dyn_cast<CallInst>(&I);		CallInst *Current = dyn_cast<CallInst>(&I);
if (!Current \|\| isa<IntrinsicInst>(&I))		if (!Current \|\| isa<IntrinsicInst>(&I))
continue;		continue;

DILocation *CurrentDIL = Current->getDebugLoc();		DILocation *CurrentDIL = Current->getDebugLoc();
if (!CurrentDIL)		if (!CurrentDIL)
continue;		continue;
Location L =		Location L =
std::make_pair(CurrentDIL->getFilename(), CurrentDIL->getLine());		std::make_pair(CurrentDIL->getFilename(), CurrentDIL->getLine());
if (!CallLocations.insert(L).second) {		if (!CallLocations.insert(L).second) {
Current->setDebugLoc(		unsigned Discriminator = ++LDM[L];
CurrentDIL->cloneWithDiscriminator((++LDM[L]) & 0x7f));		Current->setDebugLoc(CurrentDIL->setBaseDiscriminator(Discriminator));
Changed = true;		Changed = true;
}		}
}		}
}		}
return Changed;		return Changed;
}		}

bool AddDiscriminatorsLegacyPass::runOnFunction(Function &F) {		bool AddDiscriminatorsLegacyPass::runOnFunction(Function &F) {
Show All 10 Lines

lib/Transforms/Utils/LoopUnroll.cpp

Show All 21 Lines
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopIterator.h"		#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/OptimizationDiagnosticInfo.h"		#include "llvm/Analysis/OptimizationDiagnosticInfo.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
▲ Show 20 Lines • Show All 419 Lines • ▼ Show 20 Lines	bool llvm::UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool Force,
// Loop Unrolling might create new loops. While we do preserve LoopInfo, we		// Loop Unrolling might create new loops. While we do preserve LoopInfo, we
// might break loop-simplified form for these loops (as they, e.g., would		// might break loop-simplified form for these loops (as they, e.g., would
// share the same exit blocks). We'll keep track of loops for which we can		// share the same exit blocks). We'll keep track of loops for which we can
// break this so that later we can re-simplify them.		// break this so that later we can re-simplify them.
SmallSetVector<Loop *, 4> LoopsToSimplify;		SmallSetVector<Loop *, 4> LoopsToSimplify;
for (Loop SubLoop : L)		for (Loop SubLoop : L)
LoopsToSimplify.insert(SubLoop);		LoopsToSimplify.insert(SubLoop);

		if (Header->getParent()->isDebugInfoForProfiling())
		for (BasicBlock *BB : L->getBlocks())
		for (Instruction &I : *BB)
		if (const DILocation *DIL = I.getDebugLoc())
		I.setDebugLoc(DIL->cloneWithDuplicationFactor(Count));

for (unsigned It = 1; It != Count; ++It) {		for (unsigned It = 1; It != Count; ++It) {
std::vector<BasicBlock*> NewBlocks;		std::vector<BasicBlock*> NewBlocks;
SmallDenseMap<const Loop , Loop , 4> NewLoops;		SmallDenseMap<const Loop , Loop , 4> NewLoops;
NewLoops[L] = L;		NewLoops[L] = L;

for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) {		for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) {
ValueToValueMapTy VMap;		ValueToValueMapTy VMap;
BasicBlock New = CloneBasicBlock(BB, VMap, "." + Twine(It));		BasicBlock New = CloneBasicBlock(BB, VMap, "." + Twine(It));
▲ Show 20 Lines • Show All 335 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> VectorizeSCEVCheckThreshold(
"vectorize-scev-check-threshold", cl::init(16), cl::Hidden,		"vectorize-scev-check-threshold", cl::init(16), cl::Hidden,
cl::desc("The maximum number of SCEV checks allowed."));		cl::desc("The maximum number of SCEV checks allowed."));

static cl::opt<unsigned> PragmaVectorizeSCEVCheckThreshold(		static cl::opt<unsigned> PragmaVectorizeSCEVCheckThreshold(
"pragma-vectorize-scev-check-threshold", cl::init(128), cl::Hidden,		"pragma-vectorize-scev-check-threshold", cl::init(128), cl::Hidden,
cl::desc("The maximum number of SCEV checks allowed with a "		cl::desc("The maximum number of SCEV checks allowed with a "
"vectorize(enable) pragma"));		"vectorize(enable) pragma"));

/// Create an analysis remark that explains why vectorization failed		/// Create an analysis remark that explains why vectorization failed
		hfinkelUnsubmitted Not Done Reply Inline Actions Based on one of the other threads, I suppose we're going to add some command-line flag to enable/disable this? That being the case, we'll read this setting from some function attribute string instead of using a cl::opt. hfinkel: Based on one of the other threads, I suppose we're going to add some command-line flag to…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Yes, we will guard all the logic by check profile-debug flag once https://reviews.llvm.org/D25435 is submitted. I'm not sure whether it will be a function attribute string, or simply a flag passed down from frontend. Meanwhile, I think it will be worth to have a flag to control whether we will encode duplication in discriminator so that we have a choice to fall back to the old discriminator assignment algorithm. We can remove the flag later if we find it useless. danielcdh: Yes, we will guard all the logic by check profile-debug flag once https://reviews.llvm.
		hfinkelUnsubmitted Not Done Reply Inline Actions Yes, we will guard all the logic by check profile-debug flag once https://reviews.llvm.org/D25435 is submitted. I'm not sure whether it will be a function attribute string, or simply a flag passed down from frontend. I think it needs to be an attribute. Otherwise, it won't work correctly with LTO. hfinkel: > Yes, we will guard all the logic by check profile-debug flag once https://reviews.llvm.
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions I did not realize the LTO issue. But let's address that problem in https://reviews.llvm.org/D25435 danielcdh: I did not realize the LTO issue. But let's address that problem in https://reviews.llvm.
///		///
/// \p PassName is the name of the pass (e.g. can be AlwaysPrint). \p		/// \p PassName is the name of the pass (e.g. can be AlwaysPrint). \p
/// RemarkName is the identifier for the remark. If \p I is passed it is an		/// RemarkName is the identifier for the remark. If \p I is passed it is an
/// instruction that prevents vectorization. Otherwise \p TheLoop is used for		/// instruction that prevents vectorization. Otherwise \p TheLoop is used for
/// the location of the remark. \return the remark object that can be		/// the location of the remark. \return the remark object that can be
/// streamed to.		/// streamed to.
static OptimizationRemarkAnalysis		static OptimizationRemarkAnalysis
createMissedAnalysis(const char PassName, StringRef RemarkName, Loop TheLoop,		createMissedAnalysis(const char PassName, StringRef RemarkName, Loop TheLoop,
▲ Show 20 Lines • Show All 384 Lines • ▼ Show 20 Lines	protected:
/// addNewMetadata). Use this for newly created instructions in the vector		/// addNewMetadata). Use this for newly created instructions in the vector
/// loop.		/// loop.
void addMetadata(Instruction To, Instruction From);		void addMetadata(Instruction To, Instruction From);

/// \brief Similar to the previous function but it adds the metadata to a		/// \brief Similar to the previous function but it adds the metadata to a
/// vector of instructions.		/// vector of instructions.
void addMetadata(ArrayRef<Value > To, Instruction From);		void addMetadata(ArrayRef<Value > To, Instruction From);

		/// \brief Set the debug location in the builder using the debug location in
		/// the instruction.
		void setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr);

/// This is a helper class for maintaining vectorization state. It's used for		/// This is a helper class for maintaining vectorization state. It's used for
/// mapping values from the original loop to their corresponding values in		/// mapping values from the original loop to their corresponding values in
/// the new loop. Two mappings are maintained: one for vectorized values and		/// the new loop. Two mappings are maintained: one for vectorized values and
/// one for scalarized values. Vectorized values are represented with UF		/// one for scalarized values. Vectorized values are represented with UF
/// vector values in the new loop, and scalarized values are represented with		/// vector values in the new loop, and scalarized values are represented with
/// UF x VF scalar values in the new loop. UF and VF are the unroll and		/// UF x VF scalar values in the new loop. UF and VF are the unroll and
/// vectorization factors, respectively.		/// vectorization factors, respectively.
///		///
▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines	for (User::op_iterator OI = I->op_begin(), OE = I->op_end(); OI != OE; ++OI) {
if (Instruction OpInst = dyn_cast<Instruction>(OI))		if (Instruction OpInst = dyn_cast<Instruction>(OI))
if (OpInst->getDebugLoc() != Empty)		if (OpInst->getDebugLoc() != Empty)
return OpInst;		return OpInst;
}		}

return I;		return I;
}		}

/// \brief Set the debug location in the builder using the debug location in the		void InnerLoopVectorizer::setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr) {
/// instruction.		if (const Instruction *Inst = dyn_cast_or_null<Instruction>(Ptr)) {
static void setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr) {		const DILocation *DIL = Inst->getDebugLoc();
if (const Instruction *Inst = dyn_cast_or_null<Instruction>(Ptr))		if (DIL && Inst->getFunction()->isDebugInfoForProfiling())
B.SetCurrentDebugLocation(Inst->getDebugLoc());		B.SetCurrentDebugLocation(DIL->cloneWithDuplicationFactor(UF * VF));
		hfinkelUnsubmitted Not Done Reply Inline Actions I apologize for rehashing this, but now I'm confused again. We have two situations: Loop is unrolled (or code is otherwise duplicated). In this case, the discriminator value must be different (so that the relevant counts are summed) Instruction is vectorized. In this case we need a discriminator value with a duplication factor (so that the counts are multiplied by the duplication factor because each vector instruction represents DF scalar instructions). And both situations can obviously be combined. It seems like, in general, we're trying to take a shortcut here: instead of giving each instruction from an unrolled loop a different discriminator (so that all of the relevant counts will be summed), we're giving them all the same discriminator with a duplication factor. This will work in some cases, but not if some of the loop iterations (or instructions therein) are executed conditionally. I don't see why this is worthwhile. Can you please explain? hfinkel: I apologize for rehashing this, but now I'm confused again. We have two situations: 1. Loop…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions You are right. For unrolled loop, the most accurate way would be assign a distinct discriminator for each cloned body, and use "sum" to compute the source count. We are taking a short-cut here to assign all copies with one discriminator (with duplication factor), and use maxdup_factor to compute the source count. This works fine when all cloned copy has uniformed behavior and thus similar execution count, otherwise will overcount the source. So the cons of using duplication factor for unrolled loop is less accuracy. The pros is that it saves on line table size. Maybe a better solution would be: if there is control flow inside the unrolled loop, then use distinct discriminator for each clone, otherwise use duplication factor. In this patch, I only added the logic for the duplication factor. I will have another patch to add the logic to "add distinct discriminator for clones" in another patch. How about I add a FIXME here and address loop-unroll-with-control-flow in that patch? danielcdh:* You are right. For unrolled loop, the most accurate way would be assign a distinct…
		hfinkelUnsubmitted Not Done Reply Inline Actions So the cons of using duplication factor for unrolled loop is less accuracy. The pros is that it saves on line table size. Can you explain again why this saves on line-table size? In this patch, I only added the logic for the duplication factor. I will have another patch to add the logic to "add distinct discriminator for clones" in another patch. How about I add a FIXME here and address loop-unroll-with-control-flow in that patch? That's fine by me. hfinkel: > So the cons of using duplication factor for unrolled loop is less accuracy. The pros is that…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions For the following case: #1 for (i = 0; i < 4; i++) #2 a[i] = b[i]; If we use duplication factor, the line table for the expanded code will be: a[0] = b[0]; line 2, discriminator 0x400 a[1] = b[1]; line 2, discriminator 0x400 a[2] = b[2]; line 2, discriminator 0x400 a[3] = b[3]; line 2, discriminator 0x400 The all share the same location so there is no new line table entry for the clone code. But with ditinct discriminator, the expanded code will be: a[0] = b[0]; line 2, discriminator 0x10000 a[1] = b[1]; line 2, discriminator 0x20000 a[2] = b[2]; line 2, discriminator 0x30000 a[3] = b[3]; line 2, discriminator 0x40000 There are 2 issues: each line has distinct location, so we need 4 entries in the debug line table. the encoding for distinct discriminator is longer than duplication factor. danielcdh: For the following case: #1 for (i = 0; i < 4; i++) #2 a[i] = b[i]; If we use duplication…
else		else
		B.SetCurrentDebugLocation(DIL);
		} else
B.SetCurrentDebugLocation(DebugLoc());		B.SetCurrentDebugLocation(DebugLoc());
}		}

#ifndef NDEBUG		#ifndef NDEBUG
/// \return string containing a file name and a line # for the given loop.		/// \return string containing a file name and a line # for the given loop.
static std::string getDebugLocString(const Loop *L) {		static std::string getDebugLocString(const Loop *L) {
std::string Result;		std::string Result;
if (L) {		if (L) {
▲ Show 20 Lines • Show All 6,897 Lines • Show Last 20 Lines

test/Transforms/AddDiscriminators/basic.ll

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	!9 = !{!"clang version 3.5 "}			!9 = !{!"clang version 3.5 "}
	!10 = !DILocation(line: 3, scope: !11)			!10 = !DILocation(line: 3, scope: !11)
	!11 = distinct !DILexicalBlock(line: 3, column: 0, file: !1, scope: !4)			!11 = distinct !DILexicalBlock(line: 3, column: 0, file: !1, scope: !4)
	!12 = !DILocation(line: 4, scope: !4)			!12 = !DILocation(line: 4, scope: !4)

	; CHECK: ![[FOO:[0-9]+]] = distinct !DISubprogram(name: "foo"			; CHECK: ![[FOO:[0-9]+]] = distinct !DISubprogram(name: "foo"
	; CHECK: ![[BLOCK:[0-9]+]] = distinct !DILexicalBlock(scope: ![[FOO]],{{.*}} line: 3)			; CHECK: ![[BLOCK:[0-9]+]] = distinct !DILexicalBlock(scope: ![[FOO]],{{.*}} line: 3)
	; CHECK: ![[THEN]] = !DILocation(line: 3, scope: ![[BLOCKFILE:[0-9]+]])			; CHECK: ![[THEN]] = !DILocation(line: 3, scope: ![[BLOCKFILE:[0-9]+]])
	; CHECK: ![[BLOCKFILE]] = !DILexicalBlockFile(scope: ![[BLOCK]],{{.*}} discriminator: 1)			; CHECK: ![[BLOCKFILE]] = !DILexicalBlockFile(scope: ![[BLOCK]],{{.*}} discriminator: 2)
	; CHECK: ![[END]] = !DILocation(line: 4, scope: ![[FOO]])			; CHECK: ![[END]] = !DILocation(line: 4, scope: ![[FOO]])

test/Transforms/AddDiscriminators/call-nested.ll

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	!9 = !{i32 2, !"Debug Info Version", i32 3}			!9 = !{i32 2, !"Debug Info Version", i32 3}
	!10 = !{!"clang version 3.9.0 (trunk 266269)"}			!10 = !{!"clang version 3.9.0 (trunk 266269)"}
	!11 = !DILocation(line: 4, column: 14, scope: !4)			!11 = !DILocation(line: 4, column: 14, scope: !4)
	!12 = !DILocation(line: 5, column: 14, scope: !4)			!12 = !DILocation(line: 5, column: 14, scope: !4)
	!13 = !DILocation(line: 4, column: 10, scope: !4)			!13 = !DILocation(line: 4, column: 10, scope: !4)
	!14 = !DILocation(line: 4, column: 3, scope: !4)			!14 = !DILocation(line: 4, column: 3, scope: !4)

	; CHECK: ![[CALL2]] = !DILocation(line: 4, column: 10, scope: ![[CALL2BLOCK:[0-9]+]])			; CHECK: ![[CALL2]] = !DILocation(line: 4, column: 10, scope: ![[CALL2BLOCK:[0-9]+]])
	; CHECK: ![[CALL2BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 1)			; CHECK: ![[CALL2BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 2)

test/Transforms/AddDiscriminators/call.ll

	; RUN: opt < %s -add-discriminators -S \| FileCheck %s			; RUN: opt < %s -add-discriminators -S \| FileCheck %s
	; RUN: opt < %s -passes=add-discriminators -S \| FileCheck %s			; RUN: opt < %s -passes=add-discriminators -S \| FileCheck %s

	; Discriminator support for calls that are defined in one line:			; Discriminator support for calls that are defined in one line:
	; #1 void bar();			; #1 void bar();
	; #2			; #2
	; #3 void foo() {			; #3 void foo() {
	; #4 bar();bar()/discriminator 1/;bar()/discriminator 2/;			; #4 bar();bar()/discriminator 2/;bar()/discriminator 4/;
	; #5 }			; #5 }

	; Function Attrs: uwtable			; Function Attrs: uwtable
	define void @_Z3foov() #0 !dbg !4 {			define void @_Z3foov() #0 !dbg !4 {
	call void @_Z3barv(), !dbg !10			call void @_Z3barv(), !dbg !10
	; CHECK: call void @_Z3barv(), !dbg ![[CALL0:[0-9]+]]			; CHECK: call void @_Z3barv(), !dbg ![[CALL0:[0-9]+]]
	%a = alloca [100 x i8], align 16			%a = alloca [100 x i8], align 16
	%b = bitcast [100 x i8]* %a to i8*			%b = bitcast [100 x i8]* %a to i8*
	Show All 27 Lines
	!8 = !{i32 2, !"Debug Info Version", i32 3}			!8 = !{i32 2, !"Debug Info Version", i32 3}
	!9 = !{!"clang version 3.8.0 (trunk 250915) (llvm/trunk 251830)"}			!9 = !{!"clang version 3.8.0 (trunk 250915) (llvm/trunk 251830)"}
	!10 = !DILocation(line: 4, column: 3, scope: !4)			!10 = !DILocation(line: 4, column: 3, scope: !4)
	!11 = !DILocation(line: 4, column: 9, scope: !4)			!11 = !DILocation(line: 4, column: 9, scope: !4)
	!12 = !DILocation(line: 4, column: 15, scope: !4)			!12 = !DILocation(line: 4, column: 15, scope: !4)
	!13 = !DILocation(line: 5, column: 1, scope: !4)			!13 = !DILocation(line: 5, column: 1, scope: !4)

	; CHECK: ![[CALL1]] = !DILocation(line: 4, column: 9, scope: ![[CALL1BLOCK:[0-9]+]])			; CHECK: ![[CALL1]] = !DILocation(line: 4, column: 9, scope: ![[CALL1BLOCK:[0-9]+]])
	; CHECK: ![[CALL1BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 1)			; CHECK: ![[CALL1BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 2)
	; CHECK: ![[CALL2]] = !DILocation(line: 4, column: 15, scope: ![[CALL2BLOCK:[0-9]+]])			; CHECK: ![[CALL2]] = !DILocation(line: 4, column: 15, scope: ![[CALL2BLOCK:[0-9]+]])
	; CHECK: ![[CALL2BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 2)			; CHECK: ![[CALL2BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 4)

test/Transforms/AddDiscriminators/diamond.ll

	; RUN: opt < %s -add-discriminators -S \| FileCheck %s			; RUN: opt < %s -add-discriminators -S \| FileCheck %s
	; RUN: opt < %s -passes=add-discriminators -S \| FileCheck %s			; RUN: opt < %s -passes=add-discriminators -S \| FileCheck %s

	; Discriminator support for diamond-shaped CFG.:			; Discriminator support for diamond-shaped CFG.:
	; #1 void bar(int);			; #1 void bar(int);
	; #2			; #2
	; #3 void foo(int i) {			; #3 void foo(int i) {
	; #4 if (i > 10)			; #4 if (i > 10)
	; #5 bar(5); else bar(3);			; #5 bar(5); else bar(3);
	; #6 }			; #6 }

	; bar(5): discriminator 0			; bar(5): discriminator 0
	; bar(3): discriminator 1			; bar(3): discriminator 2

	; Function Attrs: uwtable			; Function Attrs: uwtable
	define void @_Z3fooi(i32 %i) #0 !dbg !4 {			define void @_Z3fooi(i32 %i) #0 !dbg !4 {
	%1 = alloca i32, align 4			%1 = alloca i32, align 4
	store i32 %i, i32* %1, align 4			store i32 %i, i32* %1, align 4
	call void @llvm.dbg.declare(metadata i32* %1, metadata !11, metadata !12), !dbg !13			call void @llvm.dbg.declare(metadata i32* %1, metadata !11, metadata !12), !dbg !13
	%2 = load i32, i32* %1, align 4, !dbg !14			%2 = load i32, i32* %1, align 4, !dbg !14
	%3 = icmp sgt i32 %2, 10, !dbg !16			%3 = icmp sgt i32 %2, 10, !dbg !16
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	!15 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 7)			!15 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 7)
	!16 = !DILocation(line: 4, column: 9, scope: !15)			!16 = !DILocation(line: 4, column: 9, scope: !15)
	!17 = !DILocation(line: 4, column: 7, scope: !4)			!17 = !DILocation(line: 4, column: 7, scope: !4)
	!18 = !DILocation(line: 5, column: 5, scope: !15)			!18 = !DILocation(line: 5, column: 5, scope: !15)
	!19 = !DILocation(line: 5, column: 18, scope: !15)			!19 = !DILocation(line: 5, column: 18, scope: !15)
	!20 = !DILocation(line: 6, column: 1, scope: !4)			!20 = !DILocation(line: 6, column: 1, scope: !4)

	; CHECK: ![[ELSE]] = !DILocation(line: 5, column: 18, scope: ![[ELSEBLOCK:[0-9]+]])			; CHECK: ![[ELSE]] = !DILocation(line: 5, column: 18, scope: ![[ELSEBLOCK:[0-9]+]])
	; CHECK: ![[ELSEBLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 1)			; CHECK: ![[ELSEBLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 2)

test/Transforms/AddDiscriminators/first-only.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines

	!11 = distinct !DILexicalBlock(line: 3, column: 0, file: !1, scope: !4)			!11 = distinct !DILexicalBlock(line: 3, column: 0, file: !1, scope: !4)
	; CHECK: ![[FOO:[0-9]+]] = distinct !DISubprogram(name: "foo"			; CHECK: ![[FOO:[0-9]+]] = distinct !DISubprogram(name: "foo"
	; CHECK: ![[BLOCK1:[0-9]+]] = distinct !DILexicalBlock(scope: ![[FOO]],{{.*}} line: 3)			; CHECK: ![[BLOCK1:[0-9]+]] = distinct !DILexicalBlock(scope: ![[FOO]],{{.*}} line: 3)

	!12 = !DILocation(line: 3, scope: !13)			!12 = !DILocation(line: 3, scope: !13)

	!13 = distinct !DILexicalBlock(line: 3, column: 0, file: !1, scope: !11)			!13 = distinct !DILexicalBlock(line: 3, column: 0, file: !1, scope: !11)
	; CHECK: !DILexicalBlockFile(scope: ![[BLOCK2:[0-9]+]],{{.*}} discriminator: 1)			; CHECK: !DILexicalBlockFile(scope: ![[BLOCK2:[0-9]+]],{{.*}} discriminator: 2)

	!14 = !DILocation(line: 4, scope: !13)			!14 = !DILocation(line: 4, scope: !13)
	; CHECK: ![[BLOCK2]] = distinct !DILexicalBlock(scope: ![[BLOCK1]],{{.*}} line: 3)			; CHECK: ![[BLOCK2]] = distinct !DILexicalBlock(scope: ![[BLOCK1]],{{.*}} line: 3)

	!15 = !DILocation(line: 5, scope: !13)			!15 = !DILocation(line: 5, scope: !13)
	; CHECK: ![[THEN]] = !DILocation(line: 4, scope: ![[BLOCK2]])			; CHECK: ![[THEN]] = !DILocation(line: 4, scope: ![[BLOCK2]])

	!16 = !DILocation(line: 6, scope: !4)			!16 = !DILocation(line: 6, scope: !4)
	; CHECK: ![[BR]] = !DILocation(line: 5, scope: ![[BLOCK2]])			; CHECK: ![[BR]] = !DILocation(line: 5, scope: ![[BLOCK2]])
	; CHECK: ![[END]] = !DILocation(line: 6, scope: ![[FOO]])			; CHECK: ![[END]] = !DILocation(line: 6, scope: ![[FOO]])

test/Transforms/AddDiscriminators/inlined.ll

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	!7 = distinct !DISubprogram(name: "f", scope: !1, file: !1, line: 1, type: !8, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, variables: !2)			!7 = distinct !DISubprogram(name: "f", scope: !1, file: !1, line: 1, type: !8, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, variables: !2)
	!8 = !DISubroutineType(types: !9)			!8 = !DISubroutineType(types: !9)
	!9 = !{!10}			!9 = !{!10}
	!10 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)			!10 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
	!11 = !DILocation(line: 1, column: 56, scope: !12, inlinedAt: !13)			!11 = !DILocation(line: 1, column: 56, scope: !12, inlinedAt: !13)
	!12 = distinct !DISubprogram(name: "g", scope: !1, file: !1, line: 1, type: !8, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, variables: !2)			!12 = distinct !DISubprogram(name: "g", scope: !1, file: !1, line: 1, type: !8, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, variables: !2)
	!13 = distinct !DILocation(line: 1, column: 17, scope: !14)			!13 = distinct !DILocation(line: 1, column: 17, scope: !14)
	; CHECK: ![[BF:.*]] = !DILexicalBlockFile(scope: ![[LB1:[0-9]+]],			; CHECK: ![[BF:.*]] = !DILexicalBlockFile(scope: ![[LB1:[0-9]+]],
	; CHECK-SAME: discriminator: 1)			; CHECK-SAME: discriminator: 2)
	!14 = !DILexicalBlockFile(scope: !15, file: !1, discriminator: 1)			!14 = !DILexicalBlockFile(scope: !15, file: !1, discriminator: 2)
	; CHECK: ![[LB1]] = distinct !DILexicalBlock(scope: ![[LB2:[0-9]+]],			; CHECK: ![[LB1]] = distinct !DILexicalBlock(scope: ![[LB2:[0-9]+]],
	; CHECK-SAME: line: 1, column: 16)			; CHECK-SAME: line: 1, column: 16)
	!15 = distinct !DILexicalBlock(scope: !16, file: !1, line: 1, column: 16)			!15 = distinct !DILexicalBlock(scope: !16, file: !1, line: 1, column: 16)
	; CHECK: ![[LB2]] = distinct !DILexicalBlock(scope: ![[LB3:[0-9]+]],			; CHECK: ![[LB2]] = distinct !DILexicalBlock(scope: ![[LB3:[0-9]+]],
	; CHECK-SAME: line: 1, column: 9)			; CHECK-SAME: line: 1, column: 9)
	!16 = distinct !DILexicalBlock(scope: !17, file: !1, line: 1, column: 9)			!16 = distinct !DILexicalBlock(scope: !17, file: !1, line: 1, column: 9)
	; CHECK: ![[LB3]] = distinct !DILexicalBlock(scope: ![[F]],			; CHECK: ![[LB3]] = distinct !DILexicalBlock(scope: ![[F]],
	; CHECK-SAME: line: 1, column: 9)			; CHECK-SAME: line: 1, column: 9)
	Show All 9 Lines

test/Transforms/AddDiscriminators/multiple.ll

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	!7 = !{i32 2, !"Dwarf Version", i32 4}			!7 = !{i32 2, !"Dwarf Version", i32 4}
	!8 = !{i32 1, !"Debug Info Version", i32 3}			!8 = !{i32 1, !"Debug Info Version", i32 3}
	!9 = !{!"clang version 3.5 (trunk 199750) (llvm/trunk 199751)"}			!9 = !{!"clang version 3.5 (trunk 199750) (llvm/trunk 199751)"}
	!10 = !DILocation(line: 3, scope: !11)			!10 = !DILocation(line: 3, scope: !11)
	!11 = distinct !DILexicalBlock(line: 3, column: 0, file: !1, scope: !4)			!11 = distinct !DILexicalBlock(line: 3, column: 0, file: !1, scope: !4)
	!12 = !DILocation(line: 4, scope: !4)			!12 = !DILocation(line: 4, scope: !4)

	; CHECK: ![[THEN]] = !DILocation(line: 3, scope: ![[THENBLOCK:[0-9]+]])			; CHECK: ![[THEN]] = !DILocation(line: 3, scope: ![[THENBLOCK:[0-9]+]])
	; CHECK: ![[THENBLOCK]] = !DILexicalBlockFile(scope: ![[SCOPE:[0-9]+]],{{.*}} discriminator: 1)			; CHECK: ![[THENBLOCK]] = !DILexicalBlockFile(scope: ![[SCOPE:[0-9]+]],{{.*}} discriminator: 2)
	; CHECK: ![[ELSE]] = !DILocation(line: 3, scope: ![[ELSEBLOCK:[0-9]+]])			; CHECK: ![[ELSE]] = !DILocation(line: 3, scope: ![[ELSEBLOCK:[0-9]+]])
	; CHECK: ![[ELSEBLOCK]] = !DILexicalBlockFile(scope: ![[SCOPE]],{{.*}} discriminator: 2)			; CHECK: ![[ELSEBLOCK]] = !DILexicalBlockFile(scope: ![[SCOPE]],{{.*}} discriminator: 4)

test/Transforms/AddDiscriminators/oneline.ll

	; RUN: opt < %s -add-discriminators -S \| FileCheck %s			; RUN: opt < %s -add-discriminators -S \| FileCheck %s
	; RUN: opt < %s -passes=add-discriminators -S \| FileCheck %s			; RUN: opt < %s -passes=add-discriminators -S \| FileCheck %s

	; Discriminator support for code that is written in one line:			; Discriminator support for code that is written in one line:
	; #1 int foo(int i) {			; #1 int foo(int i) {
	; #2 if (i == 3 \|\| i == 5) return 100; else return 99;			; #2 if (i == 3 \|\| i == 5) return 100; else return 99;
	; #3 }			; #3 }

	; i == 3: discriminator 0			; i == 3: discriminator 0
	; i == 5: discriminator 1			; i == 5: discriminator 2
	; return 100: discriminator 2			; return 100: discriminator 4
	; return 99: discriminator 3			; return 99: discriminator 6

	define i32 @_Z3fooi(i32 %i) #0 !dbg !4 {			define i32 @_Z3fooi(i32 %i) #0 !dbg !4 {
	%1 = alloca i32, align 4			%1 = alloca i32, align 4
	%2 = alloca i32, align 4			%2 = alloca i32, align 4
	store i32 %i, i32* %2, align 4, !tbaa !13			store i32 %i, i32* %2, align 4, !tbaa !13
	call void @llvm.dbg.declare(metadata i32* %2, metadata !9, metadata !17), !dbg !18			call void @llvm.dbg.declare(metadata i32* %2, metadata !9, metadata !17), !dbg !18
	%3 = load i32, i32* %2, align 4, !dbg !19, !tbaa !13			%3 = load i32, i32* %2, align 4, !dbg !19, !tbaa !13
	%4 = icmp eq i32 %3, 3, !dbg !21			%4 = icmp eq i32 %3, 3, !dbg !21
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	!25 = !DILocation(line: 2, column: 7, scope: !4)			!25 = !DILocation(line: 2, column: 7, scope: !4)
	!26 = !DILocation(line: 2, column: 25, scope: !20)			!26 = !DILocation(line: 2, column: 25, scope: !20)
	!27 = !DILocation(line: 2, column: 42, scope: !20)			!27 = !DILocation(line: 2, column: 42, scope: !20)
	!28 = !DILocation(line: 3, column: 1, scope: !4)			!28 = !DILocation(line: 3, column: 1, scope: !4)

	; CHECK: ![[F:.*]] = distinct !DISubprogram(name: "foo",			; CHECK: ![[F:.*]] = distinct !DISubprogram(name: "foo",
	; CHECK: ![[IF:.]] = distinct !DILexicalBlock(scope: ![[F]],{{.}}line: 2, column: 7)			; CHECK: ![[IF:.]] = distinct !DILexicalBlock(scope: ![[F]],{{.}}line: 2, column: 7)
	; CHECK: ![[THEN1]] = !DILocation(line: 2, column: 17, scope: ![[THENBLOCK:[0-9]+]])			; CHECK: ![[THEN1]] = !DILocation(line: 2, column: 17, scope: ![[THENBLOCK:[0-9]+]])
	; CHECK: ![[THENBLOCK]] = !DILexicalBlockFile(scope: ![[IF]],{{.*}} discriminator: 1)			; CHECK: ![[THENBLOCK]] = !DILexicalBlockFile(scope: ![[IF]],{{.*}} discriminator: 2)
	; CHECK: ![[THEN2]] = !DILocation(line: 2, column: 19, scope: ![[THENBLOCK]])			; CHECK: ![[THEN2]] = !DILocation(line: 2, column: 19, scope: ![[THENBLOCK]])
	; CHECK: ![[THEN3]] = !DILocation(line: 2, column: 7, scope: ![[BRBLOCK:[0-9]+]])			; CHECK: ![[THEN3]] = !DILocation(line: 2, column: 7, scope: ![[BRBLOCK:[0-9]+]])
	; CHECK: ![[BRBLOCK]] = !DILexicalBlockFile(scope: ![[F]],{{.*}} discriminator: 1)			; CHECK: ![[BRBLOCK]] = !DILexicalBlockFile(scope: ![[F]],{{.*}} discriminator: 2)
	; CHECK: ![[ELSE]] = !DILocation(line: 2, column: 25, scope: ![[ELSEBLOCK:[0-9]+]])			; CHECK: ![[ELSE]] = !DILocation(line: 2, column: 25, scope: ![[ELSEBLOCK:[0-9]+]])
	; CHECK: ![[ELSEBLOCK]] = !DILexicalBlockFile(scope: ![[IF]],{{.*}} discriminator: 2)			; CHECK: ![[ELSEBLOCK]] = !DILexicalBlockFile(scope: ![[IF]],{{.*}} discriminator: 4)
	; CHECK: ![[COMBINE]] = !DILocation(line: 2, column: 42, scope: ![[COMBINEBLOCK:[0-9]+]])			; CHECK: ![[COMBINE]] = !DILocation(line: 2, column: 42, scope: ![[COMBINEBLOCK:[0-9]+]])
	; CHECK: ![[COMBINEBLOCK]] = !DILexicalBlockFile(scope: ![[IF]],{{.*}} discriminator: 3)			; CHECK: ![[COMBINEBLOCK]] = !DILexicalBlockFile(scope: ![[IF]],{{.*}} discriminator: 6)

test/Transforms/LoopVectorize/discriminator.ll

This file was added.

				; RUN: opt -S -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 < %s \| FileCheck --check-prefix=LOOPVEC_4_1 %s
				; RUN: opt -S -loop-vectorize -force-vector-width=2 -force-vector-interleave=3 < %s \| FileCheck --check-prefix=LOOPVEC_2_3 %s
				; RUN: opt -S -loop-unroll -unroll-count=5 < %s \| FileCheck --check-prefix=LOOPUNROLL_5 %s
				; RUN: opt -S -loop-vectorize -force-vector-width=4 -force-vector-interleave=4 -loop-unroll -unroll-count=2 < %s \| FileCheck --check-prefix=LOOPVEC_UNROLL %s

				; Test if vectorization/unroll factor is recorded in discriminator.
				;
				; Original source code:
				; 1 int *a;
				; 2 int *b;
				; 3
				; 4 void foo() {
				; 5 for (int i = 0; i < 4096; i++)
				; 6 a[i] += b[i];
				; 7 }

				@a = local_unnamed_addr global i32* null, align 8
				@b = local_unnamed_addr global i32* null, align 8

				define void @_Z3foov() local_unnamed_addr #0 !dbg !6 {
				%1 = load i32, i32* @b, align 8, !dbg !8, !tbaa !9
				%2 = load i32, i32* @a, align 8, !dbg !13, !tbaa !9
				br label %3, !dbg !14

				; <label>:3: ; preds = %3, %0
				%indvars.iv = phi i64 [ 0, %0 ], [ %indvars.iv.next, %3 ]
				%4 = getelementptr inbounds i32, i32* %1, i64 %indvars.iv, !dbg !8
				%5 = load i32, i32* %4, align 4, !dbg !8, !tbaa !15
				%6 = getelementptr inbounds i32, i32* %2, i64 %indvars.iv, !dbg !13
				%7 = load i32, i32* %6, align 4, !dbg !17, !tbaa !15
				%8 = add nsw i32 %7, %5, !dbg !17
				store i32 %8, i32* %6, align 4, !dbg !17, !tbaa !15
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !18
				%exitcond = icmp eq i64 %indvars.iv.next, 4096, !dbg !19
				br i1 %exitcond, label %9, label %3, !dbg !14, !llvm.loop !20

				; <label>:9: ; preds = %3
				ret void, !dbg !21
				}

				;LOOPVEC_4_1: discriminator: 17
				;LOOPVEC_2_3: discriminator: 25
				;LOOPUNROLL_5: discriminator: 21
				; When unrolling after loop vectorize, both vec_body and remainder loop
				; are unrolled.
				;LOOPVEC_UNROLL: discriminator: 385
				;LOOPVEC_UNROLL: discriminator: 9

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!3, !4}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, debugInfoForProfiling: true)
				!1 = !DIFile(filename: "a.cc", directory: "/")
				!3 = !{i32 2, !"Dwarf Version", i32 4}
				!4 = !{i32 2, !"Debug Info Version", i32 3}
				!6 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 4, unit: !0)
				!8 = !DILocation(line: 6, column: 13, scope: !6)
				!9 = !{!10, !10, i64 0}
				!10 = !{!"any pointer", !11, i64 0}
				!11 = !{!"omnipotent char", !12, i64 0}
				!12 = !{!"Simple C++ TBAA"}
				!13 = !DILocation(line: 6, column: 5, scope: !6)
				!14 = !DILocation(line: 5, column: 3, scope: !6)
				!15 = !{!16, !16, i64 0}
				!16 = !{!"int", !11, i64 0}
				!17 = !DILocation(line: 6, column: 10, scope: !6)
				!18 = !DILocation(line: 5, column: 30, scope: !6)
				!19 = !DILocation(line: 5, column: 21, scope: !6)
				!20 = distinct !{!20, !14}
				!21 = !DILocation(line: 7, column: 1, scope: !6)

This is an archive of the discontinued LLVM Phabricator instance.

Encode duplication factor from loop vectorization and loop unrolling to discriminator.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 88024

docs/LangRef.rst

include/llvm/IR/DebugInfoMetadata.h

include/llvm/IR/Function.h

lib/IR/Metadata.cpp

lib/Transforms/Utils/AddDiscriminators.cpp

lib/Transforms/Utils/LoopUnroll.cpp

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Transforms/AddDiscriminators/basic.ll

test/Transforms/AddDiscriminators/call-nested.ll

test/Transforms/AddDiscriminators/call.ll

test/Transforms/AddDiscriminators/diamond.ll

test/Transforms/AddDiscriminators/first-only.ll

test/Transforms/AddDiscriminators/inlined.ll

test/Transforms/AddDiscriminators/multiple.ll

test/Transforms/AddDiscriminators/oneline.ll

test/Transforms/LoopVectorize/discriminator.ll

Encode duplication factor from loop vectorization and loop unrolling to discriminator.
ClosedPublic