This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
test/tools/llvm-exegesis/X86/
-
tools/
-
llvm-exegesis/
-
X86/
-
latency-SETCCr-cond-codes-sweep.s
-
tools/llvm-exegesis/
-
llvm-exegesis/
-
lib/
-
CodeTemplate.h
-
ParallelSnippetGenerator.h
-
ParallelSnippetGenerator.cpp
-
SerialSnippetGenerator.h
-
SerialSnippetGenerator.cpp
1
SnippetGenerator.h
1/1
SnippetGenerator.cpp
-
Target.h
-
X86/
3/5
Target.cpp
-
llvm-exegesis.cpp
-
unittests/tools/llvm-exegesis/
-
tools/
-
llvm-exegesis/
-
CMakeLists.txt
-
Mips/
-
SnippetGeneratorTest.cpp
6/6
SnippetGeneratorTest.cpp
-
X86/
-
SnippetGeneratorTest.cpp

Differential D74156

[llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE
ClosedPublic

Authored by lebedev.ri on Feb 6 2020, 12:25 PM.

Download Raw Diff

Details

Reviewers

courbet
gchatelet

Commits

rG6030fe01f4ee: [llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE

Summary

Currently, we only have nice exploration for LEA instruction,
while for the rest, we rely on randomizeUnsetVariables()
to sometimes generate something interesting.
While that works, it isn't very reliable in coverage :)

Here, i'm making an assumption that while we may want to explore
multi-instruction configs, we are most interested in the
characteristics of the main instruction we were asked about.

Which we can do, by taking the existing randomizeMCOperand(),
and turning it on it's head - instead of relying on it to randomly fill
one of the interesting values, let's pregenerate all the possible interesting
values for the variable, and then generate as much InstructionTemplate
combinations of these possible values for variables as needed/possible.

Of course, that requires invasive changes to no longer pass just the
naked Instruction, but sometimes partially filled InstructionTemplate.

As it can be seen from the test, this allows us to explore
X86::OperandType::OPERAND_COND_CODE for instructions
that take such an operand.
I'm hoping this will greatly simplify exploration.

Thoughts?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Feb 6 2020, 12:25 PM

Herald added subscribers: mstojanovic, atanasyan, jrtc27 and 2 others. · View Herald TranscriptFeb 6 2020, 12:25 PM

First, thank you for the patch, this is going in the right direction.

Now stepping back a bit there are many dimensions that we'd like to explore:

argument values (which is what you started here),
register selection (registers of a class are not strictly equivalent [1])
snippet generation (we always select the same pattern but exploring them would help [2])
repetition mode (to see the impact of the decoder)

Code wise this means it would be much better to have a global sampler object responsible for how to explore these dimensions rather than the greedy approach we're heading to.
Now I understand this is a substantial redesign and I'm not asking you to do it but I just wanted to share what I believe is the right direction to lower code complexity in the long run.

[1] LEA is known to produce different latencies when using EBP, RBP, or R13 as base registers
[2] for instance

XOR EAX, EAX, EAX

is self dependent but it's also a zero idiom, if we'd also executed the back to back pattern we would have learned something new

XOR EBX, EAX, EAX
XOR EAX, EBX, EBX

llvm/tools/llvm-exegesis/lib/SnippetGenerator.cpp
91	I don't think it's worth mentioning error here. Maybe replace the whole comment with We reached the number of allowed configs and return early.
llvm/tools/llvm-exegesis/lib/X86/Target.cpp
749	This function deserves some documentation
775	This comment should be at the function level (possibly at the function declaration)

In D74156#1864096, @gchatelet wrote:

First, thank you for the patch, this is going in the right direction.

Thank you for taking a look!

Now stepping back a bit there are many dimensions that we'd like to explore:

Yeah, i suspect as much :)

argument values (which is what you started here),

Ack.
I have started with this because this is the current itch for me;
while i'm aware of the others, this seemed most straight-forward.

register selection

Right. Currently unset registers are mostly picked randomly, within constraints.

(registers of a class are not strictly equivalent [1])

I think that currently can't be expressed in sched models - is that planned to change,
or we just want to know when we fail to model things?

snippet generation (we always select the same pattern but exploring them would help [2])

which is roughly what SerialSnippetGenerator::generateCodeTemplates()/appendCodeTemplates() does,
but somewhat more general, correct?
This is not very useful until analyze learns to deal with serial chained instructions (D60000, stuck)

repetition mode (to see the impact of the decoder)

This appears to be currently handled via -repetition-mode switch (D68125).
I'm unfamiliar with that.
Does this really have to be accounted for (a dimension in) in the greedy approach?

Code wise this means it would be much better to have a global sampler object
responsible for how to explore these dimensions rather than the greedy approach we're heading to.
Now I understand this is a substantial redesign and I'm not asking you to do it
but I just wanted to share what I believe is the right direction to lower code complexity in the long run.

For context, we kind-of already explore condcodes/registers, by producing them randomly;
so if we run a lot of benchmarks, we're bound to explore them,
but without good coverage/reproducibility/repeatability though.
Which is why i'm starting with this patch - even greedy is better than what currently is.

So it's not so much that I don't want to redesign as though
I'm not sure i currently fully grasp the idea behind "global sampler object".
How would that work?

[1] LEA is known to produce different latencies when using EBP, RBP, or R13 as base registers

Yeah, i saw that on bdver2 too in rG76fcf900d58826d9f21c0dd7f02b61b4d59c9193.

[2] for instance
XOR EAX, EAX, EAX
is self dependent but it's also a zero idiom, if we'd also executed the back to back pattern we would have learned something new
XOR EBX, EAX, EAX
XOR EAX, EBX, EBX

Right. For latency we see

llvm/tools/llvm-exegesis/lib/X86/Target.cpp
749	Yeah, and unit tests. this function ended up being too smart, although this is the best version i was able to come up with so far.

Thank you for taking a look!
Addressed review notes, other than the redesign into "global sampler object".

Herald added a subscriber: mgorny. · View Herald TranscriptFeb 7 2020, 1:42 PM

Fallback to baseline generateInstructionVariants() instead of using the recursive generator if we aren't performing any exploration after all.

Rewrite combination generator - it is more straight-forward to model it as a
weird number where each digit may have different base.
This seems more understandable to me, and involves no recursion.

CombinationGenerator::performGeneration(): simplify loops, NFC.

In D74156#1864292, @lebedev.ri wrote:

In D74156#1864096, @gchatelet wrote:

(registers of a class are not strictly equivalent [1])

I think that currently can't be expressed in sched models - is that planned to change,
or we just want to know when we fail to model things?

Not in the sched models but in the target: X86FixupLEAs.

snippet generation (we always select the same pattern but exploring them would help [2])

which is roughly what SerialSnippetGenerator::generateCodeTemplates()/appendCodeTemplates() does,
but somewhat more general, correct?

yes

This is not very useful until analyze learns to deal with serial chained instructions (D60000, stuck)

yes, @orodley is likely to work on this.

repetition mode (to see the impact of the decoder)

This appears to be currently handled via -repetition-mode switch (D68125).

yes

Does this really have to be accounted for (a dimension in) in the greedy approach?

@courbet and I believe that having two measures are better than one and can demonstrate the impact of decoding the instruction.

Code wise this means it would be much better to have a global sampler object
responsible for how to explore these dimensions rather than the greedy approach we're heading to.
Now I understand this is a substantial redesign and I'm not asking you to do it
but I just wanted to share what I believe is the right direction to lower code complexity in the long run.

For context, we kind-of already explore condcodes/registers, by producing them randomly;
so if we run a lot of benchmarks, we're bound to explore them,
but without good coverage/reproducibility/repeatability though.
Which is why i'm starting with this patch - even greedy is better than what currently is.

So it's not so much that I don't want to redesign as though
I'm not sure i currently fully grasp the idea behind "global sampler object".
How would that work?

So the GlobalSampler™ would be instantiated early (in the main). It would generate a single number (random or manually selected) which would be mapped to N dimensions (something like hilbert curve or z-curve). The value would be passed down to each function that needs to randomize a choice so they can deterministically select the values.
Does this make sense?

gchatelet added inline comments.Feb 11 2020, 6:49 AM

llvm/tools/llvm-exegesis/lib/SnippetGenerator.h
119	This is the idea but it should generate the values on the go because the space to explore might be too big to enumerate all the possibilities. Sorry if it's a bit hand-wavy for now, I haven't sorted out the details but we should be able to use space filling curves to that end (article).

Let me know how to proceed.

I know this doesn't scale, because i have designed it this way,
because i don't really expect it to be used for exhaustive sweeping,
but only exploring a few operands, because this is the real problem i have :]

Is the traditional iterative approach not acceptable for this tool,
and the intention is to only get the the One True Solution,
even if it potentially takes ages before it's there?

In D74156#1869576, @lebedev.ri wrote:

Let me know how to proceed.

I know this doesn't scale, because i have designed it this way,
because i don't really expect it to be used for exhaustive sweeping,
but only exploring a few operands, because this is the real problem i have :]

Is the traditional iterative approach not acceptable for this tool,
and the intention is to only get the the One True Solution,
even if it potentially takes ages before it's there?

I'm confused by your answer.

I am not against this patch, I just want to explain where I'd like the tool to go in the long run.
And since you responded to my comment with a code change and the creation of the CombinationGenerator I thought you were trying to implement what I suggested. Hence my answers.

If you don't intend to implement it, that's fine.

I've added a few comments.

llvm/tools/llvm-exegesis/lib/X86/Target.cpp
788	formatting is weird here, maybe extract the lambda out of the constructor.
llvm/unittests/tools/llvm-exegesis/SnippetGeneratorTest.cpp
38	https://github.com/google/googletest/blob/master/googlemock/docs/cheat_sheet.md#container-matchers You should be able to write ASSERT_THAT(Variants, ElementsAreArray(ExpectedVariants)); or ASSERT_THAT(Variants, ElementsAreArray({ {0, 2}, {0, 3}, {1, 2}, {1, 3}, }));
59	ditto here and below
187	ASSERT_THAT(Variants, IsEmpty());

In D74156#1869864, @gchatelet wrote:

In D74156#1869576, @lebedev.ri wrote:

Let me know how to proceed.

I know this doesn't scale, because i have designed it this way,
because i don't really expect it to be used for exhaustive sweeping,
but only exploring a few operands, because this is the real problem i have :]

Is the traditional iterative approach not acceptable for this tool,
and the intention is to only get the the One True Solution,
even if it potentially takes ages before it's there?

I'm confused by your answer.

That is because i'm confused by the feedback so far :)
That might be because of the differences in the dialogue structure from the usual in llvm.
(sometimes previously elsewhere this cautious non-committing-to-outcome feedback
meant polite 'no thank you' without actually stating as much)

I am not against this patch, I just want to explain where I'd like the tool to go in the long run.
And since you responded to my comment with a code change and the creation of the CombinationGenerator I thought you were trying to implement what I suggested. Hence my answers.

If you don't intend to implement it, that's fine.

I've added a few comments.

Addressed review notes.

gchatelet added inline comments.Feb 12 2020, 12:59 AM

llvm/unittests/tools/llvm-exegesis/SnippetGeneratorTest.cpp
179	It's not really empty then. maybe rename with `Singleton`

Addressed review notes - renamed one test to Singleton.

lebedev.ri added inline comments.Feb 12 2020, 1:09 AM

llvm/tools/llvm-exegesis/lib/X86/Target.cpp
788	Hm, rerun clang-format on this, and this time it produced different result. Is this better?
llvm/unittests/tools/llvm-exegesis/SnippetGeneratorTest.cpp
179	By empty i meant that the "exploration" space is empty - there is only a single possibility.
187	But it's not empty, it contains a single element `0`?

LGTM

This revision is now accepted and ready to land.Feb 12 2020, 1:27 AM

In D74156#1871588, @gchatelet wrote:

LGTM

Okay, thank you for the review.
It shouldn't be impossible to rip this out once more generic handling is in place.

Closed by commit rG6030fe01f4ee: [llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE (authored by lebedev.ri). · Explain WhyFeb 12 2020, 10:34 AM

This revision was automatically updated to reflect the committed changes.

In D74156#1869502, @gchatelet wrote:

In D74156#1864292, @lebedev.ri wrote:

In D74156#1864096, @gchatelet wrote:

repetition mode (to see the impact of the decoder)

This appears to be currently handled via -repetition-mode switch (D68125).

yes

Does this really have to be accounted for (a dimension in) in the greedy approach?

@courbet and I believe that having two measures are better than one and can demonstrate the impact of decoding the instruction.

Oh, now i see what you mean. At least for CMOV, i'm seeing wildly different results

	Latency	RThroughput
duplicate	1	0.8
loop	2	0.6

where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

So i would personally guess that --repetition-mode= shouldn't even be an another measurement,
but instead much like what is already done by running the whole snippet a few times
and denoising the result (not averaging!), the same should be done with this - run both, take minimum
https://github.com/llvm/llvm-project/blob/c55cf4afa9161bb4413b7ca9933d553327f5f069/llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp#L38-L39

What are you thoughts on this?

In D74156#1873657, @lebedev.ri wrote:

...

Or, should the default be loop for RThrougput measurements?
I didn't check how much difference is there for other instructions.

In D74156#1873657, @lebedev.ri wrote:

In D74156#1869502, @gchatelet wrote:

In D74156#1864292, @lebedev.ri wrote:

In D74156#1864096, @gchatelet wrote:

repetition mode (to see the impact of the decoder)

This appears to be currently handled via -repetition-mode switch (D68125).

yes

Does this really have to be accounted for (a dimension in) in the greedy approach?

@courbet and I believe that having two measures are better than one and can demonstrate the impact of decoding the instruction.

Oh, now i see what you mean. At least for CMOV, i'm seeing wildly different results

Latency RThroughput

duplicate 1 0.8

loop 2 0.6

where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

So i would personally guess that --repetition-mode= shouldn't even be an another measurement,
but instead much like what is already done by running the whole snippet a few times
and denoising the result (not averaging!), the same should be done with this - run both, take minimum
https://github.com/llvm/llvm-project/blob/c55cf4afa9161bb4413b7ca9933d553327f5f069/llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp#L38-L39

What are you thoughts on this?

Well that depends on how you see llvm-exegesis. It was originally designed to understand the behavior of the CPU.
To me --repetition-mode is interesting to explore sensitivity to instruction decoding.
If you mix everything and take the minimum then you lose information on the root cause and then you deduce that the minimum latency is X but it only happens if the instruction is already decoded and run in a tight loop.
I think it's important to keep this information separate. You can always take the minimum between the two during analysis though.

Thank you for replying!

In D74156#1893104, @gchatelet wrote:

In D74156#1873657, @lebedev.ri wrote:

In D74156#1869502, @gchatelet wrote:

In D74156#1864292, @lebedev.ri wrote:

In D74156#1864096, @gchatelet wrote:

repetition mode (to see the impact of the decoder)

This appears to be currently handled via -repetition-mode switch (D68125).

yes

Does this really have to be accounted for (a dimension in) in the greedy approach?

@courbet and I believe that having two measures are better than one and can demonstrate the impact of decoding the instruction.

Oh, now i see what you mean. At least for CMOV, i'm seeing wildly different results

Latency RThroughput

duplicate 1 0.8

loop 2 0.6

where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

So i would personally guess that --repetition-mode= shouldn't even be an another measurement,
but instead much like what is already done by running the whole snippet a few times
and denoising the result (not averaging!), the same should be done with this - run both, take minimum
https://github.com/llvm/llvm-project/blob/c55cf4afa9161bb4413b7ca9933d553327f5f069/llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp#L38-L39

What are you thoughts on this?

Well that depends on how you see llvm-exegesis. It was originally designed to understand the behavior of the CPU.
To me --repetition-mode is interesting to explore sensitivity to instruction decoding.
If you mix everything and take the minimum then you lose information on the root cause and then you deduce that the minimum latency is X but it only happens if the instruction is already decoded and run in a tight loop.
I think it's important to keep this information separate.

I do agree in principle.

You can always take the minimum between the two during analysis though.

I think this has the same-ish basic issue as merging results from different --mode=s - what if there is more than one result?
Currently, we form PerInstructionStats, which accumulates min/max/averages, which is nice for when results end up being noisy.
How would we aggregate (via taking min) results from different --repetition-mode=s, without sacrificing that?
Pick all --repetition-mode=duplicate and all --repetition-mode=loop and zipper-merge each pair?

In D74156#1893156, @lebedev.ri wrote:

Thank you for replying!

Sure, my apologies for the lag. I'm extra busy right now.

In D74156#1893104, @gchatelet wrote:

In D74156#1873657, @lebedev.ri wrote:

In D74156#1869502, @gchatelet wrote:

In D74156#1864292, @lebedev.ri wrote:

In D74156#1864096, @gchatelet wrote:

repetition mode (to see the impact of the decoder)

This appears to be currently handled via -repetition-mode switch (D68125).

yes

Does this really have to be accounted for (a dimension in) in the greedy approach?

@courbet and I believe that having two measures are better than one and can demonstrate the impact of decoding the instruction.

Oh, now i see what you mean. At least for CMOV, i'm seeing wildly different results

Latency RThroughput

duplicate 1 0.8

loop 2 0.6

where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

So i would personally guess that --repetition-mode= shouldn't even be an another measurement,
but instead much like what is already done by running the whole snippet a few times
and denoising the result (not averaging!), the same should be done with this - run both, take minimum
https://github.com/llvm/llvm-project/blob/c55cf4afa9161bb4413b7ca9933d553327f5f069/llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp#L38-L39

What are you thoughts on this?

Well that depends on how you see llvm-exegesis. It was originally designed to understand the behavior of the CPU.
To me --repetition-mode is interesting to explore sensitivity to instruction decoding.
If you mix everything and take the minimum then you lose information on the root cause and then you deduce that the minimum latency is X but it only happens if the instruction is already decoded and run in a tight loop.
I think it's important to keep this information separate.

I do agree in principle.

You can always take the minimum between the two during analysis though.

I think this has the same-ish basic issue as merging results from different --mode=s - what if there is more than one result?
Currently, we form PerInstructionStats, which accumulates min/max/averages, which is nice for when results end up being noisy.
How would we aggregate (via taking min) results from different --repetition-mode=s, without sacrificing that?
Pick all --repetition-mode=duplicate and all --repetition-mode=loop and zipper-merge each pair?

Right now you can simply run the benchmark twice (with loop or duplicate repetition mode), concatenate the two outputs and run them through the analysis.
We currently don't consider the flag as part of the instruction so it would be reflected into the PerInstructionStats, increasing the confidence interval if there's a discrepancy between the two.

In D74156#1893182, @gchatelet wrote:

In D74156#1893156, @lebedev.ri wrote:

Thank you for replying!

Sure, my apologies for the lag. I'm extra busy right now.

In D74156#1893104, @gchatelet wrote:

In D74156#1873657, @lebedev.ri wrote:

In D74156#1869502, @gchatelet wrote:

In D74156#1864292, @lebedev.ri wrote:

In D74156#1864096, @gchatelet wrote:

repetition mode (to see the impact of the decoder)

This appears to be currently handled via -repetition-mode switch (D68125).

yes

Does this really have to be accounted for (a dimension in) in the greedy approach?

@courbet and I believe that having two measures are better than one and can demonstrate the impact of decoding the instruction.

Oh, now i see what you mean. At least for CMOV, i'm seeing wildly different results

Latency RThroughput

duplicate 1 0.8

loop 2 0.6

where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

So i would personally guess that --repetition-mode= shouldn't even be an another measurement,
but instead much like what is already done by running the whole snippet a few times
and denoising the result (not averaging!), the same should be done with this - run both, take minimum
https://github.com/llvm/llvm-project/blob/c55cf4afa9161bb4413b7ca9933d553327f5f069/llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp#L38-L39

What are you thoughts on this?

Well that depends on how you see llvm-exegesis. It was originally designed to understand the behavior of the CPU.
To me --repetition-mode is interesting to explore sensitivity to instruction decoding.
If you mix everything and take the minimum then you lose information on the root cause and then you deduce that the minimum latency is X but it only happens if the instruction is already decoded and run in a tight loop.
I think it's important to keep this information separate.

I do agree in principle.

You can always take the minimum between the two during analysis though.

I think this has the same-ish basic issue as merging results from different --mode=s - what if there is more than one result?
Currently, we form PerInstructionStats, which accumulates min/max/averages, which is nice for when results end up being noisy.
How would we aggregate (via taking min) results from different --repetition-mode=s, without sacrificing that?
Pick all --repetition-mode=duplicate and all --repetition-mode=loop and zipper-merge each pair?

Right now you can simply run the benchmark twice (with loop or duplicate repetition mode), concatenate the two outputs and run them through the analysis.
We currently don't consider the flag as part of the instruction so it would be reflected into the PerInstructionStats, increasing the confidence interval if there's a discrepancy between the two.

Right, indeed, which is why i'm asking this question in the first place :)
Because at least for me that is the opposite from the behavior i'd expect/want to have in
analysis mode - in that case one of repetition modes results in obviously-wrong(C) results,
so i'm asking if it would be reasonable to fixup that during analysis via such a zipper merge.

In D74156#1893192, @lebedev.ri wrote:

In D74156#1893182, @gchatelet wrote:

In D74156#1893156, @lebedev.ri wrote:

Thank you for replying!

Sure, my apologies for the lag. I'm extra busy right now.

In D74156#1893104, @gchatelet wrote:

In D74156#1873657, @lebedev.ri wrote:

In D74156#1869502, @gchatelet wrote:

In D74156#1864292, @lebedev.ri wrote:

In D74156#1864096, @gchatelet wrote:

repetition mode (to see the impact of the decoder)

This appears to be currently handled via -repetition-mode switch (D68125).

yes

Does this really have to be accounted for (a dimension in) in the greedy approach?

@courbet and I believe that having two measures are better than one and can demonstrate the impact of decoding the instruction.

Oh, now i see what you mean. At least for CMOV, i'm seeing wildly different results

Latency RThroughput

duplicate 1 0.8

loop 2 0.6

where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

So i would personally guess that --repetition-mode= shouldn't even be an another measurement,
but instead much like what is already done by running the whole snippet a few times
and denoising the result (not averaging!), the same should be done with this - run both, take minimum
https://github.com/llvm/llvm-project/blob/c55cf4afa9161bb4413b7ca9933d553327f5f069/llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp#L38-L39

What are you thoughts on this?

Well that depends on how you see llvm-exegesis. It was originally designed to understand the behavior of the CPU.
To me --repetition-mode is interesting to explore sensitivity to instruction decoding.
If you mix everything and take the minimum then you lose information on the root cause and then you deduce that the minimum latency is X but it only happens if the instruction is already decoded and run in a tight loop.
I think it's important to keep this information separate.

I do agree in principle.

You can always take the minimum between the two during analysis though.

I think this has the same-ish basic issue as merging results from different --mode=s - what if there is more than one result?
Currently, we form PerInstructionStats, which accumulates min/max/averages, which is nice for when results end up being noisy.
How would we aggregate (via taking min) results from different --repetition-mode=s, without sacrificing that?
Pick all --repetition-mode=duplicate and all --repetition-mode=loop and zipper-merge each pair?

Right now you can simply run the benchmark twice (with loop or duplicate repetition mode), concatenate the two outputs and run them through the analysis.
We currently don't consider the flag as part of the instruction so it would be reflected into the PerInstructionStats, increasing the confidence interval if there's a discrepancy between the two.

Right, indeed, which is why i'm asking this question in the first place :)
Because at least for me that is the opposite from the behavior i'd expect/want to have in
analysis mode - in that case one of repetition modes results in obviously-wrong(C) results,
so i'm asking if it would be reasonable to fixup that during analysis via such a zipper merge.

I fail to see what is the expected behavior from your point of view. What should the tool do according to you?

In D74156#1893205, @gchatelet wrote:

In D74156#1893192, @lebedev.ri wrote:

In D74156#1893182, @gchatelet wrote:

In D74156#1893156, @lebedev.ri wrote:

Thank you for replying!

Sure, my apologies for the lag. I'm extra busy right now.

In D74156#1893104, @gchatelet wrote:

In D74156#1873657, @lebedev.ri wrote:

In D74156#1869502, @gchatelet wrote:

In D74156#1864292, @lebedev.ri wrote:

In D74156#1864096, @gchatelet wrote:

repetition mode (to see the impact of the decoder)

This appears to be currently handled via -repetition-mode switch (D68125).

yes

Does this really have to be accounted for (a dimension in) in the greedy approach?

@courbet and I believe that having two measures are better than one and can demonstrate the impact of decoding the instruction.

Oh, now i see what you mean. At least for CMOV, i'm seeing wildly different results

Latency RThroughput

duplicate 1 0.8

loop 2 0.6

where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

So i would personally guess that --repetition-mode= shouldn't even be an another measurement,
but instead much like what is already done by running the whole snippet a few times
and denoising the result (not averaging!), the same should be done with this - run both, take minimum
https://github.com/llvm/llvm-project/blob/c55cf4afa9161bb4413b7ca9933d553327f5f069/llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp#L38-L39

What are you thoughts on this?

Well that depends on how you see llvm-exegesis. It was originally designed to understand the behavior of the CPU.
To me --repetition-mode is interesting to explore sensitivity to instruction decoding.
If you mix everything and take the minimum then you lose information on the root cause and then you deduce that the minimum latency is X but it only happens if the instruction is already decoded and run in a tight loop.
I think it's important to keep this information separate.

I do agree in principle.

You can always take the minimum between the two during analysis though.

I think this has the same-ish basic issue as merging results from different --mode=s - what if there is more than one result?
Currently, we form PerInstructionStats, which accumulates min/max/averages, which is nice for when results end up being noisy.
How would we aggregate (via taking min) results from different --repetition-mode=s, without sacrificing that?
Pick all --repetition-mode=duplicate and all --repetition-mode=loop and zipper-merge each pair?

Right now you can simply run the benchmark twice (with loop or duplicate repetition mode), concatenate the two outputs and run them through the analysis.
We currently don't consider the flag as part of the instruction so it would be reflected into the PerInstructionStats, increasing the confidence interval if there's a discrepancy between the two.

Right, indeed, which is why i'm asking this question in the first place :)
Because at least for me that is the opposite from the behavior i'd expect/want to have in
analysis mode - in that case one of repetition modes results in obviously-wrong(C) results,
so i'm asking if it would be reasonable to fixup that during analysis via such a zipper merge.

I fail to see what is the expected behavior from your point of view. What should the tool do according to you?

I think we may be going in circles here.
Please refer to my original comment https://reviews.llvm.org/D74156#1873657, where i say that

much like what is already done by running the whole snippet a few times
and denoising the result (not averaging!), the same should be done with this - run both (edit: repetition modes), take minimum

To that you reply in https://reviews.llvm.org/D74156#1893104

I think it's important to keep this information separate. You can always take the minimum between the two during analysis though.

To which i agree and reply in https://reviews.llvm.org/D74156#1893156 with

How would we aggregate (via taking min) results from different --repetition-mode=s, without sacrificing that?
Pick all --repetition-mode=duplicate and all --repetition-mode=loop and zipper-merge each pair?

and then https://reviews.llvm.org/D74156#1893192 with

Because at least for me that (edit: the current behavior of always averaging) is the opposite from the behavior i'd expect/want to have in
analysis mode - in that case one of repetition modes results in obviously-wrong(C) results,
so i'm asking if it would be reasonable to fixup that during analysis via such a zipper merge.

And, unless i'm missing the point, you do 180° turn on your original point from https://reviews.llvm.org/D74156#1893104, and say that no, analysis should do no such minimum

I fail to see what is the expected behavior from your point of view. What should the tool do according to you?

... did that explanation of the question i'm having made any sense?

In D74156#1920632, @lebedev.ri wrote:

... did that explanation of the question i'm having made any sense?

Thx for digging in the conversation !
Ok it makes more sense now.

I discussed it a bit with @courbet:

We want the analysis tool to stay simple so we'd rather not make it knowledgeable of the repetition mode.
We'd like to still be able to select either repetition mode to dig into special cases

So we could add a third min repetition mode that would run both and take the minimum. It could be the default option.
Would you have some time to look what it would take to add this third mode?

In D74156#1924514, @gchatelet wrote:

In D74156#1920632, @lebedev.ri wrote:

... did that explanation of the question i'm having made any sense?

Thx for digging in the conversation !
Ok it makes more sense now.

Awesome! :)

I discussed it a bit with @courbet:

We want the analysis tool to stay simple so we'd rather not make it knowledgeable of the repetition mode.

We'd like to still be able to select either repetition mode to dig into special cases

So we could add a third min repetition mode that would run both and take the minimum. It could be the default option.
Would you have some time to look what it would take to add this third mode?

If that is the preferred direction then i will try to take a look :)
Thank you for your patience.

lebedev.ri mentioned this in D76921: [llvm-exegesis] 'Min' repetition mode.Mar 27 2020, 6:33 AM

In D74156#1924584, @lebedev.ri wrote:

In D74156#1924514, @gchatelet wrote:

In D74156#1920632, @lebedev.ri wrote:

... did that explanation of the question i'm having made any sense?

Thx for digging in the conversation !
Ok it makes more sense now.

Awesome! :)

I discussed it a bit with @courbet:

We want the analysis tool to stay simple so we'd rather not make it knowledgeable of the repetition mode.

We'd like to still be able to select either repetition mode to dig into special cases

So we could add a third min repetition mode that would run both and take the minimum. It could be the default option.
Would you have some time to look what it would take to add this third mode?

If that is the preferred direction then i will try to take a look :)
Thank you for your patience.

Posted D76921, thanks.

lebedev.ri mentioned this in rGde22d7154b4a: [llvm-exegesis] 'Min' repetition mode.Apr 1 2020, 11:58 PM

Revision Contents

Path

Size

llvm/

test/

tools/

llvm-exegesis/

X86/

latency-SETCCr-cond-codes-sweep.s

25 lines

tools/

llvm-exegesis/

lib/

CodeTemplate.h

5 lines

ParallelSnippetGenerator.h

2 lines

ParallelSnippetGenerator.cpp

19 lines

SerialSnippetGenerator.h

2 lines

SerialSnippetGenerator.cpp

30 lines

SnippetGenerator.h

139 lines

SnippetGenerator.cpp

48 lines

Target.h

10 lines

X86/

Target.cpp

94 lines

llvm-exegesis.cpp

15 lines

unittests/

tools/

llvm-exegesis/

CMakeLists.txt

1 line

Mips/

SnippetGeneratorTest.cpp

5 lines

SnippetGeneratorTest.cpp

183 lines

X86/

SnippetGeneratorTest.cpp

30 lines

Diff 244221

llvm/test/tools/llvm-exegesis/X86/latency-SETCCr-cond-codes-sweep.s

This file was added.

				# RUN: llvm-exegesis -mode=latency -opcode-name=SETCCr --max-configs-per-opcode=1 \| FileCheck %s --check-prefix=CHECK
				# RUN: llvm-exegesis -mode=latency -opcode-name=SETCCr --max-configs-per-opcode=256 \| FileCheck %s --check-prefix=SWEEP

				CHECK: ---
				CHECK-NEXT: mode: latency
				CHECK-NEXT: key:
				CHECK-NEXT: instructions:
				CHECK-NEXT: 'SETCCr {{.*}} i_0x{{[0-9a-f]}}'

				SWEEP-DAG: 'SETCCr {{.*}} i_0x0'
				SWEEP-DAG: 'SETCCr {{.*}} i_0x1'
				SWEEP-DAG: 'SETCCr {{.*}} i_0x2'
				SWEEP-DAG: 'SETCCr {{.*}} i_0x3'
				SWEEP-DAG: 'SETCCr {{.*}} i_0x4'
				SWEEP-DAG: 'SETCCr {{.*}} i_0x5'
				SWEEP-DAG: 'SETCCr {{.*}} i_0x6'
				SWEEP-DAG: 'SETCCr {{.*}} i_0x7'
				SWEEP-DAG: 'SETCCr {{.*}} i_0x8'
				SWEEP-DAG: 'SETCCr {{.*}} i_0x9'
				SWEEP-DAG: 'SETCCr {{.*}} i_0xa'
				SWEEP-DAG: 'SETCCr {{.*}} i_0xb'
				SWEEP-DAG: 'SETCCr {{.*}} i_0xc'
				SWEEP-DAG: 'SETCCr {{.*}} i_0xd'
				SWEEP-DAG: 'SETCCr {{.*}} i_0xe'
				SWEEP-DAG: 'SETCCr {{.*}} i_0xf'

llvm/tools/llvm-exegesis/lib/CodeTemplate.h

Show All 32 Lines	struct InstructionTemplate {
unsigned getOpcode() const;		unsigned getOpcode() const;
MCOperand &getValueFor(const Variable &Var);		MCOperand &getValueFor(const Variable &Var);
const MCOperand &getValueFor(const Variable &Var) const;		const MCOperand &getValueFor(const Variable &Var) const;
MCOperand &getValueFor(const Operand &Op);		MCOperand &getValueFor(const Operand &Op);
const MCOperand &getValueFor(const Operand &Op) const;		const MCOperand &getValueFor(const Operand &Op) const;
bool hasImmediateVariables() const;		bool hasImmediateVariables() const;
const Instruction &getInstr() const { return *Instr; }		const Instruction &getInstr() const { return *Instr; }
ArrayRef<MCOperand> getVariableValues() const { return VariableValues; }		ArrayRef<MCOperand> getVariableValues() const { return VariableValues; }
		void setVariableValues(ArrayRef<MCOperand> NewVariableValues) {
		assert(VariableValues.size() == NewVariableValues.size() &&
		"Value count mismatch");
		VariableValues.assign(NewVariableValues.begin(), NewVariableValues.end());
		}

// Builds an MCInst from this InstructionTemplate setting its operands		// Builds an MCInst from this InstructionTemplate setting its operands
// to the corresponding variable values. Precondition: All VariableValues must		// to the corresponding variable values. Precondition: All VariableValues must
// be set.		// be set.
MCInst build() const;		MCInst build() const;

private:		private:
const Instruction *Instr;		const Instruction *Instr;
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/tools/llvm-exegesis/lib/ParallelSnippetGenerator.h

	Show All 19 Lines
	namespace exegesis {			namespace exegesis {

	class ParallelSnippetGenerator : public SnippetGenerator {			class ParallelSnippetGenerator : public SnippetGenerator {
	public:			public:
	using SnippetGenerator::SnippetGenerator;			using SnippetGenerator::SnippetGenerator;
	~ParallelSnippetGenerator() override;			~ParallelSnippetGenerator() override;

	Expected<std::vector<CodeTemplate>>			Expected<std::vector<CodeTemplate>>
	generateCodeTemplates(const Instruction &Instr,			generateCodeTemplates(InstructionTemplate Variant,
	const BitVector &ForbiddenRegisters) const override;			const BitVector &ForbiddenRegisters) const override;

	static constexpr const size_t kMinNumDifferentAddresses = 6;			static constexpr const size_t kMinNumDifferentAddresses = 6;

	private:			private:
	// Instantiates memory operands within a snippet.			// Instantiates memory operands within a snippet.
	// To make computations as parallel as possible, we generate independant			// To make computations as parallel as possible, we generate independant
	// memory locations for instructions that load and store. If there are less			// memory locations for instructions that load and store. If there are less
	Show All 29 Lines

llvm/tools/llvm-exegesis/lib/ParallelSnippetGenerator.cpp

Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	for (size_t VarId = 0; VarId < TiedVariables.size(); ++VarId) {
for (BitVector &OtherPossibleRegs : PossibleRegsForVar) {		for (BitVector &OtherPossibleRegs : PossibleRegsForVar) {
OtherPossibleRegs.reset(NextPossibleReg);		OtherPossibleRegs.reset(NextPossibleReg);
}		}
}		}
Instructions.push_back(std::move(TmpIT));		Instructions.push_back(std::move(TmpIT));
}		}
}		}

Expected<std::vector<CodeTemplate>> ParallelSnippetGenerator::generateCodeTemplates(		Expected<std::vector<CodeTemplate>>
const Instruction &Instr, const BitVector &ForbiddenRegisters) const {		ParallelSnippetGenerator::generateCodeTemplates(
		InstructionTemplate Variant, const BitVector &ForbiddenRegisters) const {
		const Instruction &Instr = Variant.getInstr();
CodeTemplate CT;		CodeTemplate CT;
CT.ScratchSpacePointerInReg =		CT.ScratchSpacePointerInReg =
Instr.hasMemoryOperands()		Instr.hasMemoryOperands()
? State.getExegesisTarget().getScratchMemoryRegister(		? State.getExegesisTarget().getScratchMemoryRegister(
State.getTargetMachine().getTargetTriple())		State.getTargetMachine().getTargetTriple())
: 0;		: 0;
const AliasingConfigurations SelfAliasing(Instr, Instr);		const AliasingConfigurations SelfAliasing(Instr, Instr);
InstructionTemplate IT(&Instr);
if (SelfAliasing.empty()) {		if (SelfAliasing.empty()) {
CT.Info = "instruction is parallel, repeating a random one.";		CT.Info = "instruction is parallel, repeating a random one.";
CT.Instructions.push_back(std::move(IT));		CT.Instructions.push_back(std::move(Variant));
instantiateMemoryOperands(CT.ScratchSpacePointerInReg, CT.Instructions);		instantiateMemoryOperands(CT.ScratchSpacePointerInReg, CT.Instructions);
return getSingleton(std::move(CT));		return getSingleton(std::move(CT));
}		}
if (SelfAliasing.hasImplicitAliasing()) {		if (SelfAliasing.hasImplicitAliasing()) {
CT.Info = "instruction is serial, repeating a random one.";		CT.Info = "instruction is serial, repeating a random one.";
CT.Instructions.push_back(std::move(IT));		CT.Instructions.push_back(std::move(Variant));
instantiateMemoryOperands(CT.ScratchSpacePointerInReg, CT.Instructions);		instantiateMemoryOperands(CT.ScratchSpacePointerInReg, CT.Instructions);
return getSingleton(std::move(CT));		return getSingleton(std::move(CT));
}		}
const auto TiedVariables = getVariablesWithTiedOperands(Instr);		const auto TiedVariables = getVariablesWithTiedOperands(Instr);
if (!TiedVariables.empty()) {		if (!TiedVariables.empty()) {
CT.Info = "instruction has tied variables, using static renaming.";		CT.Info = "instruction has tied variables, using static renaming.";
CT.Instructions = generateSnippetUsingStaticRenaming(		CT.Instructions = generateSnippetUsingStaticRenaming(
State, IT, TiedVariables, ForbiddenRegisters);		State, Variant, TiedVariables, ForbiddenRegisters);
instantiateMemoryOperands(CT.ScratchSpacePointerInReg, CT.Instructions);		instantiateMemoryOperands(CT.ScratchSpacePointerInReg, CT.Instructions);
return getSingleton(std::move(CT));		return getSingleton(std::move(CT));
}		}
// No tied variables, we pick random values for defs.		// No tied variables, we pick random values for defs.
BitVector Defs(State.getRegInfo().getNumRegs());		BitVector Defs(State.getRegInfo().getNumRegs());
for (const auto &Op : Instr.Operands) {		for (const auto &Op : Instr.Operands) {
if (Op.isReg() && Op.isExplicit() && Op.isDef() && !Op.isMemory()) {		if (Op.isReg() && Op.isExplicit() && Op.isDef() && !Op.isMemory()) {
auto PossibleRegisters = Op.getRegisterAliasing().sourceBits();		auto PossibleRegisters = Op.getRegisterAliasing().sourceBits();
// Do not use forbidden registers.		// Do not use forbidden registers.
remove(PossibleRegisters, ForbiddenRegisters);		remove(PossibleRegisters, ForbiddenRegisters);
assert(PossibleRegisters.any() && "No register left to choose from");		assert(PossibleRegisters.any() && "No register left to choose from");
const auto RandomReg = randomBit(PossibleRegisters);		const auto RandomReg = randomBit(PossibleRegisters);
Defs.set(RandomReg);		Defs.set(RandomReg);
IT.getValueFor(Op) = MCOperand::createReg(RandomReg);		Variant.getValueFor(Op) = MCOperand::createReg(RandomReg);
}		}
}		}
// And pick random use values that are not reserved and don't alias with defs.		// And pick random use values that are not reserved and don't alias with defs.
const auto DefAliases = getAliasedBits(State.getRegInfo(), Defs);		const auto DefAliases = getAliasedBits(State.getRegInfo(), Defs);
for (const auto &Op : Instr.Operands) {		for (const auto &Op : Instr.Operands) {
if (Op.isReg() && Op.isExplicit() && Op.isUse() && !Op.isMemory()) {		if (Op.isReg() && Op.isExplicit() && Op.isUse() && !Op.isMemory()) {
auto PossibleRegisters = Op.getRegisterAliasing().sourceBits();		auto PossibleRegisters = Op.getRegisterAliasing().sourceBits();
remove(PossibleRegisters, ForbiddenRegisters);		remove(PossibleRegisters, ForbiddenRegisters);
remove(PossibleRegisters, DefAliases);		remove(PossibleRegisters, DefAliases);
assert(PossibleRegisters.any() && "No register left to choose from");		assert(PossibleRegisters.any() && "No register left to choose from");
const auto RandomReg = randomBit(PossibleRegisters);		const auto RandomReg = randomBit(PossibleRegisters);
IT.getValueFor(Op) = MCOperand::createReg(RandomReg);		Variant.getValueFor(Op) = MCOperand::createReg(RandomReg);
}		}
}		}
CT.Info =		CT.Info =
"instruction has no tied variables picking Uses different from defs";		"instruction has no tied variables picking Uses different from defs";
CT.Instructions.push_back(std::move(IT));		CT.Instructions.push_back(std::move(Variant));
instantiateMemoryOperands(CT.ScratchSpacePointerInReg, CT.Instructions);		instantiateMemoryOperands(CT.ScratchSpacePointerInReg, CT.Instructions);
return getSingleton(std::move(CT));		return getSingleton(std::move(CT));
}		}

constexpr const size_t ParallelSnippetGenerator::kMinNumDifferentAddresses;		constexpr const size_t ParallelSnippetGenerator::kMinNumDifferentAddresses;

} // namespace exegesis		} // namespace exegesis
} // namespace llvm		} // namespace llvm

llvm/tools/llvm-exegesis/lib/SerialSnippetGenerator.h

	Show All 21 Lines
	namespace exegesis {			namespace exegesis {

	class SerialSnippetGenerator : public SnippetGenerator {			class SerialSnippetGenerator : public SnippetGenerator {
	public:			public:
	using SnippetGenerator::SnippetGenerator;			using SnippetGenerator::SnippetGenerator;
	~SerialSnippetGenerator() override;			~SerialSnippetGenerator() override;

	Expected<std::vector<CodeTemplate>>			Expected<std::vector<CodeTemplate>>
	generateCodeTemplates(const Instruction &Instr,			generateCodeTemplates(InstructionTemplate Variant,
	const BitVector &ForbiddenRegisters) const override;			const BitVector &ForbiddenRegisters) const override;
	};			};

	} // namespace exegesis			} // namespace exegesis
	} // namespace llvm			} // namespace llvm

	#endif // LLVM_TOOLS_LLVM_EXEGESIS_SERIALSNIPPETGENERATOR_H			#endif // LLVM_TOOLS_LLVM_EXEGESIS_SERIALSNIPPETGENERATOR_H

llvm/tools/llvm-exegesis/lib/SerialSnippetGenerator.cpp

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	if (Instr.hasAliasingRegisters(ForbiddenRegisters))
EM \|= ExecutionMode::SERIAL_VIA_EXPLICIT_REGS;		EM \|= ExecutionMode::SERIAL_VIA_EXPLICIT_REGS;
if (Instr.hasOneUseOrOneDef())		if (Instr.hasOneUseOrOneDef())
EM \|= ExecutionMode::SERIAL_VIA_NON_MEMORY_INSTR;		EM \|= ExecutionMode::SERIAL_VIA_NON_MEMORY_INSTR;
}		}
return EM;		return EM;
}		}

static void appendCodeTemplates(const LLVMState &State,		static void appendCodeTemplates(const LLVMState &State,
const Instruction *Instr,		InstructionTemplate Variant,
const BitVector &ForbiddenRegisters,		const BitVector &ForbiddenRegisters,
ExecutionMode ExecutionModeBit,		ExecutionMode ExecutionModeBit,
StringRef ExecutionClassDescription,		StringRef ExecutionClassDescription,
std::vector<CodeTemplate> &CodeTemplates) {		std::vector<CodeTemplate> &CodeTemplates) {
assert(isEnumValue(ExecutionModeBit) && "Bit must be a power of two");		assert(isEnumValue(ExecutionModeBit) && "Bit must be a power of two");
switch (ExecutionModeBit) {		switch (ExecutionModeBit) {
case ExecutionMode::ALWAYS_SERIAL_IMPLICIT_REGS_ALIAS:		case ExecutionMode::ALWAYS_SERIAL_IMPLICIT_REGS_ALIAS:
// Nothing to do, the instruction is always serial.		// Nothing to do, the instruction is always serial.
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case ExecutionMode::ALWAYS_SERIAL_TIED_REGS_ALIAS: {		case ExecutionMode::ALWAYS_SERIAL_TIED_REGS_ALIAS: {
// Picking whatever value for the tied variable will make the instruction		// Picking whatever value for the tied variable will make the instruction
// serial.		// serial.
CodeTemplate CT;		CodeTemplate CT;
CT.Execution = ExecutionModeBit;		CT.Execution = ExecutionModeBit;
CT.Info = std::string(ExecutionClassDescription);		CT.Info = std::string(ExecutionClassDescription);
CT.Instructions.push_back(Instr);		CT.Instructions.push_back(std::move(Variant));
CodeTemplates.push_back(std::move(CT));		CodeTemplates.push_back(std::move(CT));
return;		return;
}		}
case ExecutionMode::SERIAL_VIA_MEMORY_INSTR: {		case ExecutionMode::SERIAL_VIA_MEMORY_INSTR: {
// Select back-to-back memory instruction.		// Select back-to-back memory instruction.
// TODO: Implement me.		// TODO: Implement me.
return;		return;
}		}
case ExecutionMode::SERIAL_VIA_EXPLICIT_REGS: {		case ExecutionMode::SERIAL_VIA_EXPLICIT_REGS: {
// Making the execution of this instruction serial by selecting one def		// Making the execution of this instruction serial by selecting one def
// register to alias with one use register.		// register to alias with one use register.
const AliasingConfigurations SelfAliasing(Instr, Instr);		const AliasingConfigurations SelfAliasing(Variant.getInstr(),
		Variant.getInstr());
assert(!SelfAliasing.empty() && !SelfAliasing.hasImplicitAliasing() &&		assert(!SelfAliasing.empty() && !SelfAliasing.hasImplicitAliasing() &&
"Instr must alias itself explicitly");		"Instr must alias itself explicitly");
InstructionTemplate IT(Instr);
// This is a self aliasing instruction so defs and uses are from the same		// This is a self aliasing instruction so defs and uses are from the same
// instance, hence twice IT in the following call.		// instance, hence twice Variant in the following call.
setRandomAliasing(SelfAliasing, IT, IT);		setRandomAliasing(SelfAliasing, Variant, Variant);
CodeTemplate CT;		CodeTemplate CT;
CT.Execution = ExecutionModeBit;		CT.Execution = ExecutionModeBit;
CT.Info = std::string(ExecutionClassDescription);		CT.Info = std::string(ExecutionClassDescription);
CT.Instructions.push_back(std::move(IT));		CT.Instructions.push_back(std::move(Variant));
CodeTemplates.push_back(std::move(CT));		CodeTemplates.push_back(std::move(CT));
return;		return;
}		}
case ExecutionMode::SERIAL_VIA_NON_MEMORY_INSTR: {		case ExecutionMode::SERIAL_VIA_NON_MEMORY_INSTR: {
		const Instruction &Instr = Variant.getInstr();
// Select back-to-back non-memory instruction.		// Select back-to-back non-memory instruction.
for (const auto *OtherInstr : computeAliasingInstructions(		for (const auto *OtherInstr : computeAliasingInstructions(
State, Instr, kMaxAliasingInstructions, ForbiddenRegisters)) {		State, &Instr, kMaxAliasingInstructions, ForbiddenRegisters)) {
const AliasingConfigurations Forward(Instr, OtherInstr);		const AliasingConfigurations Forward(Instr, *OtherInstr);
const AliasingConfigurations Back(OtherInstr, Instr);		const AliasingConfigurations Back(*OtherInstr, Instr);
InstructionTemplate ThisIT(Instr);		InstructionTemplate ThisIT(Variant);
InstructionTemplate OtherIT(OtherInstr);		InstructionTemplate OtherIT(OtherInstr);
if (!Forward.hasImplicitAliasing())		if (!Forward.hasImplicitAliasing())
setRandomAliasing(Forward, ThisIT, OtherIT);		setRandomAliasing(Forward, ThisIT, OtherIT);
if (!Back.hasImplicitAliasing())		if (!Back.hasImplicitAliasing())
setRandomAliasing(Back, OtherIT, ThisIT);		setRandomAliasing(Back, OtherIT, ThisIT);
CodeTemplate CT;		CodeTemplate CT;
CT.Execution = ExecutionModeBit;		CT.Execution = ExecutionModeBit;
CT.Info = std::string(ExecutionClassDescription);		CT.Info = std::string(ExecutionClassDescription);
CT.Instructions.push_back(std::move(ThisIT));		CT.Instructions.push_back(std::move(ThisIT));
CT.Instructions.push_back(std::move(OtherIT));		CT.Instructions.push_back(std::move(OtherIT));
CodeTemplates.push_back(std::move(CT));		CodeTemplates.push_back(std::move(CT));
}		}
return;		return;
}		}
default:		default:
llvm_unreachable("Unhandled enum value");		llvm_unreachable("Unhandled enum value");
}		}
}		}

SerialSnippetGenerator::~SerialSnippetGenerator() = default;		SerialSnippetGenerator::~SerialSnippetGenerator() = default;

Expected<std::vector<CodeTemplate>>		Expected<std::vector<CodeTemplate>>
SerialSnippetGenerator::generateCodeTemplates(		SerialSnippetGenerator::generateCodeTemplates(
const Instruction &Instr, const BitVector &ForbiddenRegisters) const {		InstructionTemplate Variant, const BitVector &ForbiddenRegisters) const {
std::vector<CodeTemplate> Results;		std::vector<CodeTemplate> Results;
const ExecutionMode EM = getExecutionModes(Instr, ForbiddenRegisters);		const ExecutionMode EM =
		getExecutionModes(Variant.getInstr(), ForbiddenRegisters);
for (const auto EC : kExecutionClasses) {		for (const auto EC : kExecutionClasses) {
for (const auto ExecutionModeBit : getExecutionModeBits(EM & EC.Mask))		for (const auto ExecutionModeBit : getExecutionModeBits(EM & EC.Mask))
appendCodeTemplates(State, &Instr, ForbiddenRegisters, ExecutionModeBit,		appendCodeTemplates(State, Variant, ForbiddenRegisters, ExecutionModeBit,
EC.Description, Results);		EC.Description, Results);
if (!Results.empty())		if (!Results.empty())
break;		break;
}		}
if (Results.empty())		if (Results.empty())
return make_error<Failure>(		return make_error<Failure>(
"No strategy found to make the execution serial");		"No strategy found to make the execution serial");
return std::move(Results);		return std::move(Results);
}		}

} // namespace exegesis		} // namespace exegesis
} // namespace llvm		} // namespace llvm

llvm/tools/llvm-exegesis/lib/SnippetGenerator.h

Show All 28 Lines

namespace llvm {		namespace llvm {
namespace exegesis {		namespace exegesis {

std::vector<CodeTemplate> getSingleton(CodeTemplate &&CT);		std::vector<CodeTemplate> getSingleton(CodeTemplate &&CT);

// Generates code templates that has a self-dependency.		// Generates code templates that has a self-dependency.
Expected<std::vector<CodeTemplate>>		Expected<std::vector<CodeTemplate>>
generateSelfAliasingCodeTemplates(const Instruction &Instr);		generateSelfAliasingCodeTemplates(InstructionTemplate Variant);

// Generates code templates without assignment constraints.		// Generates code templates without assignment constraints.
Expected<std::vector<CodeTemplate>>		Expected<std::vector<CodeTemplate>>
generateUnconstrainedCodeTemplates(const Instruction &Instr, StringRef Msg);		generateUnconstrainedCodeTemplates(const InstructionTemplate &Variant,
		StringRef Msg);

// A class representing failures that happened during Benchmark, they are used		// A class representing failures that happened during Benchmark, they are used
// to report informations to the user.		// to report informations to the user.
class SnippetGeneratorFailure : public StringError {		class SnippetGeneratorFailure : public StringError {
public:		public:
SnippetGeneratorFailure(const Twine &S);		SnippetGeneratorFailure(const Twine &S);
};		};

// Common code for all benchmark modes.		// Common code for all benchmark modes.
class SnippetGenerator {		class SnippetGenerator {
public:		public:
struct Options {		struct Options {
unsigned MaxConfigsPerOpcode = 1;		unsigned MaxConfigsPerOpcode = 1;
};		};

explicit SnippetGenerator(const LLVMState &State, const Options &Opts);		explicit SnippetGenerator(const LLVMState &State, const Options &Opts);

virtual ~SnippetGenerator();		virtual ~SnippetGenerator();

// Calls generateCodeTemplate and expands it into one or more BenchmarkCode.		// Calls generateCodeTemplate and expands it into one or more BenchmarkCode.
Expected<std::vector<BenchmarkCode>>		Error generateConfigurations(const InstructionTemplate &Variant,
generateConfigurations(const Instruction &Instr,		std::vector<BenchmarkCode> &Benchmarks,
const BitVector &ExtraForbiddenRegs) const;		const BitVector &ExtraForbiddenRegs) const;

// Given a snippet, computes which registers the setup code needs to define.		// Given a snippet, computes which registers the setup code needs to define.
std::vector<RegisterValue> computeRegisterInitialValues(		std::vector<RegisterValue> computeRegisterInitialValues(
const std::vector<InstructionTemplate> &Snippet) const;		const std::vector<InstructionTemplate> &Snippet) const;

protected:		protected:
const LLVMState &State;		const LLVMState &State;
const Options Opts;		const Options Opts;

private:		private:
// API to be implemented by subclasses.		// API to be implemented by subclasses.
virtual Expected<std::vector<CodeTemplate>>		virtual Expected<std::vector<CodeTemplate>>
generateCodeTemplates(const Instruction &Instr,		generateCodeTemplates(InstructionTemplate Variant,
const BitVector &ForbiddenRegisters) const = 0;		const BitVector &ForbiddenRegisters) const = 0;
};		};

// A global Random Number Generator to randomize configurations.		// A global Random Number Generator to randomize configurations.
// FIXME: Move random number generation into an object and make it seedable for		// FIXME: Move random number generation into an object and make it seedable for
// unit tests.		// unit tests.
std::mt19937 &randomGenerator();		std::mt19937 &randomGenerator();

Show All 10 Lines	void setRandomAliasing(const AliasingConfigurations &AliasingConfigurations,
InstructionTemplate &DefIB, InstructionTemplate &UseIB);		InstructionTemplate &DefIB, InstructionTemplate &UseIB);

// Assigns a Random Value to all Variables in IT that are still Invalid.		// Assigns a Random Value to all Variables in IT that are still Invalid.
// Do not use any of the registers in `ForbiddenRegs`.		// Do not use any of the registers in `ForbiddenRegs`.
Error randomizeUnsetVariables(const LLVMState &State,		Error randomizeUnsetVariables(const LLVMState &State,
const BitVector &ForbiddenRegs,		const BitVector &ForbiddenRegs,
InstructionTemplate &IT);		InstructionTemplate &IT);

		// Combination generator.
		//
		// Example: given input {{0, 1}, {2}, {3, 4}} it will produce the following
		// combinations: {0, 2, 3}, {0, 2, 4}, {1, 2, 3}, {1, 2, 4}.
		//
		// It is important to think of input as vector-of-vectors, where the
		// outer vector is the variable space, and inner vector is choice space.
		// The number of choices for each variable can be different.
		//
		// As for implementation, it is useful to think of this as a weird number,
		// where each digit (==variable) may have different base (==number of choices).
		// Thus modelling of 'produce next combination' is exactly analogous to the
		// incrementing of an number - increment lowest digit (pick next choice for the
		// variable), and if it wrapped to the beginning then increment next digit.
		template <typename choice_type, typename choices_storage_type,
		gchateletUnsubmitted Not Done Reply Inline Actions This is the idea but it should generate the values on the go because the space to explore might be too big to enumerate all the possibilities. Sorry if it's a bit hand-wavy for now, I haven't sorted out the details but we should be able to use space filling curves to that end (article). gchatelet: This is the idea but it should generate the values //on the go// because the space to explore…
		int variable_smallsize>
		class CombinationGenerator {
		template <typename T> struct WrappingIterator {
		using value_type = T;

		const ArrayRef<value_type> Range;
		typename decltype(Range)::const_iterator Position;

		// Rewind the tape, placing the position to again point at the beginning.
		void rewind() { Position = Range.begin(); }

		// Advance position forward, possibly wrapping to the beginning.
		// Returns whether the wrap happened.
		bool operator++() {
		++Position;
		bool Wrapped = Position == Range.end();
		if (Wrapped)
		rewind();
		return Wrapped;
		}

		// Get the value at which we are currently pointing.
		operator const value_type &() const { return *Position; }

		WrappingIterator(ArrayRef<value_type> Range_) : Range(Range_) {
		assert(!Range.empty() && "The range must not be empty.");
		rewind();
		}

		// Only allow using our custom constructor.
		WrappingIterator() = delete;
		WrappingIterator(const WrappingIterator &) = delete;
		WrappingIterator(WrappingIterator &&) = delete;
		WrappingIterator &operator=(WrappingIterator) = delete;
		WrappingIterator &operator=(const WrappingIterator &) = delete;
		WrappingIterator &operator=(WrappingIterator &&) = delete;
		};

		const ArrayRef<choices_storage_type> VariablesChoices;
		const function_ref<bool(ArrayRef<choice_type>)> &Callback;

		void performGeneration() const {
		SmallVector<WrappingIterator<choice_type>, variable_smallsize>
		VariablesState;

		// Initialize the per-variable state to refer to the possible choices for
		// that variable.
		VariablesState.reserve(VariablesChoices.size());
		for (ArrayRef<choice_type> VariablesChoices : VariablesChoices)
		VariablesState.emplace_back(VariablesChoices);

		// Temporary buffer to store each combination before performing Callback.
		SmallVector<choice_type, variable_smallsize> CurrentCombination;
		CurrentCombination.resize(VariablesState.size());

		while (true) {
		// Gather the currently-selected variable choices into a vector.
		for (auto I : llvm::zip(VariablesState, CurrentCombination))
		std::get<1>(I) = std::get<0>(I);
		// And pass the new combination into callback, as intended.
		if (/Abort=/Callback(CurrentCombination))
		return;

		// 'increment' the whole VariablesState, much like you would increment
		// a number: starting from the least significant element, increment it,
		// and if it wrapped, then propagate that carry by also incrementing next
		// (more significant) element.
		for (WrappingIterator<choice_type> &VariableState :
		llvm::reverse(VariablesState)) {
		bool Wrapped = ++VariableState;
		if (!Wrapped)
		break;

		if (VariablesState.begin() == &VariableState)
		return; // The "most significant" variable has wrapped, which means
		// that we have produced all the combinations.

		// We have carry - increment more significant variable next..
		}
		}
		};

		public:
		CombinationGenerator(ArrayRef<choices_storage_type> VariablesChoices_,
		const function_ref<bool(ArrayRef<choice_type>)> &Cb_)
		: VariablesChoices(VariablesChoices_), Callback(Cb_) {
		#ifndef NDEBUG
		assert(!VariablesChoices.empty() && "There should be some variables.");
		llvm::for_each(VariablesChoices, [](ArrayRef<choice_type> VariableChoices) {
		assert(!VariableChoices.empty() &&
		"There must always be some choice, at least a placeholder one.");
		});
		#endif
		}

		// How many combinations can we produce, max?
		// This is at most how many times the callback will be called.
		size_t numCombinations() const {
		size_t NumVariants = 1;
		for (ArrayRef<choice_type> VariableChoices : VariablesChoices)
		NumVariants *= VariableChoices.size();
		assert(NumVariants >= 1 &&
		"We should always end up producing at least one combination");
		return NumVariants;
		}

		// Actually perform exhaustive combination generation.
		// Each result will be passed into the callback.
		void generate() { performGeneration(); }
		};

} // namespace exegesis		} // namespace exegesis
} // namespace llvm		} // namespace llvm

#endif // LLVM_TOOLS_LLVM_EXEGESIS_SNIPPETGENERATOR_H		#endif // LLVM_TOOLS_LLVM_EXEGESIS_SNIPPETGENERATOR_H

llvm/tools/llvm-exegesis/lib/SnippetGenerator.cpp

Show All 32 Lines
SnippetGeneratorFailure::SnippetGeneratorFailure(const Twine &S)		SnippetGeneratorFailure::SnippetGeneratorFailure(const Twine &S)
: StringError(S, inconvertibleErrorCode()) {}		: StringError(S, inconvertibleErrorCode()) {}

SnippetGenerator::SnippetGenerator(const LLVMState &State, const Options &Opts)		SnippetGenerator::SnippetGenerator(const LLVMState &State, const Options &Opts)
: State(State), Opts(Opts) {}		: State(State), Opts(Opts) {}

SnippetGenerator::~SnippetGenerator() = default;		SnippetGenerator::~SnippetGenerator() = default;

Expected<std::vector<BenchmarkCode>> SnippetGenerator::generateConfigurations(		Error SnippetGenerator::generateConfigurations(
const Instruction &Instr, const BitVector &ExtraForbiddenRegs) const {		const InstructionTemplate &Variant, std::vector<BenchmarkCode> &Benchmarks,
		const BitVector &ExtraForbiddenRegs) const {
BitVector ForbiddenRegs = State.getRATC().reservedRegisters();		BitVector ForbiddenRegs = State.getRATC().reservedRegisters();
ForbiddenRegs \|= ExtraForbiddenRegs;		ForbiddenRegs \|= ExtraForbiddenRegs;
// If the instruction has memory registers, prevent the generator from		// If the instruction has memory registers, prevent the generator from
// using the scratch register and its aliasing registers.		// using the scratch register and its aliasing registers.
if (Instr.hasMemoryOperands()) {		if (Variant.getInstr().hasMemoryOperands()) {
const auto &ET = State.getExegesisTarget();		const auto &ET = State.getExegesisTarget();
unsigned ScratchSpacePointerInReg =		unsigned ScratchSpacePointerInReg =
ET.getScratchMemoryRegister(State.getTargetMachine().getTargetTriple());		ET.getScratchMemoryRegister(State.getTargetMachine().getTargetTriple());
if (ScratchSpacePointerInReg == 0)		if (ScratchSpacePointerInReg == 0)
return make_error<Failure>(		return make_error<Failure>(
"Infeasible : target does not support memory instructions");		"Infeasible : target does not support memory instructions");
const auto &ScratchRegAliases =		const auto &ScratchRegAliases =
State.getRATC().getRegister(ScratchSpacePointerInReg).aliasedBits();		State.getRATC().getRegister(ScratchSpacePointerInReg).aliasedBits();
// If the instruction implicitly writes to ScratchSpacePointerInReg , abort.		// If the instruction implicitly writes to ScratchSpacePointerInReg , abort.
// FIXME: We could make a copy of the scratch register.		// FIXME: We could make a copy of the scratch register.
for (const auto &Op : Instr.Operands) {		for (const auto &Op : Variant.getInstr().Operands) {
if (Op.isDef() && Op.isImplicitReg() &&		if (Op.isDef() && Op.isImplicitReg() &&
ScratchRegAliases.test(Op.getImplicitReg()))		ScratchRegAliases.test(Op.getImplicitReg()))
return make_error<Failure>(		return make_error<Failure>(
"Infeasible : memory instruction uses scratch memory register");		"Infeasible : memory instruction uses scratch memory register");
}		}
ForbiddenRegs \|= ScratchRegAliases;		ForbiddenRegs \|= ScratchRegAliases;
}		}

if (auto E = generateCodeTemplates(Instr, ForbiddenRegs)) {		if (auto E = generateCodeTemplates(Variant, ForbiddenRegs)) {
std::vector<BenchmarkCode> Output;		MutableArrayRef<CodeTemplate> Templates = E.get();
for (CodeTemplate &CT : E.get()) {
		// Avoid reallocations in the loop.
		Benchmarks.reserve(Benchmarks.size() + Templates.size());
		for (CodeTemplate &CT : Templates) {
// TODO: Generate as many BenchmarkCode as needed.		// TODO: Generate as many BenchmarkCode as needed.
{		{
BenchmarkCode BC;		BenchmarkCode BC;
BC.Info = CT.Info;		BC.Info = CT.Info;
for (InstructionTemplate &IT : CT.Instructions) {		for (InstructionTemplate &IT : CT.Instructions) {
if (auto error = randomizeUnsetVariables(State, ForbiddenRegs, IT))		if (auto error = randomizeUnsetVariables(State, ForbiddenRegs, IT))
return std::move(error);		return error;
BC.Key.Instructions.push_back(IT.build());		BC.Key.Instructions.push_back(IT.build());
}		}
if (CT.ScratchSpacePointerInReg)		if (CT.ScratchSpacePointerInReg)
BC.LiveIns.push_back(CT.ScratchSpacePointerInReg);		BC.LiveIns.push_back(CT.ScratchSpacePointerInReg);
BC.Key.RegisterInitialValues =		BC.Key.RegisterInitialValues =
computeRegisterInitialValues(CT.Instructions);		computeRegisterInitialValues(CT.Instructions);
BC.Key.Config = CT.Config;		BC.Key.Config = CT.Config;
Output.push_back(std::move(BC));		Benchmarks.emplace_back(std::move(BC));
if (Output.size() >= Opts.MaxConfigsPerOpcode)		if (Benchmarks.size() >= Opts.MaxConfigsPerOpcode) {
return Output; // Early exit if we exceeded the number of allowed		// We reached the number of allowed configs and return early.
// configs.		return Error::success();
		gchateletUnsubmitted Done Reply Inline Actions I don't think it's worth mentioning error here. Maybe replace the whole comment with We reached the number of allowed configs and return early. gchatelet: I don't think it's worth mentioning error here. Maybe replace the whole comment with ``` We…
}		}
}		}
return Output;		}
		return Error::success();
} else		} else
return E.takeError();		return E.takeError();
}		}

std::vector<RegisterValue> SnippetGenerator::computeRegisterInitialValues(		std::vector<RegisterValue> SnippetGenerator::computeRegisterInitialValues(
const std::vector<InstructionTemplate> &Instructions) const {		const std::vector<InstructionTemplate> &Instructions) const {
// Collect all register uses and create an assignment for each of them.		// Collect all register uses and create an assignment for each of them.
// Ignore memory operands which are handled separately.		// Ignore memory operands which are handled separately.
Show All 31 Lines	for (const Operand &Op : IT.getInstr().Operands) {
DefinedRegs.set(Reg);		DefinedRegs.set(Reg);
}		}
}		}
}		}
return RIV;		return RIV;
}		}

Expected<std::vector<CodeTemplate>>		Expected<std::vector<CodeTemplate>>
generateSelfAliasingCodeTemplates(const Instruction &Instr) {		generateSelfAliasingCodeTemplates(InstructionTemplate Variant) {
const AliasingConfigurations SelfAliasing(Instr, Instr);		const AliasingConfigurations SelfAliasing(Variant.getInstr(),
		Variant.getInstr());
if (SelfAliasing.empty())		if (SelfAliasing.empty())
return make_error<SnippetGeneratorFailure>("empty self aliasing");		return make_error<SnippetGeneratorFailure>("empty self aliasing");
std::vector<CodeTemplate> Result;		std::vector<CodeTemplate> Result;
Result.emplace_back();		Result.emplace_back();
CodeTemplate &CT = Result.back();		CodeTemplate &CT = Result.back();
InstructionTemplate IT(&Instr);
if (SelfAliasing.hasImplicitAliasing()) {		if (SelfAliasing.hasImplicitAliasing()) {
CT.Info = "implicit Self cycles, picking random values.";		CT.Info = "implicit Self cycles, picking random values.";
} else {		} else {
CT.Info = "explicit self cycles, selecting one aliasing Conf.";		CT.Info = "explicit self cycles, selecting one aliasing Conf.";
// This is a self aliasing instruction so defs and uses are from the same		// This is a self aliasing instruction so defs and uses are from the same
// instance, hence twice IT in the following call.		// instance, hence twice Variant in the following call.
setRandomAliasing(SelfAliasing, IT, IT);		setRandomAliasing(SelfAliasing, Variant, Variant);
}		}
CT.Instructions.push_back(std::move(IT));		CT.Instructions.push_back(std::move(Variant));
return std::move(Result);		return std::move(Result);
}		}

Expected<std::vector<CodeTemplate>>		Expected<std::vector<CodeTemplate>>
generateUnconstrainedCodeTemplates(const Instruction &Instr, StringRef Msg) {		generateUnconstrainedCodeTemplates(const InstructionTemplate &Variant,
		StringRef Msg) {
std::vector<CodeTemplate> Result;		std::vector<CodeTemplate> Result;
Result.emplace_back();		Result.emplace_back();
CodeTemplate &CT = Result.back();		CodeTemplate &CT = Result.back();
CT.Info =		CT.Info =
std::string(formatv("{0}, repeating an unconstrained assignment", Msg));		std::string(formatv("{0}, repeating an unconstrained assignment", Msg));
CT.Instructions.emplace_back(&Instr);		CT.Instructions.push_back(std::move(Variant));
return std::move(Result);		return std::move(Result);
}		}

std::mt19937 &randomGenerator() {		std::mt19937 &randomGenerator() {
static std::random_device RandomDevice;		static std::random_device RandomDevice;
static std::mt19937 RandomGenerator(RandomDevice());		static std::mt19937 RandomGenerator(RandomDevice());
return RandomGenerator;		return RandomGenerator;
}		}
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

llvm/tools/llvm-exegesis/lib/Target.h

Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	public:

// Returns true if this instruction is supported as a back-to-back		// Returns true if this instruction is supported as a back-to-back
// instructions.		// instructions.
// FIXME: Eventually we should discover this dynamically.		// FIXME: Eventually we should discover this dynamically.
virtual bool allowAsBackToBack(const Instruction &Instr) const {		virtual bool allowAsBackToBack(const Instruction &Instr) const {
return true;		return true;
}		}

		// For some instructions, it is interesting to measure how it's performance
		// characteristics differ depending on it's operands.
		// This allows us to produce all the interesting variants.
		virtual std::vector<InstructionTemplate>
		generateInstructionVariants(const Instruction &Instr,
		unsigned MaxConfigsPerOpcode) const {
		// By default, we're happy with whatever randomizer will give us.
		return {&Instr};
		}

// Creates a snippet generator for the given mode.		// Creates a snippet generator for the given mode.
std::unique_ptr<SnippetGenerator>		std::unique_ptr<SnippetGenerator>
createSnippetGenerator(InstructionBenchmark::ModeE Mode,		createSnippetGenerator(InstructionBenchmark::ModeE Mode,
const LLVMState &State,		const LLVMState &State,
const SnippetGenerator::Options &Opts) const;		const SnippetGenerator::Options &Opts) const;
// Creates a benchmark runner for the given mode.		// Creates a benchmark runner for the given mode.
Expected<std::unique_ptr<BenchmarkRunner>>		Expected<std::unique_ptr<BenchmarkRunner>>
createBenchmarkRunner(InstructionBenchmark::ModeE Mode,		createBenchmarkRunner(InstructionBenchmark::ModeE Mode,
Show All 38 Lines

llvm/tools/llvm-exegesis/lib/X86/Target.cpp

//===-- Target.cpp ----------------------------------------------- C++ --===//		//===-- Target.cpp ----------------------------------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
#include "../Target.h"		#include "../Target.h"

#include "../Error.h"		#include "../Error.h"
		#include "../ParallelSnippetGenerator.h"
#include "../SerialSnippetGenerator.h"		#include "../SerialSnippetGenerator.h"
#include "../SnippetGenerator.h"		#include "../SnippetGenerator.h"
#include "../ParallelSnippetGenerator.h"
#include "MCTargetDesc/X86BaseInfo.h"		#include "MCTargetDesc/X86BaseInfo.h"
#include "MCTargetDesc/X86MCTargetDesc.h"		#include "MCTargetDesc/X86MCTargetDesc.h"
#include "X86.h"		#include "X86.h"
#include "X86RegisterInfo.h"		#include "X86RegisterInfo.h"
#include "X86Subtarget.h"		#include "X86Subtarget.h"
		#include "llvm/ADT/Sequence.h"
#include "llvm/MC/MCInstBuilder.h"		#include "llvm/MC/MCInstBuilder.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"

namespace llvm {		namespace llvm {
namespace exegesis {		namespace exegesis {

// Returns a non-null reason if we cannot handle the memory references in this		// Returns a non-null reason if we cannot handle the memory references in this
// instruction.		// instruction.
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
}		}

namespace {		namespace {
class X86SerialSnippetGenerator : public SerialSnippetGenerator {		class X86SerialSnippetGenerator : public SerialSnippetGenerator {
public:		public:
using SerialSnippetGenerator::SerialSnippetGenerator;		using SerialSnippetGenerator::SerialSnippetGenerator;

Expected<std::vector<CodeTemplate>>		Expected<std::vector<CodeTemplate>>
generateCodeTemplates(const Instruction &Instr,		generateCodeTemplates(InstructionTemplate Variant,
const BitVector &ForbiddenRegisters) const override;		const BitVector &ForbiddenRegisters) const override;
};		};
} // namespace		} // namespace

Expected<std::vector<CodeTemplate>>		Expected<std::vector<CodeTemplate>>
X86SerialSnippetGenerator::generateCodeTemplates(		X86SerialSnippetGenerator::generateCodeTemplates(
const Instruction &Instr, const BitVector &ForbiddenRegisters) const {		InstructionTemplate Variant, const BitVector &ForbiddenRegisters) const {
		const Instruction &Instr = Variant.getInstr();

if (const auto reason = isInvalidOpcode(Instr))		if (const auto reason = isInvalidOpcode(Instr))
return make_error<Failure>(reason);		return make_error<Failure>(reason);

// LEA gets special attention.		// LEA gets special attention.
const auto Opcode = Instr.Description.getOpcode();		const auto Opcode = Instr.Description.getOpcode();
if (Opcode == X86::LEA64r \|\| Opcode == X86::LEA64_32r) {		if (Opcode == X86::LEA64r \|\| Opcode == X86::LEA64_32r) {
return generateLEATemplatesCommon(		return generateLEATemplatesCommon(
Instr, ForbiddenRegisters, State, Opts,		Instr, ForbiddenRegisters, State, Opts,
[this](unsigned BaseReg, unsigned IndexReg,		[this](unsigned BaseReg, unsigned IndexReg,
BitVector &CandidateDestRegs) {		BitVector &CandidateDestRegs) {
// We just select a destination register that aliases the base		// We just select a destination register that aliases the base
// register.		// register.
CandidateDestRegs &=		CandidateDestRegs &=
State.getRATC().getRegister(BaseReg).aliasedBits();		State.getRATC().getRegister(BaseReg).aliasedBits();
});		});
}		}

if (Instr.hasMemoryOperands())		if (Instr.hasMemoryOperands())
return make_error<Failure>(		return make_error<Failure>(
"unsupported memory operand in latency measurements");		"unsupported memory operand in latency measurements");

switch (getX86FPFlags(Instr)) {		switch (getX86FPFlags(Instr)) {
case X86II::NotFP:		case X86II::NotFP:
return SerialSnippetGenerator::generateCodeTemplates(Instr,		return SerialSnippetGenerator::generateCodeTemplates(Variant,
ForbiddenRegisters);		ForbiddenRegisters);
case X86II::ZeroArgFP:		case X86II::ZeroArgFP:
case X86II::OneArgFP:		case X86II::OneArgFP:
case X86II::SpecialFP:		case X86II::SpecialFP:
case X86II::CompareFP:		case X86II::CompareFP:
case X86II::CondMovFP:		case X86II::CondMovFP:
return make_error<Failure>("Unsupported x87 Instruction");		return make_error<Failure>("Unsupported x87 Instruction");
case X86II::OneArgFPRW:		case X86II::OneArgFPRW:
case X86II::TwoArgFP:		case X86II::TwoArgFP:
// These are instructions like		// These are instructions like
// - `ST(0) = fsqrt(ST(0))` (OneArgFPRW)		// - `ST(0) = fsqrt(ST(0))` (OneArgFPRW)
// - `ST(0) = ST(0) + ST(i)` (TwoArgFP)		// - `ST(0) = ST(0) + ST(i)` (TwoArgFP)
// They are intrinsically serial and do not modify the state of the stack.		// They are intrinsically serial and do not modify the state of the stack.
return generateSelfAliasingCodeTemplates(Instr);		return generateSelfAliasingCodeTemplates(Variant);
default:		default:
llvm_unreachable("Unknown FP Type!");		llvm_unreachable("Unknown FP Type!");
}		}
}		}

namespace {		namespace {
class X86ParallelSnippetGenerator : public ParallelSnippetGenerator {		class X86ParallelSnippetGenerator : public ParallelSnippetGenerator {
public:		public:
using ParallelSnippetGenerator::ParallelSnippetGenerator;		using ParallelSnippetGenerator::ParallelSnippetGenerator;

Expected<std::vector<CodeTemplate>>		Expected<std::vector<CodeTemplate>>
generateCodeTemplates(const Instruction &Instr,		generateCodeTemplates(InstructionTemplate Variant,
const BitVector &ForbiddenRegisters) const override;		const BitVector &ForbiddenRegisters) const override;
};		};

} // namespace		} // namespace

Expected<std::vector<CodeTemplate>>		Expected<std::vector<CodeTemplate>>
X86ParallelSnippetGenerator::generateCodeTemplates(		X86ParallelSnippetGenerator::generateCodeTemplates(
const Instruction &Instr, const BitVector &ForbiddenRegisters) const {		InstructionTemplate Variant, const BitVector &ForbiddenRegisters) const {
		const Instruction &Instr = Variant.getInstr();

if (const auto reason = isInvalidOpcode(Instr))		if (const auto reason = isInvalidOpcode(Instr))
return make_error<Failure>(reason);		return make_error<Failure>(reason);

// LEA gets special attention.		// LEA gets special attention.
const auto Opcode = Instr.Description.getOpcode();		const auto Opcode = Instr.Description.getOpcode();
if (Opcode == X86::LEA64r \|\| Opcode == X86::LEA64_32r) {		if (Opcode == X86::LEA64r \|\| Opcode == X86::LEA64_32r) {
return generateLEATemplatesCommon(		return generateLEATemplatesCommon(
Instr, ForbiddenRegisters, State, Opts,		Instr, ForbiddenRegisters, State, Opts,
[this](unsigned BaseReg, unsigned IndexReg,		[this](unsigned BaseReg, unsigned IndexReg,
BitVector &CandidateDestRegs) {		BitVector &CandidateDestRegs) {
// Any destination register that is not used for addressing is fine.		// Any destination register that is not used for addressing is fine.
remove(CandidateDestRegs,		remove(CandidateDestRegs,
State.getRATC().getRegister(BaseReg).aliasedBits());		State.getRATC().getRegister(BaseReg).aliasedBits());
remove(CandidateDestRegs,		remove(CandidateDestRegs,
State.getRATC().getRegister(IndexReg).aliasedBits());		State.getRATC().getRegister(IndexReg).aliasedBits());
});		});
}		}

switch (getX86FPFlags(Instr)) {		switch (getX86FPFlags(Instr)) {
case X86II::NotFP:		case X86II::NotFP:
return ParallelSnippetGenerator::generateCodeTemplates(Instr,		return ParallelSnippetGenerator::generateCodeTemplates(Variant,
ForbiddenRegisters);		ForbiddenRegisters);
case X86II::ZeroArgFP:		case X86II::ZeroArgFP:
case X86II::OneArgFP:		case X86II::OneArgFP:
case X86II::SpecialFP:		case X86II::SpecialFP:
return make_error<Failure>("Unsupported x87 Instruction");		return make_error<Failure>("Unsupported x87 Instruction");
case X86II::OneArgFPRW:		case X86II::OneArgFPRW:
case X86II::TwoArgFP:		case X86II::TwoArgFP:
// These are instructions like		// These are instructions like
// - `ST(0) = fsqrt(ST(0))` (OneArgFPRW)		// - `ST(0) = fsqrt(ST(0))` (OneArgFPRW)
// - `ST(0) = ST(0) + ST(i)` (TwoArgFP)		// - `ST(0) = ST(0) + ST(i)` (TwoArgFP)
// They are intrinsically serial and do not modify the state of the stack.		// They are intrinsically serial and do not modify the state of the stack.
// We generate the same code for latency and uops.		// We generate the same code for latency and uops.
return generateSelfAliasingCodeTemplates(Instr);		return generateSelfAliasingCodeTemplates(Variant);
case X86II::CompareFP:		case X86II::CompareFP:
case X86II::CondMovFP:		case X86II::CondMovFP:
// We can compute uops for any FP instruction that does not grow or shrink		// We can compute uops for any FP instruction that does not grow or shrink
// the stack (either do not touch the stack or push as much as they pop).		// the stack (either do not touch the stack or push as much as they pop).
return generateUnconstrainedCodeTemplates(		return generateUnconstrainedCodeTemplates(
Instr, "instruction does not grow/shrink the FP stack");		Variant, "instruction does not grow/shrink the FP stack");
default:		default:
llvm_unreachable("Unknown FP Type!");		llvm_unreachable("Unknown FP Type!");
}		}
}		}

static unsigned getLoadImmediateOpcode(unsigned RegBitWidth) {		static unsigned getLoadImmediateOpcode(unsigned RegBitWidth) {
switch (RegBitWidth) {		switch (RegBitWidth) {
case 8:		case 8:
▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	private:
}		}

bool allowAsBackToBack(const Instruction &Instr) const override {		bool allowAsBackToBack(const Instruction &Instr) const override {
const unsigned Opcode = Instr.Description.Opcode;		const unsigned Opcode = Instr.Description.Opcode;
return !isInvalidOpcode(Instr) && Opcode != X86::LEA64r &&		return !isInvalidOpcode(Instr) && Opcode != X86::LEA64r &&
Opcode != X86::LEA64_32r && Opcode != X86::LEA16r;		Opcode != X86::LEA64_32r && Opcode != X86::LEA16r;
}		}

		std::vector<InstructionTemplate>
		generateInstructionVariants(const Instruction &Instr,
		unsigned MaxConfigsPerOpcode) const override;

std::unique_ptr<SnippetGenerator> createSerialSnippetGenerator(		std::unique_ptr<SnippetGenerator> createSerialSnippetGenerator(
const LLVMState &State,		const LLVMState &State,
const SnippetGenerator::Options &Opts) const override {		const SnippetGenerator::Options &Opts) const override {
return std::make_unique<X86SerialSnippetGenerator>(State, Opts);		return std::make_unique<X86SerialSnippetGenerator>(State, Opts);
}		}

std::unique_ptr<SnippetGenerator> createParallelSnippetGenerator(		std::unique_ptr<SnippetGenerator> createParallelSnippetGenerator(
const LLVMState &State,		const LLVMState &State,
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	Error ExegesisX86Target::randomizeTargetMCOperand(
const Instruction &Instr, const Variable &Var, MCOperand &AssignedValue,		const Instruction &Instr, const Variable &Var, MCOperand &AssignedValue,
const BitVector &ForbiddenRegs) const {		const BitVector &ForbiddenRegs) const {
const Operand &Op = Instr.getPrimaryOperand(Var);		const Operand &Op = Instr.getPrimaryOperand(Var);
switch (Op.getExplicitOperandInfo().OperandType) {		switch (Op.getExplicitOperandInfo().OperandType) {
case X86::OperandType::OPERAND_ROUNDING_CONTROL:		case X86::OperandType::OPERAND_ROUNDING_CONTROL:
AssignedValue =		AssignedValue =
MCOperand::createImm(randomIndex(X86::STATIC_ROUNDING::TO_ZERO));		MCOperand::createImm(randomIndex(X86::STATIC_ROUNDING::TO_ZERO));
return Error::success();		return Error::success();
case X86::OperandType::OPERAND_COND_CODE:
AssignedValue =
MCOperand::createImm(randomIndex(X86::CondCode::LAST_VALID_COND));
return Error::success();
default:		default:
break;		break;
}		}
return make_error<Failure>(		return make_error<Failure>(
Twine("unimplemented operand type ")		Twine("unimplemented operand type ")
.concat(Twine(Op.getExplicitOperandInfo().OperandType)));		.concat(Twine(Op.getExplicitOperandInfo().OperandType)));
}		}

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	if (Reg == X86::MXCSR)
return CI.loadImplicitRegAndFinalize(		return CI.loadImplicitRegAndFinalize(
STI.getFeatureBits()[X86::FeatureAVX] ? X86::VLDMXCSR : X86::LDMXCSR,		STI.getFeatureBits()[X86::FeatureAVX] ? X86::VLDMXCSR : X86::LDMXCSR,
0x1f80);		0x1f80);
if (Reg == X86::FPCW)		if (Reg == X86::FPCW)
return CI.loadImplicitRegAndFinalize(X86::FLDCW16m, 0x37f);		return CI.loadImplicitRegAndFinalize(X86::FLDCW16m, 0x37f);
return {}; // Not yet implemented.		return {}; // Not yet implemented.
}		}

		// Instruction can have some variable operands, and we may want to see how
		gchateletUnsubmitted Done Reply Inline Actions This function deserves some documentation gchatelet: This function deserves some documentation
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Yeah, and unit tests. this function ended up being too smart, although this is the best version i was able to come up with so far. lebedev.ri: Yeah, and unit tests. this function ended up being too smart, although this is the best…
		// different operands affect performance. So for each operand position,
		// precompute all the possible choices we might care about,
		// and greedily generate all the possible combinations of choices.
		std::vector<InstructionTemplate> ExegesisX86Target::generateInstructionVariants(
		const Instruction &Instr, unsigned MaxConfigsPerOpcode) const {
		bool Exploration = false;
		SmallVector<SmallVector<MCOperand, 1>, 4> VariableChoices;
		VariableChoices.resize(Instr.Variables.size());
		for (auto I : llvm::zip(Instr.Variables, VariableChoices)) {
		const Variable &Var = std::get<0>(I);
		SmallVectorImpl<MCOperand> &Choices = std::get<1>(I);

		switch (Instr.getPrimaryOperand(Var).getExplicitOperandInfo().OperandType) {
		default:
		// We don't wish to explicitly explore this variable.
		Choices.emplace_back(); // But add invalid MCOperand to simplify logic.
		continue;
		case X86::OperandType::OPERAND_COND_CODE: {
		Exploration = true;
		auto CondCodes = seq((int)X86::CondCode::COND_O,
		1 + (int)X86::CondCode::LAST_VALID_COND);
		Choices.reserve(std::distance(CondCodes.begin(), CondCodes.end()));
		for (int CondCode : CondCodes)
		Choices.emplace_back(MCOperand::createImm(CondCode));
		break;
		}
		gchateletUnsubmitted Done Reply Inline Actions This comment should be at the function level (possibly at the function declaration) gchatelet: This comment should be at the function level (possibly at the function declaration)
		}
		}

		// If we don't wish to explore any variables, defer to the baseline method.
		if (!Exploration)
		return ExegesisTarget::generateInstructionVariants(Instr,
		MaxConfigsPerOpcode);

		std::vector<InstructionTemplate> Variants;
		size_t NumVariants;
		CombinationGenerator<MCOperand, decltype(VariableChoices)::value_type, 4> G(
		VariableChoices, [&](ArrayRef<MCOperand> State) -> bool {
		Variants.emplace_back(&Instr);
		gchateletUnsubmitted Not Done Reply Inline Actions formatting is weird here, maybe extract the lambda out of the constructor. gchatelet: formatting is weird here, maybe extract the lambda out of the constructor.
		lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions Hm, rerun clang-format on this, and this time it produced different result. Is this better? lebedev.ri: Hm, rerun clang-format on this, and this time it produced different result. Is this better?
		Variants.back().setVariableValues(State);
		// Did we run out of space for variants?
		return Variants.size() >= NumVariants;
		});

		// How many operand combinations can we produce, within the limit?
		NumVariants = std::min(G.numCombinations(), (size_t)MaxConfigsPerOpcode);
		// And actually produce all the wanted operand combinations.
		Variants.reserve(NumVariants);
		G.generate();

		assert(Variants.size() == NumVariants &&
		Variants.size() <= MaxConfigsPerOpcode &&
		"Should not produce too many variants");
		return Variants;
		}

static ExegesisTarget *getTheExegesisX86Target() {		static ExegesisTarget *getTheExegesisX86Target() {
static ExegesisX86Target Target;		static ExegesisX86Target Target;
return &Target;		return &Target;
}		}

void InitializeX86ExegesisTarget() {		void InitializeX86ExegesisTarget() {
ExegesisTarget::registerTarget(getTheExegesisX86Target());		ExegesisTarget::registerTarget(getTheExegesisX86Target());
}		}

} // namespace exegesis		} // namespace exegesis
} // namespace llvm		} // namespace llvm

llvm/tools/llvm-exegesis/llvm-exegesis.cpp

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	generateSnippets(const LLVMState &State, unsigned Opcode,
// Ignore instructions that we cannot run.		// Ignore instructions that we cannot run.
if (InstrDesc.isPseudo())		if (InstrDesc.isPseudo())
return make_error<Failure>("Unsupported opcode: isPseudo");		return make_error<Failure>("Unsupported opcode: isPseudo");
if (InstrDesc.isBranch() \|\| InstrDesc.isIndirectBranch())		if (InstrDesc.isBranch() \|\| InstrDesc.isIndirectBranch())
return make_error<Failure>("Unsupported opcode: isBranch/isIndirectBranch");		return make_error<Failure>("Unsupported opcode: isBranch/isIndirectBranch");
if (InstrDesc.isCall() \|\| InstrDesc.isReturn())		if (InstrDesc.isCall() \|\| InstrDesc.isReturn())
return make_error<Failure>("Unsupported opcode: isCall/isReturn");		return make_error<Failure>("Unsupported opcode: isCall/isReturn");

		const std::vector<InstructionTemplate> InstructionVariants =
		State.getExegesisTarget().generateInstructionVariants(
		Instr, MaxConfigsPerOpcode);

SnippetGenerator::Options SnippetOptions;		SnippetGenerator::Options SnippetOptions;
SnippetOptions.MaxConfigsPerOpcode = MaxConfigsPerOpcode;		SnippetOptions.MaxConfigsPerOpcode = MaxConfigsPerOpcode;
const std::unique_ptr<SnippetGenerator> Generator =		const std::unique_ptr<SnippetGenerator> Generator =
State.getExegesisTarget().createSnippetGenerator(BenchmarkMode, State,		State.getExegesisTarget().createSnippetGenerator(BenchmarkMode, State,
SnippetOptions);		SnippetOptions);
if (!Generator)		if (!Generator)
ExitWithError("cannot create snippet generator");		ExitWithError("cannot create snippet generator");
return Generator->generateConfigurations(Instr, ForbiddenRegs);
		std::vector<BenchmarkCode> Benchmarks;
		for (const InstructionTemplate &Variant : InstructionVariants) {
		if (Benchmarks.size() >= MaxConfigsPerOpcode)
		break;
		if (auto Err = Generator->generateConfigurations(Variant, Benchmarks,
		ForbiddenRegs))
		return std::move(Err);
		}
		return Benchmarks;
}		}

void benchmarkMain() {		void benchmarkMain() {
#ifndef HAVE_LIBPFM		#ifndef HAVE_LIBPFM
ExitWithError("benchmarking unavailable, LLVM was built without libpfm.");		ExitWithError("benchmarking unavailable, LLVM was built without libpfm.");
#endif		#endif

if (exegesis::pfm::pfmInitialize())		if (exegesis::pfm::pfmInitialize())
▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/unittests/tools/llvm-exegesis/CMakeLists.txt

Show All 9 Lines	set(LLVM_LINK_COMPONENTS
Symbolize		Symbolize
)		)

add_llvm_unittest(LLVMExegesisTests		add_llvm_unittest(LLVMExegesisTests
BenchmarkRunnerTest.cpp		BenchmarkRunnerTest.cpp
ClusteringTest.cpp		ClusteringTest.cpp
PerfHelperTest.cpp		PerfHelperTest.cpp
RegisterValueTest.cpp		RegisterValueTest.cpp
		SnippetGeneratorTest.cpp
)		)
target_link_libraries(LLVMExegesisTests PRIVATE LLVMExegesis)		target_link_libraries(LLVMExegesisTests PRIVATE LLVMExegesis)

if(LLVM_TARGETS_TO_BUILD MATCHES "X86")		if(LLVM_TARGETS_TO_BUILD MATCHES "X86")
add_subdirectory(X86)		add_subdirectory(X86)
endif()		endif()
if(LLVM_TARGETS_TO_BUILD MATCHES "ARM")		if(LLVM_TARGETS_TO_BUILD MATCHES "ARM")
add_subdirectory(ARM)		add_subdirectory(ARM)
Show All 10 Lines

llvm/unittests/tools/llvm-exegesis/Mips/SnippetGeneratorTest.cpp

Show All 34 Lines
class SnippetGeneratorTest : public MipsSnippetGeneratorTest {		class SnippetGeneratorTest : public MipsSnippetGeneratorTest {
protected:		protected:
SnippetGeneratorTest() : Generator(State, SnippetGenerator::Options()) {}		SnippetGeneratorTest() : Generator(State, SnippetGenerator::Options()) {}

std::vector<CodeTemplate> checkAndGetCodeTemplates(unsigned Opcode) {		std::vector<CodeTemplate> checkAndGetCodeTemplates(unsigned Opcode) {
randomGenerator().seed(0); // Initialize seed.		randomGenerator().seed(0); // Initialize seed.
const Instruction &Instr = State.getIC().getInstr(Opcode);		const Instruction &Instr = State.getIC().getInstr(Opcode);
auto CodeTemplateOrError = Generator.generateCodeTemplates(		auto CodeTemplateOrError = Generator.generateCodeTemplates(
Instr, State.getRATC().emptyRegisters());		&Instr, State.getRATC().emptyRegisters());
EXPECT_FALSE(CodeTemplateOrError.takeError()); // Valid configuration.		EXPECT_FALSE(CodeTemplateOrError.takeError()); // Valid configuration.
return std::move(CodeTemplateOrError.get());		return std::move(CodeTemplateOrError.get());
}		}

SnippetGeneratorT Generator;		SnippetGeneratorT Generator;
};		};

using SerialSnippetGeneratorTest = SnippetGeneratorTest<SerialSnippetGenerator>;		using SerialSnippetGeneratorTest = SnippetGeneratorTest<SerialSnippetGenerator>;
Show All 34 Lines	TEST_F(SerialSnippetGeneratorTest,
// - Var0 [Op0]		// - Var0 [Op0]
// - Var1 [Op1]		// - Var1 [Op1]
// - Var2 [Op2]		// - Var2 [Op2]
// - hasAliasingRegisters		// - hasAliasingRegisters
randomGenerator().seed(0); // Initialize seed.		randomGenerator().seed(0); // Initialize seed.
const Instruction &Instr = State.getIC().getInstr(Mips::XOR);		const Instruction &Instr = State.getIC().getInstr(Mips::XOR);
auto AllRegisters = State.getRATC().emptyRegisters();		auto AllRegisters = State.getRATC().emptyRegisters();
AllRegisters.flip();		AllRegisters.flip();
auto Error = Generator.generateCodeTemplates(Instr, AllRegisters).takeError();		auto Error =
		Generator.generateCodeTemplates(&Instr, AllRegisters).takeError();
EXPECT_TRUE((bool)Error);		EXPECT_TRUE((bool)Error);
consumeError(std::move(Error));		consumeError(std::move(Error));
}		}

TEST_F(ParallelSnippetGeneratorTest, MemoryUse) {		TEST_F(ParallelSnippetGeneratorTest, MemoryUse) {
// LB reads from memory.		// LB reads from memory.
// - LB		// - LB
// - Op0 Explicit Def RegClass(GPR32)		// - Op0 Explicit Def RegClass(GPR32)
Show All 25 Lines

llvm/unittests/tools/llvm-exegesis/SnippetGeneratorTest.cpp

This file was added.

				//===-- SnippetGeneratorTest.cpp --------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "SnippetGenerator.h"
				#include "gmock/gmock.h"
				#include "gtest/gtest.h"
				#include <initializer_list>

				namespace llvm {
				namespace exegesis {

				namespace {

				TEST(CombinationGenerator, Square) {
				const std::vector<std::vector<int>> Choices{{0, 1}, {2, 3}};

				std::vector<std::vector<int>> Variants;
				CombinationGenerator<int, std::vector<int>, 4> G(
				Choices, [&](ArrayRef<int> State) -> bool {
				Variants.emplace_back(State);
				return false; // keep going
				});
				const size_t NumVariants = G.numCombinations();
				G.generate();

				const std::vector<std::vector<int>> ExpectedVariants{
				{0, 2},
				{0, 3},
				{1, 2},
				{1, 3},
				};
				ASSERT_THAT(Variants, ::testing::SizeIs(NumVariants));
				ASSERT_THAT(Variants, ::testing::ContainerEq(ExpectedVariants));
				gchateletUnsubmitted Done Reply Inline Actions https://github.com/google/googletest/blob/master/googlemock/docs/cheat_sheet.md#container-matchers You should be able to write ASSERT_THAT(Variants, ElementsAreArray(ExpectedVariants)); or ASSERT_THAT(Variants, ElementsAreArray({ {0, 2}, {0, 3}, {1, 2}, {1, 3}, })); gchatelet: https://github.com/google/googletest/blob/master/googlemock/docs/cheat_sheet.md#container…
				}

				TEST(CombinationGenerator, MiddleColumn) {
				const std::vector<std::vector<int>> Choices{{0}, {1, 2}, {3}};

				std::vector<std::vector<int>> Variants;
				CombinationGenerator<int, std::vector<int>, 4> G(
				Choices, [&](ArrayRef<int> State) -> bool {
				Variants.emplace_back(State);
				return false; // keep going
				});
				const size_t NumVariants = G.numCombinations();
				G.generate();

				const std::vector<std::vector<int>> ExpectedVariants{
				{0, 1, 3},
				{0, 2, 3},
				};
				ASSERT_THAT(Variants, ::testing::SizeIs(NumVariants));
				ASSERT_THAT(Variants, ::testing::ContainerEq(ExpectedVariants));
				}
				gchateletUnsubmitted Done Reply Inline Actions ditto here and below gchatelet: ditto here and below

				TEST(CombinationGenerator, SideColumns) {
				const std::vector<std::vector<int>> Choices{{0, 1}, {2}, {3, 4}};

				std::vector<std::vector<int>> Variants;
				CombinationGenerator<int, std::vector<int>, 4> G(
				Choices, [&](ArrayRef<int> State) -> bool {
				Variants.emplace_back(State);
				return false; // keep going
				});
				const size_t NumVariants = G.numCombinations();
				G.generate();

				const std::vector<std::vector<int>> ExpectedVariants{
				{0, 2, 3},
				{0, 2, 4},
				{1, 2, 3},
				{1, 2, 4},
				};
				ASSERT_THAT(Variants, ::testing::SizeIs(NumVariants));
				ASSERT_THAT(Variants, ::testing::ContainerEq(ExpectedVariants));
				}

				TEST(CombinationGenerator, LeftColumn) {
				const std::vector<std::vector<int>> Choices{{0, 1}, {2}};

				std::vector<std::vector<int>> Variants;
				CombinationGenerator<int, std::vector<int>, 4> G(
				Choices, [&](ArrayRef<int> State) -> bool {
				Variants.emplace_back(State);
				return false; // keep going
				});
				const size_t NumVariants = G.numCombinations();
				G.generate();

				const std::vector<std::vector<int>> ExpectedVariants{
				{0, 2},
				{1, 2},
				};
				ASSERT_THAT(Variants, ::testing::SizeIs(NumVariants));
				ASSERT_THAT(Variants, ::testing::ContainerEq(ExpectedVariants));
				}

				TEST(CombinationGenerator, RightColumn) {
				const std::vector<std::vector<int>> Choices{{0}, {1, 2}};

				std::vector<std::vector<int>> Variants;
				CombinationGenerator<int, std::vector<int>, 4> G(
				Choices, [&](ArrayRef<int> State) -> bool {
				Variants.emplace_back(State);
				return false; // keep going
				});
				const size_t NumVariants = G.numCombinations();
				G.generate();

				const std::vector<std::vector<int>> ExpectedVariants{
				{0, 1},
				{0, 2},
				};
				ASSERT_THAT(Variants, ::testing::SizeIs(NumVariants));
				ASSERT_THAT(Variants, ::testing::ContainerEq(ExpectedVariants));
				}

				TEST(CombinationGenerator, Column) {
				const std::vector<std::vector<int>> Choices{{0, 1}};

				std::vector<std::vector<int>> Variants;
				CombinationGenerator<int, std::vector<int>, 4> G(
				Choices, [&](ArrayRef<int> State) -> bool {
				Variants.emplace_back(State);
				return false; // keep going
				});
				const size_t NumVariants = G.numCombinations();
				G.generate();

				const std::vector<std::vector<int>> ExpectedVariants{
				{0},
				{1},
				};
				ASSERT_THAT(Variants, ::testing::SizeIs(NumVariants));
				ASSERT_THAT(Variants, ::testing::ContainerEq(ExpectedVariants));
				}

				TEST(CombinationGenerator, Row) {
				const std::vector<std::vector<int>> Choices{{0}, {1}};

				std::vector<std::vector<int>> Variants;
				CombinationGenerator<int, std::vector<int>, 4> G(
				Choices, [&](ArrayRef<int> State) -> bool {
				Variants.emplace_back(State);
				return false; // keep going
				});
				const size_t NumVariants = G.numCombinations();
				G.generate();

				const std::vector<std::vector<int>> ExpectedVariants{
				{0, 1},
				};
				ASSERT_THAT(Variants, ::testing::SizeIs(NumVariants));
				ASSERT_THAT(Variants, ::testing::ContainerEq(ExpectedVariants));
				}

				TEST(CombinationGenerator, Singleton) {
				const std::vector<std::vector<int>> Choices{{0}};

				std::vector<std::vector<int>> Variants;
				CombinationGenerator<int, std::vector<int>, 4> G(
				Choices, [&](ArrayRef<int> State) -> bool {
				Variants.emplace_back(State);
				return false; // keep going
				});
				const size_t NumVariants = G.numCombinations();
				G.generate();

				const std::vector<std::vector<int>> ExpectedVariants{
				{0},
				};
				ASSERT_THAT(Variants, ::testing::SizeIs(NumVariants));
				ASSERT_THAT(Variants, ::testing::ContainerEq(ExpectedVariants));
				}
				gchateletUnsubmitted Done Reply Inline Actions It's not really empty then. maybe rename with `Singleton` gchatelet: It's not really empty then. maybe rename with `Singleton`
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions By empty i meant that the "exploration" space is empty - there is only a single possibility. lebedev.ri: By empty i meant that the "exploration" space is empty - there is only a single possibility.

				} // namespace
				} // namespace exegesis
				} // namespace llvm
				gchateletUnsubmitted Done Reply Inline Actions ASSERT_THAT(Variants, IsEmpty()); gchatelet: ``` ASSERT_THAT(Variants, IsEmpty()); ```
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions But it's not empty, it contains a single element `0`? lebedev.ri: But it's not empty, it contains a single element `0`?

llvm/unittests/tools/llvm-exegesis/X86/SnippetGeneratorTest.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
class SnippetGeneratorTest : public X86SnippetGeneratorTest {		class SnippetGeneratorTest : public X86SnippetGeneratorTest {
protected:		protected:
SnippetGeneratorTest() : Generator(State, SnippetGenerator::Options()) {}		SnippetGeneratorTest() : Generator(State, SnippetGenerator::Options()) {}

std::vector<CodeTemplate> checkAndGetCodeTemplates(unsigned Opcode) {		std::vector<CodeTemplate> checkAndGetCodeTemplates(unsigned Opcode) {
randomGenerator().seed(0); // Initialize seed.		randomGenerator().seed(0); // Initialize seed.
const Instruction &Instr = State.getIC().getInstr(Opcode);		const Instruction &Instr = State.getIC().getInstr(Opcode);
auto CodeTemplateOrError = Generator.generateCodeTemplates(		auto CodeTemplateOrError = Generator.generateCodeTemplates(
Instr, State.getRATC().emptyRegisters());		&Instr, State.getRATC().emptyRegisters());
EXPECT_FALSE(CodeTemplateOrError.takeError()); // Valid configuration.		EXPECT_FALSE(CodeTemplateOrError.takeError()); // Valid configuration.
return std::move(CodeTemplateOrError.get());		return std::move(CodeTemplateOrError.get());
}		}

SnippetGeneratorT Generator;		SnippetGeneratorT Generator;
};		};

using SerialSnippetGeneratorTest = SnippetGeneratorTest<SerialSnippetGenerator>;		using SerialSnippetGeneratorTest = SnippetGeneratorTest<SerialSnippetGenerator>;
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	TEST_F(SerialSnippetGeneratorTest,
// - Var1 [Op1]		// - Var1 [Op1]
// - Var2 [Op2]		// - Var2 [Op2]
// - hasAliasingRegisters		// - hasAliasingRegisters
const unsigned Opcode = X86::VXORPSrr;		const unsigned Opcode = X86::VXORPSrr;
randomGenerator().seed(0); // Initialize seed.		randomGenerator().seed(0); // Initialize seed.
const Instruction &Instr = State.getIC().getInstr(Opcode);		const Instruction &Instr = State.getIC().getInstr(Opcode);
auto AllRegisters = State.getRATC().emptyRegisters();		auto AllRegisters = State.getRATC().emptyRegisters();
AllRegisters.flip();		AllRegisters.flip();
auto Error = Generator.generateCodeTemplates(Instr, AllRegisters).takeError();		auto Error =
		Generator.generateCodeTemplates(&Instr, AllRegisters).takeError();
EXPECT_TRUE((bool)Error);		EXPECT_TRUE((bool)Error);
consumeError(std::move(Error));		consumeError(std::move(Error));
}		}

TEST_F(SerialSnippetGeneratorTest, DependencyThroughOtherOpcode) {		TEST_F(SerialSnippetGeneratorTest, DependencyThroughOtherOpcode) {
// - CMP64rr		// - CMP64rr
// - Op0 Explicit Use RegClass(GR64)		// - Op0 Explicit Use RegClass(GR64)
// - Op1 Explicit Use RegClass(GR64)		// - Op1 Explicit Use RegClass(GR64)
Show All 37 Lines	TEST_F(SerialSnippetGeneratorTest, VCVTUSI642SDZrrb_Int) {
// - VCVTUSI642SDZrrb_Int		// - VCVTUSI642SDZrrb_Int
// - Op0 Explicit Def RegClass(VR128X)		// - Op0 Explicit Def RegClass(VR128X)
// - Op1 Explicit Use RegClass(VR128X)		// - Op1 Explicit Use RegClass(VR128X)
// - Op2 Explicit Use STATIC_ROUNDING		// - Op2 Explicit Use STATIC_ROUNDING
// - Op2 Explicit Use RegClass(GR64)		// - Op2 Explicit Use RegClass(GR64)
// - Op4 Implicit Use Reg(MXSCR)		// - Op4 Implicit Use Reg(MXSCR)
const unsigned Opcode = X86::VCVTUSI642SDZrrb_Int;		const unsigned Opcode = X86::VCVTUSI642SDZrrb_Int;
const Instruction &Instr = State.getIC().getInstr(Opcode);		const Instruction &Instr = State.getIC().getInstr(Opcode);
auto Configs =		std::vector<BenchmarkCode> Configs;
Generator.generateConfigurations(Instr, State.getRATC().emptyRegisters());		auto Error = Generator.generateConfigurations(
ASSERT_FALSE(Configs.takeError());		&Instr, Configs, State.getRATC().emptyRegisters());
ASSERT_THAT(*Configs, SizeIs(1));		ASSERT_FALSE(Error);
const BenchmarkCode &BC = (*Configs)[0];		ASSERT_THAT(Configs, SizeIs(1));
		const BenchmarkCode &BC = Configs[0];
ASSERT_THAT(BC.Key.Instructions, SizeIs(1));		ASSERT_THAT(BC.Key.Instructions, SizeIs(1));
ASSERT_TRUE(BC.Key.Instructions[0].getOperand(3).isImm());		ASSERT_TRUE(BC.Key.Instructions[0].getOperand(3).isImm());
}		}

TEST_F(ParallelSnippetGeneratorTest, ParallelInstruction) {		TEST_F(ParallelSnippetGeneratorTest, ParallelInstruction) {
// - BNDCL32rr		// - BNDCL32rr
// - Op0 Explicit Use RegClass(BNDR)		// - Op0 Explicit Use RegClass(BNDR)
// - Op1 Explicit Use RegClass(GR32)		// - Op1 Explicit Use RegClass(GR32)
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	TEST_F(ParallelSnippetGeneratorTest, MemoryUse) {
EXPECT_EQ(IT.getVariableValues()[3].getReg(), 0u);		EXPECT_EQ(IT.getVariableValues()[3].getReg(), 0u);
EXPECT_EQ(IT.getVariableValues()[4].getImm(), 0);		EXPECT_EQ(IT.getVariableValues()[4].getImm(), 0);
EXPECT_EQ(IT.getVariableValues()[5].getReg(), 0u);		EXPECT_EQ(IT.getVariableValues()[5].getReg(), 0u);
}		}

TEST_F(ParallelSnippetGeneratorTest, MOV16ms) {		TEST_F(ParallelSnippetGeneratorTest, MOV16ms) {
const unsigned Opcode = X86::MOV16ms;		const unsigned Opcode = X86::MOV16ms;
const Instruction &Instr = State.getIC().getInstr(Opcode);		const Instruction &Instr = State.getIC().getInstr(Opcode);
auto Err =		std::vector<BenchmarkCode> Benchmarks;
Generator.generateConfigurations(Instr, State.getRATC().emptyRegisters())		auto Err = Generator.generateConfigurations(&Instr, Benchmarks,
.takeError();		State.getRATC().emptyRegisters());
EXPECT_TRUE((bool)Err);		EXPECT_TRUE((bool)Err);
EXPECT_THAT(toString(std::move(Err)),		EXPECT_THAT(toString(std::move(Err)),
testing::HasSubstr("no available registers"));		testing::HasSubstr("no available registers"));
}		}

class FakeSnippetGenerator : public SnippetGenerator {		class FakeSnippetGenerator : public SnippetGenerator {
public:		public:
FakeSnippetGenerator(const LLVMState &State, const Options &Opts)		FakeSnippetGenerator(const LLVMState &State, const Options &Opts)
: SnippetGenerator(State, Opts) {}		: SnippetGenerator(State, Opts) {}

const Instruction &getInstr(unsigned Opcode) {		const Instruction &getInstr(unsigned Opcode) {
return State.getIC().getInstr(Opcode);		return State.getIC().getInstr(Opcode);
}		}

InstructionTemplate getInstructionTemplate(unsigned Opcode) {		InstructionTemplate getInstructionTemplate(unsigned Opcode) {
return {&getInstr(Opcode)};		return {&getInstr(Opcode)};
}		}

private:		private:
Expected<std::vector<CodeTemplate>>		Expected<std::vector<CodeTemplate>>
generateCodeTemplates(const Instruction &, const BitVector &) const override {		generateCodeTemplates(InstructionTemplate, const BitVector &) const override {
return make_error<StringError>("not implemented", inconvertibleErrorCode());		return make_error<StringError>("not implemented", inconvertibleErrorCode());
}		}
};		};

using FakeSnippetGeneratorTest = SnippetGeneratorTest<FakeSnippetGenerator>;		using FakeSnippetGeneratorTest = SnippetGeneratorTest<FakeSnippetGenerator>;

testing::Matcher<const RegisterValue &> IsRegisterValue(unsigned Reg,		testing::Matcher<const RegisterValue &> IsRegisterValue(unsigned Reg,
APInt Value) {		APInt Value) {
Show All 15 Lines	TEST_F(FakeSnippetGeneratorTest, MemoryUse_Movsb) {
// - Var0 [Op0]		// - Var0 [Op0]
// - Var1 [Op1]		// - Var1 [Op1]
// - Var2 [Op2]		// - Var2 [Op2]
// - hasMemoryOperands		// - hasMemoryOperands
// - hasAliasingImplicitRegisters (execution is always serial)		// - hasAliasingImplicitRegisters (execution is always serial)
// - hasAliasingRegisters		// - hasAliasingRegisters
const unsigned Opcode = X86::MOVSB;		const unsigned Opcode = X86::MOVSB;
const Instruction &Instr = State.getIC().getInstr(Opcode);		const Instruction &Instr = State.getIC().getInstr(Opcode);
auto Error =		std::vector<BenchmarkCode> Benchmarks;
Generator.generateConfigurations(Instr, State.getRATC().emptyRegisters())		auto Error = Generator.generateConfigurations(
.takeError();		&Instr, Benchmarks, State.getRATC().emptyRegisters());
EXPECT_TRUE((bool)Error);		EXPECT_TRUE((bool)Error);
consumeError(std::move(Error));		consumeError(std::move(Error));
}		}

TEST_F(FakeSnippetGeneratorTest, ComputeRegisterInitialValuesAdd16ri) {		TEST_F(FakeSnippetGeneratorTest, ComputeRegisterInitialValuesAdd16ri) {
// ADD16ri:		// ADD16ri:
// explicit def 0 : reg RegClass=GR16		// explicit def 0 : reg RegClass=GR16
// explicit use 1 : reg RegClass=GR16 \| TIED_TO:0		// explicit use 1 : reg RegClass=GR16 \| TIED_TO:0
Show All 39 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODEClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 244221

llvm/test/tools/llvm-exegesis/X86/latency-SETCCr-cond-codes-sweep.s

llvm/tools/llvm-exegesis/lib/CodeTemplate.h

llvm/tools/llvm-exegesis/lib/ParallelSnippetGenerator.h

llvm/tools/llvm-exegesis/lib/ParallelSnippetGenerator.cpp

llvm/tools/llvm-exegesis/lib/SerialSnippetGenerator.h

llvm/tools/llvm-exegesis/lib/SerialSnippetGenerator.cpp

llvm/tools/llvm-exegesis/lib/SnippetGenerator.h

llvm/tools/llvm-exegesis/lib/SnippetGenerator.cpp

llvm/tools/llvm-exegesis/lib/Target.h

llvm/tools/llvm-exegesis/lib/X86/Target.cpp

llvm/tools/llvm-exegesis/llvm-exegesis.cpp

llvm/unittests/tools/llvm-exegesis/CMakeLists.txt

llvm/unittests/tools/llvm-exegesis/Mips/SnippetGeneratorTest.cpp

llvm/unittests/tools/llvm-exegesis/SnippetGeneratorTest.cpp

llvm/unittests/tools/llvm-exegesis/X86/SnippetGeneratorTest.cpp

[llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE
ClosedPublic