This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Scalar/
-
llvm/
-
Transforms/
-
Scalar/
-
LICM.h
-
lib/
-
Target/BPF/
-
BPF/
1/2
BPFTargetTransformInfo.h
-
CMakeLists.txt
-
Transforms/Scalar/
-
Scalar/
3
LICM.cpp
-
test/CodeGen/BPF/
-
CodeGen/
-
BPF/
1
licm-minmax-hoisted.ll

Differential D147078

[LICM][BPF] Disable hoistMinMax() optimization for BPF target
Needs RevisionPublic

Authored by yonghong-song on Mar 28 2023, 1:37 PM.

Download Raw Diff

Details

Reviewers

ast
mkazantsev
nikic
chandlerc

Summary

Recently, we hit a bpf verification failure due to llvm17 patch D143726 which
tries to simplify (X < A && X < B) into (X < MIN(A, B)) if MIN(A, B) is
loop-invariant.

The current bpf verifier is not able to handle newly-generated code pattern.
Alexei pushed a workaround at
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit?id=3c2611bac08a834697be918ac357eaff2e47d5b3
to temporarily fix the issue.

Although we could improve verifier for this case, but if users use llvm17 for
old kernels, the problem still exists. So ideally the issue should be fixed
in llvm.

Add flag in LICM to control MinMaxHoisting optimization. The default
is true and the BPF backend will turn it off. This resolved the issue.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yonghong-song created this revision.Mar 28 2023, 1:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 1:37 PM

Herald added subscribers: StephenFan, asbirlea, hiraditya. · View Herald Transcript

yonghong-song requested review of this revision.Mar 28 2023, 1:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 1:37 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

anakryiko added inline comments.Mar 28 2023, 3:50 PM

llvm/lib/Transforms/Scalar/LICM.cpp
998	instead of hard-coding "isBPFTarget" checks, would it be possible to instead have target-specific "isOptimizationEnabled()" callback, which different architectures could override. For such optional optimizations we can have an enum specifying what kind of transformation/optimization it is, and then BPF target could decided whether optimization should or should not be done. This will keep optimization code target-agnostic. We'll just have optional optimizations and related generic checks before trying to perform them?

If you have a target that gets confused by a correct transform, please add a cl::opt flag controlling it (true by default) and set if to false for your target.

This revision now requires changes to proceed.Mar 28 2023, 9:37 PM

Would it be fair to say that from the perspective of the BPF verifier, x < y && x < z is generally preferred over x < min(y, z), because the verifier does not know that x < min(y, z) implies x < y etc? If so, I think it would make the most sense to add an undo transform in the BPF backend.

Harbormaster completed remote builds in B222331: Diff 509116.Mar 29 2023, 12:27 AM

In D147078#4229618, @nikic wrote:

Would it be fair to say that from the perspective of the BPF verifier, x < y && x < z is generally preferred over x < min(y, z), because the verifier does not know that x < min(y, z) implies x < y etc?

Yes. BPF verifier isn't able to track complicated relationships between registers, so such transformations are generally breaking otherwise correct code.

If so, I think it would make the most sense to add an undo transform in the BPF backend.

Genuine curiosity (and keep in mind I'm no compiler expert at all), but why doing transformation and then customly undoing it in specific BPF backend would be a preferable solution to allowing backends to turn off some known-to-hurt transformations in the first place? It seems more complicated and error-prone to do the dance of undoing, rather than have a generic interface to prevent transformation in the first place.

Note, this is not the only transformation that hurts BPF verification, @yonghong-song previously added "undo transformation" logic, but it seems like a not very sustainable and very convoluted solution. So we are asking if we can have a simpler and more reliable opt-out for some small set of transformations.

In D147078#4231270, @anakryiko wrote:

If so, I think it would make the most sense to add an undo transform in the BPF backend.

Genuine curiosity (and keep in mind I'm no compiler expert at all), but why doing transformation and then customly undoing it in specific BPF backend would be a preferable solution to allowing backends to turn off some known-to-hurt transformations in the first place? It seems more complicated and error-prone to do the dance of undoing, rather than have a generic interface to prevent transformation in the first place.

Note, this is not the only transformation that hurts BPF verification, @yonghong-song previously added "undo transformation" logic, but it seems like a not very sustainable and very convoluted solution. So we are asking if we can have a simpler and more reliable opt-out for some small set of transformations.

One goal of LICM is to target-independently canonicalise the IR to enable other IR optimizations, so IMO it would be preferable to avoid disabling parts via target hooks. Also, the currently proposed hook is not really scalable, e.g. what if another target wants to opt out as well?

Note, this is not the only transformation that hurts BPF verification, @yonghong-song previously added "undo transformation" logic, but it seems like a not very sustainable and very convoluted solution. So we are asking if we can have a simpler and more reliable opt-out for some small set of transformations.

One goal of LICM is to target-independently canonicalise the IR to enable other IR optimizations, so IMO it would be preferable to avoid disabling parts via target hooks. Also, the currently proposed hook is not really scalable, e.g. what if another target wants to opt out as well?

I agree that we shouldn't do "isBPFTarget()" check like proposed in current version. See my comment below, where I proposed to have more generic "isTransformEnabled(enum transform_id)" method, which could be overriden by different backends. Then BPF would have a simple override in BPFTTIImpl:

bool isTransformEnabled(transform_id id) {
  switch (transform_id) {
  case transform_id::HOIST_MIN_MAX:
    return false;
  default:
    return true;
  }

while most other backends will just have default "return true;" implementation.

Note that we don't have to enumerate all possible transforms in enum transform_id, only those that are optional.

Does it make sense?

In D147078#4231270, @anakryiko wrote:

If so, I think it would make the most sense to add an undo transform in the BPF backend.

Genuine curiosity (and keep in mind I'm no compiler expert at all), but why doing transformation and then customly undoing it in specific BPF backend would be a preferable solution to allowing backends to turn off some known-to-hurt transformations in the first place? It seems more complicated and error-prone to do the dance of undoing, rather than have a generic interface to prevent transformation in the first place.

Note, this is not the only transformation that hurts BPF verification, @yonghong-song previously added "undo transformation" logic, but it seems like a not very sustainable and very convoluted solution. So we are asking if we can have a simpler and more reliable opt-out for some small set of transformations.

Just blocking a transform is generally insufficient, because it does not handle the case where the code uses the problematic pattern in the first place. If the input code already contained the x < min(y, z) pattern, it would fail the verifier and converting it into x < y && y < z would be necessary to make it pass.

If certain canonicalization transforms are undesirable for a specific target, this is always addressed by adding backend undo transforms, not by disabling canonicalizations. Use of TTI hooks inside canonicalization passes is generally rejected as a matter policy -- you need a very good case for that (e.g. an undo transform being impossible for some reason).

In D147078#4231608, @nikic wrote:

In D147078#4231270, @anakryiko wrote:

If so, I think it would make the most sense to add an undo transform in the BPF backend.

Genuine curiosity (and keep in mind I'm no compiler expert at all), but why doing transformation and then customly undoing it in specific BPF backend would be a preferable solution to allowing backends to turn off some known-to-hurt transformations in the first place? It seems more complicated and error-prone to do the dance of undoing, rather than have a generic interface to prevent transformation in the first place.

Note, this is not the only transformation that hurts BPF verification, @yonghong-song previously added "undo transformation" logic, but it seems like a not very sustainable and very convoluted solution. So we are asking if we can have a simpler and more reliable opt-out for some small set of transformations.

Just blocking a transform is generally insufficient, because it does not handle the case where the code uses the problematic pattern in the first place. If the input code already contained the x < min(y, z) pattern, it would fail the verifier and converting it into x < y && y < z would be necessary to make it pass.

This is normally not a problem, because people writing BPF code are aware of such limitations and do write input C code in such a way as to pass BPF verification (e.g., no one would do x < min(y,z), it is always written as x < y && x < z, where either y or z is a constant; that makes this code verifiable to BPF verifier in kernel).

So the issue here is that Clang transforms original code to something that's not verifiable, with user (programmer) having no control. And that's what we are trying to address.

We have or had similar issues with few other transformations. E.g., if I remember correctly, typical (and, if not transformed, correctly verifiable) code:

if (x >= y && x < z) { /* use x */ }

would be transformed to

x` = x - y;
if (x` < (z - y)) { /* use **original** x */ }

which breaks user code. Users have to do all sorts of tricks just to prevent this transformation.

Another one that constantly causes problems compiler transforming this:

struct some_type *t = ...;
if (t != NULL)
  x = t->some_field

to reordered low-level assembly that does offset adjustment before the NULL check, again breaking BPF verification for no good reason (as far as users are concerned). Roughly in the following manner (pseudo C code, of course, just to demonstrate):

struct some_type *t = ...;
int *field_ptr = &t->some_field; /* this is too early, verifier rejects this */
if (t != NULL)
    x = *field_ptr

If certain canonicalization transforms are undesirable for a specific target, this is always addressed by adding backend undo transforms, not by disabling canonicalizations. Use of TTI hooks inside canonicalization passes is generally rejected as a matter policy -- you need a very good case for that (e.g. an undo transform being impossible for some reason).

As I said, not a compiler expert and I don't know LLVM/Clang internals, but this pattern matching + undoing transform seems very error-prone and seems like always leaving some corner cases not covered. So while I understand overall desire to minimize the list of optional transforms, it still seems much cleaner and resulting in much simpler overall LLVM/Clang code to just prevent transform than do + undo it.

This is a very significant and recurring problem. People keep complaining and reporting very confusing (to them) issues, only due to some Clang transformations that just hurt BPF target without adding any benefit.

It's described as "fighting with verifier", commonly, even though it's really mostly "fighting with compiler". I know it's a bit hard to emphatize without having experienced this over and over personally or while helping multitude of users, but this is a really big pain point. Any help from LLVM community in making this go away (or at least be reduced) would be greatly appreciated by entire BPF community. Thanks!

I will try @mkazantsev approach by add a flag to control the optimization and target could overwrite it. I think there are some precedence for this approach. Let me do a little bit more research and resubmit the patch.

Add a flag to control MinMaxHoisting optimization. The default is true, and BPF backend turns off this optimization.

xbolva00 added a subscriber: xbolva00.Mar 30 2023, 11:46 AM

xbolva00 added inline comments.

llvm/lib/Target/BPF/BPFTargetTransformInfo.h
40	Extra comment why this optimization is disabled?

yonghong-song added inline comments.Mar 30 2023, 12:58 PM

llvm/lib/Target/BPF/BPFTargetTransformInfo.h
40	Sure. Will do.

Harbormaster completed remote builds in B222800: Diff 509748.Mar 30 2023, 1:12 PM

yonghong-song edited the summary of this revision. (Show Details)Mar 30 2023, 2:07 PM

yonghong-song edited the summary of this revision. (Show Details)Mar 30 2023, 2:15 PM

Add a comment to explain why bpf backend wants to disable MinMaxHoisting optimization.

Harbormaster completed remote builds in B222848: Diff 509814.Mar 30 2023, 4:22 PM

I'm fine with adding the option. However I think this patch is not solving the problem you have.

LICM is not the only pass that may potentially do it. For example, InstCombine may end up doing something similar. As people above have said, this is not a scalable approach. In general, any construction that is legal from LLVM's perspective may appear in your code. You cannot rely on fact that everybody knows what is or what is not legal to do for this particular backend; assume that they will do every thing illegal (as long as it compleis with LLVM LangRef), and if backend wants to limit LLVM, it is its job to make sure that all its requirements are met.

With this transform you are lucky: it's isolated and there is a clear place where to add a check. In general, it might not be the case.

I think you still need a patch that will undo the transform for this backend.

llvm/lib/Transforms/Scalar/LICM.cpp
998	Might as well in `hoistMinMax` do if (!EnableMinMaxHoisting) return false; It's more reliable in case if someone else wants to use it in another place.
llvm/test/CodeGen/BPF/licm-minmax-hoisted.ll
2	Can you instead dump a dedicated test for LICM? For that, pass `print-before=licm print-module-scope`, and dump IR before LICM. You can also use bugpoint or llvm-reduce tool to make it smaller. It's not clear where exactly a problem in your test is supposed to happen.

caused the verifier to stop recognizing such loop as bounded.

So it should be better to teach verifier / loop analysis / scev / xyz about this pattern (user can still write it directly).

I think adding a flag doesn't address the concern that were raised about making this behaviour target-dependent in general. I don't feel super strongly about the issue, so if other people in general think we should add a way to disable this transform, this is fine with me.

But I think it would be better to undo the transform if needed. There is no guarantee that LICM is the only source of this pattern and it's generally considered that backends should support any form of valid LLVM IR.

In D147078#4240289, @fhahn wrote:

I think adding a flag doesn't address the concern that were raised about making this behaviour target-dependent in general. I don't feel super strongly about the issue, so if other people in general think we should add a way to disable this transform, this is fine with me.

But I think it would be better to undo the transform if needed. There is no guarantee that LICM is the only source of this pattern and it's generally considered that backends should support any form of valid LLVM IR.

Agreed. In some ways this is even worse than the TTI hook because now you're modifying a cl::opt option from TTI, which is not how cl::opt is supposed to be used. This means that for example if you use BPF and some other target in the same process, this option is going to leak across them, not to mention that the modification is probably not thread-safe.

The only proper way to address this is via an undo transform.

In D147078#4240301, @nikic wrote:

In D147078#4240289, @fhahn wrote:

I think adding a flag doesn't address the concern that were raised about making this behaviour target-dependent in general. I don't feel super strongly about the issue, so if other people in general think we should add a way to disable this transform, this is fine with me.

But I think it would be better to undo the transform if needed. There is no guarantee that LICM is the only source of this pattern and it's generally considered that backends should support any form of valid LLVM IR.

Agreed. In some ways this is even worse than the TTI hook because now you're modifying a cl::opt option from TTI, which is not how cl::opt is supposed to be used. This means that for example if you use BPF and some other target in the same process, this option is going to leak across them, not to mention that the modification is probably not thread-safe.

The only proper way to address this is via an undo transform.

It's a valid LLVM IR, but not a profitable optimization.
The backend has all the rights to tell optimizer that certain transformations are unprofitable and should not be applied.
Consider BPF CPU as a smart CPU. Unlike traditional CPUs that do what compilers tell them, BPF CPU understands the code it is going to execute.
In this particular case we're dealing with:
for (i; i < var && i < const_val; i++)

BPF CPU is smart enough to understand that 'i' will not be bigger than const_val. It can check that array[i] within bounds and so on.
Imagine smart x86 CPU that can infer similar information from analyzing the asm code and do HW prefetch of array[0 ... const_val].
The problem with transformation:
min_var = min(var, const_cal);
for (i; i < min_var; i++)
is that CPU no longer considers 'i' to be bounded.
It's causing BPF CPU to lose information about 'i' and would hurt performance of hypothetical smart x86 CPU.

Backends absolutely need knobs to control profitability of transformations. Undoing optimization in the backend is not practical. In many cases it's not possible to undo.

In D147078#4241741, @ast wrote:

In D147078#4240301, @nikic wrote:

In D147078#4240289, @fhahn wrote:

I think adding a flag doesn't address the concern that were raised about making this behaviour target-dependent in general. I don't feel super strongly about the issue, so if other people in general think we should add a way to disable this transform, this is fine with me.

But I think it would be better to undo the transform if needed. There is no guarantee that LICM is the only source of this pattern and it's generally considered that backends should support any form of valid LLVM IR.

Agreed. In some ways this is even worse than the TTI hook because now you're modifying a cl::opt option from TTI, which is not how cl::opt is supposed to be used. This means that for example if you use BPF and some other target in the same process, this option is going to leak across them, not to mention that the modification is probably not thread-safe.

The only proper way to address this is via an undo transform.

[...]

Backends absolutely need knobs to control profitability of transformations. Undoing optimization in the backend is not practical. In many cases it's not possible to undo.

I think the idea here is that not all transformations are generically good for all backends. There is a cost model for instructions could there be something similar for transformations as well? Then BPF could mark the cost here somewhat high noting it doesn't actually provide value?

separately checking EnableMinMaxHoisting and update the test with -print-before=licm and -print-after=licm to clearly show the expected ir

cost model could be an option. I just updated a version based on previous 'flag' based approach. Also thanks @ast for some clarification, for some transformations, due to bpf verifier uniqueness (verification of the code before executing them), some transformation might be unprofitable for bpf, so it could be great if we can have a simple way to declare a particular transformation not profitable.

Harbormaster completed remote builds in B223693: Diff 510955.Apr 4 2023, 5:33 PM

mkazantsev added inline comments.Apr 5 2023, 3:51 AM

llvm/lib/Transforms/Scalar/LICM.cpp
998	I meant doing this inside `hoistMinMax`, so that it works for all its callers.

In D147078#4241741, @ast wrote:

for (i; i < var && i < const_val; i++)

BPF CPU is smart enough to understand that 'i' will not be bigger than const_val. It can check that array[i] within bounds and so on.

I don't see how this transform is a problem for in-bound checks. Do you have a particular example where we can remove a bound check with old form but cannot do so with new form? IndVars should handle both, and if it doesn't, it's a bug we should fix. I also don't see how this kind of issues is related to any particular backend. Does it eliminate range checks in codegen rather than relying on high-level opts? Why?

Here is example showing that at least IndVars is well-aware about smin semantics and can remove bound checks for them: https://godbolt.org/z/7cjzfKvf9

It's a valid LLVM IR, but not a profitable optimization.

I disagree. The transform in question is hoisting computations out of loop, so that we only do one comparison instead of two on every iteration. If you call it not profitable, please give a real proof. I don't understand your speculations about hypothetical X86 CPU that removes bound checks but only in one form.

Reading through discussion, I now think that we should not go with the option. The reason why performance degraded is not well understood.

If the claim is "this is not a profitable transform in general", I want to see a proof. The tranform is removing number of checks on every loop iteration, so it is surprising to hear that.

If the claim is "it is inhibiting some other optimizations (for example, bound checks elimination)" then I'd like to see an exact example. I made some experiments with it and now am pretty sure that in many cases IndVars, IRCE and other passes that are responsible for range checks elimination do know what is smin and can work with it as well as with two separate checks. Maybe there is a case when they don't support it, and in this case I'd like to see this test and fix it. It would be a good general improvement too.

Finally, if the claim is "this backend works bad with it for some low-level/hardware reasons" - OK, then the right way is to undo the transform. Optimizer is not committed to know about such kind of issues of backends. I also have doubts that it's the case.

So please, if you want to move ahead, please find the reason of your performance degradation and describe it here. Maybe we have just exposed a missing case for IndVarSimplify or something, and it's an opportunity to fix it.

This revision now requires changes to proceed.Apr 5 2023, 4:23 AM

In D147078#4245457, @mkazantsev wrote:

Reading through discussion, I now think that we should not go with the option. The reason why performance degraded is not well understood.

If the claim is "this is not a profitable transform in general", I want to see a proof. The tranform is removing number of checks on every loop iteration, so it is surprising to hear that.

There was no such claim. It's unprofitable for BPF.

It would be a good general improvement too.

'general improvement' is true only for a subset of CPUs. It's not an improvement for BPF "CPU".

Optimizer is not committed to know about such kind of issues of backends.

This is a false statement. Optimizer has to work hand and hand with the backend.
Several backends have loop insns and backends go out of their way to pattern match loops in the backend.
When optimizer does something "smart" claiming it's a valid llvm IR it may screw up the analysis and valid C programs would get rejected because of optimizer.
This case is not loop related, but similar in a sense that optimizer hurts a backend.

So please, if you want to move ahead, please find the reason of your performance degradation and describe it here.

Performance degradation on hypothetical x86 cpu was an illustration of problem. The real problem is unprofitable transformation.
Undoing it in the backend is fragile. It may or may not work and users writing programs in C will have no idea why their valid C code broke.
Therefore backend has to stop optimizer performing unprofitable transformations.
This is not the only unprofitable transformation. See Andrii's examples of other transformations that we need to disable for BPF backend specifically.

Undoing it in the backend is fragile.

Can you explain it why? or in CodegenPrepare? Why is it fragile?

As a bonus you will handle situation when there is < min(...) directly in your code.

Backends must follow some rules given by the design of LLVM, LLVM should not be full of ugly (but quickly done) hacks to make backends happy, and currently we saw no proofs that undo transform is not possible to do it.

In D147078#4244246, @jrfastab wrote:

I think the idea here is that not all transformations are generically good for all backends. There is a cost model for instructions could there be something similar for transformations as well? Then BPF could mark the cost here somewhat high noting it doesn't actually provide value?

Agreed that some transformations are driven by backend cost decisions. But there are also canonicalization transforms where one of the main goals is transforming the IR into a form that's easier for other passes & analysis to handle. LICM is at least partly performs general canonicalization which is completely target-independent. Another example of this is that LICM completely ignores register pressure (i.e. target properties) when hoisting code (IIUC), relying on the backend to sink it back if needed.

In D147078#4246363, @xbolva00 wrote:

Undoing it in the backend is fragile.

Can you explain it why? or in CodegenPrepare? Why is it fragile?

As a bonus you will handle situation when there is < min(...) directly in your code.

It's not a bonus. People don't write such code now because the verifier doesn't recognize it.

Backends must follow some rules given by the design of LLVM, LLVM should not be full of ugly (but quickly done) hacks to make backends happy, and currently we saw no proofs that undo transform is not possible to do it.

Ugly is relative. You're arguing that single line boolean check makes optimizer ugly while suggesting to add hundreds of pattern matching fragile code to the backend. Makes zero sense.

btw hexagon might be another example where hoistMin transformation may hurt performance.
See HexagonHardwareLoops.cpp and computeCount().
The backend has to know the constant end value of the loop to do a HW loop.
hoistMin "optimization" hurts this analysis, since backend is not able to know the upper bound.
It's similar for BPF, but in our case it's the kernel verifier that has to know the upper bound.
My limited understanding of hexagon arch is that inability to use hw loop is not the end of the world. Just slow performance.
Whereas for BPF backend it's full breakage. Because of "optimization" valid C programs get rejected by the kernel.

I'm rejecting this request for a TTI policy exception for the BPF backend in LICM. I've reviewed the discussion, but have found no grounds to support such an exception.

The discussion seems to be based around an assumption that backends should have a right to opt-out of certain transforms -- however, this assumption is simply incorrect for target-independent canonicalization passes like LICM. An exception would require some special circumstance, e.g. the inability to perform a backend undo transform because the transform is lossy, becomes lossy in conjunction with other transforms, or is not reversible without undue effort for other reasons. However, no evidence that this is the case here has been presented.

In fact, the assumptions underlying this discussion make me especially unwilling to grant an exception, because it seems clear that such an exception would be taken as precedent to make similar changes in other places. While I don't care particularly strongly about maintaining canonicality for this particular (relatively exotic) transform, extending this to certain other areas would be much more problematic. For example, the discussion mentions the InstCombine range check canonicalization as another motivating example, and that one is an important part of canonical IR form. As such, I would rather nip this entire direction in the bud.

yonghong-song mentioned this in D147968: [TTI][BPF]: Undo specific transform-preventing passes and add one TTI hook.Apr 10 2023, 1:18 PM

@nikic @mkazantsev I created another diff https://reviews.llvm.org/D147968 which use a different approach (through TTI) and more description on why we want to use TTI hooks. Could you take a look there as well? Thanks!

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Scalar/

LICM.h

1 line

lib/

Target/

BPF/

BPFTargetTransformInfo.h

8 lines

CMakeLists.txt

1 line

Transforms/

Scalar/

LICM.cpp

8 lines

test/

CodeGen/

BPF/

licm-minmax-hoisted.ll

93 lines

Diff 510955

llvm/include/llvm/Transforms/Scalar/LICM.h

	Show All 36 Lines
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"

	namespace llvm {			namespace llvm {

	class LPMUpdater;			class LPMUpdater;
	class Loop;			class Loop;
	class LoopNest;			class LoopNest;

				extern cl::opt<bool> EnableMinMaxHoisting;
	extern cl::opt<unsigned> SetLicmMssaOptCap;			extern cl::opt<unsigned> SetLicmMssaOptCap;
	extern cl::opt<unsigned> SetLicmMssaNoAccForPromotionCap;			extern cl::opt<unsigned> SetLicmMssaNoAccForPromotionCap;

	struct LICMOptions {			struct LICMOptions {
	unsigned MssaOptCap;			unsigned MssaOptCap;
	unsigned MssaNoAccForPromotionCap;			unsigned MssaNoAccForPromotionCap;
	bool AllowSpeculation;			bool AllowSpeculation;

	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/lib/Target/BPF/BPFTargetTransformInfo.h

	Show All 12 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIB_TARGET_BPF_BPFTARGETTRANSFORMINFO_H			#ifndef LLVM_LIB_TARGET_BPF_BPFTARGETTRANSFORMINFO_H
	#define LLVM_LIB_TARGET_BPF_BPFTARGETTRANSFORMINFO_H			#define LLVM_LIB_TARGET_BPF_BPFTARGETTRANSFORMINFO_H

	#include "BPFTargetMachine.h"			#include "BPFTargetMachine.h"
	#include "llvm/Analysis/TargetTransformInfo.h"			#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/CodeGen/BasicTTIImpl.h"			#include "llvm/CodeGen/BasicTTIImpl.h"
				#include "llvm/Transforms/Scalar/LICM.h"
	#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"			#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"

	namespace llvm {			namespace llvm {
	class BPFTTIImpl : public BasicTTIImplBase<BPFTTIImpl> {			class BPFTTIImpl : public BasicTTIImplBase<BPFTTIImpl> {
	typedef BasicTTIImplBase<BPFTTIImpl> BaseT;			typedef BasicTTIImplBase<BPFTTIImpl> BaseT;
	typedef TargetTransformInfo TTI;			typedef TargetTransformInfo TTI;
	friend BaseT;			friend BaseT;

	const BPFSubtarget *ST;			const BPFSubtarget *ST;
	const BPFTargetLowering *TLI;			const BPFTargetLowering *TLI;

	const BPFSubtarget *getST() const { return ST; }			const BPFSubtarget *getST() const { return ST; }
	const BPFTargetLowering *getTLI() const { return TLI; }			const BPFTargetLowering *getTLI() const { return TLI; }

	public:			public:
	explicit BPFTTIImpl(const BPFTargetMachine *TM, const Function &F)			explicit BPFTTIImpl(const BPFTargetMachine *TM, const Function &F)
	: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),			: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
	TLI(ST->getTargetLowering()) {}			TLI(ST->getTargetLowering()) {

				xbolva00Unsubmitted Not Done Reply Inline Actions Extra comment why this optimization is disabled? xbolva00: Extra comment why this optimization is disabled?
				yonghong-songAuthorUnsubmitted Done Reply Inline Actions Sure. Will do. yonghong-song: Sure. Will do.
				// Disable LICM MinMaxHoisting optimization. This optimization may cause
				// verification failure for linux kernel 6.3 or earlier versions.
				EnableMinMaxHoisting = false;
				}

	int getIntImmCost(const APInt &Imm, Type *Ty, TTI::TargetCostKind CostKind) {			int getIntImmCost(const APInt &Imm, Type *Ty, TTI::TargetCostKind CostKind) {
	if (Imm.getBitWidth() <= 64 && isInt<32>(Imm.getSExtValue()))			if (Imm.getBitWidth() <= 64 && isInt<32>(Imm.getSExtValue()))
	return TTI::TCC_Free;			return TTI::TCC_Free;

	return TTI::TCC_Basic;			return TTI::TCC_Basic;
	}			}

	Show All 38 Lines

llvm/lib/Target/BPF/CMakeLists.txt

Show All 39 Lines	add_llvm_target(BPFCodeGen
AsmPrinter		AsmPrinter
CodeGen		CodeGen
Core		Core
MC		MC
BPFDesc		BPFDesc
BPFInfo		BPFInfo
IPO		IPO
Scalar		Scalar
		ScalarOpts
SelectionDAG		SelectionDAG
Support		Support
Target		Target
TargetParser		TargetParser
TransformUtils		TransformUtils

ADD_TO_COMPONENT		ADD_TO_COMPONENT
BPF		BPF
)		)

add_subdirectory(AsmParser)		add_subdirectory(AsmParser)
add_subdirectory(Disassembler)		add_subdirectory(Disassembler)
add_subdirectory(MCTargetDesc)		add_subdirectory(MCTargetDesc)
add_subdirectory(TargetInfo)		add_subdirectory(TargetInfo)

llvm/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	static cl::opt<bool>
SingleThread("licm-force-thread-model-single", cl::Hidden, cl::init(false),		SingleThread("licm-force-thread-model-single", cl::Hidden, cl::init(false),
cl::desc("Force thread model single in LICM pass"));		cl::desc("Force thread model single in LICM pass"));

static cl::opt<uint32_t> MaxNumUsesTraversed(		static cl::opt<uint32_t> MaxNumUsesTraversed(
"licm-max-num-uses-traversed", cl::Hidden, cl::init(8),		"licm-max-num-uses-traversed", cl::Hidden, cl::init(8),
cl::desc("Max num uses visited for identifying load "		cl::desc("Max num uses visited for identifying load "
"invariance in loop using invariant start (default = 8)"));		"invariance in loop using invariant start (default = 8)"));

		/// MinMaxHoisting optimization is enabled by default.
		cl::opt<bool> llvm::EnableMinMaxHoisting(
		"enable-licm-minmax-hoisting", cl::Hidden, cl::init(true),
		cl::desc("Enable MinMaxHoisting optimization in LICM pass"));

// Experimental option to allow imprecision in LICM in pathological cases, in		// Experimental option to allow imprecision in LICM in pathological cases, in
// exchange for faster compile. This is to be removed if MemorySSA starts to		// exchange for faster compile. This is to be removed if MemorySSA starts to
// address the same issue. LICM calls MemorySSAWalker's		// address the same issue. LICM calls MemorySSAWalker's
// getClobberingMemoryAccess, up to the value of the Cap, getting perfect		// getClobberingMemoryAccess, up to the value of the Cap, getting perfect
// accuracy. Afterwards, LICM will call into MemorySSA's getDefiningAccess,		// accuracy. Afterwards, LICM will call into MemorySSA's getDefiningAccess,
// which may not be precise, since optimizeUses is capped. The result is		// which may not be precise, since optimizeUses is capped. The result is
// correct, but we may not get as "far up" as possible to get which access is		// correct, but we may not get as "far up" as possible to get which access is
// clobbering the one queried.		// clobbering the one queried.
▲ Show 20 Lines • Show All 844 Lines • ▼ Show 20 Lines	for (Instruction &I : llvm::make_early_inc_range(*BB)) {
hoist(*PN, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,		hoist(*PN, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,
MSSAU, SE, ORE);		MSSAU, SE, ORE);
assert(DT->dominates(PN, BB) && "Conditional PHIs not expected");		assert(DT->dominates(PN, BB) && "Conditional PHIs not expected");
Changed = true;		Changed = true;
continue;		continue;
}		}
}		}

		if (!EnableMinMaxHoisting)
		return false;

// Optimize complex patterns, such as (x < INV1 && x < INV2), turning them		// Optimize complex patterns, such as (x < INV1 && x < INV2), turning them
// into (x < min(INV1, INV2)), and hoisting the invariant part of this		// into (x < min(INV1, INV2)), and hoisting the invariant part of this
// expression out of the loop.		// expression out of the loop.
if (hoistMinMax(I, CurLoop, SafetyInfo, MSSAU)) {		if (hoistMinMax(I, CurLoop, SafetyInfo, MSSAU)) {
		anakryikoUnsubmitted Not Done Reply Inline Actions instead of hard-coding "isBPFTarget" checks, would it be possible to instead have target-specific "isOptimizationEnabled()" callback, which different architectures could override. For such optional optimizations we can have an enum specifying what kind of transformation/optimization it is, and then BPF target could decided whether optimization should or should not be done. This will keep optimization code target-agnostic. We'll just have optional optimizations and related generic checks before trying to perform them? anakryiko: instead of hard-coding "isBPFTarget" checks, would it be possible to instead have target…
		mkazantsevUnsubmitted Not Done Reply Inline Actions Might as well in `hoistMinMax` do if (!EnableMinMaxHoisting) return false; It's more reliable in case if someone else wants to use it in another place. mkazantsev: Might as well in `hoistMinMax` do ``` if (!EnableMinMaxHoisting) return false; ``` It's…
		mkazantsevUnsubmitted Not Done Reply Inline Actions I meant doing this inside `hoistMinMax`, so that it works for all its callers. mkazantsev: I meant doing this inside `hoistMinMax`, so that it works for all its callers.
++NumMinMaxHoisted;		++NumMinMaxHoisted;
Changed = true;		Changed = true;
continue;		continue;
}		}

// Remember possibly hoistable branches so we can actually hoist them		// Remember possibly hoistable branches so we can actually hoist them
// later if needed.		// later if needed.
if (BranchInst *BI = dyn_cast<BranchInst>(&I))		if (BranchInst *BI = dyn_cast<BranchInst>(&I))
▲ Show 20 Lines • Show All 1,504 Lines • Show Last 20 Lines

llvm/test/CodeGen/BPF/licm-minmax-hoisted.ll

This file was added.

				; RUN: opt -O2 -mtriple=bpf-pc-linux -print-before=licm -mcpu=v3 2>&1 < %s \| FileCheck --check-prefix=BEFORE %s
				; RUN: opt -O2 -mtriple=bpf-pc-linux -print-after=licm -mcpu=v3 2>&1 < %s \| FileCheck --check-prefix=AFTER %s
				mkazantsevUnsubmitted Not Done Reply Inline Actions Can you instead dump a dedicated test for LICM? For that, pass `print-before=licm print-module-scope`, and dump IR before LICM. You can also use bugpoint or llvm-reduce tool to make it smaller. It's not clear where exactly a problem in your test is supposed to happen. mkazantsev: Can you instead dump a dedicated test for LICM? For that, pass `print-before=licm print-module…
				; source:
				; unsigned foo(unsigned);
				; unsigned g;
				; void bar(unsigned u) {
				; unsigned i;
				; for (i = 0; i < 5 && i < u; i++)
				; g += foo(i);
				; }
				; Compilation flag:
				; clang -target bpf -O2 -Xclang -disable-llvm-passes -S -emit-llvm t.c -o t.ll

				@g = dso_local global i32 0, align 4

				; Function Attrs: nounwind
				define dso_local void @bar(i32 noundef %u) #0 {
				entry:
				%u.addr = alloca i32, align 4
				%i = alloca i32, align 4
				store i32 %u, ptr %u.addr, align 4, !tbaa !3
				call void @llvm.lifetime.start.p0(i64 4, ptr %i) #3
				store i32 0, ptr %i, align 4, !tbaa !3
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%0 = load i32, ptr %i, align 4, !tbaa !3
				%cmp = icmp ult i32 %0, 5
				br i1 %cmp, label %land.rhs, label %land.end

				land.rhs: ; preds = %for.cond
				%1 = load i32, ptr %i, align 4, !tbaa !3
				%2 = load i32, ptr %u.addr, align 4, !tbaa !3
				%cmp1 = icmp ult i32 %1, %2
				br label %land.end

				; BEFORE: [[cmp:%.]] = icmp ult i32 [[i:%.]], 5
				; BEFORE: [[cmp1:%.]] = icmp ult i32 [[i]], {{.}}
				; BEFORE: and i1 [[cmp]], [[cmp1]]

				; For BPF target, the MinMax Hoisting optimization should not happen.
				; AFTER: [[cmp:%.]] = icmp ult i32 [[i:%.]], 5
				; AFTER: [[cmp1:%.]] = icmp ult i32 [[i]], {{.}}
				; AFTER: and i1 [[cmp]], [[cmp1]]

				land.end: ; preds = %land.rhs, %for.cond
				%3 = phi i1 [ false, %for.cond ], [ %cmp1, %land.rhs ]
				br i1 %3, label %for.body, label %for.end

				for.body: ; preds = %land.end
				%4 = load i32, ptr %i, align 4, !tbaa !3
				%call = call i32 @foo(i32 noundef %4)
				%5 = load i32, ptr @g, align 4, !tbaa !3
				%add = add i32 %5, %call
				store i32 %add, ptr @g, align 4, !tbaa !3
				br label %for.inc

				for.inc: ; preds = %for.body
				%6 = load i32, ptr %i, align 4, !tbaa !3
				%inc = add i32 %6, 1
				store i32 %inc, ptr %i, align 4, !tbaa !3
				br label %for.cond, !llvm.loop !7

				for.end: ; preds = %land.end
				call void @llvm.lifetime.end.p0(i64 4, ptr %i) #3
				ret void
				}

				; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
				declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #1

				declare dso_local i32 @foo(i32 noundef) #2

				; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
				declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #1

				attributes #0 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
				attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
				attributes #2 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
				attributes #3 = { nounwind }

				!llvm.module.flags = !{!0, !1}
				!llvm.ident = !{!2}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 7, !"frame-pointer", i32 2}
				!2 = !{!"clang version 17.0.0 (https://github.com/llvm/llvm-project.git 1063f74aec49e842338cdcff66be444f8f58acdb)"}
				!3 = !{!4, !4, i64 0}
				!4 = !{!"int", !5, i64 0}
				!5 = !{!"omnipotent char", !6, i64 0}
				!6 = !{!"Simple C/C++ TBAA"}
				!7 = distinct !{!7, !8}
				!8 = !{!"llvm.loop.mustprogress"}