This is an archive of the discontinued LLVM Phabricator instance.

[LICM][BPF] Disable hoistMinMax() optimization for BPF target
Needs RevisionPublic

Authored by yonghong-song on Mar 28 2023, 1:37 PM.

Details

Summary

Recently, we hit a bpf verification failure due to llvm17 patch D143726 which
tries to simplify (X < A && X < B) into (X < MIN(A, B)) if MIN(A, B) is
loop-invariant.

The current bpf verifier is not able to handle newly-generated code pattern.
Alexei pushed a workaround at
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit?id=3c2611bac08a834697be918ac357eaff2e47d5b3
to temporarily fix the issue.

Although we could improve verifier for this case, but if users use llvm17 for
old kernels, the problem still exists. So ideally the issue should be fixed
in llvm.

Add flag in LICM to control MinMaxHoisting optimization. The default
is true and the BPF backend will turn it off. This resolved the issue.

Diff Detail

Event Timeline

yonghong-song created this revision.Mar 28 2023, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 1:37 PM
yonghong-song requested review of this revision.Mar 28 2023, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 1:37 PM
anakryiko added inline comments.Mar 28 2023, 3:50 PM
llvm/lib/Transforms/Scalar/LICM.cpp
998

instead of hard-coding "isBPFTarget" checks, would it be possible to instead have target-specific "isOptimizationEnabled()" callback, which different architectures could override. For such optional optimizations we can have an enum specifying what kind of transformation/optimization it is, and then BPF target could decided whether optimization should or should not be done.

This will keep optimization code target-agnostic. We'll just have optional optimizations and related generic checks before trying to perform them?

mkazantsev requested changes to this revision.EditedMar 28 2023, 9:37 PM

If you have a target that gets confused by a correct transform, please add a cl::opt flag controlling it (true by default) and set if to false for your target.

This revision now requires changes to proceed.Mar 28 2023, 9:37 PM
nikic added a comment.EditedMar 29 2023, 12:06 AM

Would it be fair to say that from the perspective of the BPF verifier, x < y && x < z is generally preferred over x < min(y, z), because the verifier does not know that x < min(y, z) implies x < y etc? If so, I think it would make the most sense to add an undo transform in the BPF backend.

Would it be fair to say that from the perspective of the BPF verifier, x < y && x < z is generally preferred over x < min(y, z), because the verifier does not know that x < min(y, z) implies x < y etc?

Yes. BPF verifier isn't able to track complicated relationships between registers, so such transformations are generally breaking otherwise correct code.

If so, I think it would make the most sense to add an undo transform in the BPF backend.

Genuine curiosity (and keep in mind I'm no compiler expert at all), but why doing transformation and then customly undoing it in specific BPF backend would be a preferable solution to allowing backends to turn off some known-to-hurt transformations in the first place? It seems more complicated and error-prone to do the dance of undoing, rather than have a generic interface to prevent transformation in the first place.

Note, this is not the only transformation that hurts BPF verification, @yonghong-song previously added "undo transformation" logic, but it seems like a not very sustainable and very convoluted solution. So we are asking if we can have a simpler and more reliable opt-out for some small set of transformations.

fhahn added a subscriber: fhahn.Mar 29 2023, 10:00 AM

If so, I think it would make the most sense to add an undo transform in the BPF backend.

Genuine curiosity (and keep in mind I'm no compiler expert at all), but why doing transformation and then customly undoing it in specific BPF backend would be a preferable solution to allowing backends to turn off some known-to-hurt transformations in the first place? It seems more complicated and error-prone to do the dance of undoing, rather than have a generic interface to prevent transformation in the first place.

Note, this is not the only transformation that hurts BPF verification, @yonghong-song previously added "undo transformation" logic, but it seems like a not very sustainable and very convoluted solution. So we are asking if we can have a simpler and more reliable opt-out for some small set of transformations.

One goal of LICM is to target-independently canonicalise the IR to enable other IR optimizations, so IMO it would be preferable to avoid disabling parts via target hooks. Also, the currently proposed hook is not really scalable, e.g. what if another target wants to opt out as well?

Note, this is not the only transformation that hurts BPF verification, @yonghong-song previously added "undo transformation" logic, but it seems like a not very sustainable and very convoluted solution. So we are asking if we can have a simpler and more reliable opt-out for some small set of transformations.

One goal of LICM is to target-independently canonicalise the IR to enable other IR optimizations, so IMO it would be preferable to avoid disabling parts via target hooks. Also, the currently proposed hook is not really scalable, e.g. what if another target wants to opt out as well?

I agree that we shouldn't do "isBPFTarget()" check like proposed in current version. See my comment below, where I proposed to have more generic "isTransformEnabled(enum transform_id)" method, which could be overriden by different backends. Then BPF would have a simple override in BPFTTIImpl:

bool isTransformEnabled(transform_id id) {
  switch (transform_id) {
  case transform_id::HOIST_MIN_MAX:
    return false;
  default:
    return true;
  }

while most other backends will just have default "return true;" implementation.

Note that we don't have to enumerate all possible transforms in enum transform_id, only those that are optional.

Does it make sense?

If so, I think it would make the most sense to add an undo transform in the BPF backend.

Genuine curiosity (and keep in mind I'm no compiler expert at all), but why doing transformation and then customly undoing it in specific BPF backend would be a preferable solution to allowing backends to turn off some known-to-hurt transformations in the first place? It seems more complicated and error-prone to do the dance of undoing, rather than have a generic interface to prevent transformation in the first place.

Note, this is not the only transformation that hurts BPF verification, @yonghong-song previously added "undo transformation" logic, but it seems like a not very sustainable and very convoluted solution. So we are asking if we can have a simpler and more reliable opt-out for some small set of transformations.

Just blocking a transform is generally insufficient, because it does not handle the case where the code uses the problematic pattern in the first place. If the input code already contained the x < min(y, z) pattern, it would fail the verifier and converting it into x < y && y < z would be necessary to make it pass.

If certain canonicalization transforms are undesirable for a specific target, this is always addressed by adding backend undo transforms, not by disabling canonicalizations. Use of TTI hooks inside canonicalization passes is generally rejected as a matter policy -- you need a very good case for that (e.g. an undo transform being impossible for some reason).

anakryiko added a comment.EditedMar 29 2023, 12:07 PM

If so, I think it would make the most sense to add an undo transform in the BPF backend.

Genuine curiosity (and keep in mind I'm no compiler expert at all), but why doing transformation and then customly undoing it in specific BPF backend would be a preferable solution to allowing backends to turn off some known-to-hurt transformations in the first place? It seems more complicated and error-prone to do the dance of undoing, rather than have a generic interface to prevent transformation in the first place.

Note, this is not the only transformation that hurts BPF verification, @yonghong-song previously added "undo transformation" logic, but it seems like a not very sustainable and very convoluted solution. So we are asking if we can have a simpler and more reliable opt-out for some small set of transformations.

Just blocking a transform is generally insufficient, because it does not handle the case where the code uses the problematic pattern in the first place. If the input code already contained the x < min(y, z) pattern, it would fail the verifier and converting it into x < y && y < z would be necessary to make it pass.

This is normally not a problem, because people writing BPF code are aware of such limitations and do write input C code in such a way as to pass BPF verification (e.g., no one would do x < min(y,z), it is always written as x < y && x < z, where either y or z is a constant; that makes this code verifiable to BPF verifier in kernel).

So the issue here is that Clang transforms original code to something that's not verifiable, with user (programmer) having no control. And that's what we are trying to address.

We have or had similar issues with few other transformations. E.g., if I remember correctly, typical (and, if not transformed, correctly verifiable) code:

if (x >= y && x < z) { /* use x */ }

would be transformed to

x` = x - y;
if (x` < (z - y)) { /* use **original** x */ }

which breaks user code. Users have to do all sorts of tricks just to prevent this transformation.

Another one that constantly causes problems compiler transforming this:

struct some_type *t = ...;
if (t != NULL)
  x = t->some_field

to reordered low-level assembly that does offset adjustment before the NULL check, again breaking BPF verification for no good reason (as far as users are concerned). Roughly in the following manner (pseudo C code, of course, just to demonstrate):

struct some_type *t = ...;
int *field_ptr = &t->some_field; /* this is too early, verifier rejects this */
if (t != NULL)
    x = *field_ptr

If certain canonicalization transforms are undesirable for a specific target, this is always addressed by adding backend undo transforms, not by disabling canonicalizations. Use of TTI hooks inside canonicalization passes is generally rejected as a matter policy -- you need a very good case for that (e.g. an undo transform being impossible for some reason).

As I said, not a compiler expert and I don't know LLVM/Clang internals, but this pattern matching + undoing transform seems very error-prone and seems like always leaving some corner cases not covered. So while I understand overall desire to minimize the list of optional transforms, it still seems much cleaner and resulting in much simpler overall LLVM/Clang code to just prevent transform than do + undo it.

This is a very significant and recurring problem. People keep complaining and reporting very confusing (to them) issues, only due to some Clang transformations that just hurt BPF target without adding any benefit.

It's described as "fighting with verifier", commonly, even though it's really mostly "fighting with compiler". I know it's a bit hard to emphatize without having experienced this over and over personally or while helping multitude of users, but this is a really big pain point. Any help from LLVM community in making this go away (or at least be reduced) would be greatly appreciated by entire BPF community. Thanks!

I will try @mkazantsev approach by add a flag to control the optimization and target could overwrite it. I think there are some precedence for this approach. Let me do a little bit more research and resubmit the patch.

yonghong-song edited the summary of this revision. (Show Details)
  • Add a flag to control MinMaxHoisting optimization. The default is true, and BPF backend turns off this optimization.
xbolva00 added inline comments.
llvm/lib/Target/BPF/BPFTargetTransformInfo.h
40

Extra comment why this optimization is disabled?

yonghong-song added inline comments.Mar 30 2023, 12:58 PM
llvm/lib/Target/BPF/BPFTargetTransformInfo.h
40

Sure. Will do.

yonghong-song edited the summary of this revision. (Show Details)Mar 30 2023, 2:07 PM
yonghong-song edited the summary of this revision. (Show Details)Mar 30 2023, 2:15 PM
  • Add a comment to explain why bpf backend wants to disable MinMaxHoisting optimization.
mkazantsev added a comment.EditedApr 3 2023, 4:09 AM

I'm fine with adding the option. However I think this patch is not solving the problem you have.

LICM is not the only pass that may potentially do it. For example, InstCombine may end up doing something similar. As people above have said, this is not a scalable approach. In general, any construction that is legal from LLVM's perspective may appear in your code. You cannot rely on fact that everybody knows what is or what is not legal to do for this particular backend; assume that they will do every thing illegal (as long as it compleis with LLVM LangRef), and if backend wants to limit LLVM, it is its job to make sure that all its requirements are met.

With this transform you are lucky: it's isolated and there is a clear place where to add a check. In general, it might not be the case.

I think you still need a patch that will undo the transform for this backend.

llvm/lib/Transforms/Scalar/LICM.cpp
998

Might as well in hoistMinMax do

if (!EnableMinMaxHoisting)
  return false;

It's more reliable in case if someone else wants to use it in another place.

llvm/test/CodeGen/BPF/licm-minmax-hoisted.ll
2

Can you instead dump a dedicated test for LICM? For that, pass print-before=licm print-module-scope, and dump IR before LICM. You can also use bugpoint or llvm-reduce tool to make it smaller. It's not clear where exactly a problem in your test is supposed to happen.

caused the verifier to stop recognizing such loop as bounded.

So it should be better to teach verifier / loop analysis / scev / xyz about this pattern (user can still write it directly).

fhahn added a comment.Apr 3 2023, 4:31 AM

I think adding a flag doesn't address the concern that were raised about making this behaviour target-dependent in general. I don't feel super strongly about the issue, so if other people in general think we should add a way to disable this transform, this is fine with me.

But I think it would be better to undo the transform if needed. There is no guarantee that LICM is the only source of this pattern and it's generally considered that backends should support any form of valid LLVM IR.

nikic added a comment.Apr 3 2023, 4:42 AM

I think adding a flag doesn't address the concern that were raised about making this behaviour target-dependent in general. I don't feel super strongly about the issue, so if other people in general think we should add a way to disable this transform, this is fine with me.

But I think it would be better to undo the transform if needed. There is no guarantee that LICM is the only source of this pattern and it's generally considered that backends should support any form of valid LLVM IR.

Agreed. In some ways this is even worse than the TTI hook because now you're modifying a cl::opt option from TTI, which is not how cl::opt is supposed to be used. This means that for example if you use BPF and some other target in the same process, this option is going to leak across them, not to mention that the modification is probably not thread-safe.

The only proper way to address this is via an undo transform.

ast added a comment.Apr 3 2023, 3:32 PM

I think adding a flag doesn't address the concern that were raised about making this behaviour target-dependent in general. I don't feel super strongly about the issue, so if other people in general think we should add a way to disable this transform, this is fine with me.

But I think it would be better to undo the transform if needed. There is no guarantee that LICM is the only source of this pattern and it's generally considered that backends should support any form of valid LLVM IR.

Agreed. In some ways this is even worse than the TTI hook because now you're modifying a cl::opt option from TTI, which is not how cl::opt is supposed to be used. This means that for example if you use BPF and some other target in the same process, this option is going to leak across them, not to mention that the modification is probably not thread-safe.

The only proper way to address this is via an undo transform.

It's a valid LLVM IR, but not a profitable optimization.
The backend has all the rights to tell optimizer that certain transformations are unprofitable and should not be applied.
Consider BPF CPU as a smart CPU. Unlike traditional CPUs that do what compilers tell them, BPF CPU understands the code it is going to execute.
In this particular case we're dealing with:
for (i; i < var && i < const_val; i++)

BPF CPU is smart enough to understand that 'i' will not be bigger than const_val. It can check that array[i] within bounds and so on.
Imagine smart x86 CPU that can infer similar information from analyzing the asm code and do HW prefetch of array[0 ... const_val].
The problem with transformation:
min_var = min(var, const_cal);
for (i; i < min_var; i++)
is that CPU no longer considers 'i' to be bounded.
It's causing BPF CPU to lose information about 'i' and would hurt performance of hypothetical smart x86 CPU.

Backends absolutely need knobs to control profitability of transformations. Undoing optimization in the backend is not practical. In many cases it's not possible to undo.

I think adding a flag doesn't address the concern that were raised about making this behaviour target-dependent in general. I don't feel super strongly about the issue, so if other people in general think we should add a way to disable this transform, this is fine with me.

But I think it would be better to undo the transform if needed. There is no guarantee that LICM is the only source of this pattern and it's generally considered that backends should support any form of valid LLVM IR.

Agreed. In some ways this is even worse than the TTI hook because now you're modifying a cl::opt option from TTI, which is not how cl::opt is supposed to be used. This means that for example if you use BPF and some other target in the same process, this option is going to leak across them, not to mention that the modification is probably not thread-safe.

The only proper way to address this is via an undo transform.

[...]

Backends absolutely need knobs to control profitability of transformations. Undoing optimization in the backend is not practical. In many cases it's not possible to undo.

I think the idea here is that not all transformations are generically good for all backends. There is a cost model for instructions could there be something similar for transformations as well? Then BPF could mark the cost here somewhat high noting it doesn't actually provide value?

  • separately checking EnableMinMaxHoisting and update the test with -print-before=licm and -print-after=licm to clearly show the expected ir

cost model could be an option. I just updated a version based on previous 'flag' based approach. Also thanks @ast for some clarification, for some transformations, due to bpf verifier uniqueness (verification of the code before executing them), some transformation might be unprofitable for bpf, so it could be great if we can have a simple way to declare a particular transformation not profitable.

mkazantsev added inline comments.Apr 5 2023, 3:51 AM
llvm/lib/Transforms/Scalar/LICM.cpp
998

I meant doing this inside hoistMinMax, so that it works for all its callers.

mkazantsev added a comment.EditedApr 5 2023, 3:56 AM

for (i; i < var && i < const_val; i++)

BPF CPU is smart enough to understand that 'i' will not be bigger than const_val. It can check that array[i] within bounds and so on.

I don't see how this transform is a problem for in-bound checks. Do you have a particular example where we can remove a bound check with old form but cannot do so with new form? IndVars should handle both, and if it doesn't, it's a bug we should fix. I also don't see how this kind of issues is related to any particular backend. Does it eliminate range checks in codegen rather than relying on high-level opts? Why?

Here is example showing that at least IndVars is well-aware about smin semantics and can remove bound checks for them: https://godbolt.org/z/7cjzfKvf9

It's a valid LLVM IR, but not a profitable optimization.

I disagree. The transform in question is hoisting computations out of loop, so that we only do one comparison instead of two on every iteration. If you call it not profitable, please give a real proof. I don't understand your speculations about hypothetical X86 CPU that removes bound checks but only in one form.

mkazantsev requested changes to this revision.EditedApr 5 2023, 4:23 AM

Reading through discussion, I now think that we should not go with the option. The reason why performance degraded is not well understood.

If the claim is "this is not a profitable transform in general", I want to see a proof. The tranform is removing number of checks on every loop iteration, so it is surprising to hear that.

If the claim is "it is inhibiting some other optimizations (for example, bound checks elimination)" then I'd like to see an exact example. I made some experiments with it and now am pretty sure that in many cases IndVars, IRCE and other passes that are responsible for range checks elimination do know what is smin and can work with it as well as with two separate checks. Maybe there is a case when they don't support it, and in this case I'd like to see this test and fix it. It would be a good general improvement too.

Finally, if the claim is "this backend works bad with it for some low-level/hardware reasons" - OK, then the right way is to undo the transform. Optimizer is not committed to know about such kind of issues of backends. I also have doubts that it's the case.

So please, if you want to move ahead, please find the reason of your performance degradation and describe it here. Maybe we have just exposed a missing case for IndVarSimplify or something, and it's an opportunity to fix it.

This revision now requires changes to proceed.Apr 5 2023, 4:23 AM
ast added a comment.Apr 5 2023, 9:39 AM

Reading through discussion, I now think that we should not go with the option. The reason why performance degraded is not well understood.

If the claim is "this is not a profitable transform in general", I want to see a proof. The tranform is removing number of checks on every loop iteration, so it is surprising to hear that.

There was no such claim. It's unprofitable for BPF.

It would be a good general improvement too.

'general improvement' is true only for a subset of CPUs. It's not an improvement for BPF "CPU".

Optimizer is not committed to know about such kind of issues of backends.

This is a false statement. Optimizer has to work hand and hand with the backend.
Several backends have loop insns and backends go out of their way to pattern match loops in the backend.
When optimizer does something "smart" claiming it's a valid llvm IR it may screw up the analysis and valid C programs would get rejected because of optimizer.
This case is not loop related, but similar in a sense that optimizer hurts a backend.

So please, if you want to move ahead, please find the reason of your performance degradation and describe it here.

Performance degradation on hypothetical x86 cpu was an illustration of problem. The real problem is unprofitable transformation.
Undoing it in the backend is fragile. It may or may not work and users writing programs in C will have no idea why their valid C code broke.
Therefore backend has to stop optimizer performing unprofitable transformations.
This is not the only unprofitable transformation. See Andrii's examples of other transformations that we need to disable for BPF backend specifically.

xbolva00 added a comment.EditedApr 5 2023, 9:58 AM

Undoing it in the backend is fragile.

Can you explain it why? or in CodegenPrepare? Why is it fragile?

As a bonus you will handle situation when there is < min(...) directly in your code.

Backends must follow some rules given by the design of LLVM, LLVM should not be full of ugly (but quickly done) hacks to make backends happy, and currently we saw no proofs that undo transform is not possible to do it.

fhahn added a comment.Apr 5 2023, 10:05 AM

I think the idea here is that not all transformations are generically good for all backends. There is a cost model for instructions could there be something similar for transformations as well? Then BPF could mark the cost here somewhat high noting it doesn't actually provide value?

Agreed that some transformations are driven by backend cost decisions. But there are also canonicalization transforms where one of the main goals is transforming the IR into a form that's easier for other passes & analysis to handle. LICM is at least partly performs general canonicalization which is completely target-independent. Another example of this is that LICM completely ignores register pressure (i.e. target properties) when hoisting code (IIUC), relying on the backend to sink it back if needed.

ast added a comment.Apr 5 2023, 10:08 AM

Undoing it in the backend is fragile.

Can you explain it why? or in CodegenPrepare? Why is it fragile?

As a bonus you will handle situation when there is < min(...) directly in your code.

It's not a bonus. People don't write such code now because the verifier doesn't recognize it.

Backends must follow some rules given by the design of LLVM, LLVM should not be full of ugly (but quickly done) hacks to make backends happy, and currently we saw no proofs that undo transform is not possible to do it.

Ugly is relative. You're arguing that single line boolean check makes optimizer ugly while suggesting to add hundreds of pattern matching fragile code to the backend. Makes zero sense.

btw hexagon might be another example where hoistMin transformation may hurt performance.
See HexagonHardwareLoops.cpp and computeCount().
The backend has to know the constant end value of the loop to do a HW loop.
hoistMin "optimization" hurts this analysis, since backend is not able to know the upper bound.
It's similar for BPF, but in our case it's the kernel verifier that has to know the upper bound.
My limited understanding of hexagon arch is that inability to use hw loop is not the end of the world. Just slow performance.
Whereas for BPF backend it's full breakage. Because of "optimization" valid C programs get rejected by the kernel.

nikic requested changes to this revision.Apr 5 2023, 12:38 PM

I'm rejecting this request for a TTI policy exception for the BPF backend in LICM. I've reviewed the discussion, but have found no grounds to support such an exception.

The discussion seems to be based around an assumption that backends should have a right to opt-out of certain transforms -- however, this assumption is simply incorrect for target-independent canonicalization passes like LICM. An exception would require some special circumstance, e.g. the inability to perform a backend undo transform because the transform is lossy, becomes lossy in conjunction with other transforms, or is not reversible without undue effort for other reasons. However, no evidence that this is the case here has been presented.

In fact, the assumptions underlying this discussion make me especially unwilling to grant an exception, because it seems clear that such an exception would be taken as precedent to make similar changes in other places. While I don't care particularly strongly about maintaining canonicality for this particular (relatively exotic) transform, extending this to certain other areas would be much more problematic. For example, the discussion mentions the InstCombine range check canonicalization as another motivating example, and that one is an important part of canonical IR form. As such, I would rather nip this entire direction in the bud.

@nikic @mkazantsev I created another diff https://reviews.llvm.org/D147968 which use a different approach (through TTI) and more description on why we want to use TTI hooks. Could you take a look there as well? Thanks!