Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[BPF] Adjust optimizations to generate kernel verifier friendly codes

Authored by yonghong-song on Jan 15 2020, 10:11 AM.



The InstCombine phase did the following optimization:

V >= Lo && V <  Hi --> V - Lo u<  Hi - Lo
V <  Lo || V >= Hi --> V - Lo u>= Hi - Lo

This generates code like below:  = V - Lo
ConstV = Hi - Lo
Cond = u< ConstV
if (Cond) ... 
... V is used ...

The current linux verifier is not able to handle such a sequence
and may reject the prog as it did not used refined value range
for V at the place of "V is used".

Previous attempt,, is to disable
this optimization under BPF target. But it violates the principle
that InstCombiner is target independent.

This patch implemented an IR pass to make the code more friendly
to the verifier. The IR pass will transform the undo the above
InstCombine optimization.

V - Lo u< Hi - Lo --> V >= Lo && V < Hi
V - Lo u>= Hi - Lo --> V < Lo || V >= Hi

In addition, the following two cases are also handled.

V - Lo u<= Hi - Lo --> V >= Lo && V <= Hi
V - Lo u> Hi - Lo --> V <  Lo || V > Hi

As a concrete example, the source code:

#pragma clang loop unroll(disable)
for (i = 0; i < 100; ++i) {
  ret = ext_test(value + off);
  if (ret <= 0 || ret > 7)
    return 0;
  off += ret & 7;

The assembly code with -mcpu=v3:

        r1 = value ll
        r1 += r6
        call ext_test
        if w0 s< 1 goto LBB0_4
        if w0 s> 7 goto LBB0_4
        r1 = w0
        r1 <<= 32
        r1 >>= 32
        r6 += r1
        w7 += -1
        if w7 != 0 goto LBB0_1

Diff Detail

Event Timeline

yonghong-song created this revision.Jan 15 2020, 10:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 15 2020, 10:11 AM

The current kernel did not handle signed compare well for 32bit registers. I will need to make that change first in order to make this llvm change effective.

ast added a comment.Jan 15 2020, 1:02 PM

looks fine. could you run it against kernel selftests and check when it triggers? since verifier cannot handle the optimized code there should be almost no cases of reverse transformation.
And if there are then add them as .ll tests?

Indeed tried kernel selftests, revert the previous workaround like

diff --git a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
index d22e438198cf..608a06871572 100644
--- a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
+++ b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
@@ -44,10 +44,7 @@ int sysctl_tcp_mem(struct bpf_sysctl *ctx)
        unsigned long tcp_mem[TCP_MEM_LOOPS] = {};
        char value[MAX_VALUE_STR_LEN];
        unsigned char i, off = 0;
-       /* a workaround to prevent compiler from generating
-        * codes verifier cannot handle yet.
-        */
-       volatile int ret;
+       int ret;
        if (ctx->write)
                return 0;

The optimization kicked in. The adjust-instcombine-1.ll is exactly for the above case. But kernel verifier needs some work to handle 32bit sign comparisons, which I will work on later.

The adjust-instcombine-2.ll is also derived from a real use case inside Facebook.

fix potential overflow issues

ast accepted this revision.Jan 16 2020, 10:24 AM
This revision is now accepted and ready to land.Jan 16 2020, 10:24 AM
yonghong-song edited the summary of this revision. (Show Details)

Added a machineinstr SSA target phase to add left/aright shift operations if the 32bit subregister is used for sle/slt/sge/sgt operations.

ast added inline comments.Jan 17 2020, 7:22 PM
78 ↗(On Diff #238927)

I'm not sure that is correct. Reading lower 32-bit doesn't clear upper bits. I think pattern matching would need to consider the origin of w0. It's not a guarantee that it came as return from a function.
I think teaching verifier to understand this is a better option.

yonghong-song marked an inline comment as done.Jan 17 2020, 7:31 PM
yonghong-song added inline comments.
78 ↗(On Diff #238927)

good point. the use case is from the function return and then use w0 after that. I want to try verifier because it is tricky to get it 100% right as so many things tangled together. But maybe in verifier we do want 32bit register tracking. The work may help other potential 32bit related issues.

I may not quite understand what the verifier is looking for here, but rather than trying to undo the optimization in 2 different places, would it be possible to implement this by having the InstructionSelector emit some kind of Pseudo Instruction for inputs to the comparison? For example, what would normally be selected to:

if w0 s< 1 goto LBB0_4

You could instead emit:

w0 = verify_range w0
if w0 s< 1 goto LBB0_4

verify_range would be a pseudo instruction that gets expanded to a real instruction either by ExpandPostRAPseudo or during MC lowering.

This seems like it would be more simple to implement and also more robust.

@tstellar Thanks for your comments. The MachineInstr SSA based optimization is really a ugly hack. Currently verifier is not able to verify the program correctly, i.e., rejects the program for certain patterns related subregister uses. The commit message provides more information. @ast suggested to look at whether we can improve kernel verifier and I will take a look there. If kernel can be enhanced, I will drop the second MachineInstr SSA pass.

yonghong-song edited the summary of this revision. (Show Details)

revert back to previous version where no hacking with machine IR optimization to add extra "<<" and "s>>". Will try to enhance verifier to do the work.

previous version has runtime failure with latest code base. Fixed the issue and reload the working code.

V >= Lo && V < Hi --> V - Lo u< Hi - Lo
V < Lo || V >= Hi --> V - Lo u>= Hi - Lo

Isn't the following possible as well?

V >= Lo && V <= Hi --> V - Lo u<= Hi - Lo
V < Lo || V > Hi --> V - Lo u> Hi - Lo

Should the code handle them as well?

V >= Lo && V < Hi --> V - Lo u< Hi - Lo
V < Lo || V >= Hi --> V - Lo u>= Hi - Lo

Isn't the following possible as well?

V >= Lo && V <= Hi --> V - Lo u<= Hi - Lo
V < Lo || V > Hi --> V - Lo u> Hi - Lo

Should the code handle them as well?

This is actually handled. If you have code like

if (v > 0) { if (v <= 16) ...}

The compiler will first transform it to

if (v >= 1) { if (v <= 17) ... }

and then do the above transformation.
With the pre-transformation, you limit the downword cases.

yonghong-song edited the summary of this revision. (Show Details)

removed MachineInstr AdjustOpt transformation since latest linux 32bit handling has been improved. No need to llvm to generate additional shifts. Also handled two more cases in IR AdjustOpt for

V - Lo u<= Hi - Lo --> V >= Lo && V <= Hi
V - Lo u> Hi - Lo --> V <  Lo || V > Hi
yonghong-song abandoned this revision.Aug 6 2020, 5:50 PM

Instead of undoing instcombine optimization, we will implement a BPF IR pass which is called before instcombine to modifying IR to prevent the optimization. So abandon this revision now.