This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
bolt/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86MCPlusBuilder.cpp
-
test/X86/
-
X86/
-
shrinkwrapping-and-rsp.s

Differential D126110

[BOLT] Fix AND evaluation bug in shrink wrapping
ClosedPublic

Authored by rafauler on May 20 2022, 8:19 PM.

Download Raw Diff

Details

Reviewers

Amir
maksfb
yota9
ayermolo
zr33

Commits

rGc09cd64e5c6d: [BOLT] Fix AND evaluation bug in shrink wrapping

Summary

Fix a bug where shrink-wrapping would use wrong stack offsets
because the stack was being aligned with an AND instruction, hence,
making its true offsets only available during runtime (we can't
statically determine where are the stack elements and we must give up
on this case).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rafauler created this revision.May 20 2022, 8:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2022, 8:19 PM

Herald added a subscriber: pengfei. · View Herald Transcript

rafauler requested review of this revision.May 20 2022, 8:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2022, 8:19 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B165635: Diff 431115.May 20 2022, 8:25 PM

I'm concerned that the fix is an overkill and doesn't address the root cause.
From what I understand, evaluateSimple can only evaluate a constant when given an instruction it understands and a pair of <reg, val> for use in evaluation. So the problem must be with the calling code – that the value of rsp is considered a constant while it's actually not (at least not in this case).

Even if it's not the case, I would like to keep the evaluation of and – it might be handy in other contexts – but restrict shrink-wrapping from using it.

In D126110#3529103, @Amir wrote:

I'm concerned that the fix is an overkill and doesn't address the root cause.
From what I understand, evaluateSimple can only evaluate a constant when given an instruction it understands and a pair of <reg, val> for use in evaluation. So the problem must be with the calling code – that the value of rsp is considered a constant while it's actually not (at least not in this case).

Even if it's not the case, I would like to keep the evaluation of and – it might be handy in other contexts – but restrict shrink-wrapping from using it.

That makes a lot of sense! Here's what I'm thinking. I would like to have two versions of this function to support this requirement. When we are evaluating offsets (such as stack offsets), only a subset of operations are supported. That's because most of the time we are operating without full knowledge of the actual operands, but rather offsets that are added to an unknown base (x + Offset, and we are doing transformations to Offset to reason about a specific property). In these cases, using other operators that don't have associativity with addition of (x + Offset), where x is the stack base, will break the analysis. In the case of the bug fixed here, "(x + Offset) AND Constant" is not the same as "x + (Offset AND Constant)" (no associativity), hence we can't support AND. But "(x + Offset) + Displacement+ Index * Size", such as LEA, is the same as "x + (Offset + Displacement + Index * Size)", so it's fine to support LEA as long as we are only feeding Base as the input of the expression. ADD and SUB are supported as well.

But I don't think it's a good idea to add a boolean value such as "OnlyBaseOffsetAssociativeOperations=true" as a parameter to this function. Rather, I would prefer to break it into two functions, evaluateSimple, used for offsets, and evaluate(), which calls evaluateSimple and if it fails, try a bunch more operations that assume that the operands are fully known during evaluation time. But at the moment, I can't find any places in our codebase that would be users of evaluate() (the more general evaluation function), and thus I'm less inclined to add it as it would be currently dead code. Let me know if you have other ideas on how to move forward. Meanwhile I'll improve the comments on the usage of evaluateSimple to make it clear its intended usage.

In D126110#3532607, @rafauler wrote:

In D126110#3529103, @Amir wrote:

I'm concerned that the fix is an overkill and doesn't address the root cause.
From what I understand, evaluateSimple can only evaluate a constant when given an instruction it understands and a pair of <reg, val> for use in evaluation. So the problem must be with the calling code – that the value of rsp is considered a constant while it's actually not (at least not in this case).

Even if it's not the case, I would like to keep the evaluation of and – it might be handy in other contexts – but restrict shrink-wrapping from using it.

That makes a lot of sense! Here's what I'm thinking. I would like to have two versions of this function to support this requirement. When we are evaluating offsets (such as stack offsets), only a subset of operations are supported. That's because most of the time we are operating without full knowledge of the actual operands, but rather offsets that are added to an unknown base (x + Offset, and we are doing transformations to Offset to reason about a specific property). In these cases, using other operators that don't have associativity with addition of (x + Offset), where x is the stack base, will break the analysis. In the case of the bug fixed here, "(x + Offset) AND Constant" is not the same as "x + (Offset AND Constant)" (no associativity), hence we can't support AND. But "(x + Offset) + Displacement+ Index * Size", such as LEA, is the same as "x + (Offset + Displacement + Index * Size)", so it's fine to support LEA as long as we are only feeding Base as the input of the expression. ADD and SUB are supported as well.

But I don't think it's a good idea to add a boolean value such as "OnlyBaseOffsetAssociativeOperations=true" as a parameter to this function. Rather, I would prefer to break it into two functions, evaluateSimple, used for offsets, and evaluate(), which calls evaluateSimple and if it fails, try a bunch more operations that assume that the operands are fully known during evaluation time. But at the moment, I can't find any places in our codebase that would be users of evaluate() (the more general evaluation function), and thus I'm less inclined to add it as it would be currently dead code. Let me know if you have other ideas on how to move forward. Meanwhile I'll improve the comments on the usage of evaluateSimple to make it clear its intended usage.

Thank you, now I understand the problem and how this change addresses it.
I agree that removing the handling of AND and renaming the function to something like evaluateAssociative (simple is not self-explanatory) would be OK.
We can add a generic evaluate later on when the need arises.

Rename evaluateSimple function and make it clear its intended usage.

Thanks!

This revision is now accepted and ready to land.May 23 2022, 3:49 PM

Harbormaster completed remote builds in B165933: Diff 431504.May 23 2022, 3:52 PM

Fix nit in testcase and update assert message in ValidateInternalCalls

Harbormaster completed remote builds in B165936: Diff 431507.May 23 2022, 4:01 PM

Closed by commit rGc09cd64e5c6d: [BOLT] Fix AND evaluation bug in shrink wrapping (authored by rafauler). · Explain WhyMay 26 2022, 2:59 PM

This revision was automatically updated to reflect the committed changes.

rafauler added a commit: rGc09cd64e5c6d: [BOLT] Fix AND evaluation bug in shrink wrapping.

Revision Contents

Path

Size

bolt/

lib/

Target/

X86/

X86MCPlusBuilder.cpp

10 lines

test/

X86/

shrinkwrapping-and-rsp.s

55 lines

Diff 431115

bolt/lib/Target/X86/X86MCPlusBuilder.cpp

Show First 20 Lines • Show All 1,254 Lines • ▼ Show 20 Lines	auto getOperandVal = [&](MCPhysReg Reg) -> ErrorOr<int64_t> {
return Input2.second;		return Input2.second;
return make_error_code(errc::result_out_of_range);		return make_error_code(errc::result_out_of_range);
};		};

switch (Inst.getOpcode()) {		switch (Inst.getOpcode()) {
default:		default:
return false;		return false;

case X86::AND64ri32:
case X86::AND64ri8:
if (!Inst.getOperand(2).isImm())
return false;
if (ErrorOr<int64_t> InputVal =
getOperandVal(Inst.getOperand(1).getReg()))
Output = *InputVal & Inst.getOperand(2).getImm();
else
return false;
break;
case X86::SUB64ri32:		case X86::SUB64ri32:
case X86::SUB64ri8:		case X86::SUB64ri8:
if (!Inst.getOperand(2).isImm())		if (!Inst.getOperand(2).isImm())
return false;		return false;
if (ErrorOr<int64_t> InputVal =		if (ErrorOr<int64_t> InputVal =
getOperandVal(Inst.getOperand(1).getReg()))		getOperandVal(Inst.getOperand(1).getReg()))
Output = *InputVal - Inst.getOperand(2).getImm();		Output = *InputVal - Inst.getOperand(2).getImm();
else		else
▲ Show 20 Lines • Show All 2,444 Lines • Show Last 20 Lines

bolt/test/X86/shrinkwrapping-and-rsp.s

This file was added.

				# This checks that shrink wrapping does attempt at accessing stack elements
				# using RSP when the function is aligning RSP and changing offsets.

				# REQUIRES: system-linux

				# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown \
				# RUN: %s -o %t.o
				# RUN: link_fdata %s %t.o %t.fdata
				# RUN: llvm-strip --strip-unneeded %t.o
				# RUN: %clang %cflags %t.o -o %t.exe -Wl,-q -nostdlib
				# RUN: llvm-bolt %t.exe -o %t.out -data %t.fdata \
				# RUN: -frame-opt=all -simplify-conditional-tail-calls=false \
				# RUN: -eliminate-unreachable=false \| FileCheck %s

				# Here we have a function that aligns the stack at prologue. Stack pointer
				# analysis can't try to infer offset positions after AND because that depends
				# on the runtime value of the stack pointer of callee (whether it is misaligned
				# or not).
				.globl _start
				.type _start, %function
				_start:
				.cfi_startproc
				# FDATA: 0 [unknown] 0 1 _start 0 0 1
				push %rbp
				mov %rsp, %rbp
				push %rbx
				push %r14
				and $0xffffffffffffffe0,%rsp
				subq $0x20, %rsp
				b: je hot_path
				# FDATA: 1 _start #b# 1 _start #hot_path# 0 1
				cold_path:
				mov %r14, %rdi
				#mov %rbx, %rdi
				# Block push-pop mode by using an instruction that requires the
				# stack to be aligned to 128B. This will force the pass
				# to try to index stack elements by using RSP +offset directly, but
				# we do not know how to access individual elements of the stack thanks
				# to the alignment.
				movdqa %xmm8, (%rsp)
				addq $0x20, %rsp
				pop %r14
				pop %rbx
				pop %rbp
				ret
				hot_path:
				addq $0x20, %rsp
				pop %r14
				pop %rbx
				pop %rbp
				ret
				.cfi_endproc
				.size _start, .-_start

				# CHECK: BOLT-INFO: Shrink wrapping moved 0 spills inserting load/stores and 0 spills inserting push/pops