Page MenuHomePhabricator

[GVN] Don't replace argument to @llvm.is.constant.*()
AbandonedPublic

Authored by jonpa on Oct 27 2020, 7:36 AM.

Details

Summary

When building the Linux kernel with clang on SystemZ, the use of __builtin_constant_p() lead to the error "error: invalid operand for inline asm constraint 'i'". The purpose of this function is to do a compile-time check to see if the argument is constant or not. If it is not, then it should evaluate to false and the (in this case) inline asm statement should be removed.

GVN optimized a conditional branch controlled by a compare with zero. It deduced that in a specific block, the value must be zero since the branch must be taken to reach there. This lead to the non-constant argument (also the compared value) passed to builtin_constant_p() being replaced with a zero in the successor block. This in turn resulted in the inline asm not being removed from the program and thus the error message of "i" being used incorrectly (there were multiple builtin_constant_p calls in the program).

This patch suggest one way of handling this, namely by excluding the llvm.is.constant intrinsic from being optimized by GVN and similar passes...

Diff Detail

Event Timeline

jonpa created this revision.Oct 27 2020, 7:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 27 2020, 7:36 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

Was semantics of LLVM's @llvm.is.constant defined in some document since the last time @llvm.is.constant patch?
This (especially the Local.cpp changes) seem rather heavy handed.

Was semantics of LLVM's @llvm.is.constant defined in some document since the last time @llvm.is.constant patch?

I see in the reference manual: "If its argument is known to be a manifest compile-time constant value, then the intrinsic will be converted to a constant true value. Otherwise, it will be converted to a constant false value..." So clearly, having GVN turn this into a run-time check is wrong.

This (especially the Local.cpp changes) seem rather heavy handed.

I am open to alternatives... I think it makes general sense to not optimize the argument to @llvm.is.constant. It might be possible to replace it as long as the new value falls into the same category as the old one of being compile-time constant or not, but that seems like unnecessary work to me. Unless it would help some later optimization somehow to produce better code?

This was my first idea of a handling to illustrate the problem, and it has been confirmed to resolve the problems in the original test case.

Since these methods by nature do not replace *all* uses, but only the dominated/non-local ones, it seems safe to me to exclude a user from the transformation. The caller is not expecting to always be able to remove the old value from the function.

Was semantics of LLVM's @llvm.is.constant defined in some document since the last time @llvm.is.constant patch?

I see in the reference manual: "If its argument is known to be a manifest compile-time constant value, then the intrinsic will be converted to a constant true value. Otherwise, it will be converted to a constant false value..." So clearly, having GVN turn this into a run-time check is wrong.

I'm sorry, i do not understand. Is the test case over-reduced?
In land.rhs BB, %shl *is* constant, and therefore the folding happens:

$ ./bin/opt -gvn /tmp/test.ll -debug
Args: ./bin/opt -gvn /tmp/test.ll -debug 
WARNING: You're attempting to print out a bitcode file.
This is inadvisable as it may cause display problems. If
you REALLY want to taste LLVM bitcode first-hand, you
can force output with the `-f' option.

GVN iteration: 0
GVN: load i32 %0 has unknown dependence
Replace dominated use of 'shl' as i32 0 in i32 0
GVN removed:   %2 = tail call i1 @llvm.is.constant.i32(i32 0)
GVN removed:   %phi.cast = zext i1 true to i32
GVN iteration: 1
GVN: load i32 %0 has unknown dependence

I think it makes general sense to not optimize the argument to @llvm.is.constant.

Sometimes, it may be necessary to optimize the argument to @llvm.is.constant so that you even recognize it is in fact constant. An example might be a constant argument propagated into an inlined function which uses @llvm.is.constant on the argument variable. (This seems to be a quite typical use of the predicate e.g. in kernel sources.)

Not sure if this would be affected by your proposed patch, but it's something to keep in mind.

jonpa added a comment.Oct 28 2020, 2:33 AM

I'm sorry, i do not understand. Is the test case over-reduced?
In land.rhs BB, %shl *is* constant, and therefore the folding happens:

IIUC, the idea of the intrinsic is to check if a value is *compile-time* constant. %shl is known to be 0 in %land.rhs given the icmp, but during runtime it actually can have any value. I see your point that this could actually be seen as a compile-time constant, but still the purpose here is to let the programmer use this as a predicate in a small function to generate different code depending on the invocation. In my example, the inline asm constraint "i" is to be used whenever the value is a constant (and legal for "i"). What happened was that a non-constant value (%shl) was used with is.constant(), which then (in a slightly more complicated test case) ended up as the value used with "i" in a block where it was true during runtime.

That is however illegal code and expected to be removed by this predicate - it is not OK to have an inline asm with "i" and a loaded value in a block just because an icmp proved it to be 0... :-)

I'm sorry, i do not understand. Is the test case over-reduced?
In land.rhs BB, %shl *is* constant, and therefore the folding happens:

IIUC, the idea of the intrinsic is to check if a value is *compile-time* constant. %shl is known to be 0 in %land.rhs given the icmp, but during runtime it actually can have any value.

In land.rhs it can only ever be 0, because in land.rhs, %shl got replaced with 0.

I think i'd like to see the bigger/less reduced test case..

I see your point that this could actually be seen as a compile-time constant, but still the purpose here is to let the programmer use this as a predicate in a small function to generate different code depending on the invocation. In my example, the inline asm constraint "i" is to be used whenever the value is a constant (and legal for "i"). What happened was that a non-constant value (%shl) was used with is.constant(), which then (in a slightly more complicated test case) ended up as the value used with "i" in a block where it was true during runtime.

That is however illegal code and expected to be removed by this predicate - it is not OK to have an inline asm with "i" and a loaded value in a block just because an icmp proved it to be 0... :-)

jonpa added a comment.Oct 28 2020, 2:43 AM

I think it makes general sense to not optimize the argument to @llvm.is.constant.

Sometimes, it may be necessary to optimize the argument to @llvm.is.constant so that you even recognize it is in fact constant. An example might be a constant argument propagated into an inlined function which uses @llvm.is.constant on the argument variable. (This seems to be a quite typical use of the predicate e.g. in kernel sources.)

Not sure if this would be affected by your proposed patch, but it's something to keep in mind.

That's a good point... Maybe that means that each optimizer really should take care of this on its own. Perhaps having the GVN pass (and probably others) pass 'true' to a new (default-false) argument to replaceDominatedUsesWith() to make this exception take effect would be better? That way the value would by default be replaced, but not for this particular type of optimization...

jonpa added a comment.Oct 28 2020, 2:53 AM

I'm sorry, i do not understand. Is the test case over-reduced?
In land.rhs BB, %shl *is* constant, and therefore the folding happens:

IIUC, the idea of the intrinsic is to check if a value is *compile-time* constant. %shl is known to be 0 in %land.rhs given the icmp, but during runtime it actually can have any value.

In land.rhs it can only ever be 0, because in land.rhs, %shl got replaced with 0.

I think i'd like to see the bigger/less reduced test case..

Sure, this is the one I was working with:

int a, b;
void c() {
  b = a << 12;
  if (__builtin_constant_p(b > -129 && b < 128 && __builtin_constant_p(b)) ?
        (b > -129 && b < 128 && __builtin_constant_p(b)) :
        ({ (b > -129 && b < 128 && __builtin_constant_p(b)) ? 1 : 0; }))
    asm("" : : "i"(b));
}

clang -O2 -c

error: invalid operand for inline asm constraint 'i'
    asm("" : : "i"(b));
joerg added a comment.Oct 28 2020, 5:39 AM

I completely agree with Roman that this patch is very undesirable since it has a high chance of breaking other reasonable uses of the intrinsic. In fact, I would say that the observed behavior here is perfectly reasonable under the definition of @lllvm.is.constant. The original code for LLVM at least should be just __builtin_constant_p(b) && b > -129 && b < 128).

lebedev.ri requested changes to this revision.Oct 28 2020, 7:41 AM
This revision now requires changes to proceed.Oct 28 2020, 7:41 AM
jonpa added a comment.Oct 29 2020, 4:19 AM

I completely agree with Roman that this patch is very undesirable since it has a high chance of breaking other reasonable uses of the intrinsic.

Would it help if the replaceXXXUsesWith() only excluded the intrinsic from this particular GVN optimization, and not all the time?

In fact, I would say that the observed behavior here is perfectly reasonable under the definition of @lllvm.is.constant. The original code for LLVM at least should be just __builtin_constant_p(b) && b > -129 && b < 128).

The example is reduced but still represents code in the Linux kernel compilable by GCC, but not by clang. Here is a bigger picture (still reduced, but not as much:

static inline __attribute__((__gnu_inline__)) __attribute__((__unused__))
__attribute__((__no_instrument_function__))
__attribute__((__always_inline__)) void
__atomic_add_const(int val, int *ptr)
{
	asm volatile("asi"
		     "	%[ptr],%[val]\n"
		     "\n"
		     : [ptr] "+Q"(*ptr)
		     : [val] "i"(val)
		     : "cc", "memory");
}

static inline __attribute__((__gnu_inline__)) __attribute__((__unused__))
__attribute__((__no_instrument_function__)) int
__atomic_add(int val, int *ptr)
{
	int old;
	asm volatile("laa"
		     "	%[old],%[val],%[ptr]\n"
		     "\n"
		     : [old] "=d"(old), [ptr] "+Q"(*ptr)
		     : [val] "d"(val)
		     : "cc", "memory");
	return old;
}

static inline __attribute__((__gnu_inline__)) __attribute__((__unused__))
__attribute__((__no_instrument_function__)) void
atomic_add(int i, atomic_t *v)
{
	if ((__builtin_constant_p(
		     !!((i > -129) && (i < 128) && __builtin_constant_p(i))) ?
			   (!!((i > -129) && (i < 128) && __builtin_constant_p(i))) :
			   ({ (!!((i > -129) && (i < 128) && __builtin_constant_p(i))) ? 1 : 0; }))) {
		__atomic_add_const(i, &v->counter);
		return;
	}

	__atomic_add(i, &v->counter);
}

void tty_buffer_reset(long size);
void tty_buffer_alloc(atomic_t v, long size)
{
	size = ((((size)) + ((255))) & ~((255)));

	tty_buffer_reset(size);
	atomic_add(size, &v);
}

#define IB_SET_POST_CREDITS(v)	((v) << 16)
void rds_ib_advertise_credits(atomic_t v, unsigned int posted)
{
	atomic_add(IB_SET_POST_CREDITS(posted), &v);
}

/*
 * more cases like:
 * atomic64_add((u64)bo->num_pages << PAGE_SHIFT, &rdev->num_bytes_moved);
 */

As this shows, atomic_add_const() is supposed to be called only when atomic_add() is called with a compile-time constant. Note that __atomic_add_const() (after inlining etc) will not even compile if the argument is not const, since it uses the "i" inline asm constraint. So having a runtime check against 0 is not acceptable - it doesn't compile!

I wonder

  • what other use case of __builtin_constant_p() would make this runtime check desirable?
  • what other way than disabling this in GVN is there to make the test case compile?
joerg added a comment.Oct 29 2020, 4:45 AM

"It compiles with GCC" is not really helpful as argument because the observed behavior depends on a lot of internals. It is quite frankly impossible to be 100% identical to GCC's behavior. What LLVM guarantees is:
(1) Folding to false happens as late as reasonable possible. It happens most importantly after the first round of function and loop optimizations.
(2) Folding to true happens as part of constant folding etc as early as the condition is evaluated.
(3) The result is propagated to derived expression and those are folded recursively, including dead code elimination. This happens even for -O0.

That's why __builtin_constant_p(i) followed by the range check is fine as far as LLVM is concerned. With -O0, it fails as it doesn't have the help of SROA and the whole code branch is killed. Otherwise, i is a known constant and the range check is constant folded. The assembler constraint is verified using the same machinery anyway.

"It compiles with GCC" is not really helpful as argument because the observed behavior depends on a lot of internals. It is quite frankly impossible to be 100% identical to GCC's behavior. What LLVM guarantees is:
(1) Folding to false happens as late as reasonable possible. It happens most importantly after the first round of function and loop optimizations.
(2) Folding to true happens as part of constant folding etc as early as the condition is evaluated.
(3) The result is propagated to derived expression and those are folded recursively, including dead code elimination. This happens even for -O0.

That's why __builtin_constant_p(i) followed by the range check is fine as far as LLVM is concerned. With -O0, it fails as it doesn't have the help of SROA and the whole code branch is killed. Otherwise, i is a known constant and the range check is constant folded. The assembler constraint is verified using the same machinery anyway.

OK - I have tried a different approach at https://reviews.llvm.org/D91786...

jonpa abandoned this revision.Dec 8 2020, 8:49 AM