This is an archive of the discontinued LLVM Phabricator instance.

Differential D20145

Fix modulo compared to zero
ClosedPublic

Authored by chrisj on May 10 2016, 5:51 PM.

Download Raw Diff

Details

Reviewers

Meinersbur
grosser

Commits

rG5c527f9963cb: Fix modulo compared to zero.
rPLO271707: Fix modulo compared to zero.
rL271707: Fix modulo compared to zero.

Summary

In case of modulo compared to zero, we need to do signed modulo operation as unsigned can give different results based on whether the dividend is negative or not.

This addresses PR27707.

Diff Detail

Repository: rL LLVM

Event Timeline

chrisj updated this revision to Diff 56843.May 10 2016, 5:51 PM

chrisj retitled this revision from to Fix modulo compared to zero.

chrisj updated this object.

chrisj set the repository for this revision to rL LLVM.

chrisj updated this object.May 10 2016, 5:59 PM

chrisj added a reviewer: grosser.

chrisj added a project: Restricted Project.

chrisj added a subscriber: pollydev.

chrisj added a reviewer: Unknown Object (User).May 10 2016, 6:01 PM

Hi Chris,

thank you for directly providing a patch for the issues you find!

Regarding this patch, I wonder what exactly is the issue you are seeing. My feeling is this might just be a lack of documentation on the ast expression nodes isl is generating. Specifically, in case isl generates a node isl_ast_op_zdiv_r, it has proven that the node is only ever compared against zero. So we can replace srem with urem as long as for all operands where srem yields zero, urem also yields zero and for operands where srem yields a non-zero result, urem also yields a non-zero result. The code was written under the assumption that this is the case.

In http://llvm.org/PR27707 you give the example "For example if (a1 - b1) = -48 (0xFFFFFFFFFFFFFFD0), the modulo with 24 would be 16". This seems to be still correct. Even though the result computed in %pexp.zdiv_r = urem i64 %25, 24 is different, the result of the comparision is again true for both cases: %26 = icmp eq i64 %pexp.zdiv_r, 0.

Could you clarify if this problem has ever introduced misbehavior in programs compiled with Polly.

Best,
Tobias

Hi Tobi,

In http://llvm.org/PR27707 you give the example "For example if (a1 - b1) = -48 (0xFFFFFFFFFFFFFFD0), the modulo with 24 would be 16". This seems to be still correct. Even though the result computed in %pexp.zdiv_r = urem i64 %25, 24 is different, the result of the comparision is again true for both cases: %26 = icmp eq i64 %pexp.zdiv_r, 0.

For the mentioned example the signed remainder operation would result in 0 and while with unsigned remainder we get 16. Since isl_ast_op_zdiv_r is compared against zero, the polly.cond would take different paths based on the type of remainder operation.

We found that this caused some failures in tests where there were pointer iterations and the polly.cond had a check for modulo with a non-power of 2. Below is an example,

#include<stdio.h>
#include<stdlib.h>
typedef struct{
  int a;
  long long b;
  char c;
}A;
void   foo(A * a1, A * b1) {
  A *start;
  for (start = (void*) a1; start != b1; start++) {
    start->a += 2;
  }
}

A *a;
int main (){
  a = (A*) calloc (100, sizeof (A));
  foo (&a[0], &a[48]);
  if (a[0].a != 2)
    printf ("FAIL!\n");
  else
    printf ("PASS!\n");
  free (a);
  return 1;
}

Thanks,
Chris

Other examples:
-3 srem 3 = 0
-3 urem 3 = 0xFFFFFFFD rem 3 = 1

-1 srem 3 = -1
-1 urem 3 = 0xFFFFFFFF rem 3 = 0

lib/CodeGen/IslExprBuilder.cpp
281–287 ↗	(On Diff #56843)	This is usually the domain of InstCombine, although I couldn't find a match for such cases in the current InstCombineMulDivRem.cpp

msg-2474-198-1.txt259 BDownload
signature.asc219 BDownload

Hi Michael,

I synched up with Zino about the weekly polly discussion regarding this issue and to summarize,

Keep srem for isl_ast_op_zdiv_r
Have a llvm peep in InstCombine to optimize srem into urem

I will modify the patch to revert it back to srem for isl_ast_op_zdiv_r.

But wouldnt it be more easier to do the optimization in polly rather than pattern match in llvm, since in polly we know exactly that we are comparing against zero.

Thanks,
Chris

In D20145#434787, @chrisj wrote:

But wouldnt it be more easier to do the optimization in polly rather than pattern match in llvm, since in polly we know exactly that we are comparing against zero.

My argument is/was that the transformation is more general than only in that case. If the urem form is faster, we can optimize more code in InstCombine. It already does this transformation, eg. X srem Y -> X urem Y, iff X and Y don't have sign bit set.

The only additional work is to match whether the result is compared to zero, which we already know in zdiv_r. That shouldn't be a difficult check, In fact, InstCombine already checks where that result of an argument is used. The file InstCombineSimplifyDemanded.cpp is only about this. There is also a isKnownToBeAPowerOfTwo that does not only work on constants.

Please note that this is my subjective idea of separation of concerns and I don't have a strong opinion about it. If you find it too hard to get it into InstCombine, and a worthy transformation, just leave it as it is.

Hi Chris,

any update on this patch?

chrisj updated this revision to Diff 59151.May 31 2016, 4:43 PM

chrisj updated this object.

chrisj edited edge metadata.

Hi Tobi,

I pushed the patch. As per Michaels' suggestion I would do the transformation in llvm. In polly, I reverted it back to signed modulo for compare against zero for correctness.

Thanks,
Chris

Meinersbur accepted this revision.Jun 1 2016, 4:12 PM

Meinersbur added a reviewer: Meinersbur.

Meinersbur edited edge metadata.

This revision is now accepted and ready to land.Jun 1 2016, 4:12 PM

Closed by commit rL271707: Fix modulo compared to zero. (authored by Meinersbur). · Explain WhyJun 3 2016, 11:58 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

polly/

trunk/

lib/

CodeGen/

IslExprBuilder.cpp

2 lines

test/

Isl/

CodeGen/

pointer_rem.ll

48 lines

Diff 59594

polly/trunk/lib/CodeGen/IslExprBuilder.cpp

Show First 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	case isl_ast_op_fdiv_q: { // Round towards -infty
Res = Builder.CreateSDiv(Dividend, RHS, "pexp.fdiv_q.4");		Res = Builder.CreateSDiv(Dividend, RHS, "pexp.fdiv_q.4");
break;		break;
}		}
case isl_ast_op_pdiv_r: // Dividend is non-negative		case isl_ast_op_pdiv_r: // Dividend is non-negative
Res = Builder.CreateURem(LHS, RHS, "pexp.pdiv_r");		Res = Builder.CreateURem(LHS, RHS, "pexp.pdiv_r");
break;		break;

case isl_ast_op_zdiv_r: // Result only compared against zero		case isl_ast_op_zdiv_r: // Result only compared against zero
Res = Builder.CreateURem(LHS, RHS, "pexp.zdiv_r");		Res = Builder.CreateSRem(LHS, RHS, "pexp.zdiv_r");
break;		break;
}		}

// TODO: We can truncate the result, if it fits into a smaller type. This can		// TODO: We can truncate the result, if it fits into a smaller type. This can
// help in cases where we have larger operands (e.g. i67) but the result is		// help in cases where we have larger operands (e.g. i67) but the result is
// known to fit into i64. Without the truncation, the larger i67 type may		// known to fit into i64. Without the truncation, the larger i67 type may
// force all subsequent operations to be performed on a non-native type.		// force all subsequent operations to be performed on a non-native type.
isl_ast_expr_free(Expr);		isl_ast_expr_free(Expr);
▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

polly/trunk/test/Isl/CodeGen/pointer_rem.ll

				; RUN: opt %loadPolly -polly-process-unprofitable -polly-code-generator=isl -polly-scops -polly-ast -analyze -S < %s \| FileCheck %s --check-prefix=AST
				; RUN: opt %loadPolly -polly-process-unprofitable -polly-code-generator=isl -polly-scops -polly-codegen -S < %s \| FileCheck %s --check-prefix=CODEGEN

				target datalayout = "e-m:e-i64:64-i128:128-n8:16:32:64-S128"
				target triple = "aarch64--linux-gnu"

				; This test is to ensure that for we generate signed remainder for
				; the polly.cond check.

				; AST: isl ast :: foo1
				; AST: if ((a1 - b1) % 24 == 0)

				; CODEGEN: define void @foo1
				; CODEGEN: polly.cond:
				; CODEGEN: %pexp.zdiv_r = srem {{.*}}, 24

				%struct.A = type { i32, i64, i8 }

				; Function Attrs: norecurse nounwind
				define void @foo1(%struct.A* %a1, %struct.A* readnone %b1) #0 {
				entry:
				br label %entry.split

				entry.split: ; preds = %entry
				%cmp4 = icmp eq %struct.A* %a1, %b1
				br i1 %cmp4, label %for.cond.cleanup, label %for.body.preheader

				for.body.preheader: ; preds = %entry.split
				br label %for.body

				for.cond.cleanup.loopexit: ; preds = %for.body
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry.split
				ret void

				for.body: ; preds = %for.body.preheader, %for.body
				%start.05 = phi %struct.A* [ %incdec.ptr, %for.body ], [ %a1, %for.body.preheader ]
				%a = getelementptr inbounds %struct.A, %struct.A* %start.05, i64 0, i32 0
				%0 = load i32, i32* %a, align 8
				%add = add nsw i32 %0, 1
				store i32 %add, i32* %a, align 8
				%incdec.ptr = getelementptr inbounds %struct.A, %struct.A* %start.05, i64 1
				%cmp = icmp eq %struct.A* %incdec.ptr, %b1
				br i1 %cmp, label %for.cond.cleanup.loopexit, label %for.body
				}

This is an archive of the discontinued LLVM Phabricator instance.

Fix modulo compared to zeroClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 59594

polly/trunk/lib/CodeGen/IslExprBuilder.cpp

polly/trunk/test/Isl/CodeGen/pointer_rem.ll

Fix modulo compared to zero
ClosedPublic