This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
@ggeorgakoudis Clang::parallel_for_codegen.cpp is failing on some of our bots:
https://lab.llvm.org/buildbot/#/builders/43/builds/5751
https://lab.llvm.org/buildbot/#/builders/107/builds/7585
The failing test line is:
// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -O1 -fopenmp-simd -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK10
It passes if you have the X86 backend enabled but fails if you do not. Our bots usually have only Arm or AArch64 enabled.
When you have the X86 backend:
; Function Attrs: nofree norecurse nosync nounwind mustprogress define void @_Z14static_chunkedPfS_S_S_(float* nocapture %a, float* nocapture readonly %b, float* nocapture readonly %c, float* nocapture readonly %d) local_unnamed_addr #1 { entry: %arrayidx.0 = getelementptr inbounds float, float* %b, i64 131071 %arrayidx2.0 = getelementptr inbounds float, float* %c, i64 131071 %arrayidx4.0 = getelementptr inbounds float, float* %d, i64 131071 %arrayidx7.0 = getelementptr inbounds float, float* %a, i64 131071 %indvars.iv.next.0 = add nuw nsw i64 131071, 127 br label %for.body for.cond.cleanup: ; preds = %for.body ret void for.body: ; preds = %for.body.for.body_crit_edge, %entry %indvars.iv.next.phi = phi i64 [ %indvars.iv.next.0, %entry ], [ %indvars.iv.next.1, %for.body.for.body_crit_edge ] %arrayidx7.phi = phi float* [ %arrayidx7.0, %entry ], [ %arrayidx7.1, %for.body.for.body_crit_edge ] %arrayidx4.phi = phi float* [ %arrayidx4.0, %entry ], [ %arrayidx4.1, %for.body.for.body_crit_edge ] %arrayidx2.phi = phi float* [ %arrayidx2.0, %entry ], [ %arrayidx2.1, %for.body.for.body_crit_edge ] %arrayidx.phi = phi float* [ %arrayidx.0, %entry ], [ %arrayidx.1, %for.body.for.body_crit_edge ] %0 = load float, float* %arrayidx.phi, align 4, !tbaa !2 %1 = load float, float* %arrayidx2.phi, align 4, !tbaa !2 %mul = fmul float %0, %1 %2 = load float, float* %arrayidx4.phi, align 4, !tbaa !2 %mul5 = fmul float %mul, %2 store float %mul5, float* %arrayidx7.phi, align 4, !tbaa !2 %3 = trunc i64 %indvars.iv.next.phi to i32 %cmp = icmp sgt i32 %3, -1 br i1 %cmp, label %for.body.for.body_crit_edge, label %for.cond.cleanup, !llvm.loop !10 for.body.for.body_crit_edge: ; preds = %for.body %arrayidx.1 = getelementptr inbounds float, float* %b, i64 %indvars.iv.next.phi %arrayidx2.1 = getelementptr inbounds float, float* %c, i64 %indvars.iv.next.phi %arrayidx4.1 = getelementptr inbounds float, float* %d, i64 %indvars.iv.next.phi %arrayidx7.1 = getelementptr inbounds float, float* %a, i64 %indvars.iv.next.phi %indvars.iv.next.1 = add nuw nsw i64 %indvars.iv.next.phi, 127 br label %for.body }
When you don't:
; Function Attrs: nofree norecurse nosync nounwind mustprogress define void @_Z14static_chunkedPfS_S_S_(float* nocapture %a, float* nocapture readonly %b, float* nocapture readonly %c, float* nocapture readonly %d) local_unnamed_addr #1 { entry: br label %for.body for.cond.cleanup: ; preds = %for.body ret void for.body: ; preds = %entry, %for.body %indvars.iv = phi i64 [ 131071, %entry ], [ %indvars.iv.next, %for.body ] %arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv %0 = load float, float* %arrayidx, align 4, !tbaa !2 %arrayidx2 = getelementptr inbounds float, float* %c, i64 %indvars.iv %1 = load float, float* %arrayidx2, align 4, !tbaa !2 %mul = fmul float %0, %1 %arrayidx4 = getelementptr inbounds float, float* %d, i64 %indvars.iv %2 = load float, float* %arrayidx4, align 4, !tbaa !2 %mul5 = fmul float %mul, %2 %arrayidx7 = getelementptr inbounds float, float* %a, i64 %indvars.iv store float %mul5, float* %arrayidx7, align 4, !tbaa !2 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 127 %3 = trunc i64 %indvars.iv.next to i32 %cmp = icmp sgt i32 %3, -1 br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !10 }
This is one of a few differences.
Either way this seems like a bug given that clang usually doesn't require the llvm backend for a particular target. (which is why things like the target parser are always enabled) But I'm new to OpenMP so please correct me if not.
We can have that as a workaround sure.
Either way this seems like a bug given that clang usually doesn't require the llvm backend for a particular target.
I agree, something is amiss.
We will probably remove the fopenmp-simd check lines again now, but this is still something we might want to investigate, non-determinism has the tendency to come back and bite you.
Thanks David for the workaround. I agree with Johannes's comments below. It looks like a bug.
// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -O1 -fopenmp-simd -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK10
Is there a good reason to run this with -O1? Doing so makes it super sensitive to which llvm passes run and now this test fails for us now with x86 as well.
One of the O1 was introduced by 6e8248fdad5fc59306beb286a3089fe401460826 and I cannot really tell why we would need it.
@ABataev do you remember if the O1 was needed?
If not I'd suggest to remove the O1 run line or add -disable-llvm-optzns to the run line.
If it is needed, we can have a simple parallel for test with manual check lines.
Thanks. I'd like to remove the "REQUIRES: x86-registered-target", the -O1 for CHECK6,10, and regenerate the CHECK lines. Unfortunately I am seeing this mangling issue (https://bugs.llvm.org/show_bug.cgi?id=49767) when running the script. @ggeorgakoudis, how did you get past this to generate the CHECK lines in your change?
The workaround is to check if the assertion would trigger and just return from the function instead.