This is an archive of the discontinued LLVM Phabricator instance.

Solve 'Too many args to microtask' problem
ClosedPublic

Authored by pawosm01 on May 3 2016, 11:15 AM.

Details

Summary

This patch solves 'Too many args to microtask' problem which occurs
while executing lulesh2.0.3 benchmark on AArch64.

To solve this I had to wrtite AArch64 assembly version of
__kmp_invoke_microtask() function, similar to x86 and x86_64
implementations.

Diff Detail

Repository
rL LLVM

Event Timeline

pawosm01 updated this revision to Diff 56032.May 3 2016, 11:15 AM
pawosm01 retitled this revision from to Solve 'Too many args to microtask' problem.
pawosm01 updated this object.
pawosm01 set the repository for this revision to rL LLVM.
pawosm01 added a subscriber: openmp-commits.
jcownie added a subscriber: jcownie.May 4 2016, 3:55 AM

On the microtask stuff: I have no objection to this, but I'm very surprised that it is needed, since I was under the impression that Clang/OpenMP only ever emits an outlined function for the parallel region that takes a single pointer argument, and then generates code that uses offsets from that to find all the actual arguments.

It also seems very odd that most code is OK, but LULESH fails. (I klnow some SpecOMP codes have a lot of references to shared variables...)

Have you tried test cases which simply pass large numbers of arguments?
(So, my concern here is that you may be fixing the wrong bug!)

pawosm01 updated this revision to Diff 56508.May 7 2016, 12:22 PM

This needs to be round up

On the microtask stuff: I have no objection to this, but I'm very surprised that it is needed, since I was under the impression that Clang/OpenMP only ever emits an outlined function for the parallel region that takes a single pointer argument, and then generates code that uses offsets from that to find all the actual arguments.

It also seems very odd that most code is OK, but LULESH fails. (I klnow some SpecOMP codes have a lot of references to shared variables...)

Have you tried test cases which simply pass large numbers of arguments?
(So, my concern here is that you may be fixing the wrong bug!)

Hi James,

This LULESH benchmark doesn't do anything unusual. It doesn't do any explicit tasking, only implicit tasks due to plain old 'parallel for' loops with 'firstprivate' clause.
I added raise(11); in place where "Too many args to microtask" is printed and stared LULESH (built with -g) along with 'ulimit -c unlimited' and 'OMP_NUM_THREADS=1'. As it crashed on raised signal, I could analyse dumped core file in gdb:
(gdb) bt
#0 0x000003ff8f981450 in raise () from /lib64/libpthread.so.0
#1 0x000003ff8fa298d8 in ..kmp_invoke_microtask () from /home/pawosm01/llvm/lib/libomp.so
#2 0x000003ff8fa097f4 in ..kmp_fork_call () from /home/pawosm01/llvm/lib/libomp.so
#3 0x000003ff8fa00048 in ..kmpc_fork_call () from /home/pawosm01/llvm/lib/libomp.so
#4 0x00000000004036a0 in CalcEnergyForElems (p_new=0x3ef85120, e_new=<optimized out>, q_new=<optimized out>, bvc=<optimized out>, pbvc=<optimized out>, p_old=<optimized out>, e_old=<optimized out>, q_old=<optimized out>, compHalfStep=<optimized out>, vnewc=0x3ed53de0, work=<optimized out>, delvc=<optimized out>,

e_cut=<optimized out>, q_cut=<optimized out>, emin=<optimized out>, qq_old=<optimized out>, ql_old=<optimized out>, rho0=<optimized out>, length=1056463064, regElemList=<optimized out>, compression=<optimized out>, pmin=<optimized out>, p_cut=<optimized out>, eosvmax=<optimized out>) at lulesh.cc:2145

#5 EvalEOSForElems (domain=..., vnewc=<optimized out>, numElemReg=<optimized out>, regElemList=<optimized out>, rep=<optimized out>) at lulesh.cc:2318
#6 ApplyMaterialPropertiesForElems (domain=..., vnew=<optimized out>) at lulesh.cc:2424
#7 LagrangeElements (domain=..., numElem=<optimized out>) at lulesh.cc:2463
#8 LagrangeLeapFrog (domain=...) at lulesh.cc:2656
#9 main (argc=<optimized out>, argv=<optimized out>) at lulesh.cc:2774

lulesh.cc line 2145 is a start of parallel region:

#pragma omp parallel for firstprivate(length, rho0, emin, e_cut)

Inside of it, 13 arrays from the outside of this region are accessed plus we got 4 firstprivate variables in the clause, also defined outside the region. This gives 17 variables and this is reflected by error message:

Running problem size 30^3 per domain until completion
Num processors: 1
Num threads: 6
Total number of elements: 27000

To run other sizes, use -s <integer>.
To run a fixed number of iterations, use -i <integer>.
To run a more or less balanced region set, use -b <integer>.
To change the relative costs of regions, use -c <integer>.
To print out progress, use -p
To write an output file for VisIt, use -v
See help (-h) for more options

Too many args to microtask: 17!
Too many args to microtask: 17!
Too many args to microtask: 17!
Too many args to microtask: 17!

I removed use of arbitrarily selected (the least used) two of them in the loop body (as the switch in ..kmp_invoke_kicrotask() handles up to 15 params) and - as expected - this error message stopped to appear.

Simple test case:

$ cat case16.c
#include <stdio.h>

int main()
{

int i1 = 0;
int i2 = 1;
int i3 = 2;
int i4 = 3;
int i5 = 4;
int i6 = 6;
int i7 = 7;
int i8 = 8;
int i9 = 9;
int i10 = 10;
int i11 = 11;
int i12 = 12;
int i13 = 13;
int i14 = 14;
int i15 = 15;
int i16 = 16;

#pragma omp parallel for firstprivate(i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15, i16)
for (int i = 0; i < i16; i++) {
  printf("%d\n", i + i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + i10 + i11 + i12 + i13 + i14 + i15 + i16);
}

return 0;

}

$ $HOME/llvm/bin/clang -Wall -O3 -fopenmp -std=c11 -o case16 case16.c -Wl,-rpath=$HOME/llvm/lib
$ OMP_NUM_THREADS=1 ./case16
Too many args to microtask: 16!
$

AndreyChurbanov accepted this revision.May 12 2016, 11:43 AM
AndreyChurbanov edited edge metadata.

LGTM.

Still unclear though how this problem corresponds with the comment just above the error print "Too many args to microtask" in z_Linux_util.c line 2579:

#if !(KMP_ARCH_X86 || KMP_ARCH_X86_64 || KMP_MIC)
we really only need the case with 1 argument, because CLANG always build
a struct of pointers to shared variables referenced in the outlined function

Probably something is wrong with compiler code generation for AArch64...

This revision is now accepted and ready to land.May 12 2016, 11:43 AM

Alexey,

Could you please comment on this? Is the comment inaccurate?

-Hal

  • Original Message -----

From: "Andrey Churbanov via Openmp-commits" <openmp-commits@lists.llvm.org>
To: "pawel osmialowski" <pawel.osmialowski@arm.com>, "jonathan l peyton" <jonathan.l.peyton@intel.com>, "andrey
churbanov" <andrey.churbanov@intel.com>
Cc: openmp-commits@lists.llvm.org, "amara emerson" <amara.emerson@arm.com>
Sent: Thursday, May 12, 2016 1:43:41 PM
Subject: Re: [Openmp-commits] [PATCH] D19879: Solve 'Too many args to microtask' problem

AndreyChurbanov accepted this revision.
AndreyChurbanov added a comment.
This revision is now accepted and ready to land.

LGTM.

Still unclear though how this problem corresponds with the comment
just above the error print "Too many args to microtask" in
z_Linux_util.c line 2579:

#if !(KMP_ARCH_X86 || KMP_ARCH_X86_64 || KMP_MIC)
we really only need the case with 1 argument, because CLANG always
build
a struct of pointers to shared variables referenced in the
outlined function

Probably something is wrong with compiler code generation for
AArch64...

Repository:

rL LLVM

http://reviews.llvm.org/D19879


Openmp-commits mailing list
Openmp-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-commits

This revision was automatically updated to reflect the committed changes.