This is an archive of the discontinued LLVM Phabricator instance.

Using MPI for Profiling Data Reduction
Needs ReviewPublic

Authored by grotdunst on Mar 9 2016, 2:44 PM.

Details

Summary

Distributed-memory systems, from small clusters to supercomputers, use MPI (http://www.mpi-forum.org/) to manage inter-process communication. On large systems, a single job might consist of millions of simultaneously-running processes. Even though millions of processes are rare, tens of thousands are now common. Profiling such applications using our current infrastructure is difficult because it requires generating, and then merging, many thousands of files. On the distributed file systems often used on these machines, creating so many files often leads to performance problems (in addition to being difficult to manage). This problem can be eliminated by teaching compiler-rt how to aggregate the profiling counters so that only one process writes out the combined profiling data for all processes. This patch adds this functionality whenever compiler-rt is compiled with an available mpi.h header.

Any well-formed MPI program calls MPI_Finalize to shut down its communication resources, making them unavailable by the time atexit handlers are called. By design, all MPI functions used by MPI applications are provided as weak symbols. The strong symbols are provided using PMPI_* names. Thus, MPI_Finalize, by default, is a weak function calling PMPI_Finalize. This allows tools to easily hook the MPI functions used by applications while easily retaining the ability to access the underlying implementation. This is utilized within the new file InstrProfilingReduce.c, where each processor’s counters are aggregated onto just one processor before terminating the parallel environment.

The order in which libraries are included on Clang’s command line was leveraged to ensure compatibility across all applications and computing environments. Programs that include MPI functions will be linked with the actual MPI library, since compiler-rt libraries are linked last. However, ones that do not will simply link using the stub function implementations in InstrProfilingStub.c.

To accomplish the goal of limiting the file-writing to one process (for applications that properly call MPI_Finalize), we needed to change the way that the truncation of existing files works. Currently, we use fopen to force the truncation of any pre-existing profiling output at application startup, and then later open the file to actually write the collected profiling data. When using MPI, we only want the process identified as MPI rank 0 to create any profiling outputs. This cannot be determined at startup, and so we need to delay the initial creation of the profiling outputs until after MPI_Finalize is called. To do this, we change the current behavior (at least on POSIX systems) so that existing profiling outputs are truncated only if they already exist (i.e. the file is opened without O_CREAT). If an output does not exist, we don't create an empty file, but rather, create the file later when it is opened for writing. This seems like a generally-beneficial change for all systems.

Lastly, the testing file instrprof-reduce.c was created to test that only processes assigned to rank 0 produce a profiling file when MPI_Finalize is called. Tests were also done to ensure that profiling counts increased proportionally with the number of processes involved in a program.

Diff Detail

Event Timeline

grotdunst updated this revision to Diff 50199.Mar 9 2016, 2:44 PM
grotdunst retitled this revision from to Using MPI for Profiling Data Reduction.
grotdunst updated this object.
grotdunst added a subscriber: llvm-commits.
davidxl edited edge metadata.Mar 9 2016, 3:12 PM
davidxl added a subscriber: davidxl.

Nice! This is the complement of the in-process profile merging (for single
node or distributed build with network file system) I plan to add. I will
get to the actual review shortly.

thanks,

David

davidxl added inline comments.Mar 9 2016, 3:55 PM
lib/profile/InstrProfilingFile.c
29

is fcntl.h needed here?

29

This seems like a good variable to have unconditionally. Put the declaration in InstrProfiling.h file.

96–97

The behavior divergence between windows and other platforms is not desirable. How about using

_open(Filename, _O_WRONLY | _O_TRUNC)
...

107

How about just calling truncate(..) method?

lib/profile/InstrProfilingReduce.c
15 ↗(On Diff #50199)

Define this variable unconditionally in InstrProfiling.c file.

24 ↗(On Diff #50199)

Notice that value profile data is not 'reduced' here. It can be non-trivial to do so, but please add a comment about the limitation.

30 ↗(On Diff #50199)

Should return be used for Rank !=0 cases too?

lib/profile/InstrProfilingStub.c
12 ↗(On Diff #50199)

guard these with #ifdef HAVE_MPI_H?

14 ↗(On Diff #50199)

Is the MPI library defining thse functions static or a DSO? If the former, then perhaps declare those stubs weak so that we don't rely on the link order of the runtime library

test/profile/instrprof-reduce.c
17

when HAVE_MPI_H is not defined, will this test case link? MPI_Finalize is not defined anywhere.

silvas added a subscriber: silvas.Mar 9 2016, 4:35 PM
silvas added inline comments.
lib/profile/InstrProfilingFile.c
84

Let's split this change into a separate patch for a focused discussion on the requirements and possible solutions.

lib/profile/InstrProfilingReduce.c
1 ↗(On Diff #50199)

Let's call this InstrProfilingMPI.c consistent with the comment.

lib/profile/InstrProfilingStub.c
14 ↗(On Diff #50199)

Why do we need these stubs?

hfinkel added inline comments.Mar 9 2016, 5:07 PM
lib/profile/InstrProfilingReduce.c
30 ↗(On Diff #50199)

It is used in both cases, however this is not visually obvious because the else statement has no braces.

Please add braces around the else body (for consistency with the if body).

lib/profile/InstrProfilingStub.c
14 ↗(On Diff #50199)

I'm not sure defining these as weak solves the dependency-on-link-order problem because, except for PMPI_Finalize, they're weak in the actual MPI library too.

Also, to be more specific, we're depending on the fact that the linker won't use any symbols from this object file because nothing in this object file is required (when there is an MPI implementation that also provides these symbols). This might be a link order effect, or it could just be the fact that the MPI library is required to resolve other symbols, I'm not sure which.

davidxl added inline comments.Mar 9 2016, 5:20 PM
lib/profile/InstrProfilingReduce.c
30 ↗(On Diff #50199)

Right -- my eyes certainly fooled me in this case.

grotdunst marked 10 inline comments as done.Mar 10 2016, 12:26 PM
grotdunst added inline comments.
lib/profile/InstrProfilingFile.c
29

It seems to be needed for file truncation in Windows. Therefore, this has been kept, but inside the _MSC_VER guard.

84

I decided to not separate the patches because file truncation and profiling data reduction are still logically connected. The __llvm_write_prof_data variable is used within this file, but its initialization needs to be kept in the new file that hooks to MPI_Finalize so that the new implementation and function hooks will be used by the linker.

lib/profile/InstrProfilingReduce.c
15 ↗(On Diff #50199)

This change was not made, because otherwise the linker will not be forced to use this file, thereby bringing in the expanded implementation of MPI_Finalize and the new stub functions.

test/profile/instrprof-reduce.c
17

A stub implementation of MPI_Finalize has now been included that will be used when HAVE_MPI_H is not defined.

grotdunst updated this revision to Diff 50336.Mar 10 2016, 12:49 PM
grotdunst edited edge metadata.
grotdunst marked an inline comment as done.

Addressed comments from yesterday's patch.

grotdunst updated this revision to Diff 50360.Mar 10 2016, 2:37 PM

Errors were found when putting stubs in same file as hook to MPI_Finalize. Putting them in their own file removed the errors.

davidxl added inline comments.Mar 10 2016, 3:39 PM
lib/profile/InstrProfiling.h
16

Make the name conform to the 'name space' convention:

--> __llvm_profile_write_data

lib/profile/InstrProfilingMPI.c
43

Why not put the stub def in the test case itself?

#ifndef HAVE_MPI_H
...
#endif

lib/profile/InstrProfilingStubs.c
14

what are the issues of merging this into InstrProfilingMPI.c?

test/profile/instrprof-reduce.c
16

With the stub definition for MPI_Finalize when HAVE_MPI_H is not defined, this test case will link, but the run will still fail -- so it will won't work.

I suggest adding the stub definition of MPI_Finalize like this:

... MPI_Finalize() {

 if (ThisRank != 0) {
  __llvm_write_prof_data = 0;
 }
return 0;

}

Any update on this patch?

No not in the past two weeks. I've had graduation and a job interview in
that time but I should be able to refocus on this today.

dexonsmith resigned from this revision.Oct 16 2020, 7:38 PM