Changeset View
Changeset View
Standalone View
Standalone View
MultiSource/Benchmarks/DOE-ProxyApps-C++/HACCKernels/README
- This file was added.
CCKernels: A Benchmark for HACC's Particle Force Kernels | |||||
The Hardware/Hybrid Accelerated Cosmology Code (HACC), a cosmology N-body-code | |||||
framework, is designed to run efficiently on diverse computing architectures | |||||
and to scale to millions of cores and beyond. The gravitational force is the | |||||
only significant force between particles at cosmological scales, and, in HACC, | |||||
this force is divided into two components: a long-range component and a | |||||
short-range component. The long-range component is handled using a distributed | |||||
grid-based solver, and the short-range component is by more-direct | |||||
particle-particle computations. On many systems, a tree-based multipole | |||||
approximation is used to further reduce the computational complexity of the | |||||
short-range force. The inner-most computation is a direct N^2 particle-particle | |||||
force calculation of the short-range part of the gravitational force. It is this | |||||
inner-most calculation that consumes most of the simulation time, is | |||||
computationally bound, and is what is represented by this benchmark. | |||||
Because this inner-most force calculation is algorithmically isolated from the | |||||
overall scale of the problem, the parameters don't need to be adjusted to | |||||
represent the workload on different machine scales (e.g. petascale or | |||||
exascale). | |||||
For more information on HACC, see: | |||||
Salman Habib, et al. HACC: Simulating Sky Surveys on State-of-the-Art | |||||
Supercomputing Architectures. New Astronomy Volume 42, January 2016, pp. 49-65. | |||||
http://doi.org/10.1016/j.newast.2015.06.003 | |||||
https://arxiv.org/abs/1410.2805 | |||||
The benchmark can be compiled using cmake (or make directly using | |||||
Makefile.simple) and then run like this: | |||||
$ ./HACCKernels | |||||
Maximum OpenMP Threads: 1 | |||||
Iterations: 2000 | |||||
Gravity Short-Range-Force Kernel (4th Order): 26307.2 -122.385 -1369.32: 4.45269 s | |||||
Gravity Short-Range-Force Kernel (5th Order): 26297.5 -123.056 -1368.67: 4.51347 s | |||||
Gravity Short-Range-Force Kernel (6th Order): 26297.6 -123.225 -1368.66: 4.8256 s | |||||
The accumulated acceleration in each direction for all particles in the last | |||||
iteration, which is a function of the total number of iterations, is printed as | |||||
a diagnostic. It should be similar for all polynomial kernel orders. | |||||
If you'd like the benchmark only to display deterministic output (i.e. | |||||
omitting information on the number of threads, timing, and the like), then | |||||
define the preprocessor symbol VERIFICATION_OUTPUT_ONLY when compiling. | |||||
You can enable this option when configuring by passing | |||||
-DVERIFICATION_OUTPUT_ONLY=ON to cmake. | |||||
Compared to the older HACCmk procurement benchmark | |||||
(https://asc.llnl.gov/CORAL-benchmarks/#haccmk), this benchmark: | |||||
* More closely matches the parallelization scheme used by the production code. | |||||
* Uses a more-realistic distribution of interaction-list lengths and | |||||
out-of-bounds particles. | |||||
* Includes 4th-, 5th-, and 6th-order kernels. | |||||
For more information, contact: Hal Finkel <hfinkel@anl.gov> | |||||