Page MenuHomePhabricator

[test-suite] Backprop kernel from Rodinia Benchmark

Authored by proton on Jun 11 2018, 12:57 PM.



Added Backprop kernel from Rodinia benchmark.
The kernel is modified such that Polly recognizes SCoPs in it.

With Polly: 0.40 sec
Without Polly: 0.81 sec

Diff Detail


Event Timeline

proton created this revision.Jun 11 2018, 12:57 PM

How long does the benchmark run with/without Polly?

Polly only optimizes part of the kernel. The loops containing exp calls are not. The swtich -ffast-math might help.

proton updated this revision to Diff 152379.Jun 21 2018, 2:34 PM

Added -ffast-math flag... Polly recognizes entire kernel but still no performance improvement with Polly.

Meinersbur set the repository for this revision to rOLDT svn-test-suite.
MatzeB requested changes to this revision.Jul 13 2018, 12:27 PM
  • Patches should be relative to the toplevel test-suite directory.
  • 94k of reference output seems a lot and smells like you may end up in the anti-pattern of spending most of the benchmark time in printing instead of doing calculations.
  • We currently tend to not add -ffast-math as part of the makefiles, but instead people choose it together with their other optimization flags.
  • How long does this benchmark run? (We aim for 0.5-1s on typical PC hardware); Bonus points if the duration can be changed from the commandline (no expections for matching reference outputs for different input size though).
  • srand()/rand() produce different output with different libc implementations!
This revision now requires changes to proceed.Jul 13 2018, 12:27 PM
MatzeB added inline comments.Jul 13 2018, 12:32 PM
13 ↗(On Diff #152379)

This ABS implementation doesn't work correctly for negative zeros, use fabs() which is likely to be faster anyway.

17–28 ↗(On Diff #152379)

Why not convert this to proper C99 so you can use the standardized restrict instead of the __restrict__ compiler extension. That would also allow you to declare variables on assignment instead of the noise early declarations.

153–158 ↗(On Diff #152379)

Do you need to cast the malloc results to restrict? I would expect the compiler to figure that out on it's own.

proton updated this revision to Diff 156504.Jul 20 2018, 9:01 AM

It runs for 1.1 sec for normal problem size and 0.6 sec for smaller
The main kernel function takes around 0.7-0.8 sec of 1.1 sec.
Polly detects scops here but runtime is still same


  • Removed -ffast-math flag by default
  • Clearer array declaration
  • Code Cleanup
  • differential wrt latest repository
  • Checksum of output compare instead of entire output compare
  • removed use of restrict keyword
  • added rand implemetation (glibc_compat_rand.c)
  • Added Rodinia License
proton updated this revision to Diff 156674.Jul 21 2018, 7:00 AM

The kernel runs for 0.79 sec and the complete program runs for 0.81 sec.

MatzeB accepted this revision.Aug 2 2018, 4:46 PM

test-suite integration LGTM

1–20 ↗(On Diff #156674)

this looks odd, copy&paste with smart indentation enabled?

2–8 ↗(On Diff #156674)

this looks odd, copy&paste with smart indentation enabled?

This revision is now accepted and ready to land.Aug 2 2018, 4:46 PM
proton updated this revision to Diff 158977.Aug 3 2018, 4:42 AM


  • Use glibc_compat_rand.c from Common folder of Rodinia
  • Use float * instead of the pointer to VLA in main.c
  • Changed CMakeLists.txt
Meinersbur accepted this revision.Aug 6 2018, 12:23 PM


16–18 ↗(On Diff #158977)

You could add static const restrict to those as well.

65 ↗(On Diff #158977)

[Nit] parenthesis not necessary around array name

proton updated this revision to Diff 159377.Aug 6 2018, 1:32 PM
proton edited the summary of this revision. (Show Details)

made changes requested by @Meinersbur

This revision was automatically updated to reflect the committed changes.