This is an archive of the discontinued LLVM Phabricator instance.

[flang] Add initial implementation for CPU_TIME
ClosedPublic

Authored by rovka on Jun 10 2021, 3:48 AM.

Details

Summary

Add an implementation for CPU_TIME based on std::clock(), which should
be available on all the platforms that we support.

Also add a test that's basically just a sanity check to make sure we
return positive values and that the value returned at the start of some
amount of work is larger than the one returned after the end.

Diff Detail

Event Timeline

rovka created this revision.Jun 10 2021, 3:48 AM
rovka requested review of this revision.Jun 10 2021, 3:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 3:48 AM
rovka updated this revision to Diff 351358.Jun 11 2021, 1:12 AM

Added braces around if. NFCI.

jeanPerier accepted this revision.Jun 11 2021, 1:28 AM

LGTM, thanks !

flang/unittests/RuntimeGTest/Time.cpp
20

nit: other runtime test files also uses brace around single line for. Only lib/[Optmizer|Lower], and include/flang/[Optmizer|Lower] files are using llvm convention there.

This revision is now accepted and ready to land.Jun 11 2021, 1:28 AM

Please ensure that this code does not make the Fortran runtime library binary dependent on a C++ runtime library. We're trying to restrict dependences to C to make things easier for vendors and application writers to ship binaries without having to also ship one or more C++ environments alongside them.

rovka added inline comments.Jun 14 2021, 12:41 AM
flang/unittests/RuntimeGTest/Time.cpp
20

Derp, I missed that one...

Would be nice to add to the clang-tidy config, I'm trying to put together a patch for that.

This revision was automatically updated to reflect the committed changes.
rovka added a comment.EditedJun 14 2021, 12:59 AM

Please ensure that this code does not make the Fortran runtime library binary dependent on a C++ runtime library. We're trying to restrict dependences to C to make things easier for vendors and application writers to ship binaries without having to also ship one or more C++ environments alongside them.

Oops, sorry, I didn't see this comment before committing. FWIW that's why I've been using functionality from <ctime> and not std::chrono, but I'll double check and revert if we're inadvertently pulling in the C++ runtime.

rovka added a comment.Jun 14 2021, 1:11 AM

Tested with this:

> cat /tmp/cputime.c                                            
double _FortranACpuTime();

int main() {
  return _FortranACpuTime();
}
> clang /tmp/cputime.c libFortranRuntime.a -o /tmp/cputime
> gcc /tmp/cputime.c libFortranRuntime.a -o /tmp/cputime

Both seem to handle it just fine :)

Could you please address the build failure mentioned in https://reviews.llvm.org/rG57e85622bbdb2eb18cc03df2ea457019c58f6912#inline-6002:

This fails with msvc which apparently optimized the loop away:

[ RUN      ] TimeIntrinsics.CpuTime
C:\Users\buildbot-worker\minipc-ryzen-win\flang-x86_64-windows\llvm-project\flang\unittests\RuntimeGTest\Time.cpp(34): error: Expected: (end) > (start), actual: 2.000000e-03 vs 2.000000e-03
[  FAILED  ] TimeIntrinsics.CpuTime (1 ms)

I think assuming how long it takes to complete a volatile write is fragile.

Instead I suggest to test CpuTime by repeatedly calling it until it changes.

Could you please address the build failure mentioned in https://reviews.llvm.org/rG57e85622bbdb2eb18cc03df2ea457019c58f6912#inline-6002:

I am so sorry, I completely missed that thread! :(

This fails with msvc which apparently optimized the loop away:

[ RUN      ] TimeIntrinsics.CpuTime
C:\Users\buildbot-worker\minipc-ryzen-win\flang-x86_64-windows\llvm-project\flang\unittests\RuntimeGTest\Time.cpp(34): error: Expected: (end) > (start), actual: 2.000000e-03 vs 2.000000e-03
[  FAILED  ] TimeIntrinsics.CpuTime (1 ms)

I think assuming how long it takes to complete a volatile write is fragile.

Instead I suggest to test CpuTime by repeatedly calling it until it changes.

Hmm, that's probably better - if it never changes then we'll timeout eventually, right? I'll try to commit that today.

rovka added a comment.Jun 18 2021, 1:15 AM

This fails with msvc which apparently optimized the loop away:

[ RUN      ] TimeIntrinsics.CpuTime
C:\Users\buildbot-worker\minipc-ryzen-win\flang-x86_64-windows\llvm-project\flang\unittests\RuntimeGTest\Time.cpp(34): error: Expected: (end) > (start), actual: 2.000000e-03 vs 2.000000e-03
[  FAILED  ] TimeIntrinsics.CpuTime (1 ms)

I think assuming how long it takes to complete a volatile write is fragile.

Instead I suggest to test CpuTime by repeatedly calling it until it changes.

Hmm, that's probably better - if it never changes then we'll timeout eventually, right? I'll try to commit that today.

But just for the record, are you sure that the loop is optimized away? Have you checked the assembly? I'm thinking there might be an issue with the resolution of the timer used on Windows - we might need a better implementation for it than the default (just like I added for POSIX systems).

It would be nice if our test could flag when the resolution of the timer isn't good enough on a given implementation, for some definition of "good enough". Naturally, one can expect that the definition of "good enough" will change in the future, and our test can change with it. Ideally, the test would be representative of the amount of work that we expect to be able to time. OTOH I don't know if this test is the place for that, or if we should just add some microbenchmarks to the test-suite.

Hmm, that's probably better - if it never changes then we'll timeout eventually, right? I'll try to commit that today.

Correct. Change looks good, does not fail anymore.

Assuming that a look takes a minimum time to execute broke some famous program in the past. For instance, Windows 3.11, some old games, and every program compiled with Turbo Pascal.

But just for the record, are you sure that the loop is optimized away? Have you checked the assembly? I'm thinking there might be an issue with the resolution of the timer used on Windows - we might need a better implementation for it than the default (just like I added for POSIX systems).

It was just an assumption, I haven't checked the disassembly. Embedded (where volatile is important) is not really a domain for msvc so ignoring volatile looked like a reasonable explanation.

It would be nice if our test could flag when the resolution of the timer isn't good enough on a given implementation, for some definition of "good enough". Naturally, one can expect that the definition of "good enough" will change in the future, and our test can change with it. Ideally, the test would be representative of the amount of work that we expect to be able to time. OTOH I don't know if this test is the place for that, or if we should just add some microbenchmarks to the test-suite.

Here is the /Ox disassembly of LookBusy by msvc:

00007FF79D37F770  xor         eax,eax  
00007FF79D37F772  nop         dword ptr [rax]  
00007FF79D37F776  nop         word ptr [rax+rax]  
    x = i;
00007FF79D37F780  mov         dword ptr [x (07FF79DB49F50h)],eax  
00007FF79D37F786  inc         eax  
00007FF79D37F788  cmp         eax,100h  
00007FF79D37F78D  jl          LookBusy+10h (07FF79D37F780h)

That is, msvc actually did honor the volatile. The test actually also fails when compiling with gcc and running with WSL1 (Windows Subsystem for Linux), i.e. your clock resolution theory seems correct. Which makes the test platform-dependent.

IMHO a MicroBenchmark would be a better place (e.g. llvm-test-suite already has MicroBenchmarks, @naromero77 currently works on adding Fortran tests) than a regression test error. A too granular timer resolution could also mitigated by running the microbenchmark multiple times, as GoogleBenchmark does. If precision is important, std::chrono::high_resolution_clock would be the API to use, which maps to QueryPerformanceCounter on Windows.

rovka added a comment.Jun 21 2021, 1:25 AM

Hmm, that's probably better - if it never changes then we'll timeout eventually, right? I'll try to commit that today.

Correct. Change looks good, does not fail anymore.

Assuming that a look takes a minimum time to execute broke some famous program in the past. For instance, Windows 3.11, some old games, and every program compiled with Turbo Pascal.

But just for the record, are you sure that the loop is optimized away? Have you checked the assembly? I'm thinking there might be an issue with the resolution of the timer used on Windows - we might need a better implementation for it than the default (just like I added for POSIX systems).

It was just an assumption, I haven't checked the disassembly. Embedded (where volatile is important) is not really a domain for msvc so ignoring volatile looked like a reasonable explanation.

It would be nice if our test could flag when the resolution of the timer isn't good enough on a given implementation, for some definition of "good enough". Naturally, one can expect that the definition of "good enough" will change in the future, and our test can change with it. Ideally, the test would be representative of the amount of work that we expect to be able to time. OTOH I don't know if this test is the place for that, or if we should just add some microbenchmarks to the test-suite.

Here is the /Ox disassembly of LookBusy by msvc:

00007FF79D37F770  xor         eax,eax  
00007FF79D37F772  nop         dword ptr [rax]  
00007FF79D37F776  nop         word ptr [rax+rax]  
    x = i;
00007FF79D37F780  mov         dword ptr [x (07FF79DB49F50h)],eax  
00007FF79D37F786  inc         eax  
00007FF79D37F788  cmp         eax,100h  
00007FF79D37F78D  jl          LookBusy+10h (07FF79D37F780h)

That is, msvc actually did honor the volatile. The test actually also fails when compiling with gcc and running with WSL1 (Windows Subsystem for Linux), i.e. your clock resolution theory seems correct. Which makes the test platform-dependent.

IMHO a MicroBenchmark would be a better place (e.g. llvm-test-suite already has MicroBenchmarks, @naromero77 currently works on adding Fortran tests) than a regression test error. A too granular timer resolution could also mitigated by running the microbenchmark multiple times, as GoogleBenchmark does. If precision is important, std::chrono::high_resolution_clock would be the API to use, which maps to QueryPerformanceCounter on Windows.

Unfortunately we can't use std::chrono for the implementation, since that would pull in the C++ runtime libraries. As far as C++ usage goes, we're limited to '#include <cstuff>'.

@rovka , I've been seeing intermittent failures running check-flang. I created an issue that has more information -- https://github.com/flang-compiler/f18-llvm-project/issues/930

rovka added a comment.Jul 27 2021, 3:05 AM

@rovka , I've been seeing intermittent failures running check-flang. I created an issue that has more information -- https://github.com/flang-compiler/f18-llvm-project/issues/930

Thanks, I'll have a look! (Sorry about the delay, I was out of office)