This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
flang/
-
runtime/
-
CMakeLists.txt
-
time-intrinsic.cpp
-
unittests/RuntimeGTest/
-
RuntimeGTest/
-
CMakeLists.txt
1/2
Time.cpp

Differential D104019

[flang] Add initial implementation for CPU_TIME
ClosedPublic

Authored by rovka on Jun 10 2021, 3:48 AM.

Download Raw Diff

Details

Reviewers

klausler
jeanPerier
sscalpone

Commits

rG57e85622bbdb: [flang] Add initial implementation for CPU_TIME

Summary

Add an implementation for CPU_TIME based on std::clock(), which should
be available on all the platforms that we support.

Also add a test that's basically just a sanity check to make sure we
return positive values and that the value returned at the start of some
amount of work is larger than the one returned after the end.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rovka created this revision.Jun 10 2021, 3:48 AM

Herald added a reviewer: sscalpone. · View Herald TranscriptJun 10 2021, 3:48 AM

Herald added subscribers: jdoerfert, mgorny. · View Herald Transcript

rovka requested review of this revision.Jun 10 2021, 3:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 3:48 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

rovka added a child revision: D104020: [flang] Add POSIX implementation for CPU_TIME.Jun 10 2021, 4:11 AM

Harbormaster completed remote builds in B108584: Diff 351118.Jun 10 2021, 4:20 AM

Added braces around if. NFCI.

rovka mentioned this in D104020: [flang] Add POSIX implementation for CPU_TIME.Jun 11 2021, 1:14 AM

LGTM, thanks !

flang/unittests/RuntimeGTest/Time.cpp
20	nit: other runtime test files also uses brace around single line for. Only lib/[Optmizer\|Lower], and include/flang/[Optmizer\|Lower] files are using llvm convention there.

This revision is now accepted and ready to land.Jun 11 2021, 1:28 AM

Harbormaster completed remote builds in B108757: Diff 351358.Jun 11 2021, 1:50 AM

Please ensure that this code does not make the Fortran runtime library binary dependent on a C++ runtime library. We're trying to restrict dependences to C to make things easier for vendors and application writers to ship binaries without having to also ship one or more C++ environments alongside them.

rovka added inline comments.Jun 14 2021, 12:41 AM

flang/unittests/RuntimeGTest/Time.cpp
20	Derp, I missed that one... Would be nice to add to the clang-tidy config, I'm trying to put together a patch for that.

Closed by commit rG57e85622bbdb: [flang] Add initial implementation for CPU_TIME (authored by rovka). · Explain WhyJun 14 2021, 12:52 AM

This revision was automatically updated to reflect the committed changes.

rovka added a commit: rG57e85622bbdb: [flang] Add initial implementation for CPU_TIME.

In D104019#2813777, @klausler wrote:

Please ensure that this code does not make the Fortran runtime library binary dependent on a C++ runtime library. We're trying to restrict dependences to C to make things easier for vendors and application writers to ship binaries without having to also ship one or more C++ environments alongside them.

Oops, sorry, I didn't see this comment before committing. FWIW that's why I've been using functionality from <ctime> and not std::chrono, but I'll double check and revert if we're inadvertently pulling in the C++ runtime.

Tested with this:

> cat /tmp/cputime.c                                            
double _FortranACpuTime();

int main() {
  return _FortranACpuTime();
}
> clang /tmp/cputime.c libFortranRuntime.a -o /tmp/cputime
> gcc /tmp/cputime.c libFortranRuntime.a -o /tmp/cputime

Both seem to handle it just fine :)

Could you please address the build failure mentioned in https://reviews.llvm.org/rG57e85622bbdb2eb18cc03df2ea457019c58f6912#inline-6002:

This fails with msvc which apparently optimized the loop away:

[ RUN      ] TimeIntrinsics.CpuTime
C:\Users\buildbot-worker\minipc-ryzen-win\flang-x86_64-windows\llvm-project\flang\unittests\RuntimeGTest\Time.cpp(34): error: Expected: (end) > (start), actual: 2.000000e-03 vs 2.000000e-03
[  FAILED  ] TimeIntrinsics.CpuTime (1 ms)

I think assuming how long it takes to complete a volatile write is fragile.

Instead I suggest to test CpuTime by repeatedly calling it until it changes.

In D104019#2825717, @Meinersbur wrote:

Could you please address the build failure mentioned in https://reviews.llvm.org/rG57e85622bbdb2eb18cc03df2ea457019c58f6912#inline-6002:

I am so sorry, I completely missed that thread! :(

This fails with msvc which apparently optimized the loop away:
[ RUN      ] TimeIntrinsics.CpuTime
C:\Users\buildbot-worker\minipc-ryzen-win\flang-x86_64-windows\llvm-project\flang\unittests\RuntimeGTest\Time.cpp(34): error: Expected: (end) > (start), actual: 2.000000e-03 vs 2.000000e-03
[  FAILED  ] TimeIntrinsics.CpuTime (1 ms)
I think assuming how long it takes to complete a volatile write is fragile.

Instead I suggest to test CpuTime by repeatedly calling it until it changes.

Hmm, that's probably better - if it never changes then we'll timeout eventually, right? I'll try to commit that today.

In D104019#2826366, @rovka wrote:
In D104019#2825717, @Meinersbur wrote:
This fails with msvc which apparently optimized the loop away:
[ RUN      ] TimeIntrinsics.CpuTime
C:\Users\buildbot-worker\minipc-ryzen-win\flang-x86_64-windows\llvm-project\flang\unittests\RuntimeGTest\Time.cpp(34): error: Expected: (end) > (start), actual: 2.000000e-03 vs 2.000000e-03
[  FAILED  ] TimeIntrinsics.CpuTime (1 ms)
I think assuming how long it takes to complete a volatile write is fragile.

Instead I suggest to test CpuTime by repeatedly calling it until it changes.
Hmm, that's probably better - if it never changes then we'll timeout eventually, right? I'll try to commit that today.

But just for the record, are you sure that the loop is optimized away? Have you checked the assembly? I'm thinking there might be an issue with the resolution of the timer used on Windows - we might need a better implementation for it than the default (just like I added for POSIX systems).

It would be nice if our test could flag when the resolution of the timer isn't good enough on a given implementation, for some definition of "good enough". Naturally, one can expect that the definition of "good enough" will change in the future, and our test can change with it. Ideally, the test would be representative of the amount of work that we expect to be able to time. OTOH I don't know if this test is the place for that, or if we should just add some microbenchmarks to the test-suite.

In D104019#2826366, @rovka wrote:

Hmm, that's probably better - if it never changes then we'll timeout eventually, right? I'll try to commit that today.

Correct. Change looks good, does not fail anymore.

Assuming that a look takes a minimum time to execute broke some famous program in the past. For instance, Windows 3.11, some old games, and every program compiled with Turbo Pascal.

But just for the record, are you sure that the loop is optimized away? Have you checked the assembly? I'm thinking there might be an issue with the resolution of the timer used on Windows - we might need a better implementation for it than the default (just like I added for POSIX systems).

It was just an assumption, I haven't checked the disassembly. Embedded (where volatile is important) is not really a domain for msvc so ignoring volatile looked like a reasonable explanation.

It would be nice if our test could flag when the resolution of the timer isn't good enough on a given implementation, for some definition of "good enough". Naturally, one can expect that the definition of "good enough" will change in the future, and our test can change with it. Ideally, the test would be representative of the amount of work that we expect to be able to time. OTOH I don't know if this test is the place for that, or if we should just add some microbenchmarks to the test-suite.

Here is the /Ox disassembly of LookBusy by msvc:

00007FF79D37F770  xor         eax,eax  
00007FF79D37F772  nop         dword ptr [rax]  
00007FF79D37F776  nop         word ptr [rax+rax]  
    x = i;
00007FF79D37F780  mov         dword ptr [x (07FF79DB49F50h)],eax  
00007FF79D37F786  inc         eax  
00007FF79D37F788  cmp         eax,100h  
00007FF79D37F78D  jl          LookBusy+10h (07FF79D37F780h)

That is, msvc actually did honor the volatile. The test actually also fails when compiling with gcc and running with WSL1 (Windows Subsystem for Linux), i.e. your clock resolution theory seems correct. Which makes the test platform-dependent.

IMHO a MicroBenchmark would be a better place (e.g. llvm-test-suite already has MicroBenchmarks, @naromero77 currently works on adding Fortran tests) than a regression test error. A too granular timer resolution could also mitigated by running the microbenchmark multiple times, as GoogleBenchmark does. If precision is important, std::chrono::high_resolution_clock would be the API to use, which maps to QueryPerformanceCounter on Windows.

In D104019#2829152, @Meinersbur wrote:
In D104019#2826366, @rovka wrote:

Hmm, that's probably better - if it never changes then we'll timeout eventually, right? I'll try to commit that today.

Correct. Change looks good, does not fail anymore.

Assuming that a look takes a minimum time to execute broke some famous program in the past. For instance, Windows 3.11, some old games, and every program compiled with Turbo Pascal.

But just for the record, are you sure that the loop is optimized away? Have you checked the assembly? I'm thinking there might be an issue with the resolution of the timer used on Windows - we might need a better implementation for it than the default (just like I added for POSIX systems).

It was just an assumption, I haven't checked the disassembly. Embedded (where volatile is important) is not really a domain for msvc so ignoring volatile looked like a reasonable explanation.

It would be nice if our test could flag when the resolution of the timer isn't good enough on a given implementation, for some definition of "good enough". Naturally, one can expect that the definition of "good enough" will change in the future, and our test can change with it. Ideally, the test would be representative of the amount of work that we expect to be able to time. OTOH I don't know if this test is the place for that, or if we should just add some microbenchmarks to the test-suite.

Here is the /Ox disassembly of LookBusy by msvc:
00007FF79D37F770  xor         eax,eax  
00007FF79D37F772  nop         dword ptr [rax]  
00007FF79D37F776  nop         word ptr [rax+rax]  
    x = i;
00007FF79D37F780  mov         dword ptr [x (07FF79DB49F50h)],eax  
00007FF79D37F786  inc         eax  
00007FF79D37F788  cmp         eax,100h  
00007FF79D37F78D  jl          LookBusy+10h (07FF79D37F780h)
That is, msvc actually did honor the volatile. The test actually also fails when compiling with gcc and running with WSL1 (Windows Subsystem for Linux), i.e. your clock resolution theory seems correct. Which makes the test platform-dependent.

IMHO a MicroBenchmark would be a better place (e.g. llvm-test-suite already has MicroBenchmarks, @naromero77 currently works on adding Fortran tests) than a regression test error. A too granular timer resolution could also mitigated by running the microbenchmark multiple times, as GoogleBenchmark does. If precision is important, std::chrono::high_resolution_clock would be the API to use, which maps to QueryPerformanceCounter on Windows.

Unfortunately we can't use std::chrono for the implementation, since that would pull in the C++ runtime libraries. As far as C++ usage goes, we're limited to '#include <cstuff>'.

@rovka , I've been seeing intermittent failures running check-flang. I created an issue that has more information -- https://github.com/flang-compiler/f18-llvm-project/issues/930

In D104019#2891564, @PeteSteinfeld wrote:

@rovka , I've been seeing intermittent failures running check-flang. I created an issue that has more information -- https://github.com/flang-compiler/f18-llvm-project/issues/930

Thanks, I'll have a look! (Sorry about the delay, I was out of office)

@rovka , sorry for not keeping this up to date. @schweitz fixed this with https://github.com/flang-compiler/f18-llvm-project/pull/944.

Revision Contents

Path

Size

flang/

runtime/

CMakeLists.txt

1 line

time-intrinsic.cpp

32 lines

unittests/

RuntimeGTest/

CMakeLists.txt

1 line

Time.cpp

35 lines

Diff 351791

flang/runtime/CMakeLists.txt

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	add_flang_library(FortranRuntime
numeric.cpp		numeric.cpp
random.cpp		random.cpp
reduction.cpp		reduction.cpp
product.cpp		product.cpp
stat.cpp		stat.cpp
stop.cpp		stop.cpp
sum.cpp		sum.cpp
terminator.cpp		terminator.cpp
		time-intrinsic.cpp
tools.cpp		tools.cpp
transformational.cpp		transformational.cpp
type-code.cpp		type-code.cpp
unit.cpp		unit.cpp
unit-map.cpp		unit-map.cpp

LINK_LIBS		LINK_LIBS
FortranDecimal		FortranDecimal
)		)

flang/runtime/time-intrinsic.cpp

This file was added.

				//===-- runtime/time-intrinsic.cpp ----------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				// Implements time-related intrinsic subroutines.

				#include "time-intrinsic.h"

				#include <ctime>

				namespace Fortran::runtime {
				extern "C" {

				// CPU_TIME (Fortran 2018 16.9.57)
				double RTNAME(CpuTime)() {
				// This is part of the c++ standard, so it should at least exist everywhere.
				// It probably does not have the best resolution, so we prefer other
				// platform-specific alternatives if they exist.
				std::clock_t timestamp{std::clock()};
				if (timestamp != std::clock_t{-1}) {
				return static_cast<double>(timestamp) / CLOCKS_PER_SEC;
				}

				// Return some negative value to represent failure.
				return -1.0;
				}
				} // extern "C"
				} // namespace Fortran::runtime

flang/unittests/RuntimeGTest/CMakeLists.txt

	add_flang_unittest(FlangRuntimeTests			add_flang_unittest(FlangRuntimeTests
	CharacterTest.cpp			CharacterTest.cpp
	CrashHandlerFixture.cpp			CrashHandlerFixture.cpp
	Format.cpp			Format.cpp
	ListInputTest.cpp			ListInputTest.cpp
	Matmul.cpp			Matmul.cpp
	MiscIntrinsic.cpp			MiscIntrinsic.cpp
	Namelist.cpp			Namelist.cpp
	Numeric.cpp			Numeric.cpp
	NumericalFormatTest.cpp			NumericalFormatTest.cpp
	Random.cpp			Random.cpp
	Reduction.cpp			Reduction.cpp
	RuntimeCrashTest.cpp			RuntimeCrashTest.cpp
				Time.cpp
	Transformational.cpp			Transformational.cpp
	)			)

	target_link_libraries(FlangRuntimeTests			target_link_libraries(FlangRuntimeTests
	PRIVATE			PRIVATE
	FortranRuntime			FortranRuntime
	)			)

flang/unittests/RuntimeGTest/Time.cpp

This file was added.

				//===-- flang/unittests/RuntimeGTest/Time.cpp -----------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "gtest/gtest.h"
				#include "../../runtime/time-intrinsic.h"

				using namespace Fortran::runtime;

				volatile int x = 0;

				void LookBusy() {
				// We're trying to track actual processor time, so sleeping is not an option.
				// Doing some writes to a volatile variable should do the trick.
				for (int i = 0; i < (1 << 8); ++i) {
				x = i;
				jeanPerierUnsubmitted Not Done Reply Inline Actions nit: other runtime test files also uses brace around single line for. Only lib/[Optmizer\|Lower], and include/flang/[Optmizer\|Lower] files are using llvm convention there. jeanPerier: nit: other runtime test files also uses brace around single line for. Only lib/[Optmizer\|Lower]…
				rovkaAuthorUnsubmitted Done Reply Inline Actions Derp, I missed that one... Would be nice to add to the clang-tidy config, I'm trying to put together a patch for that. rovka: Derp, I missed that one... Would be nice to add to the clang-tidy config, I'm trying to put…
				}
				}

				TEST(TimeIntrinsics, CpuTime) {
				// We can't really test that we get the "right" result for CPU_TIME, but we
				// can have a smoke test to see that we get something reasonable on the
				// platforms where we expect to support it.
				double start = RTNAME(CpuTime)();
				LookBusy();
				double end = RTNAME(CpuTime)();

				ASSERT_GE(start, 0.0);
				ASSERT_GT(end, 0.0);
				ASSERT_GT(end, start);
				}