This is an archive of the discontinued LLVM Phabricator instance.

Add first microbenchmarks for matrix types extensions.
Changes PlannedPublic

Authored by fhahn on Jul 13 2020, 10:12 AM.

Details

Summary

This patch adds an initial set of micro benchmarks for the matrix types
extension.

Event Timeline

fhahn created this revision.Jul 13 2020, 10:12 AM
paquette added inline comments.Jul 13 2020, 10:18 AM
MicroBenchmarks/MatrixTypes/main.cpp
147

Why 15 and 19?

fhahn marked an inline comment as done.Jul 13 2020, 10:27 AM
fhahn added inline comments.
MicroBenchmarks/MatrixTypes/main.cpp
147

No particular reason, it could be 17 and 13 or a similar combination around the 16 element range. The intention for those is to also cover some cases where the number of elements isn't a power-of-2 and more unusual combinations.

SjoerdMeijer added a comment.EditedJul 21 2020, 2:22 AM

Looks decent as an initial commit to me. Two high level questions:

  • I haven't looked at these MicroBenchmarks yets in the test-suite, but in general it would be convenient if a benchmarks also does a correctness check. Do you think there would be any value in doing that here? If so, would that easy to add?
  • In benchmarking, stable numbers are convenient. Since the input is randomly generated, I was wondering if there could be timing differences depending on different inputs? But I guess not here?
fhahn planned changes to this revision.Jul 30 2020, 11:07 AM

Looks decent as an initial commit to me. Two high level questions:

  • I haven't looked at these MicroBenchmarks yets in the test-suite, but in general it would be convenient if a benchmarks also does a correctness check. Do you think there would be any value in doing that here? If so, would that easy to add?

Agreed, that would indeed be convenient. Let me change that.

  • In benchmarking, stable numbers are convenient. Since the input is randomly generated, I was wondering if there could be timing differences depending on different inputs? But I guess not here?

I would expect that the difference in the actual FP values would not impact the throughput/latency of floating point units. I don't think there's anything about that in the public Arm Cortex tuning guides. From what I've seen so far on the devices I have access to is that the numbers are relatively stable, although sometimes there are rather large swings for some individual benchmarks (like +100% in runtime for single benchmarks). But my working theory was that this was due to system noise. If there's a real issue, I think we can address it once it appears

tschuett added a subscriber: tschuett.EditedJul 30 2020, 11:47 AM

Feel free to ignore: "On Subnormal Floating Point and Abnormal Timing"
http://www.ieee-security.org/TC/SP2015/papers-archived/6949a623.pdf

A NaN matrix times a NaN matrix will be slow.