Page MenuHomePhabricator

Intel SVML calling conventions
Needs ReviewPublic

Authored by dvnagorny on May 22 2018, 1:55 AM.

Details

Summary

This patch fixes the problem with improper calls to SVML library as it has non-standard calling conventions.
So accordingly it has SVML calling conventions definitions and code to set CC to the vectorized calls.
As SVML provides several implementations for the math functions we also took into consideration fast attribute and select more fast implementation in such case.
This work is based on original Matt Masten's work.

Diff Detail

Event Timeline

dvnagorny created this revision.May 22 2018, 1:55 AM
lebedev.ri added inline comments.
include/llvm/IR/CallingConv.h
225

unneeded comment

lib/Analysis/TargetLibraryInfo.cpp
55

This memory allocation looks unfortunate :/

utils/TableGen/SVMLEmitter.cpp
64

This surely can be StringRef

dvnagorny added inline comments.May 22 2018, 2:23 AM
lib/Analysis/TargetLibraryInfo.cpp
55

Could you explain your comment more detailed please. svmlMangle() returns not a reference but the std::string.

lebedev.ri added inline comments.May 22 2018, 2:29 AM
lib/Analysis/TargetLibraryInfo.cpp
55

Not much to explain, that already summarizes it quite well.

It will return a std::string, not a reference,
so unless small string optimization within std::string
happens, every call to svmlMangle() will cause an allocation.

And this also caused getVectorizedFunction() to return
the std:;string, and so on.

dvnagorny added inline comments.May 22 2018, 2:36 AM
lib/Analysis/TargetLibraryInfo.cpp
55

I can't agree with you here. Beside of RVO we have move semantics for std::string now. So there shouldn't be no any extra memory allocations. Does LLVM coding standard imply anything special on move semantics?

lebedev.ri added inline comments.May 22 2018, 2:40 AM
lib/Analysis/TargetLibraryInfo.cpp
1497

Sure, here, move will happen.

1499–1500

But i don't see how it can happen here.

dvnagorny added inline comments.May 22 2018, 3:21 AM
lib/Analysis/TargetLibraryInfo.cpp
1499–1500

So as the name is dynamically constructed now it should be stored somewhere. I think it's good enough if it occures not more then once.

hfinkel added inline comments.
include/llvm/Analysis/TargetLibraryInfo.h
159–160

I think just "Ignored" is fine.

include/llvm/IR/SVML.td
54

What does this mean? Why would a _ha variant of floor, for example, be needed for vectorization? Or, to put it another way, how would a _ha variant of floor differ from the _ep version?

dvnagorny added inline comments.May 22 2018, 7:31 AM
include/llvm/Analysis/TargetLibraryInfo.h
159–160

Will fix, thank you.

include/llvm/IR/SVML.td
54

Really I don't expect any difference in behaviour, However I think that link-time weak aliases for these symbols are more preferable here. It will not require any additional logic in function name mangling code.

hfinkel added inline comments.May 22 2018, 8:40 AM
include/llvm/IR/SVML.td
54

Do these aliases exist currently? I'm concerned about the comment saying that we're disabling vectorization of floor, etc., and I'm prefer that we not disable vectorization unnecessarily.

craig.topper added inline comments.
lib/Analysis/TargetLibraryInfo.cpp
1485

This looks like it passes 80 columns.

1503

This should probably be std::string() now.

utils/TableGen/SVMLEmitter.cpp
52

Mark these 'const'?

57

This looks to be a dead variable in release builds.

63

I think this should be Records.getAllDerivedDefinitions("SvmlVariant")

dvnagorny added inline comments.May 22 2018, 9:15 AM
include/llvm/IR/SVML.td
54

Really this code will work when vector-library SVML option will be explicitly passed. So it doesn't disable any default vectorizations. On the other hand in the current LLVM state these function's vectorization isn't enabled yet. So I keep their vectorization the same. From SVML library side aliases aren't available yet however this library not the only potential source of aliases.
It's quite simple to provide them from the application code.

hfinkel added inline comments.May 22 2018, 10:57 AM
include/llvm/IR/SVML.td
54

Okay, but providing them in application code doesn't help autovectorization, does it? Passing -fveclib=SVML should enable autovectorization of calls to floor, etc. You're saying that it doesn't now, and so I can certainly see that it might be best to change that in a separate patch, but saying that we'll wait for aliases to appear in the library doesn't sound like a good plan (and doesn't help users of existing versions of the library). The vectorizer can call some version which is already there.

In short, would it make sense for the comment to read?

// TODO: SVML does not currently provide _ha varients of these fucnctions. We should call the _ep variants of these functions in all cases instead.
dvnagorny updated this revision to Diff 150612.Jun 9 2018, 2:25 AM
dvnagorny edited the summary of this revision. (Show Details)
hsaito added a subscriber: hsaito.Jul 2 2018, 5:35 PM
hsaito added inline comments.
lib/Analysis/TargetLibraryInfo.cpp
55

I have a problem for the use of "_ha" interface in the non-fast case. Unless the compiler is in a reasonably relaxed mode, I'd like vector computation and scalar computation to be bitwise-identical. "_ha" interface of SVML doesn't produce the bitwise identical result as scalar call. I'm sure there is a room for using "_ha" interface, but we need to carefully design how to enable it. By simply using SVML, I don't think the programmer gave the compiler a license to deviate from bitwise identical results. I may be too much of a paranoid about it, but there is a fair number of people who ask for bitwise identity.

dvnagorny added inline comments.Jul 4 2018, 4:02 AM
lib/Analysis/TargetLibraryInfo.cpp
55

Quite interesting problem here. AFAIK "_ha" interface provides "high accuracy" versions of functions. (1 ulp), "default" SVML interface provides less precise functions (probably 4ulp).
So it means that scalar implementation is lesser precise than vectorized one.
On the other hand I'm not sure if it is really possible to have differ libm implementations providing fully bitwise-idenctical results for the whole float or double domain within 4ulps precision.
As not exact units should differ by definition.