Page MenuHomePhabricator

[OpenCL] Allow disabling types and declarations associated with extensions
ClosedPublic

Authored by yaxunl on Jun 24 2016, 1:54 PM.

Details

Summary

Added a map to associate types and declarations with extensions.

Refactored existing diagnostic for disabled types associated with extensions and extended it to declarations for generic situation.

Fixed some bugs for types associated with extensions.

Allow users to use pragma to declare types and functions for supported extensions, e.g.

#pragma OPENCL EXTENSION the_new_extension_name : begin
// declare types and functions associated with the extension here
#pragma OPENCL EXTENSION the_new_extension_name : end

Diff Detail

Repository
rL LLVM

Event Timeline

yaxunl updated this revision to Diff 61829.Jun 24 2016, 1:54 PM
yaxunl retitled this revision from to [OpenCL] Allow user to add supported OpenCL extensions by pragma.
yaxunl updated this object.
yaxunl added a reviewer: Anastasia.
yaxunl updated this revision to Diff 69606.Aug 29 2016, 1:47 PM
yaxunl retitled this revision from [OpenCL] Allow user to add supported OpenCL extensions by pragma to [OpenCL] Allow disabling types and declarations associate with extensions.
yaxunl updated this object.

Added association of types and declarations with extensions and diagnostics. I would like to update opencl-c.h in a separate patch.

yaxunl retitled this revision from [OpenCL] Allow disabling types and declarations associate with extensions to [OpenCL] Allow disabling types and declarations associated with extensions.Aug 29 2016, 1:49 PM
yaxunl updated this object.
Anastasia requested changes to this revision.Sep 5 2016, 11:14 AM
Anastasia edited edge metadata.

Have you done any investigation regarding the compilation speed as this change adds expensive container lookups for all OpenCL declarations and function calls.

It would be nice to see some profiling information first before we go ahead with this solution...

include/clang/Basic/DiagnosticParseKinds.td
957 ↗(On Diff #69606)

Could you please add a test for this diagnostic too?

include/clang/Basic/DiagnosticSemaKinds.td
7998 ↗(On Diff #69606)

Could we unify with err_type_requires_extension?

include/clang/Basic/OpenCLImageTypes.def
20 ↗(On Diff #69606)

Any reason to use variadic macros here? Do we expect variable number of arguments?

include/clang/Basic/OpenCLOptions.h
18 ↗(On Diff #69606)

Are those two headers actually used here?

include/clang/Sema/Sema.h
8006 ↗(On Diff #69606)

are disabled -> being disabled

lib/Headers/opencl-c.h
15484 ↗(On Diff #69606)

Does this change belong here?

lib/Parse/ParsePragma.cpp
461 ↗(On Diff #69606)

Could we use PointerIntPair still but with 2 bits?

lib/Sema/Sema.cpp
1558 ↗(On Diff #69606)

Would this not be an error?

lib/Sema/SemaDecl.cpp
4797 ↗(On Diff #69606)

I am a bit concerned about the compilation speed since we will be performing expensive container lookups for each declaration here.

Have you done any profiling in this direction?

lib/Serialization/ASTReader.cpp
6937 ↗(On Diff #69606)

Is the comment no longer relevant?

This revision now requires changes to proceed.Sep 5 2016, 11:14 AM
yaxunl added a comment.Sep 8 2016, 8:01 AM

I did profiling with valgrind for the cost of checking disabled types and declarations. For a typical program, time spent in checking disabled types and declarations are less than 0.1%, so this cost can be ignored.

I did profiling with valgrind for the cost of checking disabled types and declarations. For a typical program, time spent in checking disabled types and declarations are less than 0.1%, so this cost can be ignored.

Could you please provide the profiled code example here please? As far as I understand this lookup will run for all the declarations and function calls. So we should make sure to profile various programs to see what the impact is with different use cases.

yaxunl marked 8 inline comments as done.Sep 8 2016, 11:46 AM

Uploaded the kernel used for profiling and the calgrind data. This kernel is dumped from the open source ray tracer Blender Cycles. It is a large kernel with lots of declarations.

include/clang/Basic/DiagnosticParseKinds.td
957 ↗(On Diff #69606)

added

include/clang/Basic/DiagnosticSemaKinds.td
7998 ↗(On Diff #69606)

these two error msgs have different format.

def err_type_requires_extension : Error<
 "use of type %0 requires %1 extension to be enabled">;

err_type_requires_extension has an extra type name argument.

lib/Headers/opencl-c.h
15484 ↗(On Diff #69606)

With this feature, use of image types associated with an extension requires enabling the extension. That's why this change.

lib/Parse/ParsePragma.cpp
461 ↗(On Diff #69606)

Using PointerIntPair with 2 bits requires IdentifierInfo to be aligned at 4 bytes, however this is not true. IdentifierInfo has no explicit alignment specifier, therefore it can only hold 1 bit in PointerIntPair.

lib/Sema/Sema.cpp
1558 ↗(On Diff #69606)

This function can be called grammatically to set extensions for a group of image types, e.g., the extensions associated with image types. It is more convenient to allow empty string here. If empty string is not allowed, it can be diagnosed before calling this function.

lib/Sema/SemaDecl.cpp
4797 ↗(On Diff #69606)

Profiling result shows cost of checking disabled types and declarations can be ignored for typical OpenCL kernels.

lib/Serialization/ASTReader.cpp
6937 ↗(On Diff #69606)

This change will initialize Sema with OpenCLExtensions imported from module/pch, so the original comment no longer applies.

yaxunl updated this revision to Diff 70738.Sep 8 2016, 12:34 PM
yaxunl edited edge metadata.
yaxunl marked 5 inline comments as done.

Revised by Anastasia's comments.

I have made an experiment with a simple kernel:

void foo1(void);
void foo2(void);
void foo3(void);
void foo4(void);
void foo5(void);
void foo6(void);
void foo7(void);
void foo8(void);
void foo9(void);
void foo10(void);

void test(){
  foo1();
  foo2();
  foo3();
  foo4();
  foo5();
  foo6();
  foo7();
  foo8();
  foo9();
  foo10();
}

I am using time utility of linux to measure the compile time running Clang in CL2.0 mode and average over 100 samples. It shows me around 7% overhead with your approach.

Even thought it doesn't seem to be large overhead in this test, however it would be undesirable to have any regressions in compilation time in the future Clang releases for OpenCL basic functionality. Our target should be constantly to improve on that unless we have a good reason not to. But in this case I don't feel like the reason is strong enough. Also it would be nice to study more use cases to see how the compilation time scales. Compilation time might be a concern for runtime compilations especially in embedded/constraint platform.

Based on this, one alternative I would like to propose - how about we use old mechanism for standard OpenCL extensions that come from the Spec and dynamic maps for vendor extensions? We can then have an early return to avoid the expensive checks if the maps of extensions are empty (OpenCLTypeExtMap, OpenCLDeclExtMap) or full check if they contain any elements. In this case if vendors are fine with having slight overhead of compilation time they can use the feature to add their extensions flexibly, otherwise they will have to stay with an old approach of extending Clang directly. What do you think about it? Any other proposals are welcome too!

I have made an experiment with a simple kernel:

void foo1(void);
void foo2(void);
void foo3(void);
void foo4(void);
void foo5(void);
void foo6(void);
void foo7(void);
void foo8(void);
void foo9(void);
void foo10(void);

void test(){
  foo1();
  foo2();
  foo3();
  foo4();
  foo5();
  foo6();
  foo7();
  foo8();
  foo9();
  foo10();
}

I am using time utility of linux to measure the compile time running Clang in CL2.0 mode and average over 100 samples. It shows me around 7% overhead with your approach.

Since the program is very small, it is difficult to measure the compilation time accurately. I compile the program 1000 times and measure its time in a script:

 $ cat run.sh
 run() {
 i=1
 while [[ $i -le 1000 ]]; do
   ./$1 -cc1 -emit-llvm tmp.cl
   i=$((i+1))
 done
 }
 
time -p run $1

Even so, I found large variations in the measured compilation time. I ran the script 10 times for clang before my change and I got the real time

real 8.96
real 9.01
real 8.99
real 9.07
real 9.03
real 8.99
real 9.03
real 9.01
real 8.99
real 9.01

and the average time is 9.009s.

For clang after my change, I got

real 9.06
real 9.09
real 9.10
real 9.03
real 9.05
real 9.17
real 9.08
real 9.08
real 9.07
real 9.08

And the average time is 9.081s.

The increase of compilation time is 0.8%. Considering this program consists mostly of function declarations, which emphasized the cost of evaluating disabled function declarations unrealistically. In real programs this increment in compilation time should be even smaller.

Since the increment of compilation time is less than 1% even for exaggerated cases, I think it is reasonable to accept the cost for the improved diagnostics for extensions.

If there are concerns that the newly added diagnostics may break applications which use builtin functions associated with an extension without enabling it, we can make the diagnostic msg a warning which can be turned off.

I have made an experiment with a simple kernel:

void foo1(void);
void foo2(void);
void foo3(void);
void foo4(void);
void foo5(void);
void foo6(void);
void foo7(void);
void foo8(void);
void foo9(void);
void foo10(void);

void test(){
  foo1();
  foo2();
  foo3();
  foo4();
  foo5();
  foo6();
  foo7();
  foo8();
  foo9();
  foo10();
}

I am using time utility of linux to measure the compile time running Clang in CL2.0 mode and average over 100 samples. It shows me around 7% overhead with your approach.

Since the program is very small, it is difficult to measure the compilation time accurately. I compile the program 1000 times and measure its time in a script:

 $ cat run.sh
 run() {
 i=1
 while [[ $i -le 1000 ]]; do
   ./$1 -cc1 -emit-llvm tmp.cl
   i=$((i+1))
 done
 }
 
time -p run $1

Even so, I found large variations in the measured compilation time. I ran the script 10 times for clang before my change and I got the real time

real 8.96
real 9.01
real 8.99
real 9.07
real 9.03
real 8.99
real 9.03
real 9.01
real 8.99
real 9.01

and the average time is 9.009s.

For clang after my change, I got

real 9.06
real 9.09
real 9.10
real 9.03
real 9.05
real 9.17
real 9.08
real 9.08
real 9.07
real 9.08

And the average time is 9.081s.

The increase of compilation time is 0.8%. Considering this program consists mostly of function declarations, which emphasized the cost of evaluating disabled function declarations unrealistically. In real programs this increment in compilation time should be even smaller.

Since the increment of compilation time is less than 1% even for exaggerated cases, I think it is reasonable to accept the cost for the improved diagnostics for extensions.

If there are concerns that the newly added diagnostics may break applications which use builtin functions associated with an extension without enabling it, we can make the diagnostic msg a warning which can be turned off.

Ok. Sure. I will profile a few more benchmarks and let you know.

Anastasia edited edge metadata.Sep 28 2016, 7:21 AM
Anastasia added a subscriber: svenvh.

Sam, I ran a few more tests and understood that the overhead mainly comes from extra initialization (in Sema::Initialize()). Therefore, it's more noticeable on a very small kernels. However, I agree we can probably neglect the overhead as it account for only a couple % of overall time max in Release mode. Sorry, for taking this long time.

include/clang/Basic/DiagnosticSemaKinds.td
7998 ↗(On Diff #70738)

I guess we could multiplex with select and pass an empty string as a first parameter to the diagnostic? Apart from that, the core part seems to be the same.

lib/Parse/ParsePragma.cpp
1429 ↗(On Diff #70738)

Not clear why register keyword is here too?

461 ↗(On Diff #69606)

Based on its structure with having a pointer member I think it's reasonable to assume at least 4 bytes alignment... Although this doesn't seem to affect the performance anyways.

lib/Sema/Sema.cpp
226 ↗(On Diff #70738)

Any reason for changing this too?

1558 ↗(On Diff #70738)

Could we then check for an empty string ExtStr as a first function statement instead?

test/Parser/opencl-atomics-cl20.cl
51 ↗(On Diff #70738)

Why this change?

test/SemaOpenCL/extension-begin.cl
27 ↗(On Diff #70738)

Could you please add case with typedef of a type from an extensions and also using it with qualifiers for the better coverage.

yaxunl marked 15 inline comments as done.Dec 6 2016, 8:21 AM

Sorry for the delay. I will update this patch.

lib/Parse/ParsePragma.cpp
1429 ↗(On Diff #70738)

this is old code when I use 'register' to register an extension. will remove it.

461 ↗(On Diff #69606)

You are right. The alignment of IdentifierInfo should be 8 on most systems.

However in IdentifierTable.h there is

// Provide PointerLikeTypeTraits for IdentifierInfo pointers, which
// are not guaranteed to be 8-byte aligned.
template<>
class PointerLikeTypeTraits<clang::IdentifierInfo*> {
public:
  static inline void *getAsVoidPointer(clang::IdentifierInfo* P) {
    return P;
  }
  static inline clang::IdentifierInfo *getFromVoidPointer(void *P) {
    return static_cast<clang::IdentifierInfo*>(P);
  }
  enum { NumLowBitsAvailable = 1 };
};

Due to the above code there is only 1 bit available for when using PointerIntPair with IdentifierInfo*. My guess is that alignment of IdentifierInfo is not 8 on certain systems so some one intentionally defined the above code to prevent using PointerIntPair with IdentifierInfo* for more than 1 bit.

lib/Sema/Sema.cpp
226 ↗(On Diff #70738)

To associate atomic long and double types with corresponding extensions so that when the extension is disabled, the associated types are also disabled.

1558 ↗(On Diff #70738)

will do

test/Parser/opencl-atomics-cl20.cl
51 ↗(On Diff #70738)

atomic_intptr_t etc. requires cl_khr_int64_extended_atomics only on 64 bit platforms.

This is a bug which was fixed by this patch.

test/SemaOpenCL/extension-begin.cl
27 ↗(On Diff #70738)

will do

yaxunl updated this revision to Diff 80422.Dec 6 2016, 8:24 AM
yaxunl marked 6 inline comments as done.

Bring the patch up to date.
Revised by Anastasia's comments.

Anastasia added inline comments.Dec 12 2016, 8:04 AM
include/clang/Basic/DiagnosticSemaKinds.td
8127 ↗(On Diff #80422)

Could we rename err_requires_extension -> err_opencl_requires_extension

lib/Parse/ParsePragma.cpp
461 ↗(On Diff #69606)

Not too critical, but it's certainly not 8 byte aligned for 32 bit architectures. It should although be at least 4 bytes aligned in my opinion.

This code seems to be old, and the comment is not very precise to be honest.

lib/Sema/Sema.cpp
1614 ↗(On Diff #80422)

this bit (up to line 1623) seems to be repeated in multiple places... wondering if we could use a helper function instead perhaps with the templates?

lib/Serialization/ASTReader.cpp
3170 ↗(On Diff #80422)

Can you put a comment explaining which element each line corresponds to?

yaxunl updated this revision to Diff 81142.Dec 12 2016, 2:23 PM
yaxunl marked 9 inline comments as done.

Revised by Anastasia's comments.

Anastasia added inline comments.Dec 13 2016, 9:01 AM
include/clang/Sema/Sema.h
8081 ↗(On Diff #81142)

Extsion -> extension

8120 ↗(On Diff #81142)

or declaration -> or a declaration

8121 ↗(On Diff #81142)

is disabled -> being disabled

lib/Serialization/ASTReader.cpp
3167 ↗(On Diff #81142)

Btw, OpenCLTypeExtMap and OpenCLTypeDeclMap don't have to be serialized?

test/Parser/opencl-atomics-cl20.cl
51 ↗(On Diff #70738)

The spec says:
"If the device address space is 64-bits, the data types atomic_intptr_t, atomic_uintptr_t,
atomic_size_t and atomic_ptrdiff_t are supported if the cl_khr_int64_base_atomics and
cl_khr_int64_extended_atomics extensions are supported."

This seems to be the same as long and double?

yaxunl marked 6 inline comments as done.Dec 14 2016, 2:17 PM
yaxunl added inline comments.
lib/Serialization/ASTReader.cpp
3167 ↗(On Diff #81142)

Yes. I will add it.

test/Parser/opencl-atomics-cl20.cl
51 ↗(On Diff #70738)

Yes. Use of long and double were correctly diagnosed if the corresponding extension was disabled but size_t/intptr_t/uintptr_t/ptrdiff_t was not. This patch fixes that.

yaxunl updated this revision to Diff 81471.Dec 14 2016, 2:21 PM
yaxunl marked 2 inline comments as done.

Added serialisation of types and declarations associated with extensions.
Brought the patch up to date.

Anastasia accepted this revision.Dec 16 2016, 6:44 AM
Anastasia edited edge metadata.

LGTM! Small nitpicks below can be done before committing. Also it would be nice to double check the compile time is still fine after the last rebase.

Thanks!

include/clang/Basic/OpenCLOptions.h
28 ↗(On Diff #81471)

Options -> Option

43 ↗(On Diff #81471)

Did you mean "and (optional) core feature?"

test/SemaOpenCL/extension-begin.cl
5 ↗(On Diff #81471)

Do we need -DHEADER_ONLY here?

This revision is now accepted and ready to land.Dec 16 2016, 6:44 AM
yaxunl marked 3 inline comments as done.Dec 16 2016, 10:45 AM

LGTM! Small nitpicks below can be done before committing. Also it would be nice to double check the compile time is still fine after the last rebase.

Thanks. I did the measurement again and did not see significant difference in compilation time before/after this change.

include/clang/Basic/OpenCLOptions.h
43 ↗(On Diff #81471)

My understanding is that once an extension becomes an (optional) core feature, it is no longer called an extension. So for for a specific OpenCL version, \p Ext is either is an extension, or an (optional) core feature. However this wording may be confusing. How about rewording as

Is supported as either an extension or an (optional) core feature for OpenCL version \p CLVer

test/SemaOpenCL/extension-begin.cl
5 ↗(On Diff #81471)

I will remove it.

This revision was automatically updated to reflect the committed changes.
yaxunl marked 2 inline comments as done.