Previously compilation of a few tests produced incorrect code. In them FP
pragmas were ignored and the resulting code contained constrained intrinsics
with rounding mode and/or exception behavior determined by command line
options only. Compiler creates correct AST for this tests, but FP options
were ignored in code generator because builder object was not set up properly.
To fix code generation builder object now is updated according to FP options
stored in CompoundStmt.
Details
- Reviewers
kpn rjmccall aaron.ballman
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thank you for looking into this! I happened to run into this same issue with #pragma float_control not behaving the way I'd expect. (FWIW, we also ran into an interesting issue where the floating point options were pushed but never popped in the TU were delayed template instantiation behaved differently than typical template instantiation.)
clang/test/CodeGen/pragma-fenv_access.cpp | ||
---|---|---|
36 | There are some extra test cases I'd like to see coverage for because there are some interesting edge cases to consider. template <typename Ty> float func1(Ty) { float f1 = 1.0f, f2 = 3.0f; return f1 + f2 * 2.0f; } #pragma float_control(precise, on, push) template float func1<int>(int); #pragma float_control(pop) #pragma float_control(precise, on, push) template <typename Ty> float func2(Ty) { float f1 = 1.0f, f2 = 3.0f; return f1 + f2 * 2.0f; } #pragma float_control(pop) template float func2<int>(int); void bar() { func1(1.1); func2(1.1); } This gets especially interesting when you think about delayed template instantiation as happens by default on Windows targets. Consider this code with the *driver level* -ffast-math flag enabled (not the cc1 option, which is different). I think that func1<int> SHOULD be precise, because the explicit instantiation is, while func1<double> SHOULD NOT be precise, because the definition is not. func2<int> SHOULD NOT be precise, because the explicit instantiation is not, while func2<double> SHOULD be precise, because the definition is. Partial specializations are a similar situation where the primary template and its related code made have different options. WDYT? |
clang/test/CodeGen/pragma-fenv_access.cpp | ||
---|---|---|
36 | Standard FP pragmas are defined only in C standard, so interaction of them with C++ specific features is actually implementation-defined. The cases presented in your example are reasonable solutions with one exception: IMO func2<int> should be precise, because its template is precise. It is equivalent to: template <typename Ty> float func2(Ty) { #pragma float_control(precise, on) float f1 = 1.0f, f2 = 3.0f; return f1 + f2 * 2.0f; } so instantiation of it would produce function with precise operations. Implementation of correct mechanism of the interaction requires substantial efforts and should be made in a separate patch, I think. In particular, we need to invent a way to associate a point of instantiation with the FPOptions in that point, so that delayed instantiation could be made with correct set of options. In this patch the change in SemaTemplateInstantiateDecl.cpp prevents from compiler crash. Without it codegen tries to create a call to constrained intrinsic in the function that do not have attribute StrictFP, because flag FEnvAccess is set at the end of translation unit in pragma-fenv_access.cpp. |
This property adheres to a function definition, so it seems to me that an explicit *instantiation* ought to preserve it from the instantiated template definition, but an explicit *specialization* ought to be independent.
i.e.
#pragma float_control(precise, on, push) template <typename Ty> float func2(Ty) { float f1 = 1.0f, f2 = 3.0f; return f1 + f2 * 2.0f; } #pragma float_control(pop) template float func2<int>(int); // precise template <> float func2<long>(long) { ... } // not precise
Interaction of pragmas and templates is implemented in https://reviews.llvm.org/D131143.
As specific interaction of FP pragmas and template instantiation was not supported, this patch is actual without additional changes.
There are some extra test cases I'd like to see coverage for because there are some interesting edge cases to consider.
This gets especially interesting when you think about delayed template instantiation as happens by default on Windows targets. Consider this code with the *driver level* -ffast-math flag enabled (not the cc1 option, which is different).
I think that func1<int> SHOULD be precise, because the explicit instantiation is, while func1<double> SHOULD NOT be precise, because the definition is not. func2<int> SHOULD NOT be precise, because the explicit instantiation is not, while func2<double> SHOULD be precise, because the definition is.
Partial specializations are a similar situation where the primary template and its related code made have different options.
WDYT?