- This file is larger than 256 KB, so syntax highlighting is disabled by default.
|Show First 20 Lines • Show All 1,812 Lines • ▼ Show 20 Lines||``sspstrong``|
|resulting function will have an ``sspstrong`` attribute.||resulting function will have an ``sspstrong`` attribute.|
|This attribute indicates that the function was called from a scope that||This attribute indicates that the function was called from a scope that|
|requires strict floating-point semantics. LLVM will not attempt any||requires strict floating-point semantics. LLVM will not attempt any|
|optimizations that require assumptions about the floating-point rounding||optimizations that require assumptions about the floating-point rounding|
|mode or that might alter the state of floating-point status flags that||mode or that might alter the state of floating-point status flags that|
|might otherwise be set or cleared by calling this function. LLVM will||might otherwise be set or cleared by calling this function. LLVM will|
|not introduce any new floating-point instructions that may trap.||not introduce any new floating-point instructions that may trap.|
|This indicates the denormal (subnormal) handling that may be assumed|
|for the default floating-point environment. This may be one of|
|``"ieee"``, ``"preserve-sign"``, or ``"positive-zero"``. If this|
|is attribute is not specified, the default is ``"ieee"``. If the|
|mode is ``"preserve-sign"``, or ``"positive-zero"``, denormal|
|outputs may be flushed to zero by standard floating point|
|operations. It is not mandated that flushing to zero occurs, but if|
|a denormal output is flushed to zero, it must respect the sign|
|mode. Not all targets support all modes. While this indicates the|
On second thought I think this may be too permissive. I think based on the use in DAGCombiner, that flushing of outputs is compulsory.
arsenm: On second thought I think this may be too permissive. I think based on the use in DAGCombiner…
It turns out the fast sqrt usage really cares about input denormals being implicitly treated as 0, not the output flushing (i.e. this only needs DAZ, not FTZ). I think being permissive on the output is OK, but if implicit input flushing is required then it's compulsory and a target is responsible for inserting a flush of some kind if the use instruction isn't known to follow this mode.
Because of this, I do think it's necessary to treat this as two separate modes. I'm thinking to comma separate output-mode,input-mode, and assume input-mode=output-mode if the second half isn't specified for compatibility with the existing attribute.
arsenm: It turns out the fast sqrt usage really cares about input denormals being implicitly treated as…
|expected floating point mode the function will be executed with,|
|this does not make any attempt to ensure the mode is|
|consistent. User or platform code is expected to set the floating|
Can you clarify this a little bit? I'd prefer something like "Same as `"denorm-fp-math"`, but only controls the behavior of the 32-bit float type.".
scanon: Can you clarify this a little bit? I'd prefer something like "Same as ``"denorm-fp-math"``, but…
|point mode appropriately before function entry.|
Can you document which targets do support the option? What happens if I try to use the option on a target where it is not supported?
andrew.w.kaylor: Can you document which targets do support the option? What happens if I try to use the option…
I'm not sure where to document this, or if/how/where to diagnose it. I don't think the high level LangRef description is the right place to discuss specific target handling.
Currently it won't error or anything. Code checking the denorm mode will see the f32 specific mode, even if the target in the end isn't really going to respect this.
One problem is this potentially does require coordination with other toolchain components. For AMDGPU, the compiler can directly tell the driver what FP mode to set on each entry point, but for x86 it requires linking in crtfastmath to set the default mode bits. If another target had a similar runtime environment requirement, I don't think we can be sure the attribute is correct or not.
arsenm: I'm not sure where to document this, or if/how/where to diagnose it. I don't think the high…
There is precedent for describing target-specific behavior in LangRef. It just doesn't seem useful to say that not all targets support the attribute without saying which ones do. We should also say what is expected if a target doesn't support the attribute. It seems reasonable for the function attribute to be silently ignored.
This is a point I'm interested in. I don't like the current crtfastmath.o handling. It feels almost accidental when FTZ works as expected. My understanding is we link crtfastmath.o if we find it but if not everything just goes about its business. The Intel compiler injects code into main() to explicitly set the FTZ/DAZ control modes. That obviously has problems too, but it's at least consistent and predictable. As I understand it, crtfastmath.o sets these modes from a static initializer, but I'm not sure anything is done to determine the order of that initializer relative to others.
How does the compiler identify entry points for AMDGPU? And does it emit code to set FTZ based on the function attribute here?
andrew.w.kaylor: There is precedent for describing target-specific behavior in LangRef. It just doesn't seem…
The entry points are a specific calling convention. There's no real concept of main. Each kernel has an associated blob of metadata the driver uses to set up various config registers on dispatch.
I don't think specially recognizing main in the compiler is fundamentally different than having it done in a static constructor. It's still a construct not associated with any particular function or anything.
arsenm: The entry points are a specific calling convention. There's no real concept of main. Each…
The problem with having it done in a static constructor is that you have no certainty of when it will be done relative to other static constructors. If it's in main you can at least say that it's after all the static constructors (assuming main is your entry point).
andrew.w.kaylor: The problem with having it done in a static constructor is that you have no certainty of when…
Yes and no. The linker should honor static constructor priorities. But, yeah, there's no guarantee that this constructor will run before other priority 101 constructors.
The performance penalty for setting denormal flushing in main could be significant (think C++). Also, there's precedent for using static constructors, like GCC's crtfastmath.o.
cameron.mcinally: Yes and no. The linker should honor static constructor priorities. But, yeah, there's no…
Fair enough. I don't necessarily like how icc handles this. I don't have a problem with how gcc handles it. I just really don't like how LLVM does it. If we want to take the static constructor approach we should define our own, not depend on whether or not the GNU object file happens to be around.
Static initialization doesn't help for AMDGPU, and I suppose that's likely to be the case for any offload execution model. Since this patch is moving us toward a more consistent implementation I'm wondering if we can define some general rules for how this is supposed to work. Like when the function attribute will result in injected instructions setting the control flags and when it won't.
andrew.w.kaylor: Fair enough. I don't necessarily like how icc handles this. I don't have a problem with how gcc…
I think the most we can expect of this attribute as informing codegen of the expected FP denormal handling mode, and not something responsible for ensuring the mode will really be set. AMDGPU conceptually could have a separate set of attributes for setting the denormal FP mode, but since it would look identical, this gets a bonus usage for setting it for kernels. This doesn't protect you from calling functions in modules compiled with different attributes, so similar problems outside the view of the compiler still exist
arsenm: I think the most we can expect of this attribute as informing codegen of the expected FP…
That's a good idea. There's subtle differences between targets in the GNU implementation. It would be good to standardize them.
cameron.mcinally: > If we want to take the static constructor approach we should define our own, not depend on…
|Same as ``"denormal-fp-math"``, but only controls the behavior of|
|the 32-bit float type (or vectors of 32-bit floats). If both are|
|are present, this overrides ``"denormal-fp-math"``. Not all targets|
|support separately setting the denormal mode per type, and no|
|attempt is made to diagnose unsupported uses. Currently this|
|attribute is respected by the AMDGPU and NVPTX backends.|
|This attribute indicates that the function will delegate to some other||This attribute indicates that the function will delegate to some other|
|function with a tail call. The prototype of a thunk should not be used for||function with a tail call. The prototype of a thunk should not be used for|
|optimization purposes. The caller is expected to cast the thunk prototype to||optimization purposes. The caller is expected to cast the thunk prototype to|
|match the thunk target prototype.||match the thunk target prototype.|
|This attribute indicates that the ABI being targeted requires that||This attribute indicates that the ABI being targeted requires that|
|an unwind table entry be produced for this function even if we can||an unwind table entry be produced for this function even if we can|
|▲ Show 20 Lines • Show All 16,648 Lines • Show Last 20 Lines|