Page MenuHomePhabricator

[RFC] Enable vectorization on Neon even without fast-math
Needs ReviewPublic

Authored by sanjoy on Mar 13 2019, 3:09 PM.

Details

Reviewers
rengolin
tra
Summary

This patch introduces a "ftz" function attribute and uses that to enable
vectorization for ARM Neon when -ffast-math is not specified. It would be nicer
to encode FTZ as part of FastMathFlags but we've run out of space there.

If this approach looks workable, I'll change the NVPTX backend to also use this
(backend independent) ftz attribute instead of the custom "nvptx-f32ftz"
attribute. I'll also add an entry to the langref.

Diff Detail

Event Timeline

sanjoy created this revision.Mar 13 2019, 3:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2019, 3:09 PM

langref part seems to be missing.

It would be nicer to encode FTZ as part of FastMathFlags but we've run out of space there.

That recently happened with sanitizer, and backend (predicates? don't recall),
it shouldn't be too hard to fix properly..

langref part seems to be missing.

I'll add it if/once reviewers are okay with this approach.

That recently happened with sanitizer, and backend (predicates? don't recall),

By "That" you mean they too ran out of bits? In this specific case we're running out of bits in Value::SubclassOptionalData, I don't see a simple fix for it that doesn't involve increasing the size of llvm::Value. I could steal a bit or two from, say, the UseList pointer but I'm not sure that counts as simple.

langref part seems to be missing.

I'll add it if/once reviewers are okay with this approach.

Actually, adding the langref changes to the patch help reviewers understand the new semantics and provide an atomic change, guaranteeing that, if the patch is reverted, so are the docs.

That recently happened with sanitizer, and backend (predicates? don't recall),

By "That" you mean they too ran out of bits? In this specific case we're running out of bits in Value::SubclassOptionalData, I don't see a simple fix for it that doesn't involve increasing the size of llvm::Value. I could steal a bit or two from, say, the UseList pointer but I'm not sure that counts as simple.

(not for this patch, but) Could you factor the flags out completely? It would replace space for computation, and probably need a new handler class, but it would be cleaner than having some flags in and others out.

I'm curious as to how will we generate these flags.

Is this the responsibility of the front-end, based on command line options, language standards, target-specific?

Or can some middle-end passes change that, too?

How do they propagate, and how does it merge with other fast-math flags?

I could not find a discussion about this on the list, it would be good to solve those before just adding another flag, especially when we already ran out of space. :)

I'm curious as to how will we generate these flags.

Is this the responsibility of the front-end, based on command line options, language standards, target-specific?

Yes.

Or can some middle-end passes change that, too?

Yes, if they know what they're doing. :)

How do they propagate, and how does it merge with other fast-math flags?

I could not find a discussion about this on the list, it would be good to solve those before just adding another flag, especially when we already ran out of space. :)

Okay, I'll start an RFC on llvm-dev to avoid having a long discussion here.