Page MenuHomePhabricator

[CodeGen][X86] Don't scalarize vector saturating add/sub

Authored by nikic on Jan 28 2019, 11:48 AM.



This matches what the cost model already does.

This relies on vector SADDO/SSUBO though. I think this works fine because those are expanded on all archs (for vectors), but is possibly not entirely reliable.

Diff Detail


Event Timeline

nikic created this revision.Jan 28 2019, 11:48 AM

This might be a little premature as we haven't got PR40442 yet.

5325 ↗(On Diff #183928)

You can keep this but just change to isInteger()?

nikic marked an inline comment as done.Feb 7 2019, 1:30 PM

The relevant part of PR40442 has been resolved now.

5325 ↗(On Diff #183928)

I removed those because I thought these asserts are now redundant with the specification in, or is that not enforced?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2019, 1:30 PM
nikic updated this revision to Diff 186155.Feb 10 2019, 9:25 AM

Restore assertions (and move to towards top). Regenerate tests (changes in register allocation).

This revision is now accepted and ready to land.Feb 10 2019, 10:49 AM
This revision was automatically updated to reflect the committed changes.

Hi Nikita,

This commit seems to cause a crash running llc -O0 on the following IR:

target triple = "thumbv7k-apple-darwin"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     define hidden void @foo(<2 x i64> *%ptr) {                                                                                                                                                                                                                                                                                    entry:                                                                                                                                                                                                                                                                                                                          %0 = load <2 x i64>, <2 x i64>* %ptr, align 8                                                                                                                                                                                                                                                                                 %1 = call <2 x i64> @llvm.usub.sat.v2i64(<2 x i64> zeroinitializer, <2 x i64> %0)                                                                                                                                                                                                                                             %2 = bitcast i64* undef to <2 x i64>*                                                                                                                                                                                                                                                                                         store <2 x i64> %1, <2 x i64>* %2, align 8                                                                                                                                                                                                                                                                                    ret void                                                                                                                                                                                                                                                                                                                    }                                                                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                                                              ; Function Attrs: nounwind readnone speculatable                                                                                                                                                                                                                                                                              declare <2 x i64> @llvm.usub.sat.v2i64(<2 x i64>, <2 x i64>)

Could you take a look?

nikic added a comment.May 8 2019, 10:56 AM

@aemerson Thanks for the report! From a quick look, looks like the usubsat expands to usubo, which is then expanded during op legalization, while it needs to be expanded during vector op legalization. We're doing that for umulo and smulo, but apparently not for uaddo/saddo and usubo/ssubo. I'll prepare a patch.