This patch adds enough information to the NVPTX TTI so that the SLP
vectorizer will fire.
This gets us to parity with NVCC on the Eigen benchmark suite. (Without these
changes, we're 30+% slower on many benchmarks.)
Differential D20605
[NVPTX] Allow load/store vectorization. jlebar on May 24 2016, 4:09 PM. Authored by
Details
Diff Detail Event Timeline
Comment Actions Add test checking load vectorization, and make vectorization of very small Comment Actions We cannot currently vectorize loads across calls, unless those calls are vectorizable intrinsics. This seems sort of broken to me even outside the context of nvptx, because it means that if you do something like load 4 floats do a bunch of vectorizable math call a non-vectorizable function on each element of your bundle store the 4 results then we won't vectorize this (except maybe the store). It seems to me that we should calculate the cost of vectorizing this if we un-vectorize the calls. I dunno if that would be a useful optimization anywhere other than on nvptx, though. The case above is important because none of our intrinsics are vectorizable, so this is basically any kernel that doesn't do exclusively +-/*. But I think I'm happy to look at that in a separate patch, since this is a substantial win as-is.
Comment Actions (Accidental dup comment removed; I cannot figure out phabricator)
Comment Actions Abandoning this in favor of D19501, which does a *much* better job at doing what we want. |
Please comment on why 1 register.