Hacker News Re-Imagined

Vectorization Virtual Workshop

1 day ago

Created a post 29 points @tjalfi

Vectorization Virtual Workshop

@gen_greyface 6 hours

Replying to @tjalfi 🎙


this is a really good resource


@dragontamer 4 hours

Replying to @tjalfi 🎙

Hmm. Its a good resource, but I'm pretty bearish on autovectorization in this manner.

After trying OpenCL / CUDA / ROCm style programming, its clear that writing explicitly parallel code is in fact easier than expected (albeit with a ton of study involved... but I bet anyone can learn it if they put their mind to it).

If CPU-SIMD is really needed for some reason, I expect that the languages that will be most convenient are explicitly-parallel systems like OpenMP or ISPC.

In particular, look at these restrictions: https://cvw.cac.cornell.edu/vector/coding_vectorizable

> The loop must be countable at runtime.

> There should be a single control flow within the loop.

> The loop should not contain function calls.

These three restrictions are extremely restrictive!! CUDA / OpenCL / ROCm allow these constructs. Control flow may have terrible performance in CUDA / OpenCL, but its allowed (because its convenient. If the programmer can't think of any way to solve a problem aside from a few more if-statements / switch-statements, then we should let them even if its inefficient).

That's the thing. We know that SIMD-machines and languages designed for SIMD-machines can have dynamic loop counts and more than one if-statement (albeit with a branch divergence penalty). We also find it extremely convenient to decompose our problems into functions and sub-functions.


OpenMP in contrast, looks like its learning from OpenCL / CUDA. The #omp simd parallel for construct is moving closer-and-closer to CUDA/OpenCL parity, allowing for convenient "if" statements and "dynamic loop counts".

CPU-SIMD is here to stay, and I think learning how to use it is very important. But autovectorization from the compiler (without any programmer assist) looks like a dead end. The compiler gets a LOT of help when the programmer states things in terms of "threadIdx.x", and other explicitly SIMD variable concepts.

Besides, if the programmer is forced to learn all of these obscure rules / obscure programming methods (countable loops / only one control flow within the loop / etc. etc.), you're pretty much learning a sublanguage without any syntax to indicate that you've switched languages. Discoverability is really bad.

If I instead say "#pragma omp parallel simd for" before using some obscure OpenMP features, any C/C++ programmer these days will notice that something is weird and search on those terms before reading the rest of the for-loop.

A for-loop written in "autovectorized" style has no such indicator, no such "discoverability" to teach non-SIMD programmers what the hell is going on.


Kind of a shame, because this resource is excellently written and still worth a read IMO. Even if I think the tech is a bit of a dead-end.


About Us

site design / logo © 2021 Box Piper