Interesting. It all seems very brittle, though. And that something has gone very wrong with our ecosystem of tools, languages, and processes when it becomes advisable to massage source until specific passes in a specific version LLVM don't mess things up for other passes.
Not picking on the OA in the slightest; just thinking in terms of holistic system design. If you know what you want to happen, and you are smart enough to introspect the behavior of the tool and decide that it didnt happen, you are more than smart enough to just write it correctly in the first place.
Perhaps that is unrealistic, perhaps there is a hidden iceberg of necessary but convolutive optimizations no human could realistically or legibly write. But ok, where do you really need to engage in this kind of optimization golf? Inlined functions?
Ok, what about this targeted language feature for a future-day Zig:
1. Write an ordinary zig function
2. Write inline assembly version of that function
3. Write a "comptime assert" that first compiles to second, which only "runs" for the relevant arch.
4. What should that assert mean? That the compiler just uses your assembly version instead, but _also_ uses existing compiler machinery or an external theorem prover to verify they "behave the same up to X", for customizable values of X
That has the right feel, maybe. You are "pinning" specific, vetted, optimizations without compromising the intent, readability, or correctness of your code. And easy iteration is possible, because a failing comptime assert will just dump the assembly; you can even start with an empty manual impl.
Having a way to assert that the compiler does what you expect can be helpful in large projects with many contributors of different skill level. Having something fail when a random change breaks autovectorization can save a lot of time profiling. When a compiler upgrade changes codegen I would also prefer an assertion telling me about it so that I can run relevant benchmarks to see whether it’s an improvement or not. Relying on whole-system benchmarks is difficult due to noise.
> our optimisation relies on having r < count, but if count is zero this condition cannot be met. [Since taking the remainder of a division by zero is undefined behaviour, the compiler would be well within its rights to assume that count != 0 and proceed with the transformation.]
Interestingly, even if you define division by zero to produce the remainder equal to the dividend, this optimization (replacing "(r + 1) % count" with "r + 1 == count ? 0 : r + 1") would still be legal.
BTW. On ARM and RISC-V, integer remainder from division by zero does indeed result in the dividend (and no trap), unless the compiler is doing something unusual.
Not picking on the OA in the slightest; just thinking in terms of holistic system design. If you know what you want to happen, and you are smart enough to introspect the behavior of the tool and decide that it didnt happen, you are more than smart enough to just write it correctly in the first place.
Perhaps that is unrealistic, perhaps there is a hidden iceberg of necessary but convolutive optimizations no human could realistically or legibly write. But ok, where do you really need to engage in this kind of optimization golf? Inlined functions?
Ok, what about this targeted language feature for a future-day Zig:
1. Write an ordinary zig function 2. Write inline assembly version of that function 3. Write a "comptime assert" that first compiles to second, which only "runs" for the relevant arch. 4. What should that assert mean? That the compiler just uses your assembly version instead, but _also_ uses existing compiler machinery or an external theorem prover to verify they "behave the same up to X", for customizable values of X
That has the right feel, maybe. You are "pinning" specific, vetted, optimizations without compromising the intent, readability, or correctness of your code. And easy iteration is possible, because a failing comptime assert will just dump the assembly; you can even start with an empty manual impl.
Interestingly, even if you define division by zero to produce the remainder equal to the dividend, this optimization (replacing "(r + 1) % count" with "r + 1 == count ? 0 : r + 1") would still be legal.
It'is legal!
But looks like a few more steps when compiled.