No Graphics API

(sebastianaaltonen.com)

676 points | by ryandrake 16 hours ago

29 comments

  • vblanco 16 hours ago
    This is a fantastic article that demonstrates how many parts of vulkan and DX12 are no longer needed.

    I hope the IHVs have a look at it because current DX12 seems semi abandoned, with it not supporting buffer pointers even when every gpu made on the last 10 (or more!) years can do pointers just fine, and while Vulkan doesnt do a 2.0 release that cleans things, so it carries a lot of baggage, and specially, tons of drivers that dont implement the extensions that really improve things.

    If this api existed, you could emulate openGL on top of this faster than current opengl to vulkan layers, and something like SDL3 gpu would get a 3x/4x boost too.

    • pjmlp 15 hours ago
      DirectX documentation is on a bad state currently, you have the Frank Lunas's books, which don't cover the latest improvements, and then is hunting through Learn, Github samples and reference docs.

      Vulkan is another mess, even if there was a 2.0, how are devs supposed to actually use it, especially on Android, the biggest consumer Vulkan platform?

    • _bohm 12 hours ago
      I'm surprised he made no mention of the SDL3 GPU API since his proposed API has pretty significant overlap with it.
    • kllrnohj 12 hours ago
      No longer needed is a strong statement given how recent the GPU support is. It's unlikely anything could accept those minimum requirements today.

      But soon? Hopefully

      • jsheard 12 hours ago
        Those requirements more or less line up with the introduction of hardware raytracing, and some major titles are already treating that as a hard requirement, like the recent Doom and Indiana Jones games.
        • kllrnohj 11 hours ago
          Only if you're ignoring mobile entirely. One of the things Vulkan did which would be a shame to lose is it unified desktop and mobile GPU APIs.
          • flohofwoe 1 hour ago
            > One of the things Vulkan did which would be a shame to lose is it unified desktop and mobile GPU APIs.

            In hindsight it really would have been better to have a separate VulkanES which is specialized for mobile GPUs.

          • jsheard 11 hours ago
            Eh, I think the jury is still out on whether unifying desktop and mobile graphics APIs is really worth it. In practice Vulkan written to take full advantage of desktop GPUs is wildly incompatible with most mobile GPUs, so there's fragmentation between them regardless.
            • ablob 6 hours ago
              I feel like it's a win by default. I do like to write my own programs every now and then and recently there's been more and more graphics sprinkled into them. Being able to reuse those components and just render onto a target without changing anything else seems to be very useful here. This kind of seamless interoperability between platforms is very desirable in my book. I can't think of a better approach to achieve this than the graphics API itself.

              Also there is no inherent thing that blocks extensions by default. I feel like a reasonable core that can optionally do more things similar to CPU extensions (i.e. vector extensions) could be the way to go here.

            • kllrnohj 10 hours ago
              It's quite useful for things like skia or piet-gpu/vello or the general category of "things that use the GPU that aren't games" (image/video editors, effects pipelines, compute, etc etc etc)
              • Groxx 9 hours ago
                would it also apply to stuff like the Switch, and relatively high-end "mobile" gaming in general? (I'm not sure what those chips actually look like tho)

                there are also some arm laptops that just run Qualcomm chips, the same as some phones (tablets with a keyboard, basically, but a bit more "PC"-like due to running Windows).

                AFAICT the fusion seems likely to be an accurate prediction.

                • deliciousturkey 1 hour ago
                  Switch has its own API. The GPU also doesn't have limitations you'd associate with "mobile". In terms of architecture, it's a full desktop GPU with desktop-class features.
              • jsheard 10 hours ago
                I suppose that's true, yeah. I was focusing too much on games specifically.
            • 01HNNWZ0MV43FF 5 hours ago
              If the APIs aren't unified, the engines will be, since VR games will want to work on both standalone headsets and streaming headsets
        • tjpnz 11 hours ago
          Doom was able to drop it and is now Steam Deck verified.
          • nicolaslem 2 hours ago
            Little known fact, the Steam Deck has hardware ray tracing, it's just so weak as to be almost non-existent.
    • tadfisher 15 hours ago
      Isn't this all because PCI resizable BAR is not required to run any GPU besides Intel Arc? As in, maybe it's mostly down to Microsoft/Intel mandating reBAR in UEFI so we can start using stuff like bindless textures without thousands of support tickets and negative reviews.

      I think this puts a floor on supported hardware though, like Nvidia 30xx and Radeon 5xxx. And of course motherboard support is a crapshoot until 2020 or so.

      • vblanco 15 hours ago
        This is not really directly about resizable BAR. you could do mostly the same api without it. Resizable bar simplifies it a little bit because you skip manual transfer operations, but its not completely required as you can write things to a cpu-writeable buffer and then begin your frame with a transfer command.

        Bindless textures never needed any kind of resizable BAR, you have been able to use them since early 2010s on opengl through an extension. Buffer pointers also have never needed it.

    • PeterStuer 3 hours ago
      Still have some 1080's in gaming machines going strong. But as even nVidea retired support I guess it is time to move on.
  • opminion 15 hours ago
    The article is missing this motivation paragraph, taken from the blog index:

    > Graphics APIs and shader languages have significantly increased in complexity over the past decade. It’s time to start discussing how to strip down the abstractions to simplify development, improve performance, and prepare for future GPU workloads.

    • stevage 12 hours ago
      Thanks, I had trouble figuring out what the article was about, lost in all the "here's how I used AI and had the article screened by industry insiders".
      • masspro 10 hours ago
        I read that whole (single) paragraph as “I made really, really, really sure I didn’t violate any NDAs by doing these things to confirm everything had a public source”
        • beAbU 2 hours ago
          This is literally the second paragraph in the article. There is no need for interpretation here.

          Unless the link of the article has changed since your comment?

      • yuriks 8 hours ago
        I was lost when it suddenly jumped from a long retrospective on GPUs to abruptly talking about "my allocator API" on the next paragraph with no segue or justification.
      • doctorpangloss 11 hours ago
        haha, instead of making them read an AI-coauthored blog post, which obviously, they didn't do, he could have asked them interesting questions like, "Do better graphics make better games?" or "If you could change anything about the platforms' technology, what would it be?"
    • alberth 15 hours ago
      Would this be analogous to NVMe?

      Meaning ... SSDs initially reused IDE/SATA interfaces, which had inherent bottlenecks because those standards were designed for spinning disks.

      To fully realize SSD performance, a new transport had to be built from the ground up, one that eliminated those legacy assumptions, constraints and complexities.

      • rnewme 15 hours ago
        ...and introduced new ones.
  • starkparker 11 hours ago
    > GPU hardware started to shift towards a generic SIMD design. SIMD units were now executing all the different shader types: vertex, pixel, geometry, hull, domain and compute. Today the framework has 16 different shader entry points. This adds a lot of API surface and makes composition difficult. As a result GLSL and HLSL still don’t have a flourishing library ecosystem ... despite 20 years of existence

    A lot of this post went over my head, but I've struggled enough with GLSL for this to be triggering. Learning gets brutal for the lack of middle ground between reinventing every shader every time and using an engine that abstracts shaders from the render pipeline. A lot of open-source projects that use shaders are either allergic to documenting them or are proud of how obtuse the code is. Shadertoy is about as good as it gets, and that's not a compliment.

    The only way I learned anything about shaders was from someone who already knew them well. They learned what they knew by spending a solid 7-8 years of their teenage/young adult years doing nearly nothing but GPU programming. There's probably something in between that doesn't involve giving up and using node-based tools, but in a couple decades of trying and failing to grasp it I've never found it.

  • bullen 19 minutes ago
    Personally I'm staying with OpenGL (ES) 3 for eternity.

    VAO is the last feature I was missing prior.

    Also the other cores will do useful gameplay work so one CPU core for the GPU is ok.

    4 CPU cores is also enough for eternity. 1GB shared RAM/VRAM too.

    Let's build something good on top of the hardware/OSes/APIs/languages we have now? 3588/linux/OpenGL/C+Java specifically!

    Hardware has peaked, only soft internal protocols can now evolve, I write mine inside TCP/HTTP.

  • pjmlp 15 hours ago
    I have followed Sebastian Aaltonen's work for quite a while now, so maybe I am a bit biased, this is however a great article.

    I also think that the way forward is to go back to software rendering, however this time around those algorithms and data structures are actually hardware accelerated as he points out.

    Note that this is an ongoing trend on VFX industry already, about 5 years ago OTOY ported their OctaneRender into CUDA as the main rendering API.

    • gmueckl 13 hours ago
      There are tons of places within the GPU where dedicated fixed function hardware provides massive speedups within the relevant pipelines (rasterization, raytracing). The different shader types are designed to fit inbetween those stages. Abandoning this hardware would lead to a massive performance regression.
      • formerly_proven 12 hours ago
        Just consider the sheer number of computations offloaded to TMUs. Shaders would already do nothing but interpolate texels if you removed them.
      • efilife 13 hours ago
        Offtop, but sorry, I can't resist. "Inbetween" is not a word. I started seeing many people having trouble with prepositions lately, for some unknown reason.

        > “Inbetween” is never written as one word. If you have seen it written in this way before, it is a simple typo or misspelling. You should not use it in this way because it is not grammatically correct as the noun phrase or the adjective form. https://grammarhow.com/in-between-in-between-or-inbetween/

        • Antibabelic 3 hours ago
          "Offtop" is not a word. It's not in any English dictionary I could find and doesn't appear in any published literature.

          Matthew 7:3 "And why beholdest thou the mote that is in thy brother's eye, but considerest not the beam that is in thine own eye?"

          • Joker_vD 1 hour ago
            Oh, it's a transliteration of Russian "офтоп", which itself started as a borrowing of "off-topic" from English (but as a noun instead of an adjective/stative) and then went some natural linguistic developments, namely loss of a hyphen and degemination, surface analysis of the trailing "-ic" as Russian suffix "-ик" [0], and its subsequent removal to obtain the supposed "original, non-derived" form.

            [0] https://en.wiktionary.org/wiki/-%D0%B8%D0%BA#Russian

        • cracki 1 hour ago
          Your entire post does not once mention the form you call correct.

          If you intend for people to click the link, then you might just as well delete all the prose before it.

        • mikestorrent 5 hours ago
          Surely you mean "I've started seeing..." rather than "I started seeing..."?
        • dist-epoch 13 hours ago
          If enough people use it, it will become correct. This is how language evolves. BTW, there is no "official English language specification".

          And linguists think it would be a bad idea to have one:

          https://archive.nytimes.com/opinionator.blogs.nytimes.com/20...

    • mrec 14 hours ago
      Isn't this already happening to some degree? E.g. UE's Nanite uses a software rasterizer for small triangles, albeit running on the GPU via a compute shader.
      • jsheard 14 hours ago
        Things are kind of heading in two opposite directions at the moment. Early GPU rasterization was all done in fixed-function hardware, but then we got programmable shading, and then we started using compute shaders to feed the HW rasterizer, and then we started replacing the HW rasterizer itself with more compute (as in Nanite). The flexibility of doing whatever you want in software has gradually displaced the inflexible hardware units.

        Meanwhile GPU raytracing was a purely software affair until quite recently when fixed-function raytracing hardware arrived. It's fast but also opaque and inflexible, only exposed through high-level driver interfaces which hide most of the details, so you have to let Jensen take the wheel. There's nothing stopping someone from going back to software RT of course but the performance of hardware RT is hard to pass up for now, so that's mostly the way things are going even if it does have annoying limitations.

      • djmips 13 hours ago
        Why do you say 'albeit'? I think it's established that 'software rendering' can mean running on the GPU. That's what Octane is doing with CUDA in the comment you are replying to. But good callout on Nanite.
        • mrec 9 hours ago
          No good reason, I'm just very very old.
    • Q6T46nT668w6i3m 10 hours ago
      But they still rely on fixed functions for a handful of essential ops (e.g., intersection).
  • aarroyoc 15 hours ago
    Impressive post, so many details. I could only understand some parts of it, but I think this article will probably be a reference for future graphics API.

    I think it's fair to say that for most gamers, Vulkan/DX12 hasn't really been a net positive, the PSO problem affected many popular games and while Vulkan has been trying to improve, WebGPU is tricky as it has is roots on the first versions of Vulkan.

    Perhaps it was a bad idea to go all in to a low level API that exposes many details when the hardware underneath is evolving so fast. Maybe CUDA, as the post says in some places, with its more generic computing support is the right way after all.

    • erwincoumans 4 hours ago
      Yes, an amazing and detailed post, enjoyed all of it. In AI, it is common to use jit compilers (pytorch, jax, warp, triton, taichi, ...) that compile to cuda (or rocm, cpu, tpu, ...). You could write renderers like that, rasterizers or raytracers.

      For example: https://github.com/StafaH/mujoco_warp/blob/render_context/mu...

      (A new simple raytracer that compiles to cuda, used for robotics reinforcement learning, renders at up to 1 million fps at low resolution, 64x64, with textures, shadows)

  • delifue 9 hours ago
    This reminds of me Makimoto’s Wave:

    https://semiengineering.com/knowledge_centers/standards-laws...

    There is a constant cycle between domain-specific hardware-hardcoded-algorithm design, and programmable flexible design.

  • SunlitCat 9 hours ago
    This article already feels like it’s on the right track. DirectX 11 was perfectly fine, and DirectX 12 is great if you really want total control over the hardware but I even remember some IHV saying that this level of control isn’t always a good thing.

    When you look at the DirectX 12 documentation and best-practice guides, you’re constantly warned that certain techniques may perform well on one GPU but poorly on another, and vice versa. That alone shows how fragile this approach can be.

    Which makes sense: GPU hardware keeps evolving and has become incredibly complex. Maybe graphics APIs should actually move further up the abstraction ladder again, to a point where you mainly upload models, textures, and a high-level description of what the scene and objects are supposed to do and how they relate to each other. The hardware (and its driver) could then decide what’s optimal and how to turn that into pixels on the screen.

    Yes, game engines and (to some extent) RHIs already do this, but having such an approach as a standardized, optional graphics API would be interesting. It would allow GPU vendors to adapt their drivers closely to their hardware, because they arguably know best what their hardware can do and how to do it efficiently.

    • canyp 7 hours ago
      > but I even remember some IHV saying that this level of control isn’t always a good thing.

      Because that control is only as good as you can master it, and not all game developers do well on that front. Just check out enhanced barriers in DX12 and all of the rules around them as an example. You almost need to train as a lawyer to digest that clusterfuck.

      > The hardware (and its driver) could then decide what’s optimal and how to turn that into pixels on the screen.

      We should go in the other direction: have a goddamn ISA you can target across architectures, like an x86 for GPUs (though ideally not that encumbered by licenses), and let people write code against it. Get rid of all the proprietary driver stack while you're at it.

  • alaingalvan 9 hours ago
    If you enjoyed history of GPUs section, there's a great book that goes into more detail by Jon Peddie titled "The History of the GPU - Steps to Invention", definitely worth a read.
  • wg0 12 hours ago
    Very well written but I can't understand much of this article.

    What would be one good primer to be able to comprehend all the design issues raised?

    • adrian17 12 hours ago
      IMO the minimum is to be able to read a “hello world / first triangle” example for any of the modern graphics APIs (OpenGL/WebGL doesn’t count, WebGPU does), and have a general understanding of each step performed (resource creation, pipeline setup, passing data to shaders, draws, synchronization). Also to understand where the pipeline explosion issue comes from.

      Bonus points if you then look at CUDA “hello world” and consider that it can do nontrivial work on the same hardware (sans fixed function accelerators) with much less boilerplate (and driver overhead).

    • arduinomancer 11 hours ago
      To be honest there isn't really one, a lot of these concepts are advanced even for graphics programmers
    • cmovq 10 hours ago
  • reactordev 15 hours ago
    I miss Mantle. It had its quirks but you felt as if you were literally programming hardware using a pretty straight forward API. The most fun I’ve had programming was for the Xbox 360.
    • djmips 13 hours ago
      You know what else is good like that? The Switch graphics API - designed by Nvidia and Nintendo. Easily the most straightforward of the console graphics APIs
      • reactordev 12 hours ago
        Yes but it’s so underpowered. I want RTX 5090 performance with 16 cores.
  • Bengalilol 14 hours ago
    After reading this article, I feel like I've witnessed a historic moment.
    • bogwog 10 hours ago
      Most of it went over my head, but there's so much knowledge and expertise on display here that it makes me proud that this person I've never met is out there proving that software development isn't entirely full of clowns.
      • ehaliewicz2 7 hours ago
        Seb is incredibly passionate about games and graphics programming. You can find old posts of his on various forums, talking about tricks for programming the PS2, PS3, Xbox 360, etc etc. He regularly posts demos he's working on, progress clips of various engines, etc, on twitter, after staying in the same area for 3 decades.

        I wish I still had this level of motivation :)

  • modeless 12 hours ago
    I don't understand this part:

    > Meshlet has no clear 1:1 lane to vertex mapping, there’s no straightforward way to run a partial mesh shader wave for selected triangles. This is the main reason mobile GPU vendors haven’t been keen to adapt the desktop centric mesh shader API designed by Nvidia and AMD. Vertex shaders are still important for mobile.

    I get that there's no mapping from vertex/triangle to tile until after the mesh shader runs. But even with vertex shaders there's also no mapping from vertex/triangle to tile until after the vertex shader runs. The binning of triangles to tiles has to happen after the vertex/mesh shader stage. So I don't understand why mesh shaders would be worse for mobile TBDR.

    I guess this is suggesting that TBDR implementations split the vertex shader into two parts, one that runs before binning and only calculates positions, and one that runs after and computes everything else. I guess this could be done but it sounds crazy to me, probably duplicating most of the work. And if that's the case why isn't there an extension allowing applications to explicitly separate position and attribute calculations for better efficiency? (Maybe there is?)

    Edit: I found docs on Intel's site about this. I think I understand now. https://www.intel.com/content/www/us/en/developer/articles/g...

    Yes, you have to execute the vertex shader twice, which is extra work. But if your main constraint is memory bandwidth, not FLOPS, then I guess it can be better to throw away the entire output of the vertex shader except the position, rather than save all the output in memory and read it back later during rasterization. At rasterization time when the vertex shader is executed again, you only shade the triangles that actually went into your tile, and the vertex shader outputs stay in local cache and never hit main memory. And this doesn't work with mesh shaders because you can't pick a subset of the mesh's triangles to shade.

    It does seem like there ought to be an extension to add separate position-only and attribute-only vertex shaders. But it wouldn't help the mesh shader situation.

    • yuriks 8 hours ago
      I thought that the implication was that the shader compiler produces a second shader from the same source that went through a dead code elimination pass which maintains only the code necessary to calculate the position, ignoring other attributes.
      • modeless 7 hours ago
        Sure, but that only goes so far, especially when users aren't writing their shaders with knowledge that this transform is going to be applied or any tools to verify that it's able to eliminate anything.
        • hrydgard 15 minutes ago
          Well, it is what is done on several tiler architectures, and it generally works just fine. Normally your computations of the position aren't really intertwined with the computation of the other outputs, so dead code elimination does a good job.
        • kasool 4 hours ago
          Why would it be difficult? There are explicit shader semantics to specify output position.

          In fact, Qualcomm's documentation spells this out: https://docs.qualcomm.com/nav/home/overview.html?product=160...

  • blakepelton 15 hours ago
    Great post, it brings back a lot of memories. Two additional factors that designers of these APIs consider are:

    * GPU virtualization (e.g., the D3D residency APIs), to allow many applications to share GPU resources (e.g., HBM).

    * Undefined behavior: how easy is it for applications to accidentally or intentionally take a dependency on undefined behavior? This can make it harder to translate this new API to an even newer API in the future.

  • klaussilveira 13 hours ago
    NVIDIA's NVRHI has been my favorite abstraction layer over the complexity that modern APIs bring.

    In particular, this fork: https://github.com/RobertBeckebans/nvrhi which adds some niceties and quality of life improvements.

  • qingcharles 10 hours ago
    I started my career writing software 3D renderers before switching to Direct3D in the later 90s. What I wonder is if all of this is going to just get completely washed away and made totally redundant by the incoming flood of hallucinated game rendering?

    Will it be possible to hallucinate the frame of a game at a similar speed to rendering it with a mesh and textures?

    We're already seeing the hybrid version of this where you render a lower res mesh and hallucinate the upscaled, more detailed, more realistic looking skin over the top.

    I wouldn't want to be in the game engine business right now :/

    • jsheard 10 hours ago
      You can't really do a whole lot of inference in 16ms on consumer hardware. Not to say that inference isn't useful in realtime graphics, DLSS has proven itself well enough, but that's a very small model laser-targetted at one specific problem and even that takes a few milliseconds to do its thing. Fitting behemoth generative models into those time constraints seems like an uphill battle.
    • 8n4vidtmkvmk 10 hours ago
      I just assumed hallucinated rendering was a stepping stone to training AGIs or something. No one is actually seriously trying to build games that way, are they? Seems horribly inefficient at best, and incoherent at worst.
  • vegabook 13 hours ago
    ironically, explaining that "we need a simpler API" takes a dense 69-page technical missive that would make the Kronos Vulkan tutorial blush.
    • Pannoniae 11 hours ago
      It's actually not that low-level! It doesn't really get into hardware specifics that much (other than showing what's possible across different HW) or stuff like what's optimal where.

      And it's quite a bit simpler than what we have in the "modern" GPU APIs atm.

    • mkoubaa 12 hours ago
      I don't understand why you think this is ironic
  • overgard 10 hours ago
    I'm kind of curious about something.. most of my graphics experience has been OpenGL or WebGL (tiny bit of Vulkan) or big engines like Unreal or Unity. I've noticed over the years the uptake of DX12 always seemed marginal though (a lot of things stayed on D3D11 for a really long time). Is Direct3D 12 super awful to work with or something? I know it requires more resource management than 11, but so does Vulkan which doesn't seem to have the same issue..
    • canyp 7 hours ago
      Most AAA titles are on DX12 now. ID is on Vulkan. E-sports titles remain largely on the DX11 camp.

      What the modern APIs give you is less CPU driver overhead and new functionality like ray tracing. If you're not CPU-bound to begin with and don't need those new features, then there's not much of a reason to switch. The modern APIs require way more management than the prior ones; memory management, CPU-GPU synchronization, avoiding resource hazards, etc.

      Also, many of those AAA games are also moving to UE5, which is basically DX12 under the hood (presumably it should have a Vulkan backend too, but I don't see it used much?)

      • kasool 4 hours ago
        UE5 has a fairly mature Vulkan backend but as you might guess is second class to DX12.
    • flohofwoe 1 hour ago
      > but so does Vulkan which doesn't seem to have the same issue

      Vulkan has the same issues (and more) as D3D12, you just don't hear much about it because there are hardly any games built directly on top of Vulkan. Vulkan is mainly useful as Proton backend on Linux.

  • ksec 15 hours ago
    I wonder why M$ stopped putting out new Direct X? Direct X Ultimate or 12.1 or 12.2 is largely the same as Direct X 12.

    Or has the use of Middleware like Unreal Engine largely made them irrelevant? Or should EPIC put out a new Graphics API proposal?

    • pjmlp 15 hours ago
      That has always been the case, it is mostly FOSS circles that argue about APIs.

      Game developers create a RHI (rendering hardware interface) like discussed on the article, and go on with game development.

      Because the greatest innovation thus far has been ray tracing and mesh shaders, and still they are largely ignored, so why keep on pushing forward?

      • djmips 13 hours ago
        I disagree that ray tracing and mesh shaders are largely ignored - at least within AAA game engines they are leaned on quite a lot. Particularly ray tracing.
        • pjmlp 4 hours ago
          Game engines aren't games, or sales.
    • reactordev 15 hours ago
      Both-ish.

      Yes, the centralization of engines to Unreal, Unity, etc makes it so there’s less interest in pushing the boundaries, they are still pushed just on the GPU side.

      From a CPU API perspective, it’s very close to just plain old buffer mapping and go. We would need a hardware shift that would add something more to the pipeline than what we currently do. Like when tesselation shaders came about from geometry shader practices.

    • djmips 13 hours ago
      The frontier of graphics APIs might be the consoles and they don't get a bump until the hardware gets a bump and the console hardware is a little bit behind.
  • jdashg 12 hours ago
    And the GPU API cycle of life and death continues!

    I was an only-half-joking champion of ditching vertex attrib bindings when we were drafting WebGPU and WGSL, because it's a really nice simplification, but it was felt that would be too much of a departure from existing APIs. (Spending too many of our "Innovation Tokens" on something that would cause dev friction in the beginning)

    In WGSL we tried (for a while?) to build language features as "sugar" when we could. You don't have to guess what order or scope a `for` loop uses when we just spec how it desugars into a simpler, more explicit (but more verbose) core form/dialect of the language.

    That said, this powerpoint-driven-development flex knocks this back a whole seriousness and earnestness tier and a half: > My prototype API fits in one screen: 150 lines of code. The blog post is titled “No Graphics API”. That’s obviously an impossible goal today, but we got close enough. WebGPU has a smaller feature set and features a ~2700 line API (Emscripten C header).

    Try to zoom out on the API and fit those *160* lines on one screen! My browser gives up at 30%, and I am still only seeing 127. This is just dishonesty, and we do not need more of this kind of puffery in the world.

    And yeah, it's shorter because it is a toy PoC, even if one I enjoyed seeing someone else's take on it. Among other things, the author pretty dishonestly elides the number of lines the enums would take up. (A texture/data format enum on one line? That's one whole additional Pinocchio right there!)

    I took WebGPU.webidl and did a quick pass through removing some of the biggest misses of this API (queries, timers, device loss, errors in general, shader introspection, feature detection) and some of the irrelevant parts (anything touching canvas, external textures), and immediately got it down to 241 declarations.

    This kind of dishonest puffery holds back an otherwise interesting article.

    • m-schuetz 6 hours ago
      Man, how I wish WebGPU didn't go all-in on legacy Vulkan API model, and instead find a leaner approach to do the same thing. Even Vulkan stopped doing pointless boilerplate like bindings and pipelines. Ditching vertex attrib bindings and going for programmable vertex fetching would have been nice.

      WebGPU could have also introduced Cuda's simple launch model for graphics APIs. Instead of all that insane binding boilerplate, just provide the bindings as launch args to the draw call like draw(numTriangles, args), with args being something like draw(numTriangles, {uniformBuffer, positions, uvs, samplers}), depending on whatever the shaders expect.

      • p_l 4 hours ago
        My understanding is that pipelines in Vulkan still matter if you target certain GPUs though.
        • m-schuetz 3 hours ago
          At some point, we need to let legacy hardware go. Also, WebGL did just fine without pipelines, despite being mapped to Vulkan and DirectX code under the hood. Meaning WebGPU could have also worked without pipelines just fine as well. The backends can then map to whatever they want, using modern code paths for modern GPUs.
          • p_l 3 hours ago
            Quoting things I only heard about, because I don't do enough development in this area, but I recall reading that it impacted performance on pretty much every mobile chip (discounting Apple's because there you go through a completely different API and they got to design the hw together with API).

            Among other things, that covers everything running on non-apple, non-nvidia ARM devices, including freshly bought.

          • flohofwoe 1 hour ago
            > Also, WebGL did just fine without pipelines, despite being mapped to Vulkan and DirectX code under the hood.

            ...at the cost of creating PSOs at random times which is an expensive operation :/

            • m-schuetz 1 hour ago
              No longer an issue with dynamic rendering and shader objects. And never was an issue with OpenGL. Static pipelines are an artificial problem that Vulkan imposed for no good reason, and which they reverted in recent years.
              • flohofwoe 37 minutes ago
                Going entirely back to the granular GL-style state soup would have significant 'usability problems'. It's too easy to accidentially leak incorrect state from a previous draw call.

                IMHO a small number of immutable state objects is the best middle ground (similar to D3D11 or Metal, but reshuffled like described in Seb's post).

                • m-schuetz 27 minutes ago
                  Not using static pipelines does not imply having to use a global state machine like OpenGL. You could also make an API that uses a struct for rasterizer configs and pass it as an argument to a multi draw call. I would have actually preferred that over all the individual setters in Vulkan's dynamic rendering approach.
    • xyzsparetimexyz 12 hours ago
      Who cares about dev friction in the beginning? That was a bad choice.
  • awolven 9 hours ago
    Is this going to materialize into a "thing"?
  • greggman65 14 hours ago
    This seems tangentially related?

    https://github.com/google/toucan

  • xyzsparetimexyz 12 hours ago
    This needs an index and introduction. It's also not super interesting to people in industry? Like yeah, it'd be nice if bindless textures were part of the API so you didn't need to create that global descriptor set. It'd be nice if you just sample from pointers to textures similar to how dereferencing buffer pointers works.
  • henning 16 hours ago
    This looks very similar to the SDL3 GPU API and other RHI libraries that have been created at first glance.
    • cyber_kinetist 7 hours ago
      If you look at the details you can clearly see SDL3_GPU is wildly different from this proposal, such as:

      - It's not exposing raw GPU addresses, SDL3_GPU has buffer objects instead. Also you're much more limited with how you use buffers in SDL3 (ex. no coherent buffers, you're forced to use a transfer buffer if you want to do a CPU -> GPU upload)

      - in SDL3_GPU synchronization is done automatically, without the user specifying barriers (helped by a technique called cycling: https://moonside.games/posts/sdl-gpu-concepts-cycling/),

      - More modern features such as mesh shading are not exposed in SDL3_GPU, and keeps the traditional rendering pipeline as the main way to draw stuff. Also, bindless is a first class citizen in Aaltonen's proposal (and the main reason for the simplification of the API), while SDL3_GPU doesn't support it at all and instead opts for a traditional descriptor binding system.

      • Scaevolus 6 hours ago
        SDL3 is kind of the intersection of features found in DX12/Vulkan 1.0/Metal: if it's not easily supported in all of them, it's not in SDL3-- hence the lack of bindless support. That means you can run on nearly every device in the last 10-15 years.

        This "no api" proposal requires hardware from the last 5-10 years :)

        • cyber_kinetist 5 hours ago
          Yup you've actually pointed out the most important difference: SDL3 is designed to be compatible with the APIs and devices of the past (2010s), whereas this proposal is designed to be compatible with the newer 2020s batch of consumer devices.
  • thescriptkiddie 15 hours ago
    the article talks a lot about PSOs but never defines the term
    • flohofwoe 15 hours ago
      "Pipeline State Objects" (immutable state objects which define most of the rendering state needed for a draw/dispatch call). Tbf, it's a very common term in rendering since around 2015 when the modern 3D APIs showed up.
    • CrossVR 15 hours ago
      PSOs are Pipeline State Objects, they encapsulate the entire state of the rendering pipeline.
  • MaximilianEmel 15 hours ago
    I wonder if Valve might put out their own graphics API for SteamOS.
    • m-schuetz 15 hours ago
      Valve seems to be substantially responsible for the mess that is Vulkan. They were one of its pioneers from what I heard when chatting with Vulkan people.
      • jsheard 15 hours ago
        There's plenty of blame to go around, but if any one faction is responsible for the Vulkan mess it's the mobile GPU vendors and Khronos' willingness to compromise for their sake at every turn. Huge amounts of API surface was dedicated to accommodating limitations that only existed on mobile architectures, and earlier versions of Vulkan insisted on doing things the mobile way even if you knew your software was only ever going to run on desktop.

        Thankfully later versions have added escape hatches which bypass much of that unnecessary bureaucracy, but it was grim for a while, and all that early API cruft is still there to confuse newcomers.

      • pjmlp 15 hours ago
        Samsung and Google also have their share, see who does most of Vulkanised talks.
  • yieldcrv 15 hours ago
    what level of performance improvements would this represent?
    • vblanco 15 hours ago
      There is no implementation of it but this is how i see it, at least comparing with how things with fully extensioned vulkan work, which uses a few similar mechanics.

      Per-drawcall cost goes to nanosecond scale. Assuming you do drawcalls of course, this makes bindless and indirect rendering a bit easier so you could drop CPU cost to near-0 in a renderer.

      It would also highly mitigate shader compiler hitches due to having a split pipeline instead of a monolythic one.

      The simplification on barriers could improve performance a significant amount because currently, most engines that deal with Vulkan and DX12 need to keep track of individual texture layouts and transitions, and this completely removes such a thing.

    • Pannoniae 11 hours ago
      Most of it has been said by the other replies and they're really good, adding a few things onto it:

      - Would lead to reduced memory usage on the driver side due to eliminating all the statetracking for "legacy" APIs and all the PSO/shader duplication for the "modern" APIs (who doesn't like using less memory? won't show up on a microbenchmark but a reduced working set leads to globally increased performance in most cases, due to >cache hit%)

      - A much reduced cost per API operation. I don't just mean drawcalls but everything else too. And allowing more asynchrony without the "here's 5 types of fences and barriers" kind of mess. As the article says, you can either choose between mostly implicit sync (OpenGL, DX11) and tracking all your resources yourself (Vulkan) then feeding all that data into the API which mostly ignores it. This one wouldn't really have an impact on speeding up existing applications but more like unlock new possibilities. For example massively improving scene variety with cheap drawcalls and doing more procedural objects/materials instead of the standard PBR pipeline. Yes, drawindirect and friends exist but they aren't exactly straightforward to use and require you to structure your problem in a specific way.

    • flohofwoe 15 hours ago
      It's mostly not about performance, but about getting rid of legacy cruft that still exists in modern 3D APIs to support older GPU architectures.
      • wbobeirne 13 hours ago
        Getting rid of cruft isn't really a goal in and of itself, it's a goal in service of other goals. If it's not about performance, what else would be accomplished?
        • flohofwoe 13 hours ago
          A simplified API means higher programmer productivity, higher robustness, simplified debugging and testing, and also less internal complexity in the driver. All this together may also result in slightly higher performance, but it's not the main goal. You might gain a couple hundred microseconds per frame as a side effect of the simpler code, but if your use case already perfectly fits the 'modern subset' of Vulkan or D3D12, the performance gains will be deep in 'diminishing returns area' and hardly noticeable in the frame rate. It's mostly about secondary effects by making the programmer's life easier on both sides of the API.

          The cost/compromise is dropping support for outdated GPUs.

    • modeless 14 hours ago
      It would likely reduce or eliminate the "compiling shaders" step many games now have on first run after an update, and the stutters many games have as new objects or effects come on screen for the first time.
    • m-schuetz 15 hours ago
      Probably mostly about quality of life. Legacy graphics APIs like Vulkan have abysmal developer UX for no good reason.
    • Ono-Sendai 11 hours ago
      Relative to what? Relative to modern OpenGL with good driver support, not much probably. The big win is due to the simplified API, which is helpful for application developers and also driver writers.
  • ginko 14 hours ago
    I mean sure, this should be nice and easy.

    But then game/engine devs want to use the vertex shader producing a uv coordinate and a normal together with a pixel shader that only reads the uv coordinate (or neither for shadow mapping) and don't want to pay for the bandwidth of the unused vertex outputs (or the cost of calculating them).

    Or they want to be able to randomly enable any other pipeline stage like tessellation or geometry and the same shader should just work without any performance overhead.

    • Pannoniae 6 hours ago
      A preprocessor step mostly solves this one. No one said that the shader source has to go into the GPU API 1:1.

      Basically do what most engines do - have preprocessor constants and use different paths based on what attributes you need.

      I also don't see how separated pipeline stages are against this - you already have this functionality in existing APIs where you can swap different stages individually. Some changes might need a fixup from the driver side, but nothing which can't be added in this proposed API's `gpuSetPipeline` implementation...

  • imdsm 38 minutes ago
    LLMs will eat this up