You can sign up to get a daily email of our articles, see the Mailing List page.
We do often include affiliate links to earn us some pennies. See more here.

OpenGL Multi-threading, what it is and what it means

By - | Views: 39,565
Disclaimer: this information is all easily available around, I’m trying to condense it a bit, and focus on a particular topic for discussion. I may gloss over, or simplify information, but the centrals ideas should apply.

There’s been a fair bit of talk recently about attempting to multi-thread using OpenGL so I thought I’d write a bit more about what “multi-threading” an OpenGL game is, what’s normally done, and how it compares to multi-threading in Vulkan.

Some History

For those not aware, OpenGL is, in computing terms, old. It was designed before multiple CPU cores were even available to the general consumer, and long before just about every part of a graphics pipeline was programmable.
The central concept of OpenGL is a state machine. This has done it very well for a long time, but a single OpenGL context (state machine) is based on sequential inputs from the application - the application calls the API, and the OpenGL implementation reacts. State is changed, rendering commands issued, resources loaded, and so on.


State of OpenGL

Being state based, and because any of the API calls has the potential to change state, this makes multi-threaded access to an OpenGL context very difficult - indeed. Let's say one thread is handling "x" and only meant for it, but another thread accesses it and changes "x" to "x+1", the original thread then tries to do something else and it doesn't know it has changed. It’s not permitted in most cases and can result in undefined behaviour like that. There is some exception to this: contexts are allowed to share certain data such as texture information and vertex buffers, but more on that later.
Furthermore about a state based design, is that OpenGL implementations must ensure that the state is always valid. It must ensure that data is correctly bound, in range, and that nothing will break the system. Up to this point, everything is CPU-side still. If everything is okay, the implementation may then generate commands that can be sent to the GPU itself.

Drivers can do some fancy things behind the scenes of course, but the end result, as presented to the application, is the same. Accept an API command, modify and verify state, send hardware commands to the GPU.

Recent versions of OpenGL have been a big help in cutting out a lot of the overhead. Checking state validity can be decreased, greatly reducing the time from API call to GPU command, but it’s still very much a case of having to do that in a single thread.


Threading

So how can developers “multi-thread” OpenGL? It is possible to have multiple contexts in multiple threads, and use them to load texture data, update vertex buffers, possibly compile new shaders, in different threads. The tricky part about this is that sharing this information between OpenGL contexts is dependent on the drivers behaving themselves, in addition to the application not trying (by accident or intent) to do anything strange, so it’s often unstable. It can be quite the adventure getting a game running with this approach, and the runtime improvements are often simply not even worth the effort - it can often run worse if drivers need to synchronise data between contexts often enough. For the curious, things like editors with multiple rendering windows do this, but that’s a different scenario - each window isn’t trying to interfere with every other window while rendering, so multi-threading doesn’t normally come into play.

This leads to the second approach to multi-threading OpenGL: developers don’t! If OpenGL works best by submitting commands sequentially on the thread where a context is active, then that’s simply the best thing to do. Nothing stops a game developer making their own queue of OpenGL API calls they want to perform though, and creating that can be done by multi-threading. To give an example, if a game has a big list of objects, there’s going to be a bit of processing to do when deciding whether to draw each object or not; for each object, the game decides if the object might be visible first, and only try to render if it will actually be seen. The check for each object takes time, but processing each object is independent. So the list can be split into multiple sub-lists, and each sub-list given to a separate thread to run a visibility check on. Each thread will have it’s own rendering list to which objects that should be rendered are added. When done, each rendering list can be iterated over in turn and objects submitted to OpenGL in a single thread. This is a very simple example, but there’s normally quite a fair amount of similar logic in deciding what to render. So it’s not multi-threading OpenGL, but rather multi-threading in deciding how to use OpenGL.


Vulkan

Before I mentioned that OpenGL verifies state information and then generates commands to the GPU.

Firstly, once a developer has finished making everything work, then all that verification is still done, but isn’t actually required. It’s useful during development, but later it’s (hopefully!) a waste of time. So even on a dedicated thread submitting commands to OpenGL, there’s quite the overhead for the final task of actually sending commands to the GPU itself. It would be nice if there was some way to pre-build a list of commands to send to the GPU that were known to be valid.
Secondly, as with the example above of a game splitting object visibility checks into multiple sub-lists, it would also be nice if multiple GPU command lists could be created on separate threads, and then submitted to the GPU in turn. They are separate after all, and don’t require GPU access to actually prepare.
This is essentially what Vulkan allows. There are some requirements: all the state must be known up-front, and prepared for well before it’s time to actually render something. The flip side is that there is much, much less driver overhead, and the API itself can be used multi-threaded. Actual submission of commands to the GPU is still done sequentially, in a single thread, however there’s very little overhead; all error checking has been done, and it’s just sending commands directly to the GPU (feeding the beast).
There are other areas of Vulkan that lend themselves nicely to application level multi-threading, but I won’t cover them here. Suffice to say that Vulkan does not contain a central state machine, and instead tries to keep everything as isolated and contained as possible, meaning things like building a shader don’t block loading a texture, making multi-threaded designs easier to achieve.


Not Always Applicable

On a final note: when porting games, the way a game handles its data is not always compatible with some of the multi-threading ideas mentioned above. It can’t be expected in every game. In addition, it might simply be easier in time and effort (not to mention with testing and stability) to run things in a single thread anyway. Not as efficient, but possibly less error prone and faster to bring a port. Article taken from GamingOnLinux.com.
31 Likes
The comments on this article are closed.
12 comments
Page: 1/2»
  Go to:

tuubi Feb 10, 2017
View PC info
  • Supporter
Interesting stuff, mirv. <3 Maybe write about your adventures in Vulkan land at some point?
sarmad Feb 10, 2017
Thanks for the great explanation.
drmoth Feb 11, 2017
Thanks for the great article! Maybe add the GL_THREADED_OPTIMIZATION nvidia stuff to the main article too.
jnrivers Feb 11, 2017
Great information for the layman. Thanks.
Ray54 Feb 11, 2017
View PC info
  • Supporter
Well explained mirv. I knew parts of it, but you have brought the issues together very well. Single large state machine representations were all the rage back in the 1990's, so were used for Unix Workstation Graphics (OpenGL), telephone switches, etc. However, concurrency (e.g. multi-threading) was always a major problem, with deadlock, livelock and unreachable states. Is it now the case that the software tools (e.g. Vulkan validation layer and C++ debuggers) have improved so much that ordinary game writers can be expected to write and successfully debug complex concurrency issues?
etonbears Feb 11, 2017
Nice summary mirv. I tend towards the view that the most important multi-threading skill is in analyzing what you are trying to do and creating separable tasks. Once you have the tasks, you can design work queues, thread allocation, and any synchronization needs with more confidence.

I think, whether OpenGL, Vulkan or D3D12, I would still be inclined to allocate a single thread/work queue to do nothing but execute GPU commands, simply to ensure the submission order and avoid subtle bugs, leaving other threads to prepare data and execute API commands that do not cause GPU submission.

One of the most difficult things to do is to determine what benefit you will actually achieve. Multi-threading is not cost-free. Even CPUs that support hardware threading generally just allow a lower-cost switch between 2 thread contexts per core. Designing something that works well on anything from 2-16 hardware threads is not always trivial.

One intriguing aspect of OpenGL 4 that offers a potential alternative method for reducing draw call overhead ( a primary driver for Mantle/Vulkan/D3D12 and their multi-threading enhancements ) is MultiDrawIndirect. This allows you to construct a collection of draw commands as a data array on the GPU, which can then be invoked as a single draw call. This could lead to a dramatic reduction in draw call overhead in certain circumstances, but I'm not sure whether it is flexible enough for all uses. If anyone has used this on Linux, I would be interested to hear their views.
Shmerl Feb 12, 2017
QuoteActual submission of commands to the GPU is still done sequentially, in a single thread, however there’s very little overhead; all error checking has been done

Is that true? From what I've read, modern GPUs support multiple queues for input (some for graphics, some for compute). I'm not sure what GPU is supposed to do with multiple queues for graphics for example, since in the end, rendered image is a single frame, but if they exist, it means it should be possible to feed them from multiple threads (one thread per GPU input queue). And Vulkan should support that.

Also, it's possible to have multiple GPUs working in parallel (Vulkan aims to support that), to increase computational power. You for sure don't want to have one thread feeding such hardware setup - it's going to be underutilized.


Last edited by Shmerl on 12 February 2017 at 4:58 am UTC
etonbears Feb 12, 2017
Quoting: Shmerl
QuoteActual submission of commands to the GPU is still done sequentially, in a single thread, however there’s very little overhead; all error checking has been done

Is that true? From what I've read, modern GPUs support multiple queues for input (some for graphics, some for compute). I'm not sure what GPU is supposed to do with multiple queues for graphics for example, since in the end, rendered image is a single frame, but if they exist, it means it should be possible to feed them from multiple threads (one thread per GPU input queue). And Vulkan should support that.

Also, it's possible to have multiple GPUs working in parallel (Vulkan aims to support that), to increase computational power. You for sure don't want to have one thread feeding such hardware setup - it's going to be underutilized.

Yes, Vulkan uses a single thread for GPU submission. AFAIK, in hardware terms, the most common case where a single thread may cause throttling would be for an extremely powerful GPU with a relatively weak CPU. In such a case, you would be using a dedicated PCIe card for the GPU, and the need to use PCIe would enforce single-thread synchronization in the driver regardless of what you do higher up the software stack.

AMD's APUs and Intel integrated graphics are different in that they are monolithic silicon, and therefore might be expected to benefit from multi-threading; but as the GPU elements are relatively weak, it is probably not the case.

Either way, an application design is probably more robust if it explicitly synchronizes submission order through a single thread. Responsibility for explicit synchronization is the trade-off developers accept for the benefits of using Vulkan.

Multiple independently operating GPUs ( say, one for compute and another for graphics ) could clearly benefit from a submission thread per GPU, but if they are co-operating on the same tasks, you still need to synchronize submissions, so you would probably still want a single thread to do that work.
Shmerl Feb 12, 2017
I saw this topic mentioned in the Mantle document: https://www.amd.com/Documents/Mantle-Programming-Guide-and-API-Reference.pdf

Search for "GPU queue" there. Synchronization is also covered there with queue semaphores. So I assume Vulkan should have an analog of the same idea.

I can't find it now, but I saw someone asking similar question in one of the Khronos Q&A, and they said Vulkan should support multiple parallel GPU queues.


Last edited by Shmerl on 12 February 2017 at 5:31 pm UTC
Shmerl Feb 12, 2017
Vulkan doc has various chapters as well about devices and queues, and synchronization: https://www.khronos.org/registry/vulkan/specs/1.0/pdf/vkspec.pdf

But they avoid details about practical usage and threading in regards to that. I suppose some higher level articles should dive into that.


Last edited by Shmerl on 12 February 2017 at 6:21 pm UTC
While you're here, please consider supporting GamingOnLinux on:

Reward Tiers: Patreon. Plain Donations: PayPal.

This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!

You can find even more ways to support us on this dedicated page any time. If you already are, thank you!
The comments on this article are closed.