dazzle/src/routes/articles/the-graphics-pipeline/+page.svx
light7734 42c9fbe971
Some checks failed
continuous-integration/drone/push Build is failing
wip
2025-05-16 08:23:34 +03:30

329 lines
14 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: The Graphics Pipeline
date: "April 20 - 2025"
---
<script>
import Image from "../Image.svelte"
import Note from "../Note.svelte"
import Tip from "../Tip.svelte"
</script>
Ever wondered how games put all that gore on your display? All that beauty is brought into life by
a process called **rendering**, and at the heart of it, is the **graphics pipeline**.
In this article we'll dive deep into the intricate details of this powerful beast.
We'll cover all the terminologies needed to understand each stage and have many restatements so don't
worry if you don't fully grasp something at first. If you still had questions, feel free to contact me :)
So without further ado---
## Overview
Like any pipeline, the **graphics pipeline** is comprised
of several **stages**, each of which can be a pipeline in itself or even parallelized.
Each stage takes some input (data and configuration) to generate some output data for the next stage.
<Note title="A coarse division of the graphics pipeline", type="diagram">
Application --> Geometry Processing --> Rasterization --> Pixel Processing --> Presentation
</Note>
Before the heavy rendering work starts on the <Tip text="GPU">Graphics Processing Unit</Tip>,
we simulate and update the world through **systems** such as physics engine, game logic, networking, etc.
during the **application** stage.
This stage is mostly ran on the <Tip text="CPU">Central Processing Unit</Tip>,
therefore it is extremely efficient on executing <Tip text="sequentially dependent logic">
A type of execution flow where the operations depend on the results of previous steps, limiting parallel execution.
In other words, **CPUs** are great at executing **branch-heavy** code, and **GPUs** are geared
towards executing a TON of **branch-less** or **branch-light** code in parallel. </Tip>.
The updated scene data is then prepped and fed to the **GPU** for **geometry processing**. Here
we figure out where everything ends up on our screen by doing lots of fancy matrix math.
We'll cover this stage in depth very soon so don't panic (yet).
Afterwards, the final geometric data are converted into <Tip text="pixels"> Pixel is the shorthand for **picture-element**, Voxel is the shorthand for **volumetric-element**. </Tip>
and prepped for the **pixel processing** stage via a process called **rasterization**.
In other words, this stage converts a rather abstract and internal presentation (geometry)
into something more concrete (pixels). It's called rasterization because end the product is a <Tip text="raster">Noun. A rectangular pattern of parallel scanning lines followed by the electron beam on a television screen or computer monitor. -- 1930s: from German Raster, literally screen, from Latin rastrum rake, from ras- scraped, from the verb radere. ---Oxford Languages</Tip> of pixels.
The **pixel processing** stage then uses the rasterized geometry data (pixel data) to do **lighting**, **texturing**,
and all the sweet gory details of a scene (like a murder scene).
This stage is often, but not always, the most computationally expensive.
A huge problem that a good rendering engine needs to solve is how to be **performant**. And a great deal
of **optimization** can be done through **culling** the work that we can deem unnecessary/redundant in each
stage before it's passed on to the next. More on **culling** later don't worry (yet 🙂).
The pipeline will then serve (present) the output of the **pixel processing** stage, which is a **rendered image**,
to your pretty eyes 👁👄👁 using your <Tip text="display">Usually a monitor but the technical term for it is
the target **surface**. Which can be anything like a VR headset or some other crazy surface used for displaying purposes.</Tip>.
But to avoid drowning you in overviews, let's jump right into the gory details of the **geometry processing**
stage and have a recap afterwards!
## Surfaces
Ever been jump-scared by this sight in an <Tip text="FPS">First Person (Shooter) perspective</Tip>? Why are (the inside of) things rendered like that?
<Image
paths={["/images/boo.png"]}
/>
In order to display a (murder) scene,
we need to have a way of **representing** the **surface** of its composing objects (like corpses) in computer memory.
We only care about the **surface** since we won't be seeing the insides anyway---Not with that attitude.
At this stage, we only care about the **shape** or the **geometry** of the **surface**.
Texturing, lighting, and all the sweet gory details come at a much later stage once all the **geometry** has been processed.
But how do we represent surfaces in computer memory?
## Vertices
There are several ways to **represent** the surfaces of 3d objects for a computer to understand.
For instance, <Tip text="NURBS">
**Non-uniform rational basis spline** is a mathematical model using **basis splines** (B-splines) that is commonly used in computer graphics for representing curves and surfaces. It offers great flexibility and precision for handling both analytic (defined by common mathematical formulae) and modeled shapes. ---Wikipedia</Tip> surfaces are great for representing **curves**, and it's all about the
**high precision** needed to do <Tip text="CAD">Computer Assisted Design</Tip>. We could also do **ray-tracing** using fancy equations for
rendering **photo-realistic** images.
These are all great---ignoring the fact that they would take an eternity to process...
But what we need is a **performant** approach that can do this for an entire scene with
hundreds of thousands of objects (like a lot of corpses) in under a small fraction of a second. What we need is **polygonal modeling**.
**Polygonal modeling** enables us to do an exciting thing called **real-time rendering**. The idea is that we only need an
**approximation** of a surface to render it **realistically enough** for us to have some fun killing time!
We can achieve this approximation using a collection of **triangles**, **lines**, and **dots** (primitives),
which themselves are composed of a series of **vertices** (points in space).
<Image
paths={["/images/polygon_sphere.webp"]}
/>
A **vertex** is simply a point in space.
Once we get enough of these **points**, we can connect them to form **primitives** such as **triangles**, **lines**, and **dots**.
And once we connect enough of these **primitives** together, they form a **model** or a **mesh** (that we need for our corpse).
With some interesting models put together, we can compose a **scene** (like a murder scene :D).
<Image
paths={["/images/bunny.jpg"]}
/>
But let's not get ahead of ourselves. The primary type of **primitive** that we care about during **polygonal modeling**
is a **triangle**. But why not squares or polygons with a variable number of edges?
## Why Triangles?
In <Tip text="Euclidean geometry"> Developed by **Euclid** around 300 BCE, is based on five axioms. It describes properties of shapes, angles, and space using deductive reasoning. It remained the standard model of geometry for centuries until non-Euclidean geometries and general relativity showed its limits. It's still widely used in education, engineering, and **computer graphics**. ---Wikipedia </Tip>, triangles are always **planar** (they exist only in one plane),
any polygon composed of more than 3 points may break this rule, but why does polygons residing in one plane so important
to us?
<Image
paths={["/images/planar.jpg", "/images/non_planar_1.jpg", "/images/non_planar_2.png"]}
/>
When a polygon exists only in one plane, we can safely imply that **only one face** of it can be visible
at any one time; this enables us to utilize a huge optimization technique called **back-face culling**.
Which means we avoid wasting a ton of **precious processing time** on the polygons that
we know won't be visible to us. We can safely **cull** the **back-faces** since we won't
be seeing the **back** of a polygon when it's in the context of a closed-off model.
We figure this out by simply using the **winding order** of the triangle to determine whether we're looking at the
back of the triangle or the front of it.
Triangles also have a very small **memory footprint**; for instance, when using the **triangle-strip** topology (more on this very soon), for each additional triangle after the first one, only **one extra vertex** is needed.
The most important attribute, in my opinion, is the **algorithmic simplicity**.
Any polygon or shape can be composed from a **set of triangles**; for instance, a rectangle is simply **two coplanar triangles**.
Also, it is a common practice in computer science to break down hard problems into simpler, smaller problems.
This will be a lot more convincing when we cover the **rasterization** stage :)
<Note title="Bonus point: evolution", type="info">
present-day **hardware** and **algorithms** have become **extremely efficient** at processing
triangles by doing operations such as sorting, rasterizing, etc, after eons of evolving around them.
</Note>
## Primitive Topology
So, we got our set of vertices, but having a bunch of points floating around wouldn't make a scene very lively
(or gory), we need to form **triangles** out of them to compose **models** (corpse xd).
We communicate to the computer the <Tip text="toplogy"> The way in which constituent parts are interrelated or arranged.--mid 19th century: via German from Greek topos place + -logy.---Oxford Languages </Tip>
of the primitives to be generated from our set of vertices by
configuring the **primitive topology** of the **input assembler**.
We'll get into the **input assembler** bit in a second, but let's clarify the topology with some examples.
**Point list**:
When the topology is point list, each **consecutive vertex** defines a **single point** primitive, according to the equation:
<Note title="equation", type="math">
```math
p_i = \{ v_{i} \}
```
</Note>
**Line list**:
When the primitive topology is line list, each **consecutive pair of vertices** defines a **single line**, according to the equation:
<Note title="equation", type="math">
```math
p_i = \{ v_{2i},\ v_{2i+1} \}
```
</Note>
The number of primitives generated is equal to ⌊vertex_count / 2⌋.
**Line Strip**:
When the primitive topology is line strip, **one line** is defined by each **vertex and the following vertex**, according to the equation:
<Note title="equation", type="math">
```math
p_i = \{ v_i, v_{i+1} \}
```
</Note>
The number of primitives generated is equal to max(0, vertex_count - 1).
**Triangle list**:
When the primitive topology is triangle list, each **consecutive set of three vertices** defines a **single triangle**, according to the equation:
<Note title="equation", type="math">
```math
p_i = \{ v_{3i}, v_{3i+1}, v_{3i+2} \}
```
</Note>
The number of primitives generated is equal to ⌊vertex_count / 3⌋.
**Triangle strip**:
When the primitive topology is triangle strip, **one triangle** is defined by each **vertex and the two vertices that follow it**, according to the equation:
<Note title="equation", type="math">
```math
p_i = \{ v_i,\ v_{i + (1 + i \bmod 2)},\ v_{i + (2 - i \bmod 2)} \}
```
</Note>
The number of primitives generated is equal to max(0, vertex_count - 2).
**Triangle fan**:
When the primitive topology is trinagle fan, triangleas are defined **around a shared common vertex**, according to the equation:
<Note title="equation", type="math">
```math
p_i = \{ v_{i+1}, v_{i+2}, v_0 \}
```
</Note>
The number of primitives generated is equal to max(0, vertex_count - 2).
There's also some topologies suffixed with the word **adjacency**, and a special type of primitive called **patch** primitive.
But for the sake of simplicity we won't get into them.
So what's next?
## Indices
Great, we got our vertices, we figured out how to connect them, but there's one last thing we need
to understand before we can **assemble** our input using the **input assembler**. The **indices**.
## **Input Assembler**
Every section before this explained terminologies needed to grasp this,
section colored in yell-ow are concrete pipeline stages where some code gets executed
which processes the data we feed to it based on the configurations we set on it.
The **vertices** and **indices** are provided to this stage via something we call buffers.
So technically we have to provide **two** buffers here, a **vertex buffer** and a **index buffer**.
To give you yet-another ovreview, this is the diagram of the **geometry processing** section of
our pipeline:
<Note title="Geometry Processing Pipeline", type="diagram">
Draw --> Input Assembler -> Vertex Shader -> Tessellation Control Shader -> Tessellation Primitive Generator -> Tessellation Evaluation Shader -> Geometry Shader -> Vertex Post-Processing -> ... Rasterization ...
</Note>
## Coordinate System -- Local Space
## Coordinate System -- World Space
## Coordinate system -- View Space
## Coordinate system -- Clip Space
## Coordinate system -- Screen Space
## Vertex Shader
## Tessellation & Geometry Shaders
## Let's Recap!
## Rasterizer
## Pixel Shader
## Output Merger
## The Future
## Conclusion
## Sources
<Note title="Reviewers", type="review">
Mohammad Reza Nemati
</Note>
<Note title="Books", type="resource">
[Tomas Akenine Moller --- Real-Time Rendering 4th Edition](https://www.realtimerendering.com/intro.html)
[JoeyDeVriez --- LearnOpenGL - Hello Triangle](https://learnopengl.com/Getting-started/Hello-Triangle)
[JoeyDeVriez --- LearnOpenGL - Face Culling](https://learnopengl.com/Advanced-OpenGL/Face-culling)
</Note>
<Note title="Wikipedia", type="resource">
[Polygonal Modeling](https://en.wikipedia.org/wiki/Polygonal_modeling)
[Non-uniform Rational B-spline Surfaces](https://en.wikipedia.org/wiki/Non-uniform_rational_B-spline)
[Computer Aided Design (CAD)](https://en.wikipedia.org/wiki/Computer-aided_design)
[Rasterization](https://en.wikipedia.org/wiki/Rasterisation)
[Euclidean geometry](https://en.wikipedia.org/wiki/Euclidean_geometry)
</Note>
<Note title="Youtube", type="resource">
...
</Note>
<Note title="Stackoverflow", type="resource">
[Why do 3D engines primarily use triangles to draw surfaces?](https://stackoverflow.com/questions/6100528/why-do-3d-engines-primarily-use-triangles-to-draw-surfaces)
</Note>
<Note title="Vulakn Docs", type="resource">
[Drawing](https://docs.vulkan.org/spec/latest/chapters/drawing.html)
[Pipeline Diagram](https://docs.vulkan.org/spec/latest/_images/pipelinemesh.svg)
</Note>