Rendering in Tiles (Marathon) — Handmade Hero — Episode Guide

1:36Reintroducing the Intel Architecture Code Analyzer

10:46A long time ago, in RAD's source tree

13:00blowtard, an analytical tool for the Xbox 360's PowerPC Tri-Core Xenon written by Casey

22:04How IACA's output differs from Casey's stats in blowtard

30:56Looking at how to get our cycle count down

32:21Manually unroll the Fetch / Sample loop

36:30Group by Sample

37:27Use _mm_setr_ps as suggested by Fabian a long time ago

42:14Taking a look at the total throughput count

43:18Casey needs some more soya [sic] milk

44:17Could we do a load once, and grab out the two values that we needed?

45:48Explanation of possible texel loading optimisation

🖌

45:48Explanation of possible texel loading optimisation

🖌

45:48Explanation of possible texel loading optimisation

🖌

50:32Figuring out how the compiler is loading the texel data

1:00:18This is fine, then

1:01:01We multiply by TexturePitch and sizeof(uint32) four-wide manually, which is stupid

1:02:06Shift up FetchX_4x by 2, rather than multiply by sizeof(uint32)

1:03:40Premultiply FetchY_4x by TexturePitch_4x

1:04:07Give the compiler the wide stuff so that it can see it as wide

1:11:21_mm_mul_epi32 does not do integer * integer

1:13:43Port pressure (we're back to InterIteration)

1:17:46Hyperthreading

🖌

1:17:46Hyperthreading

🖌

1:17:46Hyperthreading

🖌

1:27:22Designing how to break up the renderer for multithreading to ease pressure on the caches

🖌

1:27:22Designing how to break up the renderer for multithreading to ease pressure on the caches

🖌

1:27:22Designing how to break up the renderer for multithreading to ease pressure on the caches

🖌

1:32:22Divide the frame buffer into chunks that are sized appropriately for the cache

🖌

1:32:22Divide the frame buffer into chunks that are sized appropriately for the cache

🖌

1:32:22Divide the frame buffer into chunks that are sized appropriately for the cache

🖌

1:39:55The plan for setting up the renderer

🖌

1:39:55The plan for setting up the renderer

🖌

1:39:55The plan for setting up the renderer

🖌

1:40:47Implementation of interleaved scanlines, in readiness for hyperthreading

1:46:36The logic of interleaved scanlines

🖌

1:46:36The logic of interleaved scanlines

🖌

1:46:36The logic of interleaved scanlines

🖌

1:52:37Updating compiler directives for folks who use LLVM

1:55:20Implementation of frame buffer divisions, in readiness for multi-core processing

2:05:30Go to Disassembly of DrawRectangleQuickly() in order to diagnose bogus cycle count

2:10:04Frame buffer divisions, continued

2:20:50Introduce GetClampedRectArea

2:22:12Problematic thing: Our convention for rectangles before was that they did not include their final value

2:27:33Fix the cycle counter for DrawRectangleQuickly() again

2:29:42A shortcut didn't work out. (!quote 297 + !quote 298)

2:30:56Loft FillRect above the loop

2:36:34Introduce PixelPxRow in order to keep PixelPx as a wide value rather than having to set it each time

2:39:50Check IACA for performance difference and revert to setting PixelPx each time through the loop

2:43:28Shuffle calculations around to figure out how the performance is affected, for good or ill

2:51:17Thinking about that alignment problem

🖌

2:51:17Thinking about that alignment problem

🖌

2:51:17Thinking about that alignment problem

🖌

2:55:58Align MinX and MaxX

3:00:18Microsoft Visual Studio 2013 has stopped working

3:02:03Dancing trees

3:03:03Change our loads and stores to no longer be unaligned

3:04:05Assess performance difference and revert back to the unaligned load and store instructions

3:05:12Make sure that we actually always fill the real clip region and not write outside the clip region

3:07:10Our options for filling the pixels

🖌

3:07:10Our options for filling the pixels

🖌

3:07:10Our options for filling the pixels

🖌

3:09:12Implementation of alignment to the ending edge

3:16:48Clip the leading edge

3:19:41ClipMask

🖌

3:19:41ClipMask

🖌

3:19:41ClipMask

🖌

3:21:33Try setting StartupClipMask by using _mm_srli_si128

3:22:28// TODO(casey): This is stupid.

3:26:10Early-out the FillRect tests

3:30:01Start passing ClipRect through to DrawRectangleQuickly

3:35:35Moment of realisation, with introduction of the InvertedInfinityRectangle

3:37:48Temporarily adjust ClipRect in order to avoid a crash

3:39:24Introduce TiledRenderGroupToOutput outside of the timer

3:43:57Update DrawRectangle to take the clipping information

3:47:18Update DrawRectangle{,Quickly} to use the Even / Odd information

3:49:20Break the screen up into pieces and render them separately

3:54:34Stretch your legs, Casey

3:56:28We can finally end the stream

3:57:07Q&A

🗩

3:57:07Q&A

🗩

3:57:07Q&A

🗩

3:57:12@rygorous a) your top and right clip is off-by-1!

🗪

3:57:12@rygorous a) your top and right clip is off-by-1!

🗪

3:57:12@rygorous a) your top and right clip is off-by-1!

🗪

3:59:54@mmozeiko _mm_mullo_epi32 is SSE4 intrinsic

🗪

3:59:54@mmozeiko _mm_mullo_epi32 is SSE4 intrinsic

🗪

3:59:54@mmozeiko _mm_mullo_epi32 is SSE4 intrinsic

🗪

4:04:57@mmozeiko Will you revert yesterday changes where you changed bilinear pixel unpacking code from float mul to int mul? It was faster with float mul.

🗪

4:04:57@mmozeiko Will you revert yesterday changes where you changed bilinear pixel unpacking code from float mul to int mul? It was faster with float mul.

🗪

4:04:57@mmozeiko Will you revert yesterday changes where you changed bilinear pixel unpacking code from float mul to int mul? It was faster with float mul.

🗪

4:05:24@an0nymal How many more marathon streams will we have? I thoroughly enjoyed the 4+ hours today.

🗪

4:05:24@an0nymal How many more marathon streams will we have? I thoroughly enjoyed the 4+ hours today.

🗪

4:05:24@an0nymal How many more marathon streams will we have? I thoroughly enjoyed the 4+ hours today.

🗪

4:05:47@quikligames You should give a big thanks to Rygorous for sticking around and trying to give you tips knowing full well that you wouldn't see them in chat

🗪

4:05:47@quikligames You should give a big thanks to Rygorous for sticking around and trying to give you tips knowing full well that you wouldn't see them in chat

🗪

4:05:47@quikligames You should give a big thanks to Rygorous for sticking around and trying to give you tips knowing full well that you wouldn't see them in chat

🗪

4:06:07@mmozeiko would it be better to have tile sizes always divisible by 4 horizontally (or even 16 to be cache aligned), then there will be no need to deal with alignment and masking?

🗪

4:06:07@mmozeiko would it be better to have tile sizes always divisible by 4 horizontally (or even 16 to be cache aligned), then there will be no need to deal with alignment and masking?

🗪

4:06:07@mmozeiko would it be better to have tile sizes always divisible by 4 horizontally (or even 16 to be cache aligned), then there will be no need to deal with alignment and masking?

🗪

4:07:07@rygorous (clip) one too few pixels. look at the edge of the screen.

🗪

4:07:07@rygorous (clip) one too few pixels. look at the edge of the screen.

🗪

4:07:07@rygorous (clip) one too few pixels. look at the edge of the screen.

🗪

4:09:20@rygorous just pretty sure I saw glitchiness/off-by-1-pixel stuff near the edges but it might've been the video encoding

🗪

4:09:20@rygorous just pretty sure I saw glitchiness/off-by-1-pixel stuff near the edges but it might've been the video encoding

🗪

4:09:20@rygorous just pretty sure I saw glitchiness/off-by-1-pixel stuff near the edges but it might've been the video encoding

🗪

4:11:08@mmozeiko (tile size %4) - not masking for textures, but ClipMask variable

🗪

4:11:08@mmozeiko (tile size %4) - not masking for textures, but ClipMask variable

🗪

4:11:08@mmozeiko (tile size %4) - not masking for textures, but ClipMask variable

🗪

4:13:30@abnercoimbre Q: holy crap. our 1st marathon.

🗪

4:13:30@abnercoimbre Q: holy crap. our 1st marathon.

🗪

4:13:30@abnercoimbre Q: holy crap. our 1st marathon.

🗪

4:13:50Time for Casey to go to bed, with closing remarks

Keyboard Navigation

Global Keys

Menu toggling

In-Menu and Index Controls

Quotes and References Menus and Index

Quotes, References and Credits Menus

Filter Menu

Filter and Link Menus

Credits Menu