Preparing a Function for Optimization
?
?

Keyboard Navigation

Global Keys

[, < / ], > Jump to previous / next episode
W, K, P / S, J, N Jump to previous / next timestamp
t / T Toggle theatre / SUPERtheatre mode
V Revert filter to original state Y Select link (requires manual Ctrl-c)

Menu toggling

q Quotes r References f Filter y Link c Credits

In-Menu and Index Controls

a
w
s
d
h j k l


Esc Close menu / unfocus timestamp

Quotes and References Menus and Index

Enter Jump to timestamp

Quotes, References and Credits Menus

o Open URL (in new tab)

Filter Menu

x, Space Toggle category and focus next
X, ShiftSpace Toggle category and focus previous
v Invert topics / media as per focus

Filter and Link Menus

z Toggle filter / linking mode

Credits Menu

Enter Open URL (in new tab)
1:31Open things up and recap
1:31Open things up and recap
1:31Open things up and recap
2:48DrawRectangleSlowly: Increase efficiency
2:48DrawRectangleSlowly: Increase efficiency
2:48DrawRectangleSlowly: Increase efficiency
3:33Create DrawRectangleHopefullyQuickly
3:33Create DrawRectangleHopefullyQuickly
3:33Create DrawRectangleHopefullyQuickly
4:34DrawRectangleHopefullyQuickly: Skip the preamble
4:34DrawRectangleHopefullyQuickly: Skip the preamble
4:34DrawRectangleHopefullyQuickly: Skip the preamble
5:42Remove all unnecessary code
5:42Remove all unnecessary code
5:42Remove all unnecessary code
6:44Look at what's happening
6:44Look at what's happening
6:44Look at what's happening
8:01Make the edge testing code more explicit
8:01Make the edge testing code more explicit
8:01Make the edge testing code more explicit
9:49Blackboard: See what's happening with these inner products
9:49Blackboard: See what's happening with these inner products
9:49Blackboard: See what's happening with these inner products
12:04DrawRectangleHopefullyQuickly: Test U and V instead
12:04DrawRectangleHopefullyQuickly: Test U and V instead
12:04DrawRectangleHopefullyQuickly: Test U and V instead
13:12Run the game
13:12Run the game
13:12Run the game
13:33Make these U and V computations more efficient
13:33Make these U and V computations more efficient
13:33Make these U and V computations more efficient
14:40Run the game and ensure that everything still blits fine
14:40Run the game and ensure that everything still blits fine
14:40Run the game and ensure that everything still blits fine
15:16Continue pruning
15:16Continue pruning
15:16Continue pruning
18:02Flatten the routine
18:02Flatten the routine
18:02Flatten the routine
19:55Blow out v4 Blended into scalar form
19:55Blow out v4 Blended into scalar form
19:55Blow out v4 Blended into scalar form
21:18Take a close look at the routine and precompute InvTexelA
21:18Take a close look at the routine and precompute InvTexelA
21:18Take a close look at the routine and precompute InvTexelA
23:35Blow out v4 Dest and Texel into scalar form
23:35Blow out v4 Dest and Texel into scalar form
23:35Blow out v4 Dest and Texel into scalar form
25:30Flatten BilinearSample and SRGBBilinearBlend
25:30Flatten BilinearSample and SRGBBilinearBlend
25:30Flatten BilinearSample and SRGBBilinearBlend
28:02Assess our situation
28:02Assess our situation
28:02Assess our situation
28:55Unpack and optimise the Lerps
28:55Unpack and optimise the Lerps
28:55Unpack and optimise the Lerps
33:57Run the game and annotate the code
33:57Run the game and annotate the code
33:57Run the game and annotate the code
35:33Flatten SRGB255ToLinear1
35:33Flatten SRGB255ToLinear1
35:33Flatten SRGB255ToLinear1
36:38Flatten Unpack4x8
36:38Flatten Unpack4x8
36:38Flatten Unpack4x8
38:59That's everything flattened
38:59That's everything flattened
38:59That's everything flattened
39:22Note that the code is faster
39:22Note that the code is faster
39:22Note that the code is faster
40:58We have a nasty problem with the unpackings
40:58We have a nasty problem with the unpackings
40:58We have a nasty problem with the unpackings
44:01Blackboard: What is our "wide" strategy?
44:01Blackboard: What is our "wide" strategy?
44:01Blackboard: What is our "wide" strategy?
48:43Set the stage for SIMD
48:43Set the stage for SIMD
48:43Set the stage for SIMD
50:45Consider solidifying texture boundaries
50:45Consider solidifying texture boundaries
50:45Consider solidifying texture boundaries
51:53Leave it for today
51:53Leave it for today
51:53Leave it for today
53:09Q&A
🗩
53:09Q&A
🗩
53:09Q&A
🗩
53:28braincruser The way the code is written now you have a very long dependency chain (between instructions). Will you break down the code to remove it?
🗪
53:28braincruser The way the code is written now you have a very long dependency chain (between instructions). Will you break down the code to remove it?
🗪
53:28braincruser The way the code is written now you have a very long dependency chain (between instructions). Will you break down the code to remove it?
🗪
56:42stelar7 Why did you write float instead of real32 this stream?
🗪
56:42stelar7 Why did you write float instead of real32 this stream?
🗪
56:42stelar7 Why did you write float instead of real32 this stream?
🗪
57:14stelar7 Why use -O2 instead of -O3 or -Ofast (possibly with -fverbose-asm)?
🗪
57:14stelar7 Why use -O2 instead of -O3 or -Ofast (possibly with -fverbose-asm)?
🗪
57:14stelar7 Why use -O2 instead of -O3 or -Ofast (possibly with -fverbose-asm)?
🗪
58:06garryjohanson Do you ever use exclusive or operations to avoid pipeline stalls? If not, what do you use?
🗪
58:06garryjohanson Do you ever use exclusive or operations to avoid pipeline stalls? If not, what do you use?
🗪
58:06garryjohanson Do you ever use exclusive or operations to avoid pipeline stalls? If not, what do you use?
🗪
59:04g3rain1 Aren't those square roots pretty expensive?1
🗪
59:04g3rain1 Aren't those square roots pretty expensive?1
🗪
59:04g3rain1 Aren't those square roots pretty expensive?1
🗪
1:03:31andsz_ Will you make multiple SIMD backends? (SSE?/AVX/FMA versions)
🗪
1:03:31andsz_ Will you make multiple SIMD backends? (SSE?/AVX/FMA versions)
🗪
1:03:31andsz_ Will you make multiple SIMD backends? (SSE?/AVX/FMA versions)
🗪
1:04:04davidthomas426 You could loft some of those variables out one more loop
🗪
1:04:04davidthomas426 You could loft some of those variables out one more loop
🗪
1:04:04davidthomas426 You could loft some of those variables out one more loop
🗪
1:04:58waterlimon How expensive is the float<>int conversion compared to the rest of the workload?2
🗪
1:04:58waterlimon How expensive is the float<>int conversion compared to the rest of the workload?2
🗪
1:04:58waterlimon How expensive is the float<>int conversion compared to the rest of the workload?2
🗪
1:05:40davidthomas426 Since xAxis and yAxis are usually perpendicular, should we special case for that? In the same vein, should we special-case for axis-aligned?
🗪
1:05:40davidthomas426 Since xAxis and yAxis are usually perpendicular, should we special case for that? In the same vein, should we special-case for axis-aligned?
🗪
1:05:40davidthomas426 Since xAxis and yAxis are usually perpendicular, should we special case for that? In the same vein, should we special-case for axis-aligned?
🗪
1:06:56waterlimon Does the compiler do any automatic SSE optimization (or have option for it?)
🗪
1:06:56waterlimon Does the compiler do any automatic SSE optimization (or have option for it?)
🗪
1:06:56waterlimon Does the compiler do any automatic SSE optimization (or have option for it?)
🗪
1:09:01stelar7 sqrt_ss vs sqrt_ps vs sqrt_pd?3
🗪
1:09:01stelar7 sqrt_ss vs sqrt_ps vs sqrt_pd?3
🗪
1:09:01stelar7 sqrt_ss vs sqrt_ps vs sqrt_pd?3
🗪
1:11:56waterlimon Would SSE allow doing sRGB using exponent 2.2 instead of approximating using one of 2, without a huge performance hit?
🗪
1:11:56waterlimon Would SSE allow doing sRGB using exponent 2.2 instead of approximating using one of 2, without a huge performance hit?
🗪
1:11:56waterlimon Would SSE allow doing sRGB using exponent 2.2 instead of approximating using one of 2, without a huge performance hit?
🗪
1:12:41pseudonym73 The main reason why you don't get automatic SIMD is precise exceptions. You probably need to tell the compiler that you don't need them
🗪
1:12:41pseudonym73 The main reason why you don't get automatic SIMD is precise exceptions. You probably need to tell the compiler that you don't need them
🗪
1:12:41pseudonym73 The main reason why you don't get automatic SIMD is precise exceptions. You probably need to tell the compiler that you don't need them
🗪
1:14:44waterlimon What happens if "/arch:AVX2" switch is enabled?
🗪
1:14:44waterlimon What happens if "/arch:AVX2" switch is enabled?
🗪
1:14:44waterlimon What happens if "/arch:AVX2" switch is enabled?
🗪
1:15:26Look at this AVX-512 stuff4
1:15:26Look at this AVX-512 stuff4
1:15:26Look at this AVX-512 stuff4
1:16:51braincruser FMA is fused multiply add
🗪
1:16:51braincruser FMA is fused multiply add
🗪
1:16:51braincruser FMA is fused multiply add
🗪
1:18:48andsz_ Yeah, looks like different caps bits
🗪
1:18:48andsz_ Yeah, looks like different caps bits
🗪
1:18:48andsz_ Yeah, looks like different caps bits
🗪
1:19:23Wrap things up
🗩
1:19:23Wrap things up
🗩
1:19:23Wrap things up
🗩