Continuing Streamlining the Raycaster — Handmade Hero — Episode Guide

0:01Welcome to the stream

🗩

0:01Welcome to the stream

🗩

0:01Welcome to the stream

🗩

0:06Determine to continue with optimisation

🏃

0:06Determine to continue with optimisation

🏃

0:06Determine to continue with optimisation

🏃

0:57Recap yesterday's welding optimisation in GridRayCast()

📖

0:57Recap yesterday's welding optimisation in GridRayCast()

📖

0:57Recap yesterday's welding optimisation in GridRayCast()

📖

4:09Consider optimisation potential of the SpecTexel load / stores in GridRayCast()

📖

4:09Consider optimisation potential of the SpecTexel load / stores in GridRayCast()

📖

4:09Consider optimisation potential of the SpecTexel load / stores in GridRayCast()

📖

7:22Illustrate the possibility of loading in the SpecTexel values and InvBlend at the outset

9:23Seek easier optimisation opportunities in GridRayCast()

📖

9:23Seek easier optimisation opportunities in GridRayCast()

📖

9:23Seek easier optimisation opportunities in GridRayCast()

📖

11:43Simplify out OcclusionN from GridRayCast()

12:27Seek optimisation with OcclusionD and RayD in GridRayCast()

📖

12:27Seek optimisation with OcclusionD and RayD in GridRayCast()

📖

12:27Seek optimisation with OcclusionD and RayD in GridRayCast()

📖

18:48Streamline the SignRayD and NormalXYZ computations in GridRayCast()

25:35Reacquaint ourselves with the hit testing and shuffling code in GridRayCast()

📖

25:35Reacquaint ourselves with the hit testing and shuffling code in GridRayCast()

📖

25:35Reacquaint ourselves with the hit testing and shuffling code in GridRayCast()

📖

30:30Streamline the Normal selection in GridRayCast()

34:46Check out the port usage of various instructions, noting that we may get an AND for free¹

📖

34:46Check out the port usage of various instructions, noting that we may get an AND for free¹

📖

34:46Check out the port usage of various instructions, noting that we may get an AND for free¹

📖

40:23Continue to streamline the Normal selection in GridRayCast(), introducing a NormalTable, before toggling back to the old code

48:12Run successfully

🏃

48:12Run successfully

🏃

48:12Run successfully

🏃

48:31Streamline the ProbeSampleNSingle usage in GridRayCast()

55:01Run successfully, and consider unit testing the grid ray cast

🏃

55:01Run successfully, and consider unit testing the grid ray cast

🏃

55:01Run successfully, and consider unit testing the grid ray cast

🏃

56:49Treat ProbeSampleNSingle wide in GridRayCast()

1:01:34Run successfully

🏃

1:01:34Run successfully

🏃

1:01:34Run successfully

🏃

1:01:50Treat OcclusionD wide in GridRayCast()

1:03:28Run successfully

🏃

1:03:28Run successfully

🏃

1:03:28Run successfully

🏃

1:04:02Finish streamlining the Normal selection in GridRayCast()

1:07:46Run successfully

🏃

1:07:46Run successfully

🏃

1:07:46Run successfully

🏃

1:08:13Temporarily try hard setting the NormalIndex to 0 in GridRayCast()

1:08:27We can't tell it's wrong

🏃

1:08:27We can't tell it's wrong

🏃

1:08:27We can't tell it's wrong

🏃

1:08:56Let GridRayCast() set the computed NormalIndex and make a note to test this

1:09:36hhlightprof total seconds elapsed: 4.534789

🏃

1:09:36hhlightprof total seconds elapsed: 4.534789

🏃

1:09:36hhlightprof total seconds elapsed: 4.534789

🏃

1:10:20Simplify out tUpdateBlend in GridRayCast()

1:12:49Augment light_atlas with StrideXYZ_4x and VoxelDim_4x

1:17:45Run successfully

🏃

1:17:45Run successfully

🏃

1:17:45Run successfully

🏃

1:17:54Make MakeLightAtlas() set the StrideXYZ and VoxelDim, for GridRayCast() to load out of that atlas, changing their format in light_atlas to be an array of 4

1:20:37Run successfully

🏃

1:20:37Run successfully

🏃

1:20:37Run successfully

🏃

1:20:46hhlightprof total seconds elapsed: 4.513986

🏃

1:20:46hhlightprof total seconds elapsed: 4.513986

🏃

1:20:46hhlightprof total seconds elapsed: 4.513986

🏃

1:22:09Remove the old AABBRayCast()

1:24:42Run successfully

🏃

1:24:42Run successfully

🏃

1:24:42Run successfully

🏃

1:24:51Prepare lighting_box to pack down to 64-bits total, propagating this change

1:28:29Run successfully

🏃

1:28:29Run successfully

🏃

1:28:29Run successfully

🏃

1:28:38Clean out the sprawl from FullCast()

1:36:20Run successfully

🏃

1:36:20Run successfully

🏃

1:36:20Run successfully

🏃

1:36:25Look into welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself

📖

1:36:25Look into welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself

📖

1:36:25Look into welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself

📖

1:39:21hhlightprof total seconds elapsed: 4.511818

🏃

1:39:21hhlightprof total seconds elapsed: 4.511818

🏃

1:39:21hhlightprof total seconds elapsed: 4.511818

🏃

1:39:36Extend GridRayCast() to operate on twice as many samples

1:40:44Run successfully

🏃

1:40:44Run successfully

🏃

1:40:44Run successfully

🏃

1:40:46hhlightprof total seconds elapsed: 4.394170

🏃

1:40:46hhlightprof total seconds elapsed: 4.394170

🏃

1:40:46hhlightprof total seconds elapsed: 4.394170

🏃

1:41:52Toggle off the debug code in FullCast()

1:43:26hhlightprof total seconds elapsed: 4.392245

🏃

1:43:26hhlightprof total seconds elapsed: 4.392245

🏃

1:43:26hhlightprof total seconds elapsed: 4.392245

🏃

1:43:41Consider welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself

📖

1:43:41Consider welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself

📖

1:43:41Consider welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself

📖

1:45:57Q&A

🗩

1:45:57Q&A

🗩

1:45:57Q&A

🗩

1:47:07@mindmark42 Q: Yesterday you changed your SIMD extract functions to use shuffles instead. Could you explain again why that is better?

🗪

1:47:07@mindmark42 Q: Yesterday you changed your SIMD extract functions to use shuffles instead. Could you explain again why that is better?

🗪

1:47:07@mindmark42 Q: Yesterday you changed your SIMD extract functions to use shuffles instead. Could you explain again why that is better?

🗪

1:47:26Extract vs Shuffle

🖌

1:47:26Extract vs Shuffle

🖌

1:47:26Extract vs Shuffle

🖌

1:56:14"Semantic" Extraction

🖌

1:56:14"Semantic" Extraction

🖌

1:56:14"Semantic" Extraction

🖌

1:58:02Unnecessary extract and cast, with thanks to mmozeiko

🖌

1:58:02Unnecessary extract and cast, with thanks to mmozeiko

🖌

1:58:02Unnecessary extract and cast, with thanks to mmozeiko

🖌

1:59:05Shuffle

🖌

1:59:05Shuffle

🖌

1:59:05Shuffle

🖌

2:00:41@3ygun Q: Is there such a thing as smooching too much and causing the compiler to bail before doing optimizations?

🗪

2:00:41@3ygun Q: Is there such a thing as smooching too much and causing the compiler to bail before doing optimizations?

🗪

2:00:41@3ygun Q: Is there such a thing as smooching too much and causing the compiler to bail before doing optimizations?

🗪

2:01:11@billdstrong Q: Would we gain any speed by moving ahead 16 and doing 12 ops per pass?

🗪

2:01:11@billdstrong Q: Would we gain any speed by moving ahead 16 and doing 12 ops per pass?

🗪

2:01:11@billdstrong Q: Would we gain any speed by moving ahead 16 and doing 12 ops per pass?

🗪

2:01:40Thank you, everyone

Keyboard Navigation

Global Keys

Menu toggling

In-Menu and Index Controls

Quotes and References Menus and Index

Quotes, References and Credits Menus

Filter Menu

Filter and Link Menus

Credits Menu