Replacing rand() and Preparing for SIMD
?
?

Keyboard Navigation

Global Keys

[, < / ], > Jump to previous / next episode
W, K, P / S, J, N Jump to previous / next marker
t / T Toggle theatre / SUPERtheatre mode
V Revert filter to original state Y Select link (requires manual Ctrl-c)

Menu toggling

q Quotes r References f Filter y Link c Credits

In-Menu Movement

a
w
s
d
h j k l


Quotes and References Menus

Enter Jump to timecode

Quotes, References and Credits Menus

o Open URL (in new tab)

Filter Menu

x, Space Toggle category and focus next
X, ShiftSpace Toggle category and focus previous
v Invert topics / media as per focus

Filter and Link Menus

z Toggle filter / linking mode

Credits Menu

Enter Open URL (in new tab)
0:06Recap and set the stage for the day
🗩
0:06Recap and set the stage for the day
🗩
0:06Recap and set the stage for the day
🗩
1:38Note that we're building in optimised mode
🗩
1:38Note that we're building in optimised mode
🗩
1:38Note that we're building in optimised mode
🗩
2:15Run and see our output image
🏃
2:15Run and see our output image
🏃
2:15Run and see our output image
🏃
3:39ray.cpp: Walk through the code
🗩
3:39ray.cpp: Walk through the code
🗩
3:39ray.cpp: Walk through the code
🗩
5:23Consider two areas of optimisation: 1) Bounding Volume Hierarchy
🗩
5:23Consider two areas of optimisation: 1) Bounding Volume Hierarchy
🗩
5:23Consider two areas of optimisation: 1) Bounding Volume Hierarchy
🗩
6:572) Using better math operations
🗩
6:572) Using better math operations
🗩
6:572) Using better math operations
🗩
7:42Step into RenderTile() and inspect the asm, noting down routines to improve
7:42Step into RenderTile() and inspect the asm, noting down routines to improve
7:42Step into RenderTile() and inspect the asm, noting down routines to improve
15:51Check out PCG, A Family of Better Random Number Generators1 with a recommendation to read the full paper2
📖
15:51Check out PCG, A Family of Better Random Number Generators1 with a recommendation to read the full paper2
📖
15:51Check out PCG, A Family of Better Random Number Generators1 with a recommendation to read the full paper2
📖
24:13Check out the x86 SSE2 shift-left instructions3
📖
24:13Check out the x86 SSE2 shift-left instructions3
📖
24:13Check out the x86 SSE2 shift-left instructions3
📖
27:59Read 6.3 - Specific Implementations4 and the Xorshift wiki article5
📖
27:59Read 6.3 - Specific Implementations4 and the Xorshift wiki article5
📖
27:59Read 6.3 - Specific Implementations4 and the Xorshift wiki article5
📖
31:15Introduce XOrShift32() from Wikipedia6 with a check into doing this in a 64-bit7
31:15Introduce XOrShift32() from Wikipedia6 with a check into doing this in a 64-bit7
31:15Introduce XOrShift32() from Wikipedia6 with a check into doing this in a 64-bit7
37:45Run our program to get a benchmark timing
🏃
37:45Run our program to get a benchmark timing
🏃
37:45Run our program to get a benchmark timing
🏃
38:59Replace rand() with our new XOrShift32(), packing Entropy in the work_order struct
38:59Replace rand() with our new XOrShift32(), packing Entropy in the work_order struct
38:59Replace rand() with our new XOrShift32(), packing Entropy in the work_order struct
46:39Run to see no obvious problems with our output, and note our dramatically improved performance
🏃
46:39Run to see no obvious problems with our output, and note our dramatically improved performance
🏃
46:39Run to see no obvious problems with our output, and note our dramatically improved performance
🏃
48:47Step into the code and inspect the asm to see a lot of mulss calls
48:47Step into the code and inspect the asm to see a lot of mulss calls
48:47Step into the code and inspect the asm to see a lot of mulss calls
51:30Introduce CastSampleRays() to do some of the work of RenderTile()
51:30Introduce CastSampleRays() to do some of the work of RenderTile()
51:30Introduce CastSampleRays() to do some of the work of RenderTile()
58:21Run to see that we lose some speed
🏃
58:21Run to see that we lose some speed
🏃
58:21Run to see that we lose some speed
🏃
59:12Make RenderTile() only use a random_series in its inner loop
59:12Make RenderTile() only use a random_series in its inner loop
59:12Make RenderTile() only use a random_series in its inner loop
1:00:07Run to see that that's a little bit better
🏃
1:00:07Run to see that that's a little bit better
🏃
1:00:07Run to see that that's a little bit better
🏃
1:01:07Rename cast_result to cast_state which contains both the input and output data
1:01:07Rename cast_result to cast_state which contains both the input and output data
1:01:07Rename cast_result to cast_state which contains both the input and output data
1:08:12Run to see some busted imagery
🏃
1:08:12Run to see some busted imagery
🏃
1:08:12Run to see some busted imagery
🏃
1:08:58Fix RenderTile() to correctly fill out the cast_state State
1:08:58Fix RenderTile() to correctly fill out the cast_state State
1:08:58Fix RenderTile() to correctly fill out the cast_state State
1:12:20Run to see that that helps
🏃
1:12:20Run to see that that helps
🏃
1:12:20Run to see that that helps
🏃
1:13:05Consider how to perform this ray casting wide
🗩
1:13:05Consider how to perform this ray casting wide
🗩
1:13:05Consider how to perform this ray casting wide
🗩
1:18:04Transform CastSampleRays() to handle the notion of operating wide
🗩
1:18:04Transform CastSampleRays() to handle the notion of operating wide
🗩
1:18:04Transform CastSampleRays() to handle the notion of operating wide
🗩
1:19:06Run to see that it runs roughly four times faster, and that the image now contains tile-boundary artifacts
🏃
1:19:06Run to see that it runs roughly four times faster, and that the image now contains tile-boundary artifacts
🏃
1:19:06Run to see that it runs roughly four times faster, and that the image now contains tile-boundary artifacts
🏃
1:21:08Temporarily revert RandomUnlateral() to use rand()
1:21:08Temporarily revert RandomUnlateral() to use rand()
1:21:08Temporarily revert RandomUnlateral() to use rand()
1:21:38Run to see no artifacts, and note that the XOrShift32() needs improving
🏃
1:21:38Run to see no artifacts, and note that the XOrShift32() needs improving
🏃
1:21:38Run to see no artifacts, and note that the XOrShift32() needs improving
🏃
1:22:46Sketch in the code to enable CastSampleRays() to operate wide
1:22:46Sketch in the code to enable CastSampleRays() to operate wide
1:22:46Sketch in the code to enable CastSampleRays() to operate wide
1:33:17Describe our current situation
🗩
1:33:17Describe our current situation
🗩
1:33:17Describe our current situation
🗩
1:34:11Set up CastSampleRays() to let all rays in all lanes finish
1:34:11Set up CastSampleRays() to let all rays in all lanes finish
1:34:11Set up CastSampleRays() to let all rays in all lanes finish
1:38:21Consider how to track the materials wide
🗩
1:38:21Consider how to track the materials wide
🗩
1:38:21Consider how to track the materials wide
🗩
1:40:08Set up CastSampleRays() to track the materials wide and collate all the computations
1:40:08Set up CastSampleRays() to track the materials wide and collate all the computations
1:40:08Set up CastSampleRays() to track the materials wide and collate all the computations
1:52:29Create ray_lane.h to #define the lanes, and introduce RandomBilateralLane(), various permutations of ConditionalAssign(), a Max(), MaskIsZeroed() and versions of HorizontalAdd()
1:52:29Create ray_lane.h to #define the lanes, and introduce RandomBilateralLane(), various permutations of ConditionalAssign(), a Max(), MaskIsZeroed() and versions of HorizontalAdd()
1:52:29Create ray_lane.h to #define the lanes, and introduce RandomBilateralLane(), various permutations of ConditionalAssign(), a Max(), MaskIsZeroed() and versions of HorizontalAdd()
2:03:39Run and see totally busted imagery
🏃
2:03:39Run and see totally busted imagery
🏃
2:03:39Run and see totally busted imagery
🏃
2:04:23Build in debug mode and on one core
2:04:23Build in debug mode and on one core
2:04:23Build in debug mode and on one core
2:05:40Step in to CastSampleRays() and inspect its values
2:05:40Step in to CastSampleRays() and inspect its values
2:05:40Step in to CastSampleRays() and inspect its values
2:05:56Make CastSampleRays() set FilmX and FilmY to their centres
2:05:56Make CastSampleRays() set FilmX and FilmY to their centres
2:05:56Make CastSampleRays() set FilmX and FilmY to their centres
2:07:14Step in to CastSampleRays() and see that the State->Series and Order->Entropy are both 0
2:07:14Step in to CastSampleRays() and see that the State->Series and Order->Entropy are both 0
2:07:14Step in to CastSampleRays() and see that the State->Series and Order->Entropy are both 0
2:08:36Make CastSampleRays() offset the Entropy and use different random series per ray
2:08:36Make CastSampleRays() offset the Entropy and use different random series per ray
2:08:36Make CastSampleRays() offset the Entropy and use different random series per ray
2:09:27Step in to CastSampleRays() and note that the ConditionalAssign() is wrong
2:09:27Step in to CastSampleRays() and note that the ConditionalAssign() is wrong
2:09:27Step in to CastSampleRays() and note that the ConditionalAssign() is wrong
2:10:44Make ConditionalAssign() zero the Mask if there is nothing set in it
2:10:44Make ConditionalAssign() zero the Mask if there is nothing set in it
2:10:44Make ConditionalAssign() zero the Mask if there is nothing set in it
2:11:20Step in to ConditionalAssign() to see that that is better
2:11:20Step in to ConditionalAssign() to see that that is better
2:11:20Step in to ConditionalAssign() to see that that is better
2:11:41Run to see how the picture looks
🏃
2:11:41Run to see how the picture looks
🏃
2:11:41Run to see how the picture looks
🏃
2:13:24View the image
🏃
2:13:24View the image
🏃
2:13:24View the image
🏃
2:13:49Reduce the RayCount and increase the CoreCount
2:13:49Reduce the RayCount and increase the CoreCount
2:13:49Reduce the RayCount and increase the CoreCount
2:14:49Investigate the summation
2:14:49Investigate the summation
2:14:49Investigate the summation
2:17:53Make CastSampleRays() correctly set the LaneMask
2:17:53Make CastSampleRays() correctly set the LaneMask
2:17:53Make CastSampleRays() correctly set the LaneMask
2:18:35Run and see a more correct image
🏃
2:18:35Run and see a more correct image
🏃
2:18:35Run and see a more correct image
🏃
2:18:52Switch back to the optimised version, with more RaysPerPixel
2:18:52Switch back to the optimised version, with more RaysPerPixel
2:18:52Switch back to the optimised version, with more RaysPerPixel
2:19:09Run to see that we're darker
🏃
2:19:09Run to see that we're darker
🏃
2:19:09Run to see that we're darker
🏃
2:20:13Correctly set the LaneWidth
2:20:13Correctly set the LaneWidth
2:20:13Correctly set the LaneWidth
2:21:20Run and see that the images are basically indistinguishable
🏃
2:21:20Run and see that the images are basically indistinguishable
🏃
2:21:20Run and see that the images are basically indistinguishable
🏃
2:22:12Set up to support a constrained set of LANE_WIDTH values
2:22:12Set up to support a constrained set of LANE_WIDTH values
2:22:12Set up to support a constrained set of LANE_WIDTH values
2:30:05Run to see that XOrShift32() is actually fine
🏃
2:30:05Run to see that XOrShift32() is actually fine
🏃
2:30:05Run to see that XOrShift32() is actually fine
🏃
2:31:45Do LANE_WIDTH==8 too
2:31:45Do LANE_WIDTH==8 too
2:31:45Do LANE_WIDTH==8 too
2:32:43Q&A
🗩
2:32:43Q&A
🗩
2:32:43Q&A
🗩
2:33:46yurasniper Q: How would one implement something like bloom effect in a raytracer?
🗪
2:33:46yurasniper Q: How would one implement something like bloom effect in a raytracer?
🗪
2:33:46yurasniper Q: How would one implement something like bloom effect in a raytracer?
🗪
2:39:46Run our program to capture its performance statistics
🏃
2:39:46Run our program to capture its performance statistics
🏃
2:39:46Run our program to capture its performance statistics
🏃
2:42:07macielda Q: Is the Halton 2,3 sequence a good way to generate sample positions? I've heard about some people using it. It is a low discrepancy series
🗪
2:42:07macielda Q: Is the Halton 2,3 sequence a good way to generate sample positions? I've heard about some people using it. It is a low discrepancy series
🗪
2:42:07macielda Q: Is the Halton 2,3 sequence a good way to generate sample positions? I've heard about some people using it. It is a low discrepancy series
🗪
2:43:11Rename our image and stat files
🗹
2:43:11Rename our image and stat files
🗹
2:43:11Rename our image and stat files
🗹
2:44:30vaualbus Q: When you learn this way of doing SIMD? I remember in Handmade Hero when we had optimized the renderer we use __m128 every way
🗪
2:44:30vaualbus Q: When you learn this way of doing SIMD? I remember in Handmade Hero when we had optimized the renderer we use __m128 every way
🗪
2:44:30vaualbus Q: When you learn this way of doing SIMD? I remember in Handmade Hero when we had optimized the renderer we use __m128 every way
🗪
2:46:05macielda Q: What is your take on AA methods? I'm currently looking for one for my game. I see The Witness has MSAA option only (no FXAA, TXAA and friends)?
🗪
2:46:05macielda Q: What is your take on AA methods? I'm currently looking for one for my game. I see The Witness has MSAA option only (no FXAA, TXAA and friends)?
🗪
2:46:05macielda Q: What is your take on AA methods? I'm currently looking for one for my game. I see The Witness has MSAA option only (no FXAA, TXAA and friends)?
🗪
2:46:31longboolean Q: Are there any machines with hardware RNG that just puts random values into a register with one instruction?8
🗪
2:46:31longboolean Q: Are there any machines with hardware RNG that just puts random values into a register with one instruction?8
🗪
2:46:31longboolean Q: Are there any machines with hardware RNG that just puts random values into a register with one instruction?8
🗪
2:48:44pseudonym73 Q: G'day, long time no stream. Low-discrepancy sequences do exhibit blue noise behaviours if you do them right, but their main advantage is that you can access the quasi-random streams in an arbitrary order. Not really relevant yet. Also, you can do better than 2,3 Halton
🗪
2:48:44pseudonym73 Q: G'day, long time no stream. Low-discrepancy sequences do exhibit blue noise behaviours if you do them right, but their main advantage is that you can access the quasi-random streams in an arbitrary order. Not really relevant yet. Also, you can do better than 2,3 Halton
🗪
2:48:44pseudonym73 Q: G'day, long time no stream. Low-discrepancy sequences do exhibit blue noise behaviours if you do them right, but their main advantage is that you can access the quasi-random streams in an arbitrary order. Not really relevant yet. Also, you can do better than 2,3 Halton
🗪
2:49:37macielda Q: Do shader languages expose things like "Conditional Assign"?
🗪
2:49:37macielda Q: Do shader languages expose things like "Conditional Assign"?
🗪
2:49:37macielda Q: Do shader languages expose things like "Conditional Assign"?
🗪
2:51:14Ensure that everything is in good shape
🗹
2:51:14Ensure that everything is in good shape
🗹
2:51:14Ensure that everything is in good shape
🗹
2:52:14Shut down
🗩
2:52:14Shut down
🗩
2:52:14Shut down
🗩