Modern x64 Architectures and the Cache — Handmade Chat — Episode Guide

4:55@vateferfout handmade_hero Hello, it's nice to finally be able to catch a stream live

🗪

4:55@vateferfout handmade_hero Hello, it's nice to finally be able to catch a stream live

🗪

4:55@vateferfout handmade_hero Hello, it's nice to finally be able to catch a stream live

🗪

5:11@ivereadthesequel handmade_hero Hey Casey, it's my birthday today and I'm glad to catch some Handmade Hero on it! Woo!

🗪

5:11@ivereadthesequel handmade_hero Hey Casey, it's my birthday today and I'm glad to catch some Handmade Hero on it! Woo!

🗪

5:11@ivereadthesequel handmade_hero Hey Casey, it's my birthday today and I'm glad to catch some Handmade Hero on it! Woo!

🗪

5:48Desire a sponsor for Handmade Hero

6:54@simpalaxy Q: You mentioned on Twitter being interested in a hiring process that includes an option for people to record themselves doing their normal programming work. Is there a way you think the industry could be led to feasibly adopt that?

🗪

6:54@simpalaxy Q: You mentioned on Twitter being interested in a hiring process that includes an option for people to record themselves doing their normal programming work. Is there a way you think the industry could be led to feasibly adopt that?

🗪

6:54@simpalaxy Q: You mentioned on Twitter being interested in a hiring process that includes an option for people to record themselves doing their normal programming work. Is there a way you think the industry could be led to feasibly adopt that?

🗪

9:26@ivereadthesequel handmade_hero Had you seen the RLM parody of those "nerdbox" services that send you some cheap stuff in a box every month?

🗪

9:26@ivereadthesequel handmade_hero Had you seen the RLM parody of those "nerdbox" services that send you some cheap stuff in a box every month?

🗪

9:26@ivereadthesequel handmade_hero Had you seen the RLM parody of those "nerdbox" services that send you some cheap stuff in a box every month?

🗪

10:33@culdevu Q: I started my first programming job a couple months ago and have been thrown into an unfamiliar codebase a couple of times now. I wouldn't have said so previously, but now I'd say that the hardest part of learning a new codebase is the threading. That stuff can get crazy if it's not thought out carefully beforehand. Thoughts?

🗪

10:33@culdevu Q: I started my first programming job a couple months ago and have been thrown into an unfamiliar codebase a couple of times now. I wouldn't have said so previously, but now I'd say that the hardest part of learning a new codebase is the threading. That stuff can get crazy if it's not thought out carefully beforehand. Thoughts?

🗪

10:33@culdevu Q: I started my first programming job a couple months ago and have been thrown into an unfamiliar codebase a couple of times now. I wouldn't have said so previously, but now I'd say that the hardest part of learning a new codebase is the threading. That stuff can get crazy if it's not thought out carefully beforehand. Thoughts?

🗪

19:30@blaster_junior Q: Can you explain cache misses and how to avoid them? I'm coming from the Java world and never had to think about that

🗪

19:30@blaster_junior Q: Can you explain cache misses and how to avoid them? I'm coming from the Java world and never had to think about that

🗪

19:30@blaster_junior Q: Can you explain cache misses and how to avoid them? I'm coming from the Java world and never had to think about that

🗪

22:07Modern Caches

🖌

22:07Modern Caches

🖌

22:07Modern Caches

🖌

25:32The structure and work of an x64 CPU

🖌

25:32The structure and work of an x64 CPU

🖌

25:32The structure and work of an x64 CPU

🖌

36:19x64 Scheduler out-of-order processing

🖌

36:19x64 Scheduler out-of-order processing

🖌

36:19x64 Scheduler out-of-order processing

🖌

38:25x64 Caches¹

🖌

38:25x64 Caches¹

🖌

38:25x64 Caches¹

🖌

47:23Cache misses, and how to avoid them

🖌

47:23Cache misses, and how to avoid them

🖌

47:23Cache misses, and how to avoid them

🖌

57:16IPC (Instructions Per Clock) vs. Cache Lines

🖌

57:16IPC (Instructions Per Clock) vs. Cache Lines

🖌

57:16IPC (Instructions Per Clock) vs. Cache Lines

🖌

1:08:35Intel's undocumented L1 ← L2 "fill" penalty

🖌

1:08:35Intel's undocumented L1 ← L2 "fill" penalty

🖌

1:08:35Intel's undocumented L1 ← L2 "fill" penalty

🖌

1:10:59Avoiding cache misses: 1) Learn cache sizes

🖌

1:10:59Avoiding cache misses: 1) Learn cache sizes

🖌

1:10:59Avoiding cache misses: 1) Learn cache sizes

🖌

1:12:58Avoiding cache misses: 2) Organize for the cache

🖌

1:12:58Avoiding cache misses: 2) Organize for the cache

🖌

1:12:58Avoiding cache misses: 2) Organize for the cache

🖌

1:20:16Avoiding cache misses: 3) Linear, simple access patterns (prefetching)

🖌

1:20:16Avoiding cache misses: 3) Linear, simple access patterns (prefetching)

🖌

1:20:16Avoiding cache misses: 3) Linear, simple access patterns (prefetching)

🖌

1:26:07x64 Line buffers

🖌

1:26:07x64 Line buffers

🖌

1:26:07x64 Line buffers

🖌

1:31:04Hyperthreading

🖌

1:31:04Hyperthreading

🖌

1:31:04Hyperthreading

🖌

1:31:58Measuring cache utilisation with VTune or perf

🖌

1:31:58Measuring cache utilisation with VTune or perf

🖌

1:31:58Measuring cache utilisation with VTune or perf

🖌

1:33:58@saidwho12 Q: Is there a way to transfer data to the cache manually?

🗪

1:33:58@saidwho12 Q: Is there a way to transfer data to the cache manually?

🗪

1:33:58@saidwho12 Q: Is there a way to transfer data to the cache manually?

🗪

1:34:39Intel's PREFETCH instructions²^,3^,4

📖

1:34:39Intel's PREFETCH instructions²^,3^,4

📖

1:34:39Intel's PREFETCH instructions²^,3^,4

📖

1:41:33Manual cache control in the Nintendo GameCube's Dolphin CPU and the Sony PlayStation 3

1:43:58@saidwho12 Is 64 bytes the cache line size on every CPU?

🗪

1:43:58@saidwho12 Is 64 bytes the cache line size on every CPU?

🗪

1:43:58@saidwho12 Is 64 bytes the cache line size on every CPU?

🗪

1:45:31@sapper123 Q: Have you tested MeowHash on the new Zen2 processors?

🗪

1:45:31@sapper123 Q: Have you tested MeowHash on the new Zen2 processors?

🗪

1:45:31@sapper123 Q: Have you tested MeowHash on the new Zen2 processors?

🗪

1:45:58@printf_armin handmade_hero This is also important for DMA memory

🗪

1:45:58@printf_armin handmade_hero This is also important for DMA memory

🗪

1:45:58@printf_armin handmade_hero This is also important for DMA memory

🗪

1:47:09@kkrabz handmade_hero How does the processor handle multiple programs running at the same time, in regards to the cache?

🗪

1:47:09@kkrabz handmade_hero How does the processor handle multiple programs running at the same time, in regards to the cache?

🗪

1:47:09@kkrabz handmade_hero How does the processor handle multiple programs running at the same time, in regards to the cache?

🗪

1:49:21The structure of a Zen CPU⁵

📖

1:49:21The structure of a Zen CPU⁵

📖

1:49:21The structure of a Zen CPU⁵

📖

1:53:12Yield and extreme ultraviolet lithography⁶^,7

📖

1:53:12Yield and extreme ultraviolet lithography⁶^,7

📖

1:53:12Yield and extreme ultraviolet lithography⁶^,7

📖

1:55:06Chip fabrication⁸^,9 and transistor density¹⁰

🖌

1:55:06Chip fabrication⁸^,9 and transistor density¹⁰

🖌

1:55:06Chip fabrication⁸^,9 and transistor density¹⁰

🖌

2:10:11Failure in chip fabrication due to the sheer precision of the process¹¹

📖

2:10:11Failure in chip fabrication due to the sheer precision of the process¹¹

📖

2:10:11Failure in chip fabrication due to the sheer precision of the process¹¹

📖

2:27:43Fabrication Yield and multiple cores

🖌

2:27:43Fabrication Yield and multiple cores

🖌

2:27:43Fabrication Yield and multiple cores

🖌

2:37:14Cache Between Cores: 1) NUMA (Non-Uniform Memory Access) architecture

🖌

2:37:14Cache Between Cores: 1) NUMA (Non-Uniform Memory Access) architecture

🖌

2:37:14Cache Between Cores: 1) NUMA (Non-Uniform Memory Access) architecture

🖌

2:43:38Cache Between Cores: 2) MESI (Modified, Exclusive, Shared, Invalid) protocol¹²

🖌

2:43:38Cache Between Cores: 2) MESI (Modified, Exclusive, Shared, Invalid) protocol¹²

🖌

2:43:38Cache Between Cores: 2) MESI (Modified, Exclusive, Shared, Invalid) protocol¹²

🖌

2:48:40@cultofrig Q: handmade_hero 64-byte lines are pervasive due to the burst size of DDR controllers. And yes, Arm also moved from 32-byte to 64-byte

🗪

2:48:40@cultofrig Q: handmade_hero 64-byte lines are pervasive due to the burst size of DDR controllers. And yes, Arm also moved from 32-byte to 64-byte

🗪

2:48:40@cultofrig Q: handmade_hero 64-byte lines are pervasive due to the burst size of DDR controllers. And yes, Arm also moved from 32-byte to 64-byte

🗪

2:49:25@rroohhh Q: The process for the silicon ingots is called Czochralski process

🗪

2:49:25@rroohhh Q: The process for the silicon ingots is called Czochralski process

🗪

2:49:25@rroohhh Q: The process for the silicon ingots is called Czochralski process

🗪

2:49:37@sanchopanzo handmade_hero Why don't they focus on increasing the cache sizes? Is there a hard limit to it or are they happy with the sizes as they are?

🗪

2:49:37@sanchopanzo handmade_hero Why don't they focus on increasing the cache sizes? Is there a hard limit to it or are they happy with the sizes as they are?

🗪

2:49:37@sanchopanzo handmade_hero Why don't they focus on increasing the cache sizes? Is there a hard limit to it or are they happy with the sizes as they are?

🗪

2:51:05@pythno Q: Isn't wavelength and size of electron kind of the same? Just different models?

🗪

2:51:05@pythno Q: Isn't wavelength and size of electron kind of the same? Just different models?

🗪

2:51:05@pythno Q: Isn't wavelength and size of electron kind of the same? Just different models?

🗪

2:51:17@printf_armin Q: Did have the privilege to wear one at Infineon. Really interesting experience

🗪

2:51:17@printf_armin Q: Did have the privilege to wear one at Infineon. Really interesting experience

🗪

2:51:17@printf_armin Q: Did have the privilege to wear one at Infineon. Really interesting experience

🗪

2:51:35@cubercaleb Q: If the probability of failure for one chip is P(F), and the probability of failure for two combined chips is denoted P(FC), then P(FC) = 2 * P(F) - P(F) * P(F)

🗪

2:51:35@cubercaleb Q: If the probability of failure for one chip is P(F), and the probability of failure for two combined chips is denoted P(FC), then P(FC) = 2 * P(F) - P(F) * P(F)

🗪

2:51:35@cubercaleb Q: If the probability of failure for one chip is P(F), and the probability of failure for two combined chips is denoted P(FC), then P(FC) = 2 * P(F) - P(F) * P(F)

🗪

2:51:56Failure probability of two combined chips

🖌

2:51:56Failure probability of two combined chips

🖌

2:51:56Failure probability of two combined chips

🖌

3:00:48@cubercaleb Q: It's the union of either chip being a dud, minus the intersection of both chips being a dud (since this is already accounted for in the union)

🗪

3:00:48@cubercaleb Q: It's the union of either chip being a dud, minus the intersection of both chips being a dud (since this is already accounted for in the union)

🗪

3:00:48@cubercaleb Q: It's the union of either chip being a dud, minus the intersection of both chips being a dud (since this is already accounted for in the union)

🗪

3:01:38@cubercaleb Q: I over simplified. It should just be Bayes' theorem and these events should be independent, so just multiply the failure rate

🗪

3:01:38@cubercaleb Q: I over simplified. It should just be Bayes' theorem and these events should be independent, so just multiply the failure rate

🗪

3:01:38@cubercaleb Q: I over simplified. It should just be Bayes' theorem and these events should be independent, so just multiply the failure rate

🗪

3:02:53Close this down

Keyboard Navigation

Global Keys

Menu toggling

In-Menu and Index Controls

Quotes and References Menus and Index

Quotes, References and Credits Menus

Filter Menu

Filter and Link Menus

Credits Menu