In 2012, the world's largest DRAM memory manufacturer went bankrupt. This Japanese company, Elpida, was once the pride of Japan's semiconductor industry, backed by the technological accumulation of three giants: NEC, Hitachi, and Mitsubishi. Even with government intervention, it couldn't survive. With debts of 430 billion yen, it filed for bankruptcy protection and was subsequently acquired, integrated, and absorbed by an American company for 200 billion yen, disappearing completely from history. That American company was Micron Technology. Intel made a DRAM business and withdrew. Texas Instruments made one and withdrew. Motorola made one and withdrew. Japan's entire semiconductor memory industry went from its peak to its collapse in less than twenty years. South Koreans took the baton, with Samsung and SK Hynix sweeping the market with government subsidies and aggressive price wars, cornering all competitors. Micron survived, becoming the only company in the US capable of mass-producing advanced memory chips. This company, headquartered in Boise, Idaho, lived in the shadow of Nvidia and TSMC. It didn't design GPUs or manufacture logic chips. But as AI pushed the world's thirst for computing power to its limits, a physical bottleneck neglected for decades suddenly became unavoidable—the time computing units waited for data was longer than the computation itself. This problem has no software solution, only a hardware one. And that hardware is precisely what Micron has been working on for forty years. I. Physical and System Limitations of AI Computing (h1 style="text-align: left;">
I. Physical and System Limitations of AI ComputingRevisiting the Memory Wall
In the current von Neumann architecture, GPU or TPU computing units and main memory are physically independent of each other.
The computing unit contains a small amount of SRAM (Static Random-Access Memory) as on-chip cache.
Model weights and input data are mainly stored in off-chip DRAM (Dynamic Random Access Memory).
Data must be transmitted between two physical structures, such as intermediary layers, in the form of electrical signals. Taking a large language model with 70 billion parameters as an example, the weight data alone, at FP16 precision, requires approximately 140GB of physical memory. Currently, mainstream high-end AI computing cards have memory capacities between 80GB and 192GB; slightly larger models must be split across multiple cards for execution. Over the past decade, chip computing power has increased exponentially. However, memory bandwidth growth is constrained by the number of physical pins, signal frequency, and heat dissipation limits, lagging far behind computing power growth. When computation speed exceeds memory supply speed, computing units are forced into a waiting state, resulting in a significant drop in the utilization of expensive hardware. Training and inference are two stages of AI. Training refines the accuracy of large models and occurs in the background. Inference generates results when the user uses the system and occurs in the foreground. Training is characterized by large-scale parallel processing. The same batch of data is repeatedly used in the cache of the computing cores, resulting in high arithmetic intensity. The system is primarily limited by computation speed rather than memory. This is a computationally intensive scenario where NVIDIA's computing power advantage is fully utilized. The inference phase is a different story. Large language models rely on autoregressive mechanisms to generate text. Only one token is output at a time, which is then used as input for the next step. To avoid recalculating previous attention scores with each generation, the system maintains a KV cache in GPU memory to cache the key-value tensors of historical sequences. With a context length of 4096, a single user request requires approximately 1.34GB of GPU memory. If we deduct the model weight usage from the two A100 GPUs, approximately 20GB remains for the KV cache, which can only support a maximum of about 14 concurrent requests. During the inference phase, the arithmetic intensity is extremely low; the system is entirely constrained by memory bandwidth, making it a memory-intensive task. What truly determines the throughput ceiling is the HBM physical transfer rate. Energy consumption is a key factor. Reading data from an off-chip HBM consumes approximately 10-20 pJ/bit, while performing a single FP16 floating-point operation requires only about 0.1 pJ. Moving data consumes 100 to 200 times the energy of the computation itself. In large-scale inference scenarios, if memory access patterns are not optimized, a significant portion of the data center's power will be consumed in bus transmissions rather than actual logical operations. This is precisely the physical driving force behind Micron's continued advancement of HBM technology. II. Analysis of Micron's Core Semiconductor Technologies First, what kind of company is Micron Technology? Micron is an IDM (Integrated Device Manufacturer), handling everything from design and manufacturing to packaging. However, Micron's wafer fabs only produce one type of product: memory chips. It doesn't make CPUs or GPUs; it only makes RAM and flash memory. In terms of product structure, Micron's revenue can be roughly divided into three parts: DRAM accounts for over 70%, NAND for 20-30%, and NOR flash memory for a smaller proportion. DRAM is what we're familiar with as memory modules; NAND is the core medium of solid-state drives (SSDs); NOR is mainly found in automotive electronics and industrial equipment, responsible for quickly executing boot code—its presence is low but irreplaceable. In the end-market, Micron has four business units: Computing & Networking for data centers and servers, Mobile for smartphones, SSDs for enterprise storage, and Embedded Systems for automotive and industrial applications. What role does Micron play in the AI supply chain? Nvidia makes GPUs, TSMC manufactures them—where does Micron fit in this chain? In short, NVIDIA's H100 and B200 GPUs are manufactured by TSMC; Micron is not involved in this process. However, a complete accelerator card capable of running large models requires more than just computing cores. As explained earlier, the performance bottleneck in the inference stage lies in memory bandwidth, not computing power itself. Therefore, NVIDIA must tightly integrate high-bandwidth memory (HBM) next to the GPU. These HBMs are manufactured by Micron (as well as SK Hynix and Samsung) and then fixed on the same silicon interposer as the GPU logic chip using TSMC's CoWoS advanced packaging technology, forming a complete AI computing module. Micron is a key component supplier. The GPU is the brain, and the HBM is the ultra-high-speed data channel closely attached to the brain; both are indispensable. This structure determines that Micron's competitive logic is completely different from Nvidia's. Nvidia builds its moat on architecture and ecosystem, while Micron relies on continuous iteration of process technology and stacked packaging technology. Each generation of HBM bandwidth improvement is backed by more complex TSV (Through Silicon Via) technology and higher stacking layers, making the barrier to entry quite high. DRAM: The Infrastructure Hidden Behind the Computing Power Narrative Before AI computing power, there is a more fundamental question: where does the data come from, and how does it reach the computing core? The answer to this question is DRAM (Dynamic Random Access Memory). Let's start with personal computers. In traditional computers, DRAM is main memory, solving the speed mismatch problem. Hard drives store a lot of data, but read it slowly. CPUs calculate quickly, but have nowhere to temporarily store data. There's a three-order-of-magnitude speed difference between the two. The CPU waiting for the hard drive is like following a tractor on a highway. DRAM solves this problem. When a user opens a program, the operating system moves its code and data from the hard drive into DRAM; the CPU then directly sends address instructions to the DRAM, completing the data read and write operation with nanosecond latency and tens of GB/s bandwidth. The operating system kernel, the state of background processes, and everything running are all stored here in real time. This is because data is lost upon power failure, which is the meaning of "dynamic." DRAM capacitors naturally leak current, requiring continuous refreshing to maintain data. From a physical structure perspective, each memory cell in DRAM consists of a transistor and a capacitor (1T1C). Entering AI scenarios, the nature of the requirements changes. The AI computing core has shifted from the CPU to the GPU. The form of DRAM has also evolved accordingly, no longer just DDR memory modules plugged into the motherboard, but rather HBM high-bandwidth memory, vertically stacking multiple layers of bare dies using TSV (Through Silicon Vias) technology, packaged on the same interposer layer as the GPU. The demand for DRAM has shifted from simply meeting system operation needs to overcoming computing power bottlenecks. First, there's the loading of model weights. The parameters of large models are stored in physical memory in matrix form, and must all reside in the HBM (Hardware Bus) near the computing core before inference begins. A model with 70 billion parameters requires approximately 140GB of storage space for the weights themselves in FP16 format. Second, there's the dynamic use of the KV Cache. When the model generates text, each output word references all previous context. To avoid recalculating every time, the system caches historical data in video memory; this is the KV Cache. The longer the context, the larger the cache. Two A100 GPUs, after deducting the model weights, only have enough GPU memory to serve a dozen or so users simultaneously. This is the actual concurrency limit of a server costing tens of thousands of dollars. The consumption is even greater in training scenarios. During training, not only model parameters need to be stored, but also the intermediate calculation results of each layer need to be retained to facilitate updating weights during backpropagation. The commonly used Adam optimizer also stores two additional copies of the data for each parameter. Combined, the GPU memory used during training is usually three to four times that used during inference. This brings us back to the memory wall problem. The computing power of GPU cores grows far faster than the growth of memory bandwidth. The arithmetic intensity during the inference phase is extremely low, and the GPU spends a large amount of time in an idle state waiting for data. The bandwidth improvement of each generation of HBM directly determines the upper limit of the actual throughput that AI inference servers can support. This is the core value of DRAM in the AI era, and the underlying logic behind Micron's continuous investment in HBM R&D. In the global DRAM market, Samsung, SK Hynix, and Micron together account for approximately 95% of the market share. However, their strengths are completely different.
Process Advancement: Micron Leads the Way
In semiconductor manufacturing, process node (technology node) refers to the characteristic dimensions of the microscopic physical structure inside an integrated circuit.
When Micron is praised for leading the way in process advancement, it means that Micron is ahead of Samsung and SK Hynix in the engineering progress of shrinking the internal physical structure of DRAM chips and increasing the storage density per unit area.
In other words, more chips can be cut from a single wafer, the manufacturing cost per bit decreases, and gross margin is supported.
From 1-alpha to 1-beta and then to 1-gamma, Micron is usually the first manufacturer to announce mass production of the next generation of high-density DRAM. Samsung has encountered yield bottlenecks at nodes below 14nm, and its delivery pace has slowed significantly in the last two generations. SK Hynix's process advancement speed is roughly equivalent to Micron's; the two are in the same tier. HBM: Hynix's Home Ground While process technology is Micron's strength, the HBM market is currently SK Hynix's home ground. Hynix holds over 50% of the HBM market share and is the exclusive initial supplier for NVIDIA's highest-end GPUs. Its core technological advantage lies in its MR-MUF packaging process, which excels in heat dissipation and yield control during multi-layer DRAM die stacking. Micron, a latecomer, skipped HBM3 and went directly to HBM3E, leveraging its energy efficiency advantage to enter NVIDIA's supply chain. However, it uses TC-NCF packaging, which is more difficult to manufacture with multi-layer stacking, resulting in a significant gap in overall production capacity and market share compared to Hynix. Samsung, on the other hand, is a different story. During the HBM3 and HBM3E phases, Samsung's products failed to pass NVIDIA's testing in time due to heat dissipation and power consumption control issues, missing the peak window of opportunity for AI memory growth. Currently, betting on a comeback in the HBM4 phase. Energy Efficiency: Micron's Differentiated Approach While Micron lags behind Hynix in HBM market share, its differentiating factor lies in power consumption. Public test data shows that Micron's HBM consumes 20% to 30% less power than competitors while providing the same data bandwidth. This figure may not seem significant on a single GPU, but in a data center deploying tens of thousands of GPUs, it translates directly into electricity costs. Currently, power supply and cooling in AI data centers have become bottlenecks for expansion, making energy efficiency increasingly important for procurement decisions. The same logic extends to mobile devices. Micron's LPDDR5X, based on a 1-gamma process, achieves speeds of 9.6Gbps while reducing overall power consumption by 30%. For phones running local AI models, battery life is a directly perceptible metric for users. Scale: Samsung's trump card. Micron's overall production capacity is at the bottom. Without Samsung's absolute scale, Micron cannot rely on price wars and can only pursue a technology premium strategy. This is why Micron must maintain its lead in process technology and energy efficiency; once its technological advantage disappears, it will have no chance of winning in price competition. Here's a brief summary of the three companies' positions. Hynix capitalized on its HBM packaging technology to reap the biggest benefits from the AI memory boom; Samsung maintained its dominance in the conventional DRAM market through scale, but faltered in HBM; Micron leads in process technology and energy efficiency, has the smallest production capacity, but has incorporated certainty into its financial structure through technology premiums and early order locking. NAND and NOR: Micron's Other Two Pieces of the Puzzle In the global NAND market, Micron ranks fourth or fifth, with a market share consistently between 10% and 15%, following Samsung, SK Hynix, Kioxia, and Western Digital. NOR flash memory is a much smaller segment than NAND, with the low-end market share dominated by Taiwanese and mainland Chinese companies such as Macronix, Winbond, and GigaDevice. Micron has proactively abandoned low-capacity consumer orders, focusing instead on the automotive and industrial high-end markets. Each NOR memory cell is directly connected to a bit line, creating a parallel structure that supports single-byte random addressing. When a car's CPU is powered on, it can directly execute boot code within the NOR chip via the memory bus, which is why a car's dashboard can light up within milliseconds. In terms of bandwidth, Micron spearheaded the development of the Octal xSPI interface standard, using 8 data lines and DDR technology to push NOR read speeds to the 400MB/s level. Modern smart car cockpit systems are becoming increasingly complex, and this speed is a crucial requirement for achieving rapid cold starts. Micron's automotive-grade NOR has achieved the highest safety level certification, ASIL-D, and its chip integrates hardware ECC error correction logic, enabling automatic error correction in a very short time. Industrial equipment and automobiles often have service lives exceeding ten years. Micron, with its own wafer fabs, can provide a continuous supply commitment for over a decade, something many competitors relying on foundries cannot achieve. The NAND and NOR businesses together constitute another revenue stream for Micron, independent of HBM. The former benefits from the data center boom through process technology leadership and product structure upgrades, while the latter secures automotive industry customers through its irreplaceable physical characteristics and stringent certification requirements. Both logics point in the same direction: avoiding price wars and earning premiums in areas with the highest performance and reliability requirements. As of now, Micron's stock price is approximately $600, with a price-to-earnings ratio of 21.44 and a market capitalization of approximately $650 billion. The 12-month target prices given by mainstream Wall Street investment banks are concentrated between $400 and $675, with an average close to $500. By this standard, the current price is undervalued. Why a PE ratio of 21? Over the past thirty years, memory chips have been a typical cyclical stock. When the industry is booming, production expands, then everyone experiences overcapacity, price drops, and losses. The market has little confidence in this type of business, typically giving it a PE ratio of only 8 to 10. Micron is now at 21, primarily because HBM has changed its revenue structure. Previously, Micron produced standard DDR memory, and its output and selling price were entirely dependent on market conditions. Now, HBM (Hardware-Built-to-order) production is done to order, with irrevocable long-term supply agreements signed with customers like Nvidia before wafer fabrication, locking in both price and quantity. Reportedly, Micron's HBM capacity for 2026 has already been sold out. Under this model, Micron's future revenue is no longer a forecast but a contract. Wall Street's logic has changed accordingly. This is a company closer to an infrastructure supplier with stable contracts, naturally leading to a higher valuation multiplier. Another driving force is its capital structure. Micron is the only company in the United States with large-scale advanced memory manufacturing capabilities. Against the backdrop of the "Chip Act" and policies promoting supply chain localization, US institutional investors have poured funds into Micron when allocating to AI hardware themes, resulting in a real liquidity premium. SK Hynix: Strongest Technology, Lowest Valuation. SK Hynix's PE ratio is 12.17, lower than Micron's. Although HBM has a market share exceeding 50% and is a core supplier for Nvidia's high-end GPUs, Firstly, South Korean listed companies have complex chaebol governance structures, low dividend payout ratios and buyback rates, meaning profits often circulate internally within the group, leaving little return for minority shareholders. At the same profit level, the valuation multiplier of South Korean companies is systematically lower than their US counterparts. Secondly, there is geopolitical risk. SK Hynix has approximately 40% of its conventional DRAM production capacity at its Wuxi plant in China. The US export ban on EUV equipment to China means this production line cannot be upgraded to advanced processes. In the future, it will either bear huge costs of capacity relocation or watch this asset gradually lose its competitiveness. Wall Street has directly factored this potential cost into its valuation. Samsung: A PE ratio of 34.18 is not a high premium, but a collapse of the denominator. Samsung Electronics' PE ratio of 34.18 is based on a completely different logic. Samsung is not a pure memory company; it also manufactures wafer foundries, smartphones, and display panels. The problem is that the foundry division has invested tens of billions of dollars to catch up with TSMC in 3nm and 2nm processes, but yield rates are low, and this division is currently incurring huge losses. The group's overall net profit has shrunk significantly. However, the stock price has been supported by South Korean domestic funds and has not fallen sharply. With the numerator not falling and the denominator shrinking, the PE ratio has reached over 25 times. The core logic supporting these target prices is highly consistent. Increased HBM product share drives higher gross margins; long-term agreements lock in revenue certainty; capacity shift to HBM compresses conventional DRAM supply, creating room for price increases across the entire product line; and capital expenditures enter a payback period after 1-gamma process mass production, turning free cash flow from negative to positive. Of course, the target price is a prediction based on current information and model assumptions, not a guarantee. The cyclical nature of the storage industry hasn't disappeared; it's just been partially smoothed out by the order structure of HBMs. If the pace of AI infrastructure investment slows down, or Samsung re-enters Nvidia's supply chain in the HBM4 phase, supply and demand will be repriced. III. Advanced Packaging and Next-Generation AI Connectivity The Standards for Good and Bad HBMs Every vendor claims their HBM is the best: Samsung says Samsung is good, SK Hynix says SK Hynix is good, Micron says Micron is good. So, are there any standards to judge the quality of HBM? Three truly important parameters: The first is pin rate, or bandwidth. HBM connects to the GPU through thousands of microbumps, each bump being a transmission channel. Pin rate measures how much data a single channel can transmit per second. Physically, digital signals 0 and 1 correspond to different voltage states; for example, 1.1V represents 1, and 0V represents 0. This involves the calculation and conversion between 0 and 1. Transmitting data involves switching the voltage between these two states; this is called voltage level switching. A pin rate of 9.2Gbps means that the voltage on a metal bump with a diameter of tens of micrometers must precisely flip 9.2 billion times per second. The HBM physical bus width is fixed at 1024 pins, so the total bandwidth is calculated as: pin rate × 1024 bits ÷ 8 = GB/s. Micron's HBM3E is rated at 9.2Gbps, which translates to approximately 1.2TB/s bandwidth per stack. SK Hynix and Samsung's current flagship products typically range from 8.0 to 8.5Gbps. Faster flipping means more data transmission, but the trade-off is a linear increase in power consumption. Each flip essentially involves charging and discharging the parasitic capacitance of the wires, and this energy is ultimately converted into heat. Flipping too quickly can also cause signal waveform distortion. Before the voltage of the previous pulse has even settled, the next one arrives, and the receiver cannot distinguish between 0 and 1, causing data transmission to collapse. The second is energy efficiency, measured in pJ/bit. This refers to how many picojoules of energy are consumed to transmit 1 bit of data; the lower the better. This metric is important because HBM and GPU are packaged together, and the heat generated by both must be dissipated within this package. If the HBM itself consumes too much power, the thermal burden on the entire system will exceed the thermal design limit, forcing the GPU to throttle and reducing actual computing power. Micron claims that its low-voltage design at the 1-beta process node achieves approximately 30% higher energy efficiency than competitors. In data centers where a single GPU can consume 600 to 1000 watts, this difference directly translates into electricity and cooling costs. The third factor is thermal resistance and packaging technology. This is the most challenging part and SK Hynix's current true competitive advantage. The basic formula for thermal resistance is: Temperature rise = Power consumption × Thermal resistance. With a fixed power consumption, the lower the thermal resistance, the lower the chip temperature. HBM stands for vertically stacked multilayer DRAM dies. The bottom logic chips generate the most heat, which must be conducted upwards to dissipate. The material used to fill the spaces between layers determines the efficiency of this heat dissipation path. Currently, there are two main processes in the industry. Micron and Samsung use TC-NCF, a hot-pressed non-conductive thin film, which involves bonding a solid film to high temperature and pressure. The problem is that tiny air bubbles easily remain around the microbumps during bonding, resulting in poor air thermal conductivity and a relatively high overall thermal resistance. SK Hynix uses MR-MUF, a batch reflow molding bottom filler. Liquid epoxy resin is injected between each layer, filling all gaps through capillary action. After curing, there are zero air bubbles, resulting in significantly lower thermal resistance. High thermal resistance has a cascading effect. DRAM stores charge through microscopic capacitors; for every 10-degree Celsius increase in temperature, the leakage rate increases exponentially. At excessively high temperatures, a charge that could normally be held for 64 milliseconds might leak out in just 32 milliseconds, forcing the memory controller to double the number of refresh commands. During the refresh period, DRAM cannot be read or written, effectively reducing usable bandwidth significantly. The packaging process also determines the upper limit of the number of stacked layers. Data centers have strict limitations on the physical height of chips. Liquid filling can more tightly fill the gaps, allowing for more DRAM layers to be placed at the same height. This is why the yield pressure of the packaging process increases dramatically when HBM4 achieves 16-layer stacking. The more layers there are, the more amplified the problem of inconsistent mechanical stress and thermal expansion coefficients among each layer become. If any layer of the die experiences micro-bending, the entire module is ruined. What to look for when reading manufacturer materials: When you see any HBM introduction, look for three things: 1) At what voltage was the nominal pin rate measured? Relying on increasing voltage to increase frequency is unusable in actual data centers because power consumption would exceed the thermal design limits. 2) Stacking layers and single-chip capacity. Whether a 12-layer 36GB HBM4 can be mass-produced and what its yield rate is are more telling than peak bandwidth figures.
3) Who will actually supply it? The final verification of all technical indicators is customer acceptance testing. SK Hynix almost monopolizes the HBM supply for Nvidia's H100; Micron entered the H200 supply chain with a combination of energy efficiency and bandwidth; Samsung failed to pass Nvidia's testing in time during the HBM3E stage due to overheating issues and is currently trying to catch up in the HBM4 stage.
The selection result of major customers is a comprehensive score of all the above parameters. CXL: The Next Battleground for Memory HBM solves the bandwidth problem within a single GPU. When AI clusters scale to hundreds or even thousands of GPUs, the issue is no longer whether the computation is fast enough, but whether memory allocation is flexible enough. CXL solves this problem. The cache consistency problem is a fundamental issue in existing data center memory architectures: memory is physically bound to the server and cannot be shared across machines. One server running large model inference overflows its KV cache, causing a system crash and error. Meanwhile, another server in the same data center runs lightweight tasks, leaving hundreds of GB of memory idle. These idle DRAM assets cannot be allocated to where they are needed; this is known in the industry as memory staging. The memory staging rate in hyperscale data centers is typically between 20% and 30%. Considering that memory accounts for over 40% of a server's BOM cost, this represents a waste of real capital expenditure. The second problem is cache consistency. CPUs and GPUs each have their own private caches. When both hold copies of the same memory data, if one modifies it without the other's knowledge, it will read outdated data. The previous solution was to force the cached data to be written back to DRAM and then read again at the software level. This operation took several microseconds, during which the processor pipeline stalled. In AI systems that emphasize nanosecond-level response, this pause can reduce system performance by more than 30%, and it also requires engineers to manually handle cross-chip data synchronization in the code, which is highly error-prone. The common root cause of these two problems is the limitation of the PCIe protocol. PCIe was originally designed for I/O devices such as hard drives and network cards, supporting only large-block data transfer, not byte-level direct read and write, and lacking a built-in cache coherence mechanism. Micron's CXL (Compute Express Link) rewrites the protocol logic above the PCIe physical layer, specifically targeting memory semantics and cache consistency. For cache consistency, CXL relies on a hardware state machine for automatic maintenance. Each 64-byte cache line in the system has a status flag: modified, exclusive, shared, or invalid. When the GPU needs to modify a piece of data, the request reaches the main agent on the CPU side. The main agent has a sniffing filter that records which devices have a copy of this data in their caches. If the CPU's L3 cache contains the data, the hardware circuitry automatically sends an invalidation signal, forcing the CPU's cache state to become invalid, allowing the GPU to gain exclusive access and perform the write operation. The entire process is completed within a few to tens of nanoseconds, requiring no operating system intervention and no manual synchronization code from the programmer. Regarding data transmission format, CXL abandons the lengthy PCIe packet header, adopting a fixed 256-byte FLIT format. This results in minimal header overhead, eliminating the need for complex boundary resolution by the memory controller, allowing data to be continuously fed into the bus like a pipeline. The latency for accessing remote CXL memory can theoretically be reduced to 170 to 250 nanoseconds, slightly slower than local DDR5, but far lower than the microsecond-level latency of PCIe. Regarding memory sharing, CXL uses switches to group multiple memory modules into independent memory pools, no longer subordinate to any single server. Management software can dynamically map specific capacities from the memory pool to the required compute nodes at the microsecond level. If server A's KV cache is nearly full, a portion can be directly allocated from the pool, freeing up idle memory on server B. Micron's Industry Position with CXL Micron has launched the CXL Type 3 memory expansion module, positioned as a pure memory expansion device, manufactured based on its own DDR5 process. Logically, this and HBM are two different levels of products. HBM addresses the extreme bandwidth requirements of hundreds of gigabytes next to the GPU, with latency in the 20 nanosecond range. The CXL module addresses large-capacity expansion across nodes, with latency in the 250 nanosecond range and capacity reaching the terabyte level. The two are used together to keep frequently accessed hot data in the local HBM and offload cold data such as long-context historical KV cache and checkpoints to the CXL memory pool. When the AI framework is computing layer N, it issues an instruction in advance to prefetch the cold data needed for layer N+1 from CXL memory to the local machine, using computation time to mask the physical latency of CXL. This avoids wasting expensive HBM capacity and enables extremely long context windows, such as those involving millions of tokens. From Micron's business perspective, CXL is a new entry point. Hynix has a clear first-mover advantage in the HBM market, leading to fierce competition; the CXL memory expansion market is still in its early stages, with customer lock-in not yet established, and Micron, as a pure storage manufacturer, has no additional historical baggage in this area. Furthermore, CXL modules use standard DDR5 technology, eliminating the need for the complex stacking packaging of HBM, resulting in lower yield and production capacity pressures. The data center memory stall problem is a genuine waste of capital, and CXL pooling is currently the only feasible solution at the architectural level. This demand will not disappear. IV. Industry Economics and Frontier Research The Next Decade Building an advanced DRAM wafer fab costs between $15 billion and $20 billion, with a single ASML EUV lithography machine costing over $200 million. Additional investment is required for the power supply and cooling systems. The equipment depreciation period is 5 years. This translates to the wafer fab incurring tens of millions of dollars in amortization every day, regardless of whether there are orders or shipments. Equipment utilization must be maintained above 95%. Once utilization declines, the manufacturing cost per bit skyrockets. This is why the storage industry is so cyclical. When demand declines, manufacturers cannot easily reduce production; reducing production would only worsen the cost structure, forcing them to hold on and engage in price wars. Micron partially hedged this risk through long-term HBM orders, but the physical laws of wafer fab depreciation remain unchanged. Why is HBM expensive? HBM manufacturing costs are several times higher than regular DDR5, as it involves vertically stacking multiple layers of DRAM dies. A defect in any layer renders the entire module unusable. Assuming a single-wafer yield of 95% and an interlayer bonding yield of 99%, with N layers stacked, the total yield is: The overall yield of an 8-layer HBM3E is approximately 61%. The overall yield of a 12-layer HBM4 is approximately 48%. A 95% single-wafer yield is already a fairly mature technology, but even with 12 layers, more than half of the material is still scrapped in the final test. Each layer is multiplicative, not additive, and the error accumulates continuously. Why is SK Hynix's MR-MUF liquid encapsulation commercially valuable? Because it directly improves interlayer bonding yield, meaning a higher Y-bond in the formula. Why Micron must maximize the single-wafer yield ramp-up at the 1-gamma node; every percentage point increase in Y-die has an exponentially amplified effect at 12 layers. And why doesn't the price of HBM drop quickly simply because demand increases? Capacity expansion takes time, and yield ramp-up takes time; neither can be rushed. In-Memory Computing: Proposed for Twenty Years, Why Hasn't It Arrived Yet? HBM and CXL both address the data transport problem. Either speed it up, or make the memory pool more flexible. But from an energy consumption perspective, transport itself is the problem. The PIM (Personal In-Memory) computing concept integrates the computing units directly into the DRAM; the data doesn't move, the computation happens in-place, and only the results are transmitted. This idea is theoretically very elegant, but it's stuck on a fundamental physical contradiction. DRAM transistors require low leakage current to allow capacitors to store charge. Therefore, DRAM manufacturing processes use transistors with high threshold voltages, resulting in slow but stable switching. Logic chips like CPUs and GPUs require extremely fast switching to achieve clock speeds of several GHz, necessitating low threshold voltages at the cost of high leakage current. These two requirements are completely contradictory. If a processing unit were implanted on a DRAM silicon die, it would be an order of magnitude slower than a GPU. Even more problematic, the heat generated during processing would heat adjacent capacitors, accelerating leakage and compromising data reliability. So it's not that no one wants to make PIM (Process Injection Model), but rather that the physical requirements of the manufacturing process are inherently contradictory. This problem has been raised for over twenty years, and there is still no large-scale commercial solution. Currently, the path explored by manufacturers like Micron is a fallback. Instead of embedding computing units in the DRAM array, they integrate more AI computing power in the logic layer Base Die at the bottom of the HBM (Hydraulic Machine Block). The Base Die can be manufactured using TSMC's advanced logic process, bypassing the process constraints of the DRAM array. However, this is far from truly data-stationary, in-place computation; it's closer to attaching a small GPU close to memory, rather than memory itself performing computations. Therefore, Micron's current business logic is clear: leverage its 1-gamma process leadership to lower the cost per bit, utilize the high profits and capacity absorption effect of HBM to gain pricing power, and rely on long-term order lock-in to smooth out cyclical fluctuations. This logic is financially sound given the continued growth in AI infrastructure investment. However, longer-term structural problems remain unresolved. DRAM planar miniaturization is approaching its physical limits, the yield penalty for 3D stacking increases exponentially with the number of layers, and the process contradictions of in-memory computing have no short-term breakthrough path. This means that Micron's future competition will increasingly rely less on a lead in a single technology node, and more on making fewer mistakes than its competitors across yield engineering, packaging processes, and system integration. A moat is never a single technology, but rather a comprehensive capability to manage all these physical constraints simultaneously. And this capability requires decades of manufacturing experience to develop. In conclusion, while researching Micron Technology, I discussed something with my good friend @deepbluue: "Insufficient computing units lead to increased area, but larger areas affect yield. Then comes interconnection, which presents communication and material handling issues. Stacking then presents heat dissipation problems, which in turn affect yield. It's a cycle. Ultimately, chip development boils down to materials science, and in the end, we rely on superconducting materials." I asked an AI, and surprisingly, it said, "In just a few dozen words, it accurately summarizes the 'Whac-A-Mole' deadlock that the semiconductor industry, especially AI chips, has been trapped in over the past fifty years." Your reasoning is very rigorous; the first half is entirely based on industry consensus. Only the final conclusion regarding "superconducting materials" contains a fatal physical flaw in its engineering application. "Insufficient computing units lead to increased area, affecting yield"—Reticle Limit and the economic wall; "Still interconnecting, interconnecting has communication and transport issues"—RC Delay and the memory wall; "Starting stacking, stacking has heat dissipation issues, affecting yield"—Heat Flux and the thermodynamic wall. "The final product of chips is materials science," is absolutely correct. However, the hope isn't for superconductivity, but rather: 1) Silicon Photonics; 2) Two-dimensional semiconductor materials; 3) A complete revolution in computing architecture (in-memory computing and neuromorphic computing).