Text: Lan Xi
The past ten days since DeepSeek became popular, it has actually been the most noisy period. To be honest, most of the discussion products have a taste of working overtime to meet KPIs. People are arguing about whether they are human or ghost. There are only a few that are worth retaining. However, there are two podcasts that benefited me a lot after listening to them. I highly recommend them.
One is that Zhang Xiaojun invited Pan Jiayi, a doctor from the AI Laboratory of the University of California, Berkeley, to explain the DeepSeek paper sentence by sentence. The high-density output for nearly 3 hours can kill brain cells, but the endorphins secreted after killing are also explosive.
The other is Ben Thompson's 3-episode podcast collection on DeepSeek, which adds up to more than an hour. This guy is the founder of News Letter and one of the most technically savvy analysts in the world. He lives in Taipei all year round and has a much higher close insight into China/Asia than his American counterparts.
Let's talk about Zhang Xiaojun's episode first. After reading DeepSeek's paper, guest Pan Jiayi developed a small-scale project to reproduce the R1-Zero model, which has nearly 10,000 Stars on GitHub.
This kind of knowledge relay is actually a projection of idealism in the field of technology. Just like Flood Sung, a researcher at Dark Side of the Moon, said that Kimi's reasoning model k1.5 was originally inspired by two videos released by OpenAI. Earlier, when Google released "Attention Is All You Need", OpenAI immediately realized the future of Transformer. The fluidity of wisdom is the prerequisite for all progress.
That's why everyone was very disappointed with the blockade statement of Dario Amodei, the founder of Anthropic, that "Science has no borders, but scientists have a motherland". While denying competition, he was also challenging basic common sense.
Let's get back to the content of the podcast. I'll try to highlight some key points for you. I recommend that you listen to the original version if you have time:
- OpenAI o1 made a stunning debut while doing very deep hiding work. It doesn't want other manufacturers to crack the principle, but from the situation it's a bit like posing a riddle to the industry, betting that you won't be able to solve it so quickly. DeepSeek-R1 was the first to find the answer, and the process of finding the answer was quite beautiful;
- Open source can provide more certainty than closed source, which is very helpful for manpower growth and output of results. R1 is equivalent to making the entire technical route clear, so its contribution to stimulating scientific research investment is better than that of o1 which hides tricks;
- Although the scale of AI industry burning money is getting bigger and bigger, the fact is that we haven't obtained the next generation of models for nearly 2 years, and the mainstream model is still aligned with GPT-4, which is very rare in a market that advocates "constant innovation". Even if we don't investigate whether Scaling Laws has hit a wall, OpenAI o1 itself is also an attempt at a new technical line, using language models to teach AI to think;
- o1 has re-achieved a linear improvement in intelligence level in the benchmark test, which is awesome. The technical report did not disclose too many details, but the key points were all mentioned, such as the value of reinforcement learning. Pre-training and supervised fine-tuning are equivalent to providing the model with the correct answers for imitation. Over time, the model learns to copy the model, but reinforcement learning allows the model to complete the task by itself. You only tell it whether the result is right or wrong. If it is right, do more of it, and if it is wrong, do less of it;

- OpenAI discovered that reinforcement learning can make the model produce an effect close to human thinking, that is, CoT (chain of thought). When an error occurs in the problem-solving steps, it will go back to the previous step and try to think of new ways. These are not taught by human researchers, but the model itself is forced to complete the task, oh no, it is an emerging ability. Later, when DeepSeek-R1 also reproduced a similar "moment of enlightenment", the core fortress of o1 was actually broken;
- The reasoning model is essentially a product of economic calculation. If the computing power is forcibly piled up, it may still be able to produce an effect similar to o1 when it reaches GPT-6, but that is not a miracle with great effort, but a miracle with miracle. It is possible but not necessary. The model capability can be understood as training computing power x reasoning computing power. The former is already too expensive, and the latter is still very cheap, but the multiplier effect is almost the same, so now the industry has begun to take the reasoning route with better cost performance;
- The release of o3-mini at the end of last month may have little to do with DeepSeek-R1, but the price of o3-mini has dropped to 1/3 of o1-mini, which must have been greatly affected. OpenAI believes that ChatGPT's business model has a moat, but selling APIs does not, and it is too substitutable. There has also been controversy in China recently about whether ChatBot is a good business. Even DeepSeek obviously hasn't figured out how to take on this wave of traffic. There may be a natural conflict between doing consumer-level market and doing cutting-edge research;
- In the view of technical experts, DeepSeek-R1-Zero is more beautiful than R1 because it has less human intervention. The model has figured out the process of finding the optimal solution in thousands of steps of reasoning, and it is not so dependent on prior knowledge. However, because it has not been aligned, R1-Zero cannot be delivered to users. For example, it outputs in various languages. So in fact, DeepSeek's R1, which has been recognized in the mass market, still uses old methods such as distillation, fine-tuning, and even pre-implantation of thought chains;
- This involves a problem that ability and performance are not synchronized. The model with the best ability may not be the best performer, and vice versa. R1's outstanding performance is largely due to the direction of manual effort. R1 has no exclusive training corpus. Everyone's corpus contains classical poetry and so on. It is not possible that R1 knows more. The real reason may be data annotation. It is said that DeepSeek has found students from the Chinese Department of Peking University to do the annotation, which will significantly improve the reward function of literary expression. Generally, the industry does not like to use liberal arts students. The fact that Liang Wenfeng himself sometimes does annotation does not only show his enthusiasm, but also that the annotation project has long reached the point where professional test-takers are needed to coach AI. OpenAI also pays 100-200 US dollars per hour to invite doctoral students to do annotation for o1;
- Data, computing power, and algorithms are the three flywheels of the big model industry. The main breakthroughs of this wave come from algorithms. DeepSeek-R1 discovered a misunderstanding, that is, the emphasis on value functions in traditional algorithms may be a trap. Value functions tend to make judgments on every step of the reasoning process, thereby guiding the model to the right path in every detail. For example, when the model is solving the problem of 1+1 equals to what, when it has the illusion that 1+1=3, it starts to punish it, a bit like electric shock therapy, not allowing it to make mistakes;
- This algorithm is theoretically sound, but it is also very perfectionist. Not every question is as simple as 1+1, especially when thousands of token sequences are inferred in a long chain of thinking. If every step needs to be supervised, the input-output ratio will become very low. Therefore, DeepSeek made a decision that goes against the ancestral teachings. It no longer uses the value function to satisfy the obsessive-compulsive disorder during research. It only scores the answers and lets the model figure out how to get the answer with the correct steps. Even if it has a 1+1=3 solution, it will not over-correct it. Instead, it will realize that something is wrong during the reasoning process and find that it cannot get the correct answer by calculating in this way, and then make self-correction;
- Algorithms are the biggest innovation of DeepSeek to the entire industry, including how to distinguish whether the model is imitating or reasoning. I remember that after o1 came out, many people claimed that the general model could also output thought chains through prompt words, but those models did not have the ability to reason. In fact, they were just imitating. It still gave answers in the usual way, but in order to meet user requirements, it went back and gave ideas based on the answers. This is imitation, a meaningless action of shooting arrows first and then drawing targets. DeepSeek has also made a lot of efforts in cracking rewards against models, mainly to address the problem of models becoming cunning. It gradually guessed how to think to get rewards, but did not really understand why to think in this way;
- In recent years, the industry has been looking forward to the emergence of models. In the past, people thought that if the amount of knowledge was sufficient, the model would naturally evolve wisdom. However, after O1, it was found that reasoning seemed to be the most critical springboard. DeepSeek emphasized in the paper which behaviors of R1-Zero emerged autonomously rather than under human command. For example, when it realized that generating more tokens could make it think more perfectly and ultimately improve its performance, it began to actively lengthen its thinking chain. This is instinctive in the human world - long thinking is of course more strategic than fast chess - but it is very surprising to let the model draw such experience on its own;
- The training cost of DeepSeek-R1 may be between 100,000 and 1 million US dollars, which is less than the 6 million US dollars of V3. In addition, after open source, DeepSeek also demonstrated the results of using R1 to distill other models, and continued reinforcement learning after distillation. It can be said that the open source community’s support for DeepSeek is not without reason. It has turned the ticket to AGI from a luxury product into a fast-moving consumer product, allowing more people to come in and try;
- Kimi k1.5 was released at the same time as DeepSeek-R1, but because it is not open source and has insufficient international accumulation, although it has contributed similar algorithmic innovations, its influence is quite limited. In addition, Kimi, influenced by the 2C business, will emphasize the use of short thinking chains to achieve methods close to long thinking chains, so it will reward k1.5 for using shorter reasoning. Although the original intention is to cater to users - not wanting people to wait too long after asking questions - but it seems that there are some counterproductive returns. Many of DeepSeek-R1's popular materials are the highlights in the thinking chain that are discovered and spread by users. For those who are first exposed to reasoning models, they don't seem to mind the lengthy efficiency of the model;
- Data annotation is a point that the entire industry is hiding, but this is only a transitional solution. A self-learning roadmap like R1-Zero is ideal. At present, OpenAI's moat is still very deep. Last month, its Web traffic reached an all-time high. The popularity of DeepSeek will objectively attract new users for the entire industry, but Meta will be more uncomfortable. LLaMa 3 There is actually no innovation in the architecture layer, and the impact of DeepSeek on the open source market was not anticipated at all. Meta has a very strong talent pool, but the organizational structure did not transform these resources into technical achievements.
Let's talk about Ben Thompson's podcast. He cross-verified Pan Jiayi's judgment in many places. For example, R1-Zero removed the technical highlight of HF (human feedback) in RLHF, but more discussions were placed on geopolitical competition and the past of large companies. The narrative is very smooth:
- One of the motivations for Silicon Valley to overemphasize AI safety is that it can use it to rationalize closed behavior. As early as in the GPT-2 protocol, it was to avoid the use of large language models to generate "deceptive and biased" content, but "deceptive and biased" is far from the risk of human extinction. This is essentially a continuation of the cultural war, and it is based on an assumption that "when the granaries are full, the etiquette is known", that is, American technology companies have absolute advantages in technology, so we are qualified to distract ourselves from discussing whether AI is racially discriminatory;
- Just like what OpenAI said righteously when it decided to hide the o1 thought chain - the original thought chain may not be aligned, and users may feel offended after seeing it, so we decided to cut it off and not show it to users - but DeepSeek-R1 disproved the above mysterious confidence in one fell swoop. Yes, Silicon Valley does not have such a solid leading position in the AI industry. Yes, the exposed thought chain can become part of the user experience, making people trust the model's thinking ability more after seeing it;
- Reddit's former CEO believes that describing DeepSeek as the Sputnik moment - the Soviet Union launched the first artificial satellite before the United States - is a forced politicized interpretation. He is more certain that DeepSeek is in the Google moment in 2004. In that year, Google showed the world in its prospectus how distributed algorithms connect computer networks together and achieve the optimal solution for price and performance. This is different from all technology companies at the time. They just bought more and more expensive hosts and were willing to be at the most expensive front end of the cost curve;
- DeepSeek open-sourced the R1 model and transparently explained how it did it. This is a huge act of goodwill. If the Chinese company had continued to incite geopolitics, it should have kept its achievements confidential. Google has also drawn the finish line for professional server manufacturers such as Sun, pushing the competition to the commodity level.
- OpenAI researcher Roon believes that DeepSeek's downgrade optimization to overcome the H800 chip - engineers cannot use Nvidia's CUDA and can only choose the lower-end PTX - is a wrong demonstration, because it means that the time they wasted on it cannot be made up, while American engineers can apply for H100 without worries. Weakening hardware cannot bring real innovation.

- If Google in 2004 had listened to Roon's advice and not "wasted" valuable researchers to build more economical data centers, then perhaps American Internet companies would be renting Alibaba's cloud servers today. In the past two decades of wealth influx, Silicon Valley has lost the driving force to optimize infrastructure. Large and small factories have become accustomed to capital-intensive production models and are happy to submit budget forms in exchange for investment. They even used Nvidia's chips as collateral. As for how to deliver as much value as possible with limited resources, no one cares; AI companies will certainly support the Jevons paradox, that is, cheaper computing creates more usage, but the actual behavior in the past few years has been inconsistent, because every company has shown a preference for research over cost, until DeepSeek really brought the Jevons paradox to everyone's eyes; Nvidia's company becomes more valuable, and Nvidia's stock price becomes more risky. These can coexist. If DeepSeek can achieve such an achievement on a highly restricted chip, then imagine how much technological progress will be when they get full-power computing resources. This is an inspiring revelation for the entire industry, but Nvidia's stock price is based on the assumption that it is the only supplier, which may be falsified;
- Chinese and American technology companies have obvious differences in the value judgment of AI products. China believes that differentiation lies in achieving a superior cost structure, which is consistent with its achievements in other industries. The United States believes that differentiation comes from the product itself and the higher profit margins created based on this differentiation, but the United States needs to reflect on the mentality of winning the competition by denying innovation, such as restricting Chinese companies from obtaining chips needed for AI research;
- No matter how good Claude's reputation is in San Francisco, it is difficult to change its natural weakness in the sales API model, that is, it is too easy to be replaced. ChatGPT makes OpenAI as a consumer technology company more resistant to risks. However, in the long run, DeepSeek will benefit both those who sell AI and those who use AI. We should be grateful for this generous gift.
Well, that's about it. I hope this assignment can help you better understand the real significance of DeepSeek to the AI industry after it goes viral.