Reproducing GPT-2

Published:

🧠💬 ChatGPT has become an integral part of my daily life. I became increasingly intrigued to understand what was going on under the hood of the GPT model and spent my weekends 🛠️ building a GPT model — aiming to replicate OpenAI’s GPT-2 (124M) model.

🎓📚 Andrej Karpathy’s lecture series is hands-down the best resource to learn about Deep Learning and LLMs.

🚀 After a lot of tinkering, I finally completed building my GPT model and began training it on a piece of internet data. This whole journey made me appreciate even more the incredible ingenuity and exceptional engineering that goes into OpenAI’s models. 🙌

🤖 I’m eagerly awaiting the launch of OpenAI’s open-source reasoning model, and now that I’ve got a better grasp of how LLMs work, I’m super excited to start applying them in my projects! 🔍⚙️


📂 GitHub Repository: nano GPT-2

🎥 Neural Networks: Zero to Hero by Andrej Karpathy: Lecture Series

🧪 Finally training my model: Project Image