|𝔻⟩irac's Student: Acutely Tuned & Karpathy Aware

In the world of AI there is a whole subfield dedicated to making sure AI systems work in the best interest of humanity. From my understanding this is called AI alignment or safety. My guess is that this field researches AI systems by putting them through a serious of tests (in a sandbox) to ensure they behave in the most responsive way. Other than that presumption, I'm not really sure what AI alignment or safet really means.

The reason I bring it up in this post is there was a very interesting [report] posted by Leopold Aschenbrenner¹, a leading AI safety researcher at OpenAI who was fired [1]. In summary the report discusses topics related to energy consumption and security needs for artificial general intelligence (AGI) and artificial superintelligence (ASI) systems, as well as other aspects of national security. My view is that I don't really think AI researchers really know if current neural network architecture will enable AGI or ASI, or what will, but the analysis is quite through and detailed and provides some aspects to think about.

After skimming the report, what stood out to me is the fact that we (the U.S.) would need to improve our energy production and security greatly. I do think Aschenbrenner's analysis brushes aside anticipated improvements in energy efficient hardware for inference. Regardless, there is going to be a huge demand for compute based on AI systems. The other concern is in the global arena, we will be competing with each other and dominence of the world order is on the table. While I'm not so sure the gloom and doom is so pressing, it is something that cannot be ignored.

I suggest reading it even if your not really interested in these types of topics because Leopold, whether you agree with him or not, seems to be very through and brilliant in thought.

Coding GPT-2 from Scratch

I've followed Andrej Karpathy for some time since the release of ChatGPT because he is one of the few people who showed how to build these type of AI architectures from scratch². My plan is to follow Karpathy's 4-hour video and reproduce it using Julia's Flux.jl and Transformers.jl. I'm using Julia so that I avoid copying exactly the same PyTorch code that Karpathy is implementing. I was thinking about using JAX but decided to go with Flux.jl because its more familiar to me at the moment.

Footnotes

I had never heard of this person but seems very well known in the AI community. Also find him to be very interesting. ↩
For whatever reason I am draw to people who build things from first-principles. Yes this is not efficient for useful tools or applications but in my experience it has been really hard to get to a 90% "expertise" level without building the darn thing from scratch. When I was learning Finite-Difference Time-Domain Methods, moelcular dynamics, or density functional theory, I always wrote a basic code from scratch to prove to myself that I knew how the guts worked. ↩