Search Blogs

Thursday, September 19, 2024

Some Updates

I haven't been writing many blog posts over the last two months—just playing catch-up with daily life. However, I came across a few tools during my last post that I wanted to share.

ML Potentials

The pace of new universal ML potentials for atomistic simulations is ramping up, with proprietary models currently leading the way. It seems like much of the activity is following the trend set by large language models (LLMs), 🀷‍♂️. Earlier this month, I spent an evening setting up the Orbital Materials pre-trained potential models to test some property predictions on specific systems and compare them to other physics-informed potentials. I haven't fully benchmarked this yet but plan to include it in my draft pre-print1.

As I mentioned in a previous post, if you're looking for a good overall comparison of different ML potentials, check out the Matbench Discovery paper [1]. There is also this recent effort aimed at benchmarking more specifically for MD simulations, though it’s still under development and hence sparse in detail.

The Orbital pre-trained model(s) seems to perform reasonably well, on par with the MACE-MP2 model [3]. One thing I wanted to achieve was to make the Orbital models ASE calculator available for use with LAMMPS via a Python wrapper. Fortunately, the skeleton code for this had already been done by the AdvancedSoft Corp team for M3GNet, making it straightforward to implement a similar wrapper. I went ahead and did so; you can find it here. It seems to work correctly, but I have yet to fully stress-test the implementation. The benefit of having the LAMMPS interface is that you can now utilize many of the compute and fix commands available in LAMMPS.

One thing I want to refactor in the Orbital LAMMPS wrapper is to eliminate the use of the Python driver script. While this approach allows modifying the ASE Calculator interface, I'm unsure if it's the most efficient solution. It seems more robust to instantiate the Python class and directly call its methods from within the C++ code with PyBind11, particularly for better memory management and reduced overhead.

Turning Papers into Podcasts

Reading research papers is something I spend a lot of time on—it's part of being a scientist or engineer. The challenge is the sheer volume of papers being published, making it practically impossible to keep up by reading them all. You need to find ways to efficiently digest and select the ones worth your time3. One of the best outcomes of the recent advancements in generative AI and LLMs is the ability to consume information in one medium and convert it into another. In other words, you can now turn papers into podcasts or lectures and listen to them during your commute to school or work. I'm not referring to mere dictations (although you can do that too), but using AI to distill the important details of a paper into a script and then have it narrated.

There are a few tools out there that do this. The two I've been experimenting with are PDF to Audio Converter [4] from the Buehler LAMM Group at MIT and PDF to Podcast4. So what kind of quality do you get? I’d say it's pretty impressive. Here's an audio lecture based on the tutorial paper "How to train a neural network potential" by Tokita and Behler [5]:

It definitely captures the university lecture style, although it may not convey all the nuances of the tutorial paper. Still, it’s a great way to decide if you want to dive deeper into the paper. I'm not sure how it handles figures or equations, but it would be interesting if it tried to describe them. If you prefer a conversational style (e.g., question-answer format), you can switch the profile.

Setting up PDF to Podcast locally is fairly straightforward, but you can also use the HuggingFace web app. You’ll need your OpenAI API key, and the cost for the example above was about $0.30 USD using GPT-4o. You also get the transcript of the paper, so you can review it afterward if you want to find a specific detail.

My goal is to use this method to go through papers while I'm in the car, at least to determine if they are of interest and worth my limited time to read. After all, nothing beats reading a good paper in depth.

Footnotes


  1. Yes I know its kind of funny to say "draft pre-print" but I just haven't had the time to turn this into a arXiv uploadable preprint yet. 

  2. This is just the model parameters the actual model code is here

  3. Most papers are a waste of time and really are making an impact in a meaningful way. 

  4. This was actually the code the Buehler LAMM group used for their project. 

References

[1] Riebesell, J., Goodall, R., Jain, A., Persson, K., & Lee, A. (Date TBD). Can machine learning identify stable crystals? [Preprint]. Matbench Discovery. Retrieved from https://matbench-discovery.materialsproject.org/preprint.

[2] Y. Chiang, MLIP Arena, https://github.com/atomind-ai/mlip-arena.

[3] I. Batatia, D.P. Kovacs, G.N.C. Simm, C. Ortner, G. Csanyi, MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields, in: A.H. Oh, A. Agarwal, D. Belgrave, K. Cho (Eds.), Advances in Neural Information Processing Systems, 2022. https://openreview.net/forum?id=YPpSngE-ZU.

[4] A. Ghafarollahi, M.J. Buehler, SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning, (2024). https://arxiv.org/abs/2409.05556.

[5] A.M. Tokita, J. Behler, How to train a neural network potential, The Journal of Chemical Physics 159 (2023) 121501. https://doi.org/10.1063/5.0160326.



Reuse and Attribution