Search Blogs

Thursday, April 27, 2023

Atomistic Calculations using GNN

Disclaimer

I'm still very much working through M3GNet, CHGNet, and MACE papers. As a result I may get some of the GNN concepts and descriptins incorrect. Apologies in advance. I will provide updates as I continue learning.

I'm in the process of better understanding the recent release of graph neural network (GNN) interatomic potentials. I'm primarily focused on M3GNet and CHGNet potentials which include 3-body effects by updating the GNN structure [1,2]. The CHGNet adds the feature that it captures dynamic charge transfer effects, although the predictive improvement seems marginal over M3GNet/MEGNet based on a recent matbench-discovery preprint [3]. I will say that in general, I'm fairly optimistic about this class of ML potential and its use as an initial effort to do MD simulations across the periodic table. For reference here are some figure-of-merits for the GNN from ref. [3]:

Table showing a comparison of GNN using Matbench Discovery [3].

Based on the results it's clear that M3GNet and CHGNet perform better overall. CHGNet is at the top and especially does better in the discovery acceleration factor (DAF) which is a measure of how well the model predicts a stable structure (as compared to a random/dummy model).

So what is allowing these potentials to work in the first place? The answer is intricate GNNs. So I'm going to do my best here to try and describe a GNN in my understanding. I'll probably get a lot of points wrong but here's my go at it.

We first need to define what a graph is. A graph is a connected network that contains 3 major components. A set of vertices or nodes, $v \in V$, a set of edges, $e \in E | e; \text{connects to}\; v$, and global state, $U$. Okay, this seems reasonable and at first glance, you can see how an atomic representation of a material can be mapped to a graph. The vertices/nodes are the atoms, the edges are the bonds, and the global state are system properties like density, pressure, and temperature. There are however two additional needs for an atomic system represented by a graph:

  1. Atoms have defined features: mass, atomic number, charge, and valency.
  2. Bonds can be characterized by binding energy and lengths but also bond-pairs matter because bonding angles and dihedrals are characteristic of the chemistry.

So what do you do, you assign features to the vertices and edges corresponding to these. If you break this down then you really have a large set of graphs where each corresponds to a graph with a scalar value assigned to the nodes and edges; these are like weights and bias maybe? Great, I think, but now you need to feed this into a neural network, and for potential, you need it to predict energies, forces, and stresses. Additionally, because physics is invariant to the orientation of the system, meaning if we rotate the entire system by $\pi/2$ we shouldn't see anything change, so our NN needs to handle such inputs (i.e. if we rotate a molecule the prediction is the same). I'll touch on this invariance condition at the end.

Alright, so we have some additional details about the graph or an atomic system and constraints for the NN. We then may want to ask about the feature dimensionality of the graph; or collection of graphs if we want to think like that. This can easily become very large and so one may want to find a latent space representation that is easier to train on to match the energies and forces.  To do this we can use a convolutional NN that finds a single graph representation. Then we can feed this graph into a multi-layer perceptron network to make predictions for the energies, forces, and stresses. 

So what does M3GNet and CHGNet do here? Based on how I understand the papers, they add something that is akin to an attention mechanism to CNN. What is attention? I'm still learning this as well but essentially it is a way to have the CNN consider what the environment looks like and use that information to update the subsequent graph. In other words, say an edge corresponding to a bond is passed through the NN but doesn't know anything about the other edges or nodes, then it will simply learn about what it means to be that type of edge in the grand scheme of the network. However, if the edge is informed, via an attention mechanism (?), about the environment then one updates what it means to be an edge/bond in that context. 

There is also the addition of some more physics-informed aspects of informing/updating the edge features (i.e. bonds) in the graph. This is done by the inclusion of many-body interactions which give sets of nodes and edges, then edge features can be updated. This is very similar to the Tersoff-style interatomic potential which uses a bond-order parameter to modify the pairwise interaction between atoms. In M3GNet this step seems to occur before the update (attention) in the CNN. Basically, a GNN is trained to take in nodes and edges where the edges have features representing the bond distance as an expansion in radial basis functions and these are what gets updated and pass on to the graph CNN. Again this is what I think is going on, but could be wrong.

If I get anything wrong here please leave a comment to correct me. Also on the topic of invariance, I believe that graph CNN can be structured so that they are resistant to translation and rotation of inputs [4].

References

[1] C. Chen, S.P. Ong, A universal graph deep learning interatomic potential for the periodic table, Nat Comput Sci. 2 (2022) 718–728. https://doi.org/10.1038/s43588-022-00349-3.

[2] B. Deng, P. Zhong, K. Jun, K. Han, C.J. Bartel, G. Ceder, CHGNet: Pretrained universal neural network potential for charge-informed atomistic modeling, (2023). https://doi.org/10.48550/arXiv.2302.14231.

[3] Riebesell, J., Goodall, R., Jain, A., Persson, K., & Lee, A. (Date TBD). Can machine learning identify stable crystals? [Preprint]. Matbench Discovery. Retrieved from https://matbench-discovery.materialsproject.org/preprint.

[4] N. Keriven, G. Peyré, Universal Invariant and Equivariant Graph Neural Networks, (2019). https://doi.org/10.48550/arXiv.1905.04943.




Reuse and Attribution

Monday, April 24, 2023

A Wave of Information Overload

As a scientist and researcher, I constantly find myself in a state of awe and helplessness due to the vast ocean of information and knowledge that surrounds me. Today's world is evolving at breakneck speed, and I often feel like I'm struggling to stay afloat. This is particularly the case in science, engineering, and technology. It's a mixture of excitement, fear, and a deep-rooted curiosity that drives me forward as a continuous learner, but at times, the weight of information overload can be overwhelming. It seems so easy to fall behind if you don't invest every ounce of effort you have to be knowledgeable.

It is true that navigating this ever-changing landscape is both exhilarating and exhausting. Furthermore, as I delve into the realm of ML/AI tools like LLM and generative AI, I'm amazed by the potential they offer in helping researchers like myself manage and digest the wealth of information available. I do believe these tools will change the pace and ease of research. However, as these technologies continue to advance, I can't help but feel a sense of inadequacy in keeping up with the myriad of developments and research papers published daily.

Despite the challenges, my passion for learning remains undiminished. The thrill of discovering something new or understanding a complex concept continues to propel me on my journey as a self-proclaimed student of Dirac. I wonder how Ph.D. students, post-docs, and early career researchers feel? Do they share these sentiments, as we collectively traverse an era characterized by rapid change and uncharted territory? What about more senior scientists and engineers? What do they think, is it all hype? Have they seen this before? Then there are the powerhouse PIs/researchers who dominate in output and productivity. These individuals and groups are amazing, but how do they feel about this? Also, do they have any suggestions for the underdogs (i.e., me) and how do they suggest we swim in the sea of information they produce?



Reuse and Attribution

Thursday, April 13, 2023

Consensus Chat LLMs

As I continue my foray into large language models (LLMs), exploring how they can be used and tuned for my areas of interest (materials science, physics, and chemistry), I have started contemplating how the rapid development of fine-tuned or use-case-specific LLMs by various groups and companies might eventually lead to an extraordinary comprehension of the human-constructed world. Currently, tools like ChatGPT and GPT-4 are capable of performing a wide range of intelligent tasks. However, in many instances, they don't quite meet our expectations or aren't able to produce exceptional results. For example, if you ask ChatGPT-4 to draw a car using scalable vector graphics, it will generate something that most would recognize as a car, complete with tires, trunk, and windows. Yet, it doesn't fine-tune the drawing the way a human would. Of course, you could pass this output to Stable Diffusion or another image-generative platform to get a better result, but I'm focusing on the text-, markup-, or code-generating LLMs here.

This brings me to my thoughts on domain-specific LLMs. I believe that eventually, we will have hundreds or thousands of highly intelligent domain-specific LLM systems. While this is undoubtedly exciting, what if we connect these systems so they can send queries to one another? Furthermore, what would happen if we invoke a consensus policy based on a human user's prompt? Let's consider a question containing a factoid (i.e., something taken as true by some but not verifiable), for example:

Based on our understanding of the universe, is the following statement true: 

"There is said to be an omnipresent entity that seeded the Big Bang, which brought forth our existence. This entity is believed to remain hidden indefinitely."

If you rely on faith to describe our existence, you might consider this factoid nearly true. However, if you are a staunch atheist, this statement would be an unverifiable claim and of little value.

For the sake of this blog, let's assume that these highly capable domain-specific LLMs have no guardrails limiting how they might answer the question. If these LLMs capture the distribution of human beliefs and behaviors, each one might provide a different response that either finds the factoid compelling or purely speculative. However, if we now have the LLMs query one another, what would happen? Would they reach a consensus that aligns with the majority of humanity's beliefs, which might be in favor1 of the factoid? I'm not certain, but it seems plausible that these hundreds or thousands of LLMs could eventually encompass nearly all of humanity's collective "wisdom" or abstract constructions of our behaviors and existence.

I'm not entirely sure if what I'm proposing makes sense, but it seems possible that there could come a time when all these capable and impressive LLM systems are interconnected and start exhibiting collective behavior that surprises us. Alternatively, this could merely lead to a series of frustrating dialogues that produce dead-end responses. I eagerly anticipate discovering what the future holds for these advanced LLMs and their potential impact on our understanding of the world.



Reuse and Attribution

Thursday, April 6, 2023

Dual Numbers

A while back when I was doing some exploration of writing a simple NN code to improve my understanding of neural networks and deep learning in general, I came across dual numbers. They're a type of number that generalizes the concept of real and complex numbers. But what makes them so interesting is that they can encode both a function value and its derivative in a single number. This means that we can use them to simplify the calculation of derivatives and solve complex problems efficiently.

So how does one think of dual numbers? What's the difference between a dual number and a complex number? One way to think about dual numbers is that they consist of two parts: a scalar part and a skew part. The scalar part is just a regular real number, while the skew part is a multiple of a new number, often denoted as $\epsilon$, that satisfies the property $\epsilon^2=0$. This means that every dual number can be written as $a+b\epsilon$, where $a$ and $b$ are real numbers.

What's most interesting is that the skew part of a dual number is that it provides an approximation of the first derivative of a function evaluated at a particular point. By using the dual number representation of the function at that point, one can calculate both the function value and its derivative in one shot.  One reason dual numbers have applications in deep learning is that algebra on dual numbers provides the chain rule for calculus, therefore they can be used to compute derivatives of complicated functions involving multiple variables and interdependencies.

As an example, say I want to evaluate the function $f(x)=x^2+2x$ at $x=3$. The dual number representation of $f(3)$ is $f(3+\epsilon)=f(3)+f'(3)\epsilon$, where $f'(x)=\frac{df(x)}{dx}$. We can compute $f(3)$ directly as $f(3)=3^2+2\cdot3=9+6=15$. To compute $f'(3)$, we can take the derivative of $f$ with respect to $x$: $f'(x)=2x+2$. Evaluating this at $x=3$, we get $f'(3)=2\cdot3+2=8$. Therefore, the dual number representation of $f$ at $x=3$ is $15+8\epsilon$.

One of the benefits of dual numbers is the derivative of the composition of two functions, $f(g(x))$ requires only the derivatives of the individual functions. Specifically, if $f(x)$ and $g(x)$ are two functions, then the dual number representation of their composition $f(g(x))$ is $(f(g(x)), f'(g(x))g'(x))$. This is especially useful when dealing with complex functions involving multiple variables and complicated interdependencies.

Dual numbers are actually a useful mathematical concept because they have practical applications in a wide range of fields. It's pretty cool that one can encode function values and derivatives in a single number, which makes it possible to simplify the calculation of derivatives and solve complex problems efficiently. On my computational blog, I have an example using dual numbers to calculate the derivative of an interatomic potential, Dual Numbers Pluto blog.


Reuse and Attribution