|𝔻⟩irac's Student: Blurbs

Your Blog Page

This is a collection of text and code snippets, thoughts, and blurbs that didn't quite make it as posts but wanted to collect them somewhere.

Origin of the term analog in computing

I was confused about the term "analog" in electronics and where it originates from with regard to the concept of signals and data. It was chosen because a signal or data can be comparable to continuously variable physical quantities. This is in contrast to digital systems, where signals or data are represented by discrete numerical values. Looking at the etymology of the word, "analog" comes from the Greek word "analogos," meaning "proportionate."

The first electronic devices were analog, as they operated by directly manipulating continuous signals. This allowed them to mimic or be analogous to the physical phenomena they were designed to measure, process, or transmit. For example, the voltage across a resistor in an electric circuit could be used to represent the temperature, with changes in voltage corresponding directly to changes in temperature. Hence, voltage signal is analogous to temperature.

Attention as a Form of message passing

Transformers: In the Transformer architecture, the attention mechanism allows each token in the sequence to directly "attend" to every other token, essentially forming a fully connected graph where nodes represent tokens, and edges represent the attention weights between tokens. This can be viewed as a form of message passing where information (or messages) from all tokens are aggregated at each token according to the attention weights. The multi-head attention further enriches this process by allowing multiple relationships (or message types) to be learned and aggregated in parallel.
Similarity to Graphs: This process is reminiscent of how information is propagated in graphs, especially in GNNs, where nodes aggregate information from their neighbors. The key difference lies in the structure of the graph. In traditional GNNs, the graph's structure is typically determined by the data (e.g., molecular structure) and is not fully connected. In contrast, the Transformer treats sequences as fully connected graphs, at least in the attention mechanism.

Message Passing in GNN

The first thing you should be told is a graph is essentially defined by adjacency matrix. This means its all just Linear Algebra. The features of a GNN can be tied to the diagonal (nodes) and off-diagonal (edges) elements. Therefore, message passing in GNN with averaging aggregation is nothing more than taking the matrix products: $\mathbf{D}^{-\frac{1}{2}}\mathbf{A}\mathbf{D}^{-\frac{1}{2}} \vec{x}$. Here $\mathbf{D}$ is the degree matrix, which is diagonal with elements $D_{ii} = \sum_{i,j} A_{ij}$ and normalizes the adjacency. For convolution over the features just stick a weight matrix $W$ between. Per usually domain people use language to make things sound more complicated.

Reuse and Attribution