Search Blogs

Thursday, June 20, 2024

Terminal Diary Galore

I had this idea the other day about tracking my entire life in the Linux terminal. My thinking is that I have probably typed hundreds, if not millions, of content into the terminal and when I try to recall them I end up doing a Google search1 to reproduce something I've probably done hundreds to thousands of times. The idea I had of tracking my terminal input/output by recording it, came about when I stumbled accidently upon asciinema and thought "well this might be useful". At first I thought "oh I'll just use this to record workflows that are complicated and I might forget", but then I realized I'll just forget what recording I need to watch.

Then I saw an interesting [post] by Miles Cranmer regarding an approach on how he is using a vector database to make finding and pointing others to github discussions more streamlined. In my head something went off. I was like: "Okay, I think I'm going to record my terminal sessions everytime and feed them into a vector database". Seems straightforward to do so.

Well what use is a vector database really? I mean its just semantic search, and even if I get the retrieved text I'm most likely looking for, will I really remember what I wanted to do with it or what it did. This is where using a LLM would be a tremendous utlity. I basically can have it re-teach me my terminal workflows2 and what I did by adding in context, instruction, and descriptions.

So I'm going to try proofing this out. I think the utility of this approach needs to have 2 components:

  1. It's easy to start recording and update the vector database.
  2. Querying and LLM to find what I'm looking for is straight forward.

For the vector database could use something like Pinecone or Chroma. Then need to regularly update the vector database. Once that is done, I can get the embedding for user input and then query the vector database. This will give me some number of top retrieved results but then the goal should be to use an instructed LLM to always output some useful script or set of steps. Here is an illustration of what I envision:

$ querylog "Show me how to install materials project api/pymatgen, then query the most stable copper oxide structure, and analyze the structure factor (i.e., radial distribution function)"

Which then outputs something like:

To achieve the the desired workflow:
1. In the terminal, create a python venv: python -m venv .venv and activate source .venv/bin/activate 2. In terminal, install materials project api and pymatgen: pip install -U mp-api pymatgen pymatgen-analysis-diffusion 3. To query materials project for Cu-O stable structures in python:

with MPRester(apikey) as mpr: # Query the database for all materials containing copper copper_oxide = mpr.summary.search(chemsys="Cu-O", is_stable=True, fields=["structure"])
structure = copper_oxide[0].structure

Get RDF

from pymatgen.analysis.diffusion.aimd.rdf import RadialDistributionFunctionFast rdf_func = RadialDistributionFunctionFast(structures=structure, ngrid=101, rmax=10.0, sigma=0.2) r,rdf = rdf_func.get_rdf(["Cu","O"],["Cu","O"]) plt.plot(r,rdf)

The example above is a bit generic and your standard LLM would probably get this correct on its own, but if you had some idiosyncratic way of doing things you would need to prompt the LLM so that it would know those specific details. Here the idea is to just find previous examples and have a generic LLM organize, format, and style in a informative way. Lets see if I can get something working.

Footnotes


  1. Now its more common for me to use an LLM or something like perplexity

  2. A user on the asciinema site replied to a question post I made, their thinking was to use the .history for capturing all my terminal IO. The issue is that will not capture things that go on in emacs or python interpreter. I really want to capture ever IO that goes on int the terminal. 



Reuse and Attribution

No comments:

Post a Comment

Please refrain from using ad hominem attacks, profanity, slander, or any similar sentiment in your comments. Let's keep the discussion respectful and constructive.