OpenAI recently updated their image generation tool with ChatGPT and it's getting a lot of attention. I took some time to see what it could do and it is quite impressive. I've actually updated the illustration sketch of Dirac for this blog (see upper right corner). What stands out is that it can produce almost any style of drawing/illustration you want and is able to reproduce text with very close to perfect accuracy. I think this might be particularly useful for refining scientific hand drawn sketches. In the early days of ChatGPT, I was able to get it to write tikz code for some drawing lattices1. So in this post I wanted to explore how well we can use the new image generation tool to refine and make hand drawn scientific or technical sketches into something more professional looking.
Test Case 1:
The first image I sketched was an illustration of the charge polarization of atom in an electric field. The sketch is based on that in the book by R. LeSar [1]. Here is my hand drawn sketch:
Figure 1. Hand drawn sketch of charge polarization based of R. LeSar [1] |
The following is the prompt that I gave ChatGPT 4o to refine the sketch:
Turn this hand drawn image illustrating polarization the model for an atom into a scientific sketch illustration with appropriate labels. Correct any mistakes.
Here is the result:
Figure 2. ChaptGPT new image generation tool sketch of charge polarization |
So for a first shot its not too bad! It basically got almost everything correct. I want it to look a little more hand drawn like old school scientific illustration. I also tried to be more specific and gave the following prompt:
Turn the original hand drawn image illustrating electron charge polarization model for an atom into a line-art sketch illustration with appropriate labels that looks like old school scientific illustrations. Correct any mistakes. Pay attention to detail.
Here is the result:
Figure 3. Specifying some other details and style. |
So not entirely accurate now, but looks like old hand drawn scientific illustration, and we can actually specify what needs to be edited/changes by clicking on the image and brushing over regions of the image and re-prompting that LLM. So after iterating two times to improve some details I was able to get the following, which I'm not sure is exactly what I want but things look nice!
Figure 4. Final generated sketch of charge polarization |
Test Case 2:
Now want to try to change an existing diagram image in a paper that I published on hybrid quantum-classical computing algorithm [2] to see what it would look like in a more artistic style. Here is the original diagram:
Figure 5. Original diagram of hybrid quantum-classical computing algorithm (fig. 1 in ref. [2] |
Using the following prompt:
The figure attached is from a paper that describes a quantum-classical hybrid algorithm for calculating the eigenspectrum in quantum chemistry calculations. The figure illustrates the quantum circuit and gate operations for the algorithm. Can you redraw it in a calligraphy and watercolor approach. No need to include the caption.
We get th following result:
Figure 6. Artistic style hybrid quantum-classical computing algorithm |
which looks kind of nice but lots of errors in the qubit lines, especially in part c. I tried to iterate over a few times to fix these errors, but didn't have much luck. So then I tried to make the original figure in ascii art style to see what it would produce, but it just didn't work. Then I tried a pencil and pen sketch style and gave me something that was pretty good:
Figure 7. Pencil and pen sketch style of diagram of hybrid quantum-classical computing algorithm |
This could be a problem in scientific publishing
This brings up a bit of a controversial topic, it could very much be the case people start manipulating published figures or add "data points" into existing plots using these types of image generation tools. Of course good digital forensics could catch this, but reviewers or even just readers who read papers on arXiv that contain such manipulations could easily be fooled. I believe there is a need to require that the raw data be embedded into the figure file itself. So for plots it will have .csv or .json like files embedded into the figure file and for images maybe .svg?
Test Case 3:
For this case study I want to take a paper [3] I've been reading and have it summarized and then generate a diagram or infographic about the paper. This type of image generation could be very useful for social media or publicizing ones paper; kind of like an ideal visual abstract. What I had to do was first get ChatGPT to summarize the details of the paper I was interested in and then use the to prompt below with the image generation tool to create an inforgraphic.
Turn the summary below into a detailed infographic image for others to understand the steps and results. Don't over clutter or be too detailed, but remain factual and accurate.
`[SUMMARY OF PAPER]`
Figure 8. Infographic of methodology and results of a paper |
As you can see in Figure 8 its pretty awful. It looks cool, but there is a bunch of nonesense and errors that it barely captures what the original paper was about. So I call this one a bust; a lot of improvements are needed to go from a paper text summary to a useful infographic2 that can be used for social media or publicizing the paper.
Summary
The only thing I found the new image generation tool good for is changing the style of an image that contains objects like people or animals. For scientific illustration and plots, I think it still has a long way to go before it can be useful. What I and I assume others really want is to be able to draw a sketch of a scientific illustration and then turn it into a rendered image of a specific style with refinements. This would make the ideation or learning process really useful because you could go from crude idea to polished visual representation. It could be that I missed already existing tools that can do this, probably so, but I'm just not aware of them and was seeing some cool capabilities of the ChatGPT image generation tool.
Footnotes
-
I forgot if this is with ChatGPT 3.5 or 4, but I'm sure 4o and other models would do this even better now. And the question is do you use the image generation modal aspect or the coding one to make such illustrations now? ↩
-
What may be even more useful in the long term is turning papers into presentation slide decks. This would let authors focus on the detailed manuscript and they wouldn't have to be distracted by distilling down to slide format; although there is value in having to do this yourself has it forces you to refine your messaging and what is important in your work. Still would require review and practice for actually presenting say at a conference. ↩
References
[1] R. LeSar, Introduction to Computational Materials Science: Fundamentals to Applications. Cambridge: Cambridge University Press, 2013. URL.
[2] P. Jouzdani, S. Bringuier, Hybrid Quantum-Classical Eigensolver without Variation or Parametric Gates, Quantum Reports 3 (2021) 137–152. DOI.
[3] J. Qi, et al., Robust training of machine learning interatomic potentials with dimensionality reduction and stratified sampling, Npj Comput Mater 10 (2024) 1–11. DOI.
@misc{Bringuier_3APR2025,
title = {Trying Image Gen Tool for Scientific Illustration},
author = {Bringuier, Stefan},
year = 2025,
month = apr,
url = {https://www.diracs-student.blog/2025/04/}#
{trying-chatgpt-image-gen-tool-for.html},
note = {Accessed: 2025-04-04},
howpublished = {Dirac's Student [Blog]},
}