Study of Faithfulness in LLM Classification:

Found task where LLM’s stated reasoning did not mach the reasons it articulates. Studied ways to prompt the LLM to give true information.

LLMResults.pdf

Untitled

Reading The Dictionary

A study of how Dictionary learning scales with the size of the dictionary. Discovered many new interpretable features in transformers.

arabicfeature.jpeg

scalingplot.png

LeelaGo Mechanistic Interpretability

https://docs.google.com/presentation/d/1nSOL0pim1w7GKrexbSd_uINJ0wyVowovWW3W65SRstY/edit#slide=id.p

Studied how AIs make strategic decisions in the game of Go (using PyTorch). Worked on classifying structures in the neural network Leela Zero responsible for the strategic move of Atari.

Automated Design of Quantum Experiments

Untitled

(Image Taken from Krenn et al)

Analysed quantum experiment design systems. Did computational experiments to determine the limit of these systems. I then found simplified proofs for these bounds.

http://hansgundlach.github.io/EssayCam.pdf