Found task where LLM’s stated reasoning did not mach the reasons it articulates. Studied ways to prompt the LLM to give true information.
A study of how Dictionary learning scales with the size of the dictionary. Discovered many new interpretable features in transformers.
https://docs.google.com/presentation/d/1nSOL0pim1w7GKrexbSd_uINJ0wyVowovWW3W65SRstY/edit#slide=id.p
Studied how AIs make strategic decisions in the game of Go (using PyTorch). Worked on classifying structures in the neural network Leela Zero responsible for the strategic move of Atari.
(Image Taken from Krenn et al)
Analysed quantum experiment design systems. Did computational experiments to determine the limit of these systems. I then found simplified proofs for these bounds.