My Research Journey - from SNN to Adversarial Examples to LLMs

I started my research journey in 2015, when I was a sophomore undergraduate student major in physics at Tsinghua University. At that time, the field of deep learning was just beginning to bloom. The success of AlexNet in 2012 had sparked a wave of enthusiasm, and by 2015, we were seeing rapid advancements in image classification, object detection, and natural language processing. Multiple model architectures and training techniques were being proposed, and the community was eager to explore the potential of deep learning in various domains. There was a direction of research parallel to deep learning, which is Spiking Neural Networks (SNNs). I was fascinated by the idea of mimicking the brain’s neural activity and decided to explore this area. As an undergraduate student, I published my first paper on SNNs in 2017, which proposed a STDP-based learning algorithm for SNN, which do not require gradient backpropagation. However, I soon realized that this direction was not going to lead to foundamental breakthroughs in AI, since all the works in this area were trying to prove the equality between SNNs and ANNs, rather than proposing a fundamentally new learning paradigm.

Starting from 2017, as I began my Ph.D. studies in Computer Science at Tsinghua University, I shifted my focus to the robustness of deep learning models, particularly in the context of computer vision. The discovery of adversarial examples in 2014 had revealed a critical vulnerability in deep learning models, and I was intrigued by the implications of this phenomenon. At that time, the field of adversarial examples was still in its infancy, many methods were proposed to defend against adversarial attacks, but almost all of them were soon broken by stronger attacks. I knew that this was a fundamental problem, and I wanted to find a solution. At the time, a seemedly promising direction was to use brain as a source of inspiration, since the human visual system is known to be robust to adversarial perturbations. I’ve spent a significant amount of time trying to understand the mechanisms of the human visual system and how they could be applied to improve the robustness of deep learning models. I talked to many neuroscientists and read a lot of neuroscience papers. I proposed and experimented with various biologically inspired architectures and training methods, but none of them were able to achieve significant improvements in robustness under strong attacks, just like most other methods in the field. I eventually realized that the problem was not with the specific methods, but with the entire research paradigm. Brain-inspired methods did contribute to our understanding of the problem, but they were still some sort of appearances rather than the essence. The solution of the robustness problem requires a fundamental shift in our understanding of how deep learning models learn and generalize, rather than just adding more “biological features” to the models.

As adversarial examples in the digital world is sort of a “toy setting” and will not directly cause real-world harms, I started to research on it’s counterpart in the physical world in 2020, as a turning from theoretical problem to a more practical one. I published several papers on physical adversarial examples, which are designed to red-team the robustness of real-world deployed models, such as person detection models and autonomous driving models. Two of them were luckily accepted as oral presentations at top-tier conferences. However, I knew that this was still not the end of the story. The real-world harms of adversarial examples does not bring us closer to the solution, but rather it just rings the alarm bell louder.

After the rise of Large Language Models (LLMs) in 2022, many researchers in the related fields shifted their focus to the safety and security of LLMs, which is a more immediate and practical problem. And so did I, beginning with my postdoctoral research at UC Berkeley. I’ve been researching and publishing papers on jailbreaking, prompt injection, and some defense methods. The same pattern repeats: people propose defense methods, but they are soon broken by stronger attacks. The problem is still the same: we are treating the symptoms rather than the root cause.

As a researcher, I am always looking for the next big challenge, and I am excited to continue exploring the frontiers of AI research. My ultimate goal is to contribute to the development of Artificial General Intelligence (AGI) which benefits humanity, and I view robustness as a necessary condition for it. I believe that the pursuit of robustness is not just about fixing vulnerabilities, but about probing the fundamental “knowledge and reasoning” capabilities of deep learning models.

Credit to Gemini for helping me write this essay.

Enjoy Reading This Article?