For artificial intelligence to realize its potential — to relieve humans from mundane tasks, make life easier, and eventually invent entirely new solutions to our problems — computers will need to surpass us at two things that we humans do pretty well: see the world around us and understand our language.
“Learning to see and learning to read are the two main things we need for the computer to do to gain knowledge,” said Jen Rexford, chair of Princeton’s computer science department and the Gordon Y.S. Wu Professor in Engineering. “We call these fields computer vision and natural language processing. These two fields have evolved independently but our faculty are bringing them together in interesting ways.”
In recent years, researchers at Princeton and beyond have made major strides in these two fields, opening up rapid progress across a variety of applications. “There’s been this huge transformation in the last decade,” said Olga Russakovsky, an assistant professor of computer science who works with computer vision. “We’re entering this second decade of things actually working.”
Seeing our world
Improving our ability to capture and analyze images is an essential part of bringing human, or even superhuman, visual abilities to machines such as cellphones, robots and health devices.
Felix Heide is one of the researchers who is developing AI methods to improve the computer’s eye, the camera. His goal is to help cameras evolve to the point where their vision capabilities match or surpass those of humans or animals.
“Cameras are a ubiquitous interface between the real world and machines,” said Heide, an assistant professor of computer science who works at the interface of AI, physics and optics.
Heide and collaborators at the University of Washington recently built a camera so small that it is about the size of a grain of salt. The device consists of more than a million nanoscale cylindrical posts that interact with light to produce an image. The camera combines image processing and software on the same computer chip.
The team used AI to optimize the shape and position of the posts to modulate light so that the best picture is recorded when using AI to reconstruct and refine the resulting image. The team’s approach relies on a type of AI known as an artificial neural network, modeled after the neurons and connections of the brain, combined with a model of the physics of light transport. The neurons in the model are actually computer algorithms called nodes that take in information, perform a calculation, and produce an output.
“Combining physical models with artificial neural networks is a new paradigm for designing cameras,” Heide said. “We’re able to use AI to open up an entirely different design space on the optical side.”
Future applications of such AI-driven cameras are very broad, Heide said. Placing thousands of such cameras in an array could turn entire surfaces into full-scene cameras. The tiny cameras could be built into ultra-thin endoscopes for medical diagnoses from within the body. With imaging and information processing combined in a single device, the cameras could be ideal for security applications.
AI also is helping us see objects that we’ve never seen before, such as individual proteins, life’s building blocks and sometimes the cause of diseases including Alzheimer’s. Proteins are far too tiny to visualize in detail, even with the most powerful equipment. AI could change that.
Ellen Zhong, a new assistant professor of computer science, has developed machine-learning techniques to obtain three-dimensional structures of proteins. She works with images captured using a cryo-electron microscope, which involves first freezing the proteins to quell their vibrations before imaging the sample with an electron microscope.
The resulting images contain a series of two-dimensional snapshots of the molecules from all directions. Researchers then use complex algorithms to synthesize the different views and stitch together the 3-D structure, which can reveal the positions of the atoms in these complex molecules.
Zhong uses machine learning to make sense of the patterns of complex data in cryo-EM images, helping researchers get closer than ever to accurate representations of proteins. But she doesn’t plan to stop there.
“One of the exciting future-looking areas of my research is being able to visualize full cells instead of single proteins,” said Zhong. “Right now, we can do 3-D reconstructions to visualize individual molecules, but that’s just such an isolated piece of the puzzle.”
Zhong is one of many researchers who believe that AI may be an important key to tackling the larger goal of understanding how individual proteins interact with each other within the cellular landscape. With a better understanding of these interactions, biologists can help to create new therapeutics for a number of diseases involving protein malfunction.
AI is not only helping us see new things, it is also helping us communicate through improvements in natural language processing. Such systems are behind the ability of computers to translate languages, convert speech into text, and answer spoken questions.
Helping computers understand us
Princeton’s Natural Language Processing group aims to make computers understand and use human language effectively. The group was started by two assistant professors of computer science, Danqi Chen and Karthik Narasimhan, and includes Sanjeev Arora, Princeton’s Charles C. Fitzmorris Professor in Computer Science.
Chen is working to develop machines that can access human knowledge through interactions with written and spoken language, and that have the power to comprehend, reason, and make decisions and judgments with little or no outside guidance.
“I study basic questions like how we should represent text in neural networks, how we should extract and encode information that is written in the text, and how we can retrieve relevant information and utilize it for downstream applications such as question answering and dialogue systems,” Chen said.
Over the past two to three years, the natural language processing field has transformed through the introduction of large language models (LLMs), which have started a new era of open-ended human-machine interactions via simple natural-language instructions. Despite the excitement, these LLMs may contain hundreds of billions of parameters, making them a thousand times larger than previous models.
Training these models comes at a massive financial and environmental cost, and therefore has been limited to only a handful of large corporations and well-funded research labs. “One of the major problems I am currently tackling is how to scale down these models and develop more efficient solutions for training and adapting these very large models,” Chen said.
Narasimhan is developing autonomous systems that can acquire language through interactions with their environments. He also wants to grow the capability for computers to take in textual information and use it to drive decision-making.
“Most of today’s natural language processing models focus on learning semantic representations from text alone, but deep understanding of natural language requires situational and contextual awareness for an AI system to resolve ambiguities, avoid misunderstandings and provide appropriate responses,” Narasimhan said. “Our lab focuses on embodied language understanding, with the goal of teaching machines to understand and use language in interactive, multi-modal environments.”
Narasimhan’s team also develops new methods to get computers to learn via a combination of “doing” and “reading” — just as humans do — as opposed to the “trial and error” nature of predominant AI paradigms like reinforcement learning, a training method based on rewarding desired behaviors. As an example, say you decide to pick up tennis as a sport. You could just hit balls in the court every day without any external inputs and improve slowly, but it is more likely that you would get tips from the internet or through verbal feedback from a coach to make more rapid progress.
“I imagine a not-too-distant future where AI systems can similarly use language as a way to receive distilled knowledge and guidance from human experiences through books, manuals and the internet,” Narasimhan said.
Over the last several years, Arora has been captivated by questions about how AI works, why some methods of AI work better than others, and what is happening when AI systems are learning. Arora is interested in figuring out what is going on inside of the neural network as it processes the world around it.
“My work is to understand — at a more rigorous and mathematical level — what is going on inside the training of the artificial neural net,” Arora said. “We say our goal is to open the black box.” This will help understand the answers that the neural net gives, and also may lead to better training algorithms and more robust learners.
In understanding what happens when neural nets are running, Arora hopes to help engineers better plan and design their algorithms.
Making AI smarter
AI has caught up with humans in many ways, becoming as good as we are at recognizing familiar images, translating languages and converting text to speech. And AI can do these things faster than most humans can. But can AI really help people create, and innovate?
In Ryan Adams’ laboratory, the question researchers are chasing is, can they design new things using AI?
“We have generative models that synthesize new pictures and text,” explained Adams, a professor of computer science and director of Princeton’s undergraduate program in statistics and machine learning. “But we’re also working on how you can use AI to create new kinds of designs for real-world objects, for example, inventing new antibiotic molecules, new mechanical systems, or new materials. Even more than just design, we want AI to help us make these things, too.”
One recent innovation to come out of Adams’ research is the application of AI models to computer aided design (CAD) tools. Adams and his team created AI software trained with human-designed CAD sketches that can automate suggestions of new inventions on its own. “Think about using Microsoft Word and you misspell something and it autocorrects or it suggests new text,” explained Adams. “What if we could do that for design?”
Across the hall from his office, Adams has a laboratory space filled with machine tools, 3-D printers and laser cutters. It’s a highly physical set up, unlike most AI labs where researchers do the bulk of their work behind computer screens. “We have some fun chaos,” he said.
One of the most impactful things Adams believes he and his colleagues are doing in their research is thinking deeply about the interaction between physics and AI. “Invention is about physical embodiment,” said Adams. “It’s about making things, and you can’t be blind to the physics behind it.”
Adji Bousso Dieng is also thinking deeply about the intersection of science and AI — but in a different way.
Dieng leads Vertaix, an interdisciplinary research lab at Princeton working at the intersection of AI and the natural sciences.
“We are looking at every step involved in the scientific discovery process and developing AI methods motivated by problems arising from that process,” said Dieng, an assistant professor of computer science.
One core part of that discovery process is to ensure that machine learning algorithms are able to generate solutions or outcomes that contain the diversity that we see in the natural world. Dieng and her collaborator Dan Friedman, a Ph.D. student in the computer science department at Princeton, drew on definitions of diversity used in the field of ecology to develop a metric called the Vendi Score to measure diversity of models.
The Vendi Score looks at the similarity between elements in a sample — in one example, a large number of scent molecules — and returns a score of how diverse or different from each other the molecules are. If all the scent molecules were of the category “herbal,” for example, the score would be lower than if many more scent categories were shown.
Unlike other estimates of diversity in machine learning, the score can be used in any problem where similarity can be defined. It is unsupervised, in that it doesn’t require a human to add labels to the dataset. “For AI to enable discovery, we ought to be able to measure and incorporate diversity into the methods we develop,” said Dieng.
The interest in AI for science is rapidly growing, said Dieng. “In 10 years, the biggest impact from AI will be from the sciences.”
In the Google AI Lab, a research center near campus where Princeton and Google researchers collaborate, Elad Hazan, a professor of computer science, and his team are working on challenges such as controlling ventilators for patients as well as other situations in which machines control technologies. To do this, they are developing new algorithms to advance machine learning methods as well as make them more efficient.
The fastest known methods for training neural networks stem from Hazan’s work on optimization and are widely used in academia and industry. Hazan’s current research is concerned with the field of control, with the goal of manipulating a physical system, such as the medical ventilator, using observable signals. “The field of control goes back decades, even centuries,” said Hazan. “Our take on it is new because we’re using AI and deep learning, which are new tools and give rise to new methods.”
In his laboratory, he and his collaborators are working on developing methods to train neural networks to perform in a certain way. For example, Hazan’s methodology could be applied to controlling autonomous vehicles and robotics. “Generally, innovations in the field of control have implications to robotics,” said Hazan. “Control theory concerns manipulating a physical system in generality. It can be a ventilator, robot, drone or autonomous vehicle.”
Expanding the community
The rapid adoption of AI must be accompanied by addressing questions about racial and gender biases in AI algorithms.
Russakovsky is one of the researchers in the field grappling with ethical questions from the engineering perspective. “We’re starting to ask — as engineers, as builders of these systems — what can we do to ensure that they are equally accurate for all people,” Russakovsky said.
Previous research has found striking biases embedded in AI-driven processes. For example, facial recognition systems performed vastly more accurately when identifying men with lighter skin versus women with darker skin.
Russakovsky and her group are engineering solutions to these problems. She helped build a tool known as REVISE, which analyzes visual datasets for signs of biases, including racial and gender biases.
“It’s a very complicated space, and approaching it from a tech standpoint is kind of a Catch-22,” said Russakovsky. “You have to design technical solutions, but any technical solution you design is inherently a simplification of the underlying issue.”
Despite the challenges, Russakovsky is excited about the progress AI has made. Now that researchers know how well these visual learning applications can work in the real world, they want to push forward the limits of what they may actually achieve. “Now the question is: what are the new frontiers we can tackle?” said Russakovsky.
Attracting the next generation of AI researchers to Princeton will be important as researchers across the University continue to innovate and move into the next generation of machine learning. The collaborative opportunities are only growing. Adams, for his part, believes the University is in a position to make things possible in the AI space that may have seemed impossible even just last year. “We sort of have this balance of size and quality,” said Adams.
Princeton is small enough that an AI researcher can walk across campus to collaborate with engineers and researchers in robotics while also delivering world-class teaching and research, Adams said. “Princeton is just absolutely uniquely positioned to take things to the moon.
This story first appeared in Discovery: Research at Princeton Magazine from the Office of the Dean for Research. The full issue appears online at discovery.princeton.edu.