While significant advancements have been made in computer vision the past few years, teaching a computer to identify objects as they change shape remains an Achilles heel in the field, particularly with generative artificial intelligence (AI) systems.
Computer science researchers at the University of Maryland have come up with an innovative project to tackle this problem, using objects that we alter everyday—fruits and vegetables.
Their end product is Chop & Learn, a dataset that teaches machine learning systems to recognize 20 different types of fruits and vegetables in various forms—even as they’re being peeled, sliced or chopped into pieces.
The project is being presented this week at the 2023 International Conference on Computer Vision (ICCV) in Paris.
“You and I can visualize how a sliced apple or orange would look compared to a whole fruit, but machine learning models require lots of data to learn how to interpret that,” says Nirat Saini, a fifth-year computer science doctoral student and lead author of the paper. “We needed to come up with a method to help the computer imagine unseen scenarios the same way that humans do.” To develop the datasets, Saini and computer science doctoral students Hanyu Wang and Archana Swaminathan filmed themselves chopping 20 types of fruits and vegetables in seven different styles using go-pros set up at four different angles.
The variety of angles, people and food-prepping styles are necessary for a comprehensive data set, says Saini.
“Someone may peel their apple or potato before chopping it, while other people don’t, the computer is going to recognize that differently,” she explains.
In addition to Saini, Wang, and Swaminathan, the Chop & Learn team includes computer science doctoral students Vinoj Jayasundara and Bo He; Kamal Gupta, who graduated with their Ph.D. in computer science in May and is now at Tesla Optimus; and their adviser Abhinav Shrivastava, an assistant professor of computer science.
“Being able to recognize objects as they are undergoing different transformations is crucial for building long-term video understanding systems, as well as dealing with the long-tail problem in object recognition,” says Shrivastava, who also has an appointment in the University of Maryland Institute for Advanced Computer Studies. “We believe our dataset is a good start to making real progress on the basic crux of this problem in compositional image generation and action recognition.”
In the short term, Shrivastava says that the Chop & Learn dataset will contribute to the advancement of image and video tasks such as 3D reconstruction, future frame prediction, video generation, summarization, and parsing of long-term video.
In the long-term, those advancements could have an impact on applications like safety features in driverless vehicles or helping officials identify public safety threats. It’s also not out of the realm of possibilities that Chop & Learn could contribute to the development of a robotic chef that could prepare healthy meals in your kitchen on command.
—Story by Maria Herd, UMIACS communications group