Looking at Fish Identification through a Data Lens

Identifying fish is harder than you would think, and because of this, it will take decades to track down and categorize every species of fish. However, Dr. Dan Distel and Hannah Appiah-Madson from Northeastern University are simplifying this incredible hurdle with the Ocean Genome Legacy Center; where they are building a database to categorize snapper as well as other commercially sold seafood species.

The origin for this concept comes from a 2000 editorial from “Science of the Year” where they asked the question, “Why weren’t we saving the DNA of endangered species.” Distel said, “What we decided to do was to create a taxonomically, geographically, and ecologically diverse collection of marine organisms, tissues, and DNA samples. We opted to try to make those samples available as freely as possible to the research community.”

In 2008, they began to start loaning samples to museums in the local area, and now have grown to having loaned over 7000 samples.

“Our main goal rather than to just squirrel these samples away forever is to try and get these samples used to promote research, because our feeling is that we need [knowledge] to protect the environment. We need to know what’s out there and what we can do to protect it.”

In 2017, two researchers from Brandeis University reached out to Distel’s team to provide fish samples for their new technology named FASTFISH-ID, a more efficient and faster way for fish species authentication. Soon, a collaboration was formed between both teams, which led Dr. Distel to the Seafood Industry Research Fund (SIRF) for funding. “We reached out to SIRF and applied for a grant to do two things.” Distel said, “One was to build a reference collection of validated tissue and DNA samples that could be used not only to validate this method but any method of seafood identification. Two was to validate FASTFISH-ID’s s performance.”

Findings: Snapper is easily misidentified, even by museums

With FASTFISH-ID technology, the research project was going swimmingly. “The FASTFISH-ID chemistry seemed to be pretty robust. It appeared to be reproduceable, and it appeared to be capable of differentiating closely related fish species.”

With this new and extremely helpful technology in their hands, Dr. Distel focused on snapper for one key reason: it is easily misidentified. “Within the market name “snapper” there are specific names like red snapper, and so, it is difficult to apply these names correctly and it’s difficult to evaluate whether these names have been identified correctly in the marketing environment. Our idea was to reach out and get validated specimens from museums, presuming that these would be the gold standard for identification.”

However, Dr. Distel ran into a hurdle. The “gold standard” samples he received from the lab had a a small amount of errors with no clear reason why these errors occurred. “We couldn’t tell if those errors were due to mislabeling of samples or if the [samples] were too difficult to tell apart.”

He said, “Our sleuthing into databases gave us pretty strong indications that there are certain species that are very hard to tell apart and even experts were misnaming these species. We found a lot of crossover which showed it was even hard for these experts.”

The Importance of Data Integrity

Nature doesn’t care whether it’s easy to tell apart or not. It’s only people that care about telling them apart. That’s one of the biggest issues that this research project has is confirming the data, and more specifically, making sure it holds up in the future. Taxonomy is a dynamic science and what belongs in which taxonomic group changes over time.

Dr. Distel explained, “What we wanted to do was build a method that allows our classifications to evolve as our data evolves and also gives us quantitative measures of what is the center and the boundaries of these clusters. To test accuracy, the real thing is what does it agree with, what is accepted by both scientists and the industry as proper classification of these taxon.”

“In that way as the database grows, our ideas of what the boundaries of what these natural groups are gets better and better. What we are looking for is to define those boundaries of those natural clusters that appear in the data.”

Mislabeling Has a Global Impact

The impact of Ocean Genome Legacy is something of importance for Dr. Dan Distel and his team, who want their work to prevent mislabeling and fraud within the seafood marketplace. For them, consumer confidence in the products they buy is extremely important. “Mislabeling has a fairly large cost, it costs consumers because they potentially get a product they didn’t want or is a lower value than the product they wanted. It has health implications because it prevents consumers from knowing what they are purchasing and making good purchasing decisions.

“It’s a crime to sell something’s that has been mislabeled.” Dr. Dan Distel said. “There are good actors out there who want to make sure they are doing the right thing, and there are bad actors who are just trying to rip people out. We want to help out the former group and hurt the latter group.”

“There’s a real appetite in all aspects of the seafood industry to improve sustainability and consumer confidence in their products, which provides a market for these products.”

For more information on SIRF and our other funded projects, you can find more here: www.SIRFonline.org