Preprint Casts Doubt on Study Underpinning Microbiome | Ningbo Ceramsite Production Line Co.,Ltd

NEW YORK – In January, Micronoma, a San Diego, California-based startup cofounded by University of California San Diego professor Rob Knight and his former graduate student Greg Sepich-Poore nabbed breakthrough device designation from the US Food and Drug Administration for its OncobiotaLung assay, a blood microbiome-based assay for the detection of lung cancer.

At the time, the company, which claims to be the first to use microbiome-driven liquid biopsy technology, said that "[t]he work that led to the breakthrough device designation is based on the findings of Micronoma's cofounders, published in the scientific journals Nature and Cell."

The Nature study, however, published in 2020, is now alleged to have "major data-analysis errors," as a result of which its conclusions should be considered "invalid," according to a preprint study posted on BioRxiv on Monday by researchers from the University of East Anglia in the UK and Johns Hopkins University.

Knight dismissed the criticism, asserting that his team's results are sound and reproducible.

It remains to be seen how the ongoing controversy might impact the commercial development of cancer microbiome-based diagnostic tests like Micronoma's. The company declined to answer questions from GenomeWeb regarding the concerns raised by the preprint, stating that "[the] window for being able to respond to these questions has closed."

It is also unclear whether the critique might jeopardize the regulatory approval of OncobiotaLung, which, with its breakthrough device designation, is entitled to fast-track review and assessment by the FDA. An FDA spokesperson said the agency "is not able to discuss pending applications."

The study in question

Published in 2020 by Knight, Sepich-Poore, and their collaborators, the Nature study in question presented evidence of widespread cancer-specific microbial signatures, based on DNA sequencing data from more than 18,000 samples of more than 10,000 patients from The Cancer Genome Atlas (TCGA), covering 33 cancer types.

By training machine learning algorithms, Knight's team further demonstrated that it could differentiate tumor types based on their microbial composition with high accuracy.

"I was quite elated initially when the Rob Knight paper came out," said Abraham Gihawi, a postdoctoral researcher at the University of East Anglia, who is the first author of the preprint study. "It looked like a great proof of concept."

"Then you realize, 'Oh, God, it's not necessarily exactly what it claims to be,'" he added. Gihawi's initial concerns included human sequence contamination, the handling of batch effects, false-positive classifications, and limitations in the machine learning approaches, and he published them in an earlier preprint posted on BioRxiv in January of this year.

That preprint was soon met with what Knight called a "thorough rebuttal," which his team published in a preprint posted in February.

Additionally, Knight maintained that a 2022 Cell paper he coauthored, which used updated methods, reached "the same conclusions that microbes are cancer type-specific."

Two "major errors"

After Gihawi's January preprint came out, one of the researchers who reached out to him was Steven Salzberg, a computational biologist at Johns Hopkins University, who became a collaborator and is the corresponding author of this week's preprint.

"[Steven] always suspected that something was a bit off about the data, as well, but he wasn't able to put his finger on it," Gihawi said. "So, we started working together."

The two further scrutinized the 2020 Nature study, and their new preprint, which was built upon Gihawi's previous analysis, claims there were two "major errors" in Knight's original paper. "Each of these problems invalidates the results," the researchers argued, "leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong."

For one, Gihawi and Salzberg alleged that the raw microbial read counts in the Nature study were "vastly overestimated for nearly every bacterial species" due to contamination with human sequences.

Moreover, they argued that the paper's strategy for normalizing raw data against technical batch effects, namely Voom-SNM normalization, created an artificial signature for each cancer that did not exist in reality, which was then exploited by the machine learning model to create highly accurate classifiers despite the absence of any true signal.

"We showed that the signal that has been introduced during the normalization process has created a tissue-type distinctive signature that should not exist because this is [based on] taxa that [are] completely zero," Gihawi said. "You should not be able to distinguish anything from data that does not exist."

Knight rejected these claims, accusing the preprint authors of "re-hashing a non-controversy that has already been thoroughly addressed including in the [peer]-reviewed literature." He pointed to his team's 2022 Cell paper, which he said analyzed an independent, international cohort of tumors from the Weizmann Institute in Israel with more stringent settings, such as additional human read filtering, and still arrived at the "same conclusions" that supported the previous Nature study.

The peer reviewers of the 2020 Nature study did not respond to a request for comment.

After Gihawi and Salzberg released their preprint on Monday, Knight's team also deposited a rebuttal on GitHub, where they said they repeated their machine learning analysis without using Voom-SNM normalization and still maintained the original findings.

However, Travis Gibson, a Harvard Medical School professor, argued on Twitter that there was evidence of environmental contamination in the data used for the GitHub rebuttal. Gibson did not respond to requests for comment.

Matters Arising

One thing both sides appear to agree on is the need for peer-reviewed journals to step into the ongoing debate.

"We continue to believe that Voom-SNM normalization is a useful cross-cohort integration technique until there is peer-reviewed evidence to the contrary, at which point we will assess the results and do our own investigation based on them about the limits of utility of the technique," Knight said.

"This is why [our analysis] needs to be published in a peer-reviewed journal, really," Gihawi commented.

Before releasing his preprint in January, Gihawi said he submitted his comments to Nature for publication as a "Matters Arising" article. However, Nature's editorial team declined his request, telling him that the arguments he presented "[did] not sufficiently challenge the reported conclusions" of the original study.

"We quite clearly showed that there were zero values that had been normalized," and that the method had introduced an artificial signal, Gihawi said. "We couldn't quite wrap our heads around why Nature would not contend with that."

After the rejection, Gihawi said, Nature suggested that he post his manuscript as a preprint, which he did. At the same time, he also submitted the analysis to a variety of other journals, including Science, but it was turned down because it was solely about a single paper published in Nature.

In an email, a spokesperson for Science said the journal "supports rapid post-publication feedback of research published in our journal via eLetters," but it does not "publish such feedback if it is focused on research exclusively published in other journals."

After the initial preprint came out, Gihawi said he received public and private support from various researchers. Encouraged by that, he said he contacted Nature again, asking the journal to reconsider publishing his manuscript. But the journal editors once again declined his request, noting they "continue to feel that the piece is not well suited for the editorial criteria for 'Matters Arising,' which are more stringent with regards to format and content."

Nature did not further explain why it believed Gihawi's manuscript was not up to its publication standards.

Further implications unclear

Aside from the Nature paper, more than a dozen subsequent studies that relied on the Knight group's 2020 results to find additional cancer microbiome associations are also "likely to be invalid," Gihawi and Salzberg contended in their preprint this week.

These include, for instance, a 2022 Nature Communications study that suggested the tumor microbiome can help predict cancer prognosis and drug response, as well as a January 2023 study in NPJ Breast Cancer that observed race-specific microbial communities in breast tumors, they noted.

Corresponding authors of both papers did not respond to GenomeWeb's requests for comment.

Beyond academic research, it is still unclear how the questions raised by the latest preprint will impact the commercial development of cancer diagnostic tests, given that the 2020 Nature study, which is now in question, plays a somewhat important role for Micronoma.

"Our work is based on the findings in the 2020 Nature paper," a Micronoma spokesperson said in an email, "but methods have continued to evolve since then, and we are not using the exact same processes."

Knight asserted that the company's OncobiotaLung test "does not depend at all on any of the techniques in the 2020 paper."

In a statement, Micronoma CEO Sandrine Miller-Montgomery said that after the 2020 Nature study, the company "developed additional human filtering and quality control methods that minimized human genomic DNA contamination, finding that doing so did not hinder the ability to diagnose cancer presence or types, as later published in Cell." She added that in the case of lung cancer detection, the company has "generated an independent and proprietary microbial database based on metagenome assembly of non-human reads."

The company, however, did not answer specific questions, including what it thinks about the concerns raised by the preprint authors, and whether it believe that its tests, including OncobiotaLung, are safe and effective for use in patients.

According to Micronoma's website, the company's Oncobiota platform, which incorporates proprietary machine learning algorithms, is "designed to reveal cancer-related microbiome signatures with high specificity and sensitivity."

Ivan Vujkovic-Cvijin, a microbiome researcher at Cedars-Sinai Medical Center, who was not involved with either study, said machine learning is designed to find complex patterns in data that can predict an outcome, but these patterns can arise from often unpredictable sources, including from the environment.

"If researchers could obtain independent validation samples and test the same machine learning model on new sample sets, it would head off scientific disagreements like this one," he said. "But there are no standards for the application of machine learning in microbiome science, and I think this scientific disagreement underscores the need to develop them."

The study in question Two "major errors"Matters ArisingFurther implications unclear