Home
Resource
Knowledge Center
Pre-submission Checklist for Metabolomics Data: Avoid Common Errors

Pre-submission Checklist for Metabolomics Data: Avoid Common Errors

Metabolomics Data Analysis

Metabolomics papers often live or die on details that never make it into the main figures. Reviewers and repository curators are effectively asking the same question: Could someone else audit what you did, reproduce the processing, and reuse the dataset without guessing?

This is why "having the raw files" is not the same as having submission-ready data. A strong submission package is less about one heroic upload at the end—and more about making sure every artifact (raw data, metadata, processing records, annotations, and supplementary tables) tells the same story.

Key Takeaway: "Submission-ready metabolomics data" means your raw files, metadata, methods, annotations, and supplementary tables are complete, traceable, and internally consistent—before you hit the journal submission button.

Why Metabolomics Data Are Often Not Submission-Ready

Most teams start organizing their repository package late—often during final manuscript polishing. At that stage, the study already has multiple versions of processed matrices, statistical outputs, annotation tables, and plots floating around. If you only reconcile these materials at the end, you're more likely to discover mismatched sample IDs, unclear preprocessing steps, or overconfident metabolite claims after a reviewer points them out.

Journals, repositories, and reviewers increasingly expect raw data access plus structured metadata and transparent processing provenance. MetaboLights, for example, explicitly frames its submissions around structured ISA-Tab metadata and a validation step before accessioning and public release (see MetaboLights: open data repository for metabolomics and the official MetaboLights submission guides).

In practice, avoidable delays typically come from three root causes:

Missing or unstructured metadata that can't be mapped cleanly to repository templates.
Untraceable processing decisions (software versions, parameter choices, matrix versions) that make results hard to audit.
Inconsistency between the repository record, supplementary tables, and what the manuscript claims.

What "Submission-Ready" Means for Metabolomics Data

A submission-ready metabolomics package is defined by traceability and consistency—not by the existence of any single file.

At a minimum:

Raw data are organized and accessible (including blanks and QC-related injections).
Metadata are complete, structured, and compatible with the target repository.
Sample handling, QC strategy, and analytical methods are traceable from collection to acquisition.
Annotation confidence is described conservatively and consistently (e.g., MSI confidence tiers).
Supplementary tables, repository records, and manuscript claims match—line by line where needed.

A helpful way to think about "submission-ready" is to treat it as a validation problem. If you had to hand your study to a colleague outside your lab, could they reconstruct:

what each sample is,
where it sits in the injection order,
how features were detected and filtered,
what preprocessing created the final matrix used for figures,
and which identifications are confirmed vs putative?

What Journals and Repositories Commonly Expect

Requirements vary by journal and repository, but expectations cluster around a few common pillars.

Repository-Accessible Data

Many journals expect raw metabolomics data to be available through a public repository (or at least a private review-access mechanism) before or during manuscript submission. Your goal is to be ready to provide an accession number, private reviewer link, or equivalent access route early.

MetaboLights' workflow includes a guided submission process and validation. In practical terms, this means a repository record that exists but isn't validation-ready can still slow editorial handling.

Complete Metadata and Study Description

Repository submission usually requires study design details, sample descriptors, analytical platform information, and processing context. Incomplete metadata is a leading cause of curator follow-up and validation delays.

A large-scale assessment of public metabolomics submissions showed substantial variability in adherence to minimum information guidance across studies and repositories, with frequent missing or implicit metadata (reported in papers but not captured per-sample in repositories) (see Compliance with minimum information guidelines in public metabolomics repositories).

Transparent Reporting

Reviewers and curators increasingly expect clear reporting of acquisition, preprocessing, filtering, normalization, annotation, and statistical analysis. "Transparent reporting" doesn't mean drowning the reader in irrelevant details—it means capturing the decisions that materially affect results.

If you want a concrete benchmark, ask: If a reviewer questions batch effects, drift correction, or metabolite identification, do I have the run-level and evidence-level artifacts ready to share without rework?

Metabolomics data submission checklist: the core consistency checks

If you only remember one idea, make it this: a metabolomics data submission checklist is really a set of consistency checks across artifacts.

Do sample IDs match between metadata sheets, raw file names, and matrices?
Do factor names and group labels match between supplement tables and figure legends?
Is the exact matrix used for statistics present in the submission package (not just "a" matrix)?

This sounds mundane, but it's exactly where reviewers and curators find avoidable errors.

Check Study Design and Metadata Before Submission

The best time to fix metadata problems is before the repository templates are filled—not after the final figures are exported.

Replicates, Controls, and Study Factors

Confirm that biological replicates, technical replicates, controls, treatment groups, and study factors are clearly defined, and then ensure they are represented consistently across:

metadata sheets,
repository fields,
and supplementary tables.

If your metadata implies four groups but your supplementary table has five group labels, you've created instant reviewer doubt—even if the statistical conclusions are correct.

A practical pre-submission move is to build one "single source of truth" metadata table (SampleID-centric) and derive all downstream sheets from it. This helps prevent silent drift in factor naming ("treated" vs "Tx" vs "drug").

Randomization and Blocking

Record sample randomization, injection order logic, blocking structure, and batch assignment where relevant. These details become central when reviewers question drift, batch effects, or confounding.

Even if you can't include every operational detail in the manuscript, you should be able to provide a run-order table and batch labels that explain why the sequence is credible.

Repository-Relevant Metadata Fields

Ensure key metadata can be mapped cleanly into repository-specific structures such as ISA-Tab or mwTab (different repositories and toolchains may emphasize different schemas). For MetaboLights, submissions are powered by the ISA software suite and are submitted in ISA-Tab format (see About MetaboLights).

In practice, teams get stuck not because they "don't have metadata," but because their internal sheet can't be translated into MetaboLights submission metadata without manual, error-prone cleanup.

To avoid that friction, verify that the following are complete and internally consistent:

study labels and sample identifiers
protocol names and versioning
factor values (exact spelling, casing, and allowed values)
assay file naming that maps unambiguously to raw files

Metadata element	What to check before submission	Typical failure mode	Why reviewers/curators care
Sample identifiers	Unique, stable SampleID used everywhere	Multiple aliases across files	Prevents "which sample is which?" confusion
Study factors	Clear factors + levels (e.g., treatment, genotype, time)	Factors only described in prose	Enables reanalysis and repository indexing
Batch labels	Batch and block fields captured explicitly	Batch inferred from run order only	Supports defensible batch-effect handling
Protocol references	Extraction and assay protocols versioned	"Standard methods" with no parameters	Processing decisions become unverifiable
File mapping fields	SampleID ↔ RawFileName ↔ InjectionID	Raw names don't match metadata	Validation failure and curator follow-up

Check Sample and Pre-analytical Traceability

Metabolomics is unusually sensitive to pre-analytical variation. If sample handling details are missing, reviewers may treat the observed differences as artifactual—even when your stats are clean.

Confirm that collection, storage, stabilization, transport, and processing details are documented. Where relevant, record:

aliquoting strategy,
storage duration,
freeze–thaw limits,
and the chain of custody or handling logs that support the reported sample history.

This is especially important when you're interpreting subtle shifts in metabolites that are known to be labile.

A simple way to pressure-test traceability is to pick one sample and trace it end-to-end: from collection conditions to extraction batch to injection file to matrix row. If you can't do this quickly, the package isn't ready.

Check QC Design and Run-Level Documentation

QC is not just something you did; it's something you need to document in a way other people can audit. When reviewers question drift or batch effects, they usually aren't asking for more words—they're asking for QC documentation for metabolomics studies that proves the run behaved the way you claim.

Pooled QCs, Blanks, and Contaminant Control

Confirm pooled QCs, blanks, and contaminant-control samples are clearly documented, including:

how pooled QCs were prepared,
how often they were injected,
what blanks represent (solvent blank vs process blank),
and how contaminant features were filtered or flagged.

Make sure the role of each QC-related sample type is described in methods and, where relevant, in repository metadata.

Calibration and System Suitability

Verify calibration routines, system-suitability checks, and instrument readiness procedures are recorded where appropriate. While not every journal expects an exhaustive log, being able to describe system suitability and instrument readiness strengthens confidence in analytical stability.

Injection Order, Drift, and Batch Structure

Make injection order, drift handling, and batch structure traceable before submission. If batch correction or drift correction was applied, ensure it is described consistently across methods, supplementary methods, and any processing README.

A timeline showing study batches, pooled QC injections, blanks, system suitability checks, and drift-control checkpoints across a metabolomics run

Pro Tip: If you expect reviewer questions on drift or batch effects, prepare a "QC pack" that includes run order, pooled QC RSD summaries, and drift plots. It often answers questions before they become review comments.

Check the Raw Data Package

A raw data package that works for internal lab storage may still fail repository submission. Repositories and curators need consistent naming and clear mapping between samples and files.

Before submission:

Confirm that all raw analytical files are present.
Ensure files are named consistently and map to the correct samples.
Make sure QC samples, blanks, pooled references, and replicate structure are reflected in the raw-data package.
Separate upload-ready raw files from local clutter, intermediate exports, and obsolete versions.

If you are submitting to MetaboLights and your raw data are stored as folders, their upload guidance indicates you may need to compress each raw folder individually rather than making one giant zip containing multiple folders. That's a detail that can cause submission friction if discovered late.

This is also where "metabolomics raw data package organization" becomes a concrete task: your file tree and naming scheme should make polarity, batch, QC type, and replicate structure obvious without opening the instrument software.

A reviewer-focused sanity check is to confirm that your raw-data package can answer these questions without manual detective work:

Which injection file corresponds to SampleID X?
Where are the blanks and pooled QCs?
Are positive and negative ion modes clearly separated and labeled?
If there are multiple batches, is the split obvious from file naming and/or a run-order table?

Check the Processed Data Matrix

In many submissions, the processed data matrix is where inconsistencies accumulate. Ensure the processed matrix matches the manuscript figures, tables, and statistical analysis.

A submission-ready approach is to preserve clear lineage between matrix versions:

raw feature table (after peak picking/alignment)
filtered matrix (after QC filters and blank subtraction)
normalized/transformed/scaled matrix
final reporting matrix (the one used for figures)

Confirm that sample order, feature IDs, and group labels are consistent across all downstream files. If a downstream figure uses a matrix that isn't represented in the submission package, you've created an audit gap.

A lightweight but powerful check is to compute a simple "matrix fingerprint" (row/column counts, feature ID hash, or checksum) for the version used in the final stats and ensure the same fingerprint appears in your submission package.

Matrix artifact	Should contain	Should NOT contain	Common mismatch to catch
Feature detection output	Feature IDs, m/z, RT, intensities, QC flags	Final fold changes and pathway claims	Feature IDs change after reprocessing
Filtered matrix	Documented filter criteria and thresholds	Unstated blank subtraction	Filtering differs from what methods describe
Normalized matrix	Method + parameters, scaling choice	Hidden batch correction	Normalization differs from figure pipeline
Final reporting matrix	The exact input to stats/plots	Extra samples or dropped groups	Figure based on "secret" intermediate matrix

Check Methods Transparency

Methods are submission-ready when they allow an informed reader to understand how the final matrix was produced—and what decisions materially shaped results.

Document:

instrument platform and acquisition settings,
preprocessing workflow,
filtering and QC handling,
normalization logic,
software tools and version numbers,
and key parameter choices.

Avoid vague descriptions ("data were normalized and analyzed using standard methods") when specific parameter choices affected reported results.

If your team is outsourcing some steps or wants an external audit of processing decisions, Metabolomics Data Analysis can serve as a reference point for what a complete analysis workflow and deliverable set typically includes (preprocessing, normalization, missing value handling, multivariate/univariate statistics, and interpretation).

Check Annotation and Identification Evidence

Metabolite identification is one of the fastest ways to lose reviewer trust—not because putative annotations are wrong, but because the confidence language doesn't match the evidence.

Use Conservative Confidence Language

Separate confirmed, probable, and putative identifications. The Metabolomics Standards Initiative (MSI) chemical analysis working group formalized identification confidence tiers commonly summarized as Levels 1–4. Level 1 requires comparison to an authentic standard with orthogonal evidence under identical conditions; lower levels reflect putative annotation or unknowns (see Proposed minimum reporting standards for chemical analysis and The role of reporting standards for metabolite annotation and identification in metabolomics).

In practice, "MSI identification confidence levels" should be visible in your supplement—not inferred from wording.

Align Identification Evidence With Reporting Standards

Use MSI levels (or an equivalent evidence tier system) consistently across supplementary identification tables, narrative text, and any "featured metabolites" callouts. Don't present tentative annotations as confirmed metabolite identities without supporting evidence.

Distinguish Statistical Significance From Identification Confidence

A statistically significant feature is not automatically a confidently identified metabolite. Keep statistical evidence (p-values, effect sizes) and identification evidence (MS/MS match, RT match, standard confirmation) separated.

A practical way to implement this is to include both "statistical rank" and "ID confidence tier" in the same results table so readers can see when a headline result rests on putative annotation.

If your study requires confirmation-focused quantitation and identification for a defined set of metabolites (for example, to strengthen a biomarker narrative), Targeted Metabolomics Analysis Service is relevant as a complementary approach because it is inherently oriented around predefined analytes and evidence-backed reporting.

Check Statistics and Result Tables Before Submission

Statistics are submission-ready when every claim can be traced back to a table that exists in the repository package and the supplement.

Confirm that the statistical workflow is described transparently, including model choices, multiple-testing control, effect-size reporting, and any covariate adjustments.

Then audit your supplementary tables for metabolomics supplementary tables consistency:

feature names and IDs
fold changes
adjusted p-values
confidence tiers
sample-group labels

If you make pathway-level claims, ensure they are traceable to the submitted tables and the mapping strategy is stated. When pathway interpretation is central, Metabolic Pathways Analysis is a helpful reference for how metabolite changes are typically contextualized within metabolic networks.

A stepwise metabolomics pre-submission checklist covering study design, sample traceability, QC strategy, raw data, processed matrices, methods, identification evidence, and final validation

How to Avoid Repository Validation and Curation Delays

Delays are usually caused by fixable structural issues. The most effective way to prevent them is to treat repository submission as a parallel workstream, not an end-of-project task.

Start Repository Preparation Before the Final Draft

Begin organizing metadata and upload-ready files while the manuscript is still being drafted. This reduces last-minute failures and avoids the awkward situation where you have a partial accession but missing critical artifacts.

Validate Against Repository Requirements

Use repository-specific validation rules as a planning tool rather than waiting until the final upload. If your lab uses a local metadata template, pressure-test it early against metabolomics repository submission requirements so you find mapping problems before your final draft.

Review Validation Feedback Before Submission

If the repository returns feedback, fix metadata mismatches, file-structure problems, and missing links before sharing the accession or review link with a journal. A repository entry that exists but is incomplete can still slow editorial handling.

Common Pre-submission Errors in Metabolomics

Most failures look different on the surface but share a small number of underlying causes.

The Submission Package Is Incomplete

Raw files may exist, but processed matrices, metadata sheets, methods records, or supplementary tables are missing or inconsistent. This is especially common when multiple people contribute figures and each person keeps a different "final" matrix.

Metadata Are Sparse or Internally Inconsistent

Study factors, sample labels, batch information, and protocol details do not match across metadata sheets, repository fields, and manuscript text.

The MSI compliance assessment mentioned earlier highlights how often important metadata are missing or only reported implicitly. Repository curators can't reliably extract implicit metadata from prose.

Annotation Confidence Is Overstated

Putative metabolites are described as confirmed, or feature-level results are reported as metabolite-level conclusions without enough evidence. This is one of the fastest routes to skeptical reviewer comments.

Methods Are Not Reproducible

Software versions, preprocessing settings, normalization logic, and QC handling are underreported or missing. "Reproducible enough" is usually defined by what a critical reviewer would need to understand why your matrix looks the way it does.

Repository Submission Starts Too Late

If repository submission begins only when the manuscript is nearly finished, you may discover that validation, curation, and metadata cleanup still require substantial work. This can delay submission or force rushed, error-prone changes.

Build a Submission-Ready Final Package

A strong metabolomics data submission checklist should end with a concrete package inventory. The items below are intentionally artifact-focused.

Raw data files
Processed data matrix or matrices
Complete metadata sheet
Methods and parameter record
Annotation evidence summary
Supplementary result tables
Repository accession or review link
README and file map
Checksums or equivalent file-integrity information where relevant
Final consistency check between manuscript, supplement, and repository records

A visual layout of a metabolomics submission-ready package including raw files, processed matrices, metadata, methods record, annotation evidence, README, file map, supplementary tables, and repository accession details

Annotation and Reporting Errors That Reviewers Notice Quickly

Reviewers tend to focus on a handful of red flags because they correlate strongly with irreproducible conclusions:

Metabolite IDs presented without clear confidence tiers
Missing explanation of preprocessing, filtering, normalization, or QC handling
Pathway claims not clearly linked to submitted result tables
Repository records that do not match the manuscript narrative
Unclear distinction between detected features and confidently annotated metabolites

If you want to reduce the chance of these comments, the goal isn't to add more prose—it's to make sure the evidence is in the submission package and clearly mapped.

Frequently Asked Questions

What should be included in a metabolomics data submission?

Include raw files plus a clear sample-to-file mapping, structured metadata, the processed matrix version that matches your figures, a methods/parameter record, annotation evidence with confidence tiers, and the supplementary tables that support manuscript claims.

Do journals require repository submission before manuscript submission?

Often yes, or they require a private reviewer-access mechanism during peer review. Many journals will ask for an accession number or private review link so reviewers can assess raw data and metadata, so it's safest to prepare deposition before you submit the manuscript.

What metadata are most often missing in metabolomics submissions?

Per-sample fields that enable reuse are commonly missing or only described in prose: batch labels, run order/injection sequence, detailed protocol parameters, and complete factor values. Compliance studies show that "implicit metadata" in publications does not translate cleanly into repository-ready records.

How should metabolite annotation confidence be reported before submission?

Report identification confidence explicitly (for example, MSI Levels 1–4) and keep the confidence tier aligned across narrative text and supplementary tables. Level 1 typically requires authentic standard confirmation with orthogonal evidence under identical conditions; lower levels should be labeled as putative to avoid overstatement.

What is the difference between raw data, processed data, and submission-ready data?

Raw data are the instrument output files. Processed data are derived tables/matrices after peak picking, alignment, filtering, and normalization. Submission-ready data means raw + processed + structured metadata + methods provenance + annotation evidence are assembled into a package that is complete, auditable, and internally consistent.

When should repository submission start during manuscript preparation?

Start while figures and supplements are still being built—ideally as soon as sample IDs, groups, and acquisition runs are stable. Early deposition lets you discover schema or validation issues before the final submission week.

Do I need a README or file map for metabolomics submission packages?

Yes. A short README with a file map and a sample-to-file table reduces curator follow-up and helps reviewers understand which matrix version produced the figures.

What details about preprocessing and normalization should be reported?

Report the steps and the parameter choices that materially shape your matrix: peak picking/alignment settings, missingness thresholds, blank/QC filtering logic, imputation method (if used), normalization method, transformation/scaling, batch correction (if applied), plus software tools and versions.

What is the minimum metadata needed for MetaboLights submission?

At minimum, you need structured fields that define study design and sample characteristics, explicitly map samples to assay files, and provide protocols (extraction and acquisition). If you plan early for ISA-Tab metabolomics metadata, you'll avoid late-stage template mismatch.

Should I upload vendor RAW files, mzML, or both?

When possible, keep vendor RAW files for full fidelity and provide open formats like mzML for interoperability and reanalysis. If you convert, document the conversion tool and settings so the conversion itself is auditable.

What files do reviewers want when they question batch effects or drift?

They usually want run order/injection sequence, pooled QC performance summaries (e.g., RSD or drift plots), blank behavior, and a clear statement of whether batch correction/drift correction was applied—plus the exact matrix used for statistics.

Where does mwTab fit in a metabolomics submission?

mwTab is a structured tabular schema used in some metabolomics data-sharing pipelines and repositories; the key practical point is to keep your internal metadata table clean enough that it can be mapped into whatever schema your target repository expects.

References

Sumner, Lloyd W., et al. "Proposed minimum reporting standards for chemical analysis". Metabolomics 3.3 (2007): 211-221. DOI: 10.1007/s11306-007-0082-2.
Goodacre, Royston, et al. "Proposed minimum reporting standards for data analysis in metabolomics". Metabolomics 3.3 (2007): 231-241. DOI: 10.1007/s11306-007-0081-3.
Haug, Kenneth, et al. "MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data". Nucleic Acids Research 41.D1 (2013): D781-D786. DOI: 10.1093/nar/gks1004.
Sud, Manish, et al. "Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools". Nucleic Acids Research 44.D1 (2016): D463-D470. DOI: 10.1093/nar/gkv1270.
Spicer, Rachel A., Reza Salek, and Christoph Steinbeck. "A decade after the metabolomics standards initiative it's time for a revision". Scientific Data 4 (2017): 170138. DOI: 10.1038/sdata.2017.138.

For Research Use Only. Not for use in diagnostic procedures.