Data Test Set
Use the form below to request the current month’s data test set for evaluating DNA sequence‐screening systems. After you submit, we’ll email you a secure download link and simple next steps.
What this is
The test set is part of a joint effort with NIST to help synthesis providers and other stakeholders evaluate baseline sequence screening performance. Each monthly package includes:
- A FASTA file of 1,000 blinded sequences to be used to evaluate your screening system.
- A FASTA file of ~50 sample sequences so teams can verify ingest and reporting before submitting their official runs.
- A TSV file containing identifying information and expected screening results for the ~50 sample sequences.
- A template results file.
Only a hidden subset (~400 sequences) is graded for evaluation; the rest are included for statistical purposes. Providers will not know which sequences are scored.
How the evaluation works
- Screen using your standard procedure. You’ll apply your normal workflow to the sequences you download. Results should be returned in the reporting format we provide.
Scoring & thresholds. Results are summarized as PASS/FAIL against two metrics calculated on the graded subset:
- Accuracy = (TP + TN) / T, threshold ≥ 75%
- Senstivity = TP / (TP + FN), threshold ≥ 95%
- Turnaround. Please return results within one week of receiving your link and reporting template.
What a result means (and does not mean)
- Not a certification. A PASS demonstrates baseline performance on this specific test set; it is not an accreditation or endorsement of real-world order screening.
- Validity window. During the alpha phase, PASS notices expire after 30 days; during the beta phase they are valid for 1 year.
Confidentiality & use
- Evaluation datasets are confidential. Do not share sequences or detailed result files beyond your organization, except via the explicit PASS opt-in noted above.
- One submission per dataset per organization. Each month’s set permits a single official submission.
- Aggregate feedback to NIST. Partners receive only anonymized, aggregated statistics to check item quality across participants.
Who should request access
- Commercial and academic DNA synthesis providers
- Benchtop platform vendors validating embedded screening
- Tool developers and stakeholders evaluating compliance workflows
- Anyone else with a reasonable need to access this information for the purpose of evaluating a screening system that is being used to screen for sequences of concern before gene/nucleic acid synthesis
Before you submit the form
Access to the current month’s Evaluation Dataset requires organizational registration and acceptance of the applicable terms. You must agree to reasonable use restrictions (no automated scraping; lawful use; no redistribution of confidential materials).