Why this tool exists
Biomedical knowledge is scattered across dozens of siloed databases, each with its own query language, schema, and access patterns. Answering a single research question — "Is this gene a viable drug target?" — can require stitching together data from genomics, proteomics, clinical trials, literature, drug safety reports, and more.
AI Co-Scientist unifies 26 data sources behind a single conversational interface. Ask a question in plain language and the agent plans a multi-step investigation, queries the right databases, synthesizes the evidence, and delivers a cited research report.
Data sources
Organized from the broadest research questions to the most specialized analyses.
Literature & Clinical Evidence
Starting point for any biomedical question.
- PubMed
- Over 38 million biomedical abstracts and citations from MEDLINE, life science journals, and online books. The primary literature search engine for biomedicine.
- OpenAlex
- Broad scholarly knowledge graph covering 250M+ works including preprints, conference papers, and datasets. Useful for researcher discovery and citation analysis.
- ClinicalTrials.gov
- Registry of 500K+ clinical studies worldwide. Search active, recruiting, and completed trials by condition, intervention, or sponsor.
Target-Disease Associations
Connecting genes and proteins to diseases with aggregated evidence.
- Open Targets Platform
- Integrates genetics, genomics, transcriptomics, drugs, and literature into scored target-disease associations. The go-to resource for target validation.
- GWAS Catalog
- Curated collection of published genome-wide association studies. Returns specific variants, p-values, odds ratios, and mapped genes for any trait or disease.
- CIViC
- Community-curated clinical interpretations of cancer variants. Expert-reviewed evidence linking specific mutations to diagnosis, prognosis, and treatment response.
Drug Discovery & Safety
The therapeutic landscape — from compounds to post-marketing surveillance.
- PubChem
- Open chemistry database with 116M+ compounds. Molecular properties, SMILES structures, InChIKeys, drug-likeness descriptors (XLogP, H-bond donors/acceptors, polar surface area), synonyms, and compound descriptions.
- ChEMBL
- Bioactivity database with 2M+ compounds, binding affinities, functional assays, and ADMET properties. Essential for understanding the pharmacological landscape around a target.
- DGIdb
- Drug-Gene Interaction Database aggregating 40+ sources. Returns druggability categories (kinase, clinically actionable, etc.) and known drug interactions for any gene.
- FDA FAERS
- Post-marketing adverse event reports from the FDA. Analyze safety signals, compare adverse event profiles across drugs, and identify drug class effects.
- RxNorm
- Standardized drug nomenclature from the NLM. Maps between brand names, generics, ingredients, and clinical drug forms.
Gene & Protein Biology
Understanding molecular function, expression, and interactions.
- UniProt
- Comprehensive protein knowledgebase with sequences, domains, post-translational modifications, subcellular localization, and functional annotations for every known protein.
- GTEx
- Tissue-level gene expression from the Genotype-Tissue Expression project. Median TPM values across 54 human tissues from 948 donors — critical for target safety assessment.
- Reactome
- Curated biological pathways and reactions. Understand the signaling cascades, metabolic pathways, and cellular processes a gene participates in.
- STRING
- Protein-protein interaction networks combining experimental data, text mining, and computational predictions. Identify interaction partners and functional modules.
Protein Structure
3D structural insights — predicted and experimental.
- AlphaFold
- AI-predicted protein structures from DeepMind covering 200M+ proteins. Returns pLDDT confidence scores and downloadable PDB/CIF structure files.
- RCSB PDB
- Experimentally determined structures from X-ray crystallography, cryo-EM, and NMR. Search by gene or UniProt ID to find resolved structures with bound ligands.
Genomic Variation
Population genetics and variant-level interpretation.
- gnomAD
- Population variant frequencies from 76K+ genomes and 125K+ exomes. Essential for distinguishing rare pathogenic variants from common benign polymorphisms.
- 1000 Genomes
- Reference catalog of human genetic variation across 26 populations. Foundation for understanding population-level diversity and ancestry-specific variants.
- ClinVar
- Clinical significance classifications for human variants — pathogenic, benign, uncertain significance. Links variants to conditions with submitter-level evidence.
- Ensembl VEP
- Variant Effect Predictor returning functional consequence types, SIFT and PolyPhen deleteriousness scores, and AlphaMissense pathogenicity predictions.
- MyVariant.info
- Aggregated variant annotations pulling from ClinVar, CADD, dbSNP, gnomAD, and COSMIC in a single lookup. Quick comprehensive view of any variant.
Specialized Domains
Advanced and niche applications.
- cBioPortal
- Cancer genomics from the TCGA Pan-Cancer Atlas (32 tumor types, ~10K samples). Mutation frequencies, hotspot protein changes, and mutation type breakdowns by cancer.
- IEDB
- Immune Epitope Database with experimentally characterized B-cell and T-cell epitopes, MHC binding data, and T-cell receptor sequences for immunology research.
- LINCS L1000
- Library of Integrated Network-based Cellular Signatures. Gene expression profiles measured after chemical and genetic perturbations — for drug repurposing and mechanism of action studies.
- SureChEMBL
- Chemical structures automatically extracted from patent literature. Search the patent landscape for compounds related to your target or chemical series.