← Back to Chat

AI Co-Scientist

Why this tool exists

Biomedical knowledge is scattered across dozens of siloed databases, each with its own query language, schema, and access patterns. Answering a single research question — "Is this gene a viable drug target?" — can require stitching together data from genomics, proteomics, clinical trials, literature, drug safety reports, and more.

AI Co-Scientist unifies 26 data sources behind a single conversational interface. Ask a question in plain language and the agent plans a multi-step investigation, queries the right databases, synthesizes the evidence, and delivers a cited research report.

Data sources

Organized from the broadest research questions to the most specialized analyses.

Literature & Clinical Evidence

Starting point for any biomedical question.

PubMed
Over 38 million biomedical abstracts and citations from MEDLINE, life science journals, and online books. The primary literature search engine for biomedicine.
OpenAlex
Broad scholarly knowledge graph covering 250M+ works including preprints, conference papers, and datasets. Useful for researcher discovery and citation analysis.
ClinicalTrials.gov
Registry of 500K+ clinical studies worldwide. Search active, recruiting, and completed trials by condition, intervention, or sponsor.

Target-Disease Associations

Connecting genes and proteins to diseases with aggregated evidence.

Open Targets Platform
Integrates genetics, genomics, transcriptomics, drugs, and literature into scored target-disease associations. The go-to resource for target validation.
GWAS Catalog
Curated collection of published genome-wide association studies. Returns specific variants, p-values, odds ratios, and mapped genes for any trait or disease.
CIViC
Community-curated clinical interpretations of cancer variants. Expert-reviewed evidence linking specific mutations to diagnosis, prognosis, and treatment response.

Drug Discovery & Safety

The therapeutic landscape — from compounds to post-marketing surveillance.

PubChem
Open chemistry database with 116M+ compounds. Molecular properties, SMILES structures, InChIKeys, drug-likeness descriptors (XLogP, H-bond donors/acceptors, polar surface area), synonyms, and compound descriptions.
ChEMBL
Bioactivity database with 2M+ compounds, binding affinities, functional assays, and ADMET properties. Essential for understanding the pharmacological landscape around a target.
DGIdb
Drug-Gene Interaction Database aggregating 40+ sources. Returns druggability categories (kinase, clinically actionable, etc.) and known drug interactions for any gene.
FDA FAERS
Post-marketing adverse event reports from the FDA. Analyze safety signals, compare adverse event profiles across drugs, and identify drug class effects.
RxNorm
Standardized drug nomenclature from the NLM. Maps between brand names, generics, ingredients, and clinical drug forms.

Gene & Protein Biology

Understanding molecular function, expression, and interactions.

UniProt
Comprehensive protein knowledgebase with sequences, domains, post-translational modifications, subcellular localization, and functional annotations for every known protein.
GTEx
Tissue-level gene expression from the Genotype-Tissue Expression project. Median TPM values across 54 human tissues from 948 donors — critical for target safety assessment.
Reactome
Curated biological pathways and reactions. Understand the signaling cascades, metabolic pathways, and cellular processes a gene participates in.
STRING
Protein-protein interaction networks combining experimental data, text mining, and computational predictions. Identify interaction partners and functional modules.

Protein Structure

3D structural insights — predicted and experimental.

AlphaFold
AI-predicted protein structures from DeepMind covering 200M+ proteins. Returns pLDDT confidence scores and downloadable PDB/CIF structure files.
RCSB PDB
Experimentally determined structures from X-ray crystallography, cryo-EM, and NMR. Search by gene or UniProt ID to find resolved structures with bound ligands.

Genomic Variation

Population genetics and variant-level interpretation.

gnomAD
Population variant frequencies from 76K+ genomes and 125K+ exomes. Essential for distinguishing rare pathogenic variants from common benign polymorphisms.
1000 Genomes
Reference catalog of human genetic variation across 26 populations. Foundation for understanding population-level diversity and ancestry-specific variants.
ClinVar
Clinical significance classifications for human variants — pathogenic, benign, uncertain significance. Links variants to conditions with submitter-level evidence.
Ensembl VEP
Variant Effect Predictor returning functional consequence types, SIFT and PolyPhen deleteriousness scores, and AlphaMissense pathogenicity predictions.
MyVariant.info
Aggregated variant annotations pulling from ClinVar, CADD, dbSNP, gnomAD, and COSMIC in a single lookup. Quick comprehensive view of any variant.

Specialized Domains

Advanced and niche applications.

cBioPortal
Cancer genomics from the TCGA Pan-Cancer Atlas (32 tumor types, ~10K samples). Mutation frequencies, hotspot protein changes, and mutation type breakdowns by cancer.
IEDB
Immune Epitope Database with experimentally characterized B-cell and T-cell epitopes, MHC binding data, and T-cell receptor sequences for immunology research.
LINCS L1000
Library of Integrated Network-based Cellular Signatures. Gene expression profiles measured after chemical and genetic perturbations — for drug repurposing and mechanism of action studies.
SureChEMBL
Chemical structures automatically extracted from patent literature. Search the patent landscape for compounds related to your target or chemical series.