← Back to Chat

AI Co-Scientist

Biomedical knowledge is scattered across dozens of siloed sources: literature indexes, curated knowledge bases, ontologies, trial registries, molecular databases, and dataset repositories. Each of these come with its own schema, terminology, and interface. Answering a single research question - “Is this gene a viable drug target?” - often requires moving across these systems by hand, normalizing identifiers, comparing conflicting evidence, and tracing claims back to the original source.

AI Co-Scientist brings 50+ biomedical information resources into one conversational workflow. Ask a question in plain language and the agent plans a multi-step investigation, queries the most relevant sources, cross-checks the results, and synthesizes the findings into a cited research report with provenance.

Data sources

Organized from the broadest research questions to the most specialized analyses.

Literature & Clinical Evidence

Starting point for any biomedical question.

PubMed
Over 38 million biomedical abstracts and citations from MEDLINE, life science journals, and online books. The primary literature search engine for biomedicine.
OpenAlex
Broad scholarly knowledge graph covering 250M+ works including preprints, conference papers, and datasets. Useful for researcher discovery and citation analysis.
Europe PMC
European life-science literature platform with PubMed records, full text links, preprints, citation counts, grants, and text-mined metadata. Useful when PubMed alone is too narrow.
ClinicalTrials.gov
Registry of 500K+ clinical studies worldwide. Search active, recruiting, and completed trials by condition, intervention, or sponsor.

Target-Disease Associations

Connecting genes and proteins to diseases with aggregated evidence.

Open Targets Platform
Integrates genetics, genomics, transcriptomics, drugs, and literature into scored target-disease associations. The go-to resource for target validation.
GWAS Catalog
Curated collection of published genome-wide association studies. Returns specific variants, p-values, odds ratios, and mapped genes for any trait or disease.
CIViC
Community-curated clinical interpretations of cancer variants. Expert-reviewed evidence linking specific mutations to diagnosis, prognosis, and treatment response.
ClinGen
Curated gene-disease validity and dosage sensitivity resource. Adds expert-panel classifications and dosage evidence beyond variant-level pathogenicity databases.

Drug Discovery & Safety

The therapeutic landscape — from compounds to post-marketing surveillance.

PubChem
Open chemistry database with 116M+ compounds. Molecular properties, SMILES structures, InChIKeys, drug-likeness descriptors (XLogP, H-bond donors/acceptors, polar surface area), synonyms, and compound descriptions.
ChEMBL
Bioactivity database with 2M+ compounds, binding affinities, functional assays, and ADMET properties. Essential for understanding the pharmacological landscape around a target.
DGIdb
Drug-Gene Interaction Database aggregating 40+ sources. Returns druggability categories (kinase, clinically actionable, etc.) and known drug interactions for any gene.
Guide to Pharmacology
Curated target-ligand pharmacology from IUPHAR/BPS. Useful for mechanism-of-action summaries, action types, affinity evidence, and a cleaner curated pharmacology layer than general chemistry databases alone.
GDSC / CancerRxGene
Genomics of Drug Sensitivity in Cancer pharmacogenomics screens across hundreds of cancer cell lines. Useful for compound sensitivity patterns, tissue-specific response, and in vitro IC50/AUC context.
PRISM Repurposing
Broad Institute pooled-cell-line repurposing screen with single-dose log2-fold-change viability readouts across large cancer cell-line panels. Useful for broad viability patterns and fast repurposing-style response scans.
PharmacoDB
Cross-dataset pharmacogenomics portal harmonizing compound-response data from GDSC, PRISM, CTRPv2, and related public screens. Useful when response needs to be compared across multiple public drug-response resources in one layer.
FDA FAERS
Post-marketing adverse event reports from the FDA. Analyze safety signals, compare adverse event profiles across drugs, and identify drug class effects.
RxNorm
Standardized drug nomenclature from the NLM. Maps between brand names, generics, ingredients, and clinical drug forms.
DailyMed
Current FDA Structured Product Labels (SPLs) for US drugs. Useful for boxed warnings, indications, contraindications, and warnings/precautions straight from the label.

Gene & Protein Biology

Understanding molecular function, expression, and interactions.

UniProt
Comprehensive protein knowledgebase with sequences, domains, post-translational modifications, subcellular localization, and functional annotations for every known protein.
GTEx
Tissue-level gene expression from the Genotype-Tissue Expression project. Median TPM values across 54 human tissues from 948 donors — critical for target safety assessment.
Human Protein Atlas
Protein-level tissue specificity, single-cell specificity, protein class, and subcellular localization summaries for human genes. Useful for target validation beyond RNA-only evidence.
Reactome
Curated biological pathways and reactions. Understand the signaling cascades, metabolic pathways, and cellular processes a gene participates in.
STRING
Protein-protein interaction networks combining experimental data, text mining, and computational predictions. Identify interaction partners and functional modules.
IntAct
Curated experimental molecular interactions from EMBL-EBI. Useful when you want publication-backed interaction records and detection methods rather than integrated network predictions.
BioGRID
Large experimental interaction archive covering both physical and genetic interactions. Useful when you want broader publication-backed interaction coverage, throughput context, and partner evidence beyond a narrower curated interaction subset.
Pathway Commons
Integrated pathway and interaction resource aggregating multiple providers into a single queryable graph. Useful for widening pathway context beyond any one source.

Protein Structure

3D structural insights — predicted and experimental.

AlphaFold
AI-predicted protein structures from DeepMind covering 200M+ proteins. Returns pLDDT confidence scores and downloadable PDB/CIF structure files.
RCSB PDB
Experimentally determined structures from X-ray crystallography, cryo-EM, and NMR. Search by gene or UniProt ID to find resolved structures with bound ligands.

Genomic Variation, Phenotypes & Ontologies

Population genetics, phenotype normalization, and rare-disease grounding.

gnomAD
Population variant frequencies from 76K+ genomes and 125K+ exomes. Essential for distinguishing rare pathogenic variants from common benign polymorphisms.
1000 Genomes
Reference catalog of human genetic variation across 26 populations. Foundation for understanding population-level diversity and ancestry-specific variants.
ClinVar
Clinical significance classifications for human variants — pathogenic, benign, uncertain significance. Links variants to conditions with submitter-level evidence.
Ensembl VEP
Variant Effect Predictor returning functional consequence types, SIFT and PolyPhen deleteriousness scores, and AlphaMissense pathogenicity predictions.
MyVariant.info
Aggregated variant annotations pulling from ClinVar, CADD, dbSNP, gnomAD, and COSMIC in a single lookup. Quick comprehensive view of any variant.
MyGene.info
Fast gene identifier normalization service for symbols, aliases, Entrez IDs, Ensembl IDs, and UniProt IDs. Useful for joining evidence across heterogeneous APIs.
RefSeq
NCBI curated reference sequences for transcripts, non-coding RNAs, chromosomes, and proteins (NM/NR/NC/NG/NP accessions). Tools search nuccore and protein indices with refseq[filter] and return accession-level metadata plus links.
UCSC Genome Browser
Interactive reference genome assemblies (hg38, hg19, mouse, and more) with the public REST API for search, interval sequence, and track rows. Tools resolve gene symbols to coordinates, fetch DNA for a locus, and query named tracks within bounded windows.
ENCODE
Encyclopedia of DNA Elements: functional genomics metadata and files (ChIP-seq, DNase-seq, RNA-seq, and more). Tools query the ENCODE REST API for experiments and related objects, then fetch accession-level JSON with portal links.
OxO
Ontology cross-reference service from EMBL-EBI. Bridges MONDO, EFO, DOID, MeSH, OMIM, UMLS, and related identifier systems for safer cross-database joins.
QuickGO
Gene Ontology term search and annotation service. Supports GO term discovery plus reviewed GO annotations for gene products with evidence codes and references.
Human Phenotype Ontology (HPO)
Standard phenotype vocabulary for rare disease and clinical genomics. Useful for normalizing phenotype terms like ataxia, microcephaly, and seizures before cross-source joins.
Orphanet / ORDO
Reference rare-disease catalog and ontology with disease definitions, xrefs, inheritance, age of onset, phenotype annotations, and curated disease-gene links.
Monarch Initiative
Phenotype-centric knowledge graph spanning genes, diseases, phenotypes, and model-organism evidence. Useful for phenotype-to-gene and disease-to-phenotype association queries.
Alliance Genome Resources
Integrated cross-species knowledge platform spanning human and model-organism genomes. Useful for orthologs, disease and phenotype summaries, and model-organism disease evidence that complements human-only sources.

Specialized Domains

Advanced and niche applications.

Allen Brain Atlas
Reference neuroanatomy and gene expression atlases, including region-level structure ontology and in situ hybridization expression profiles for mouse brain. Supports differential expression and structure-focused queries.
EBRAINS Knowledge Graph
Curated neuroscience knowledge graph spanning datasets, models, software, workflows, and contributors. Useful for discovering reusable brain research assets with rich metadata and provenance.
Zenodo
General-purpose open repository for datasets, software, posters, and publications with DOIs (often 10.5281/zenodo.*). Tools query the public JSON API for discovery and retrieve record metadata with file links.
CONP Datasets
Canadian Open Neuroscience Platform datasets discoverable through the CONP ecosystem and the `conpdatasets` public catalog. Useful for finding reusable neuroscience repositories and linking to dataset documentation and terms.
Neurobagel
Federated cohort discovery ecosystem with harmonized phenotype and imaging metadata. The public node API enables filtered cohort queries across indexed datasets without requiring direct data download.
OpenNeuro
Primary open platform for sharing neuroimaging data in BIDS format. Search datasets by modality (MRI, MEG, EEG, PET) and retrieve metadata, DOIs, and snapshot information.
DANDI Archive
BRAIN Initiative archive for cellular neurophysiology: electrophysiology, calcium imaging, behavioral time-series, immunostaining. NWB/BIDS format with searchable metadata by keyword.
NEMAR
NeuroElectroMagnetic data Archive for EEG, MEG, and iEEG from OpenNeuro. BIDS format, HED event descriptions, NSG compute integration. Hosted at SDSC.
Brain-CODE
Ontario Brain Institute neuroinformatics platform: clinical, MRI, EEG, genomic data for epilepsy, depression, neurodegenerative disease, cerebral palsy, concussion. Public and controlled releases via braincode.ca and CONP.
ENIGMA Consortium
Imaging genetics meta-analysis: 100+ case-control summary statistics for schizophrenia, depression, ADHD, bipolar, OCD, autism, epilepsy, Parkinson's. Cortical thickness, subcortical volume, surface area via ENIGMA Toolbox.
cBioPortal
Cancer genomics from the TCGA Pan-Cancer Atlas (32 tumor types, ~10K samples). Mutation frequencies, hotspot protein changes, and mutation type breakdowns by cancer.
DepMap
Cancer Dependency Map target-vulnerability resource. Public releases expose CRISPR/RNAi dependency fractions, pan-dependency/selectivity metrics, and predictive biomarkers for target prioritization.
BioGRID ORCS
Open Repository of CRISPR Screens from BioGRID. Useful for published screen-level hit status, phenotype labels, cell-line context, methodologies, and score summaries that complement release-level dependency resources.
CELLxGENE Discover / Census
Public single-cell dataset catalog and Census ecosystem from CZ CELLxGENE. Useful for discovering datasets by disease, tissue, assay, organism, and annotated cell types.
IEDB
Immune Epitope Database with experimentally characterized B-cell and T-cell epitopes, MHC binding data, and T-cell receptor sequences for immunology research.
LINCS L1000
Library of Integrated Network-based Cellular Signatures. Gene expression profiles measured after chemical and genetic perturbations — for drug repurposing and mechanism of action studies.
SureChEMBL
Chemical structures automatically extracted from patent literature. Search the patent landscape for compounds related to your target or chemical series.