GnistAI GnistAI
Log in

Getting Started with PubChem

Chemical compound data — structures, properties, bioactivity, and safety information.

All Tutorials   |   Overview   |   Playground   |   MCP   |   REST API   |   Home
Science

Data source: PubChem (NCBI)

Overview

PubChem wraps PubChem (NCBI), handling authentication, pagination, and rate limits for you. This tutorial covers all 5 tools with working code examples you can copy and run.

Prerequisites

  1. Sign up at https://context.gnist.ai/signup for a free API key (100 calls/day).
  2. Choose your integration method: MCP protocol or REST API.

Connect via MCP

Add to your MCP client config (Claude Desktop, Cursor, etc.):

MCP Config
{
  "mcpServers": {
    "gnist-pubchem": {
      "url": "https://context.gnist.ai/mcp/pubchem/",
      "headers": {
        "Gnist-API-Key": "YOUR_API_KEY"
      }
    }
  }
}

Tools (5)

get_compound

Look up a chemical compound by name, CID, InChIKey, or SMILES. PubChem covers 100M+ chemical structures. Returns molecular formula, weight, SMILES, InChI, IUPAC name, and key physicochemical descriptors. Args: identifier: The compound identifier. Examples: - By name: "aspirin", "caffeine", "glucose" - By CID: "2244" (aspirin's PubChem CID) - By InChIKey: "BSYNRYMUTXBXSQ-UHFFFAOYSA-N" - By SMILES: "CC(=O)Oc1ccccc1C(=O)O" namespace: How to interpret the identifier. One of: "name" (default), "cid", "inchikey", "smiles". Returns: Compound record with cid, iupac_name, molecular_formula, molecular_weight, canonical_smiles, isomeric_smiles, inchi, inchikey, xlogp, exact_mass, tpsa, hbond_donors, hbond_acceptors, rotatable_bonds, heavy_atom_count, charge, and complexity.

ParameterTypeRequiredDescription
identifierstringrequiredThe compound identifier. Examples: - By name: "aspirin", "caffeine", "glucose" - By CID: "2244" (aspirin's PubChem CID) - By InChIKey: "BSYNRYMUTXBXSQ-UHFFFAOYSA-N" - By SMILES: "CC(=O)O...
namespacestringoptionalHow to interpret the identifier. One of: "name" (default), "cid", "inchikey", "smiles". (default: name)
curl -X POST "https://context.gnist.ai/mcp/pubchem/" \
  -H "Content-Type: application/json" \
  -H "Gnist-API-Key: YOUR_API_KEY" \
  -d '{"jsonrpc": "2.0", "method": "tools/call", "id": 1, "params": {"name": "get_compound", "arguments": {"identifier": "12345"}}}'
import httpx

resp = httpx.post(
    "https://context.gnist.ai/mcp/pubchem/",
    headers={"Gnist-API-Key": "YOUR_API_KEY"},
    json={'id': 1,
 'jsonrpc': '2.0',
 'method': 'tools/call',
 'params': {'arguments': {'identifier': '12345'}, 'name': 'get_compound'}},
)
print(resp.json())

search_compounds

Search PubChem for chemical compounds by name or keyword. Returns a compact list of matching compounds (CID, IUPAC name, formula, weight). Use get_compound with the returned CID for full details. Args: query: Chemical name or keyword (e.g., "aspirin", "beta-lactam", "serotonin"). max_results: Number of results to return (1–50, default 10). Returns: List of matching compounds with cid, iupac_name, molecular_formula, molecular_weight.

ParameterTypeRequiredDescription
querystringrequiredChemical name or keyword (e.g., "aspirin", "beta-lactam", "serotonin").
max_resultsintegeroptionalNumber of results to return (1–50, default 10). (default: 10)
curl -X POST "https://context.gnist.ai/mcp/pubchem/" \
  -H "Content-Type: application/json" \
  -H "Gnist-API-Key: YOUR_API_KEY" \
  -d '{"jsonrpc": "2.0", "method": "tools/call", "id": 1, "params": {"name": "search_compounds", "arguments": {"query": "renewable energy"}}}'
import httpx

resp = httpx.post(
    "https://context.gnist.ai/mcp/pubchem/",
    headers={"Gnist-API-Key": "YOUR_API_KEY"},
    json={'id': 1,
 'jsonrpc': '2.0',
 'method': 'tools/call',
 'params': {'arguments': {'query': 'renewable energy'},
            'name': 'search_compounds'}},
)
print(resp.json())

get_compound_properties

Fetch physicochemical properties for a PubChem compound by CID. Returns a richer property set than get_compound, including stereochemistry counts, covalent units, and monoisotopic mass. Args: cid: PubChem Compound ID (e.g., 2244 for aspirin). Found in search_compounds results. properties: Optional list of specific property names to fetch. If omitted, returns all standard properties. Available properties include: MolecularFormula, MolecularWeight, CanonicalSMILES, IsomericSMILES, InChI, InChIKey, IUPACName, XLogP, ExactMass, MonoisotopicMass, TPSA, Complexity, Charge, HBondDonorCount, HBondAcceptorCount, RotatableBondCount, HeavyAtomCount, CovalentUnitCount, AtomStereoCount, BondStereoCount. Returns: Dict of property name → value for the requested properties.

ParameterTypeRequiredDescription
cidintegerrequiredPubChem Compound ID (e.g., 2244 for aspirin). Found in search_compounds results.
propertiesanyoptionalOptional list of specific property names to fetch. If omitted, returns all standard properties. Available properties include: MolecularFormula, MolecularWeight, CanonicalSMILES, IsomericSMILES, InC...
curl -X POST "https://context.gnist.ai/mcp/pubchem/" \
  -H "Content-Type: application/json" \
  -H "Gnist-API-Key: YOUR_API_KEY" \
  -d '{"jsonrpc": "2.0", "method": "tools/call", "id": 1, "params": {"name": "get_compound_properties", "arguments": {"cid": 5}}}'
import httpx

resp = httpx.post(
    "https://context.gnist.ai/mcp/pubchem/",
    headers={"Gnist-API-Key": "YOUR_API_KEY"},
    json={'id': 1,
 'jsonrpc': '2.0',
 'method': 'tools/call',
 'params': {'arguments': {'cid': 5}, 'name': 'get_compound_properties'}},
)
print(resp.json())

find_similar_compounds

Find structurally similar compounds using 2D Tanimoto similarity. Uses PubChem's fast 2D fingerprint similarity search. Useful for identifying drug analogs, structural scaffolds, or related molecules. Args: cid: PubChem Compound ID to use as the query structure. threshold: Tanimoto similarity threshold (0–100, default 90). Higher values return only very close structural matches. 90 is a common threshold for drug analog searches. max_results: Number of similar compounds to return (1–50, default 10). Returns: List of similar compounds with cid, iupac_name, molecular_formula, molecular_weight.

ParameterTypeRequiredDescription
cidintegerrequiredPubChem Compound ID to use as the query structure.
thresholdintegeroptionalTanimoto similarity threshold (0–100, default 90). Higher values return only very close structural matches. 90 is a common threshold for drug analog searches. (default: 90)
max_resultsintegeroptionalNumber of similar compounds to return (1–50, default 10). (default: 10)
curl -X POST "https://context.gnist.ai/mcp/pubchem/" \
  -H "Content-Type: application/json" \
  -H "Gnist-API-Key: YOUR_API_KEY" \
  -d '{"jsonrpc": "2.0", "method": "tools/call", "id": 1, "params": {"name": "find_similar_compounds", "arguments": {"cid": 5}}}'
import httpx

resp = httpx.post(
    "https://context.gnist.ai/mcp/pubchem/",
    headers={"Gnist-API-Key": "YOUR_API_KEY"},
    json={'id': 1,
 'jsonrpc': '2.0',
 'method': 'tools/call',
 'params': {'arguments': {'cid': 5}, 'name': 'find_similar_compounds'}},
)
print(resp.json())

report_feedback

Report a bug, feature request, or general feedback for this data source. Use this when something doesn't work as expected, when you'd like a new feature, or when you have suggestions for improvement. Args: feedback: Describe the issue or suggestion. feedback_type: One of 'bug', 'feature_request', or 'general'.

ParameterTypeRequiredDescription
feedbackstringrequired
feedback_typestringoptional (default: general)
curl -X POST "https://context.gnist.ai/mcp/pubchem/" \
  -H "Content-Type: application/json" \
  -H "Gnist-API-Key: YOUR_API_KEY" \
  -d '{"jsonrpc": "2.0", "method": "tools/call", "id": 1, "params": {"name": "report_feedback", "arguments": {"feedback": "example"}}}'
import httpx

resp = httpx.post(
    "https://context.gnist.ai/mcp/pubchem/",
    headers={"Gnist-API-Key": "YOUR_API_KEY"},
    json={'id': 1,
 'jsonrpc': '2.0',
 'method': 'tools/call',
 'params': {'arguments': {'feedback': 'example'}, 'name': 'report_feedback'}},
)
print(resp.json())

Common Patterns

Search then retrieve
Use search_compounds to find items, then get_compound to get full details. This two-step pattern is common for exploring data before drilling down.
Pagination
Several tools support limit, offset, or page parameters. Start with small limits during development, then increase for production queries.

FAQ

What data does PubChem provide?

Chemical compound data — structures, properties, bioactivity, and safety information. It exposes 5 tools: get_compound, search_compounds, get_compound_properties, find_similar_compounds, report_feedback.

What do I need to get started?

A Gnist API key (free tier: 100 calls/day). Sign up at https://context.gnist.ai/signup.

What format does the PubChem API return?

JSON, via either MCP protocol (JSON-RPC 2.0) or REST API.

Next Steps

Related Tutorials