Data source: Wayback Machine
Overview
Wayback Machine wraps Wayback Machine, handling authentication, pagination, and rate limits for you. This tutorial covers all 5 tools with working code examples you can copy and run.
Prerequisites
- Sign up at https://context.gnist.ai/signup for a free API key (100 calls/day).
- Choose your integration method: MCP protocol or REST API.
Connect via MCP
Add to your MCP client config (Claude Desktop, Cursor, etc.):
{
"mcpServers": {
"gnist-wayback": {
"url": "https://context.gnist.ai/mcp/wayback/",
"headers": {
"Gnist-API-Key": "YOUR_API_KEY"
}
}
}
}
Tools (5)
check_availability
Check if a URL has been archived in the Wayback Machine and find the closest snapshot. Returns a dict with 'url', 'available' (bool), and 'closest' snapshot if available. The 'wayback_url' field contains the direct Wayback Machine link to the archived page.
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | required | The URL to look up (e.g. "https://example.com/page"). |
timestamp | any | optional | Optional target date/time (YYYY-MM-DD or YYYYMMDD). When provided, the Wayback Machine returns the snapshot closest to that date. Omit to get the most recent snapshot. |
curl -X POST "https://context.gnist.ai/mcp/wayback/" \
-H "Content-Type: application/json" \
-H "Gnist-API-Key: YOUR_API_KEY" \
-d '{"jsonrpc": "2.0", "method": "tools/call", "id": 1, "params": {"name": "check_availability", "arguments": {"url": "https://example.com/page"}}}'
import httpx
resp = httpx.post(
"https://context.gnist.ai/mcp/wayback/",
headers={"Gnist-API-Key": "YOUR_API_KEY"},
json={'id': 1,
'jsonrpc': '2.0',
'method': 'tools/call',
'params': {'arguments': {'url': 'https://example.com/page'},
'name': 'check_availability'}},
)
print(resp.json())
search_snapshots
Search the Wayback Machine CDX index for archived snapshots of a URL. Returns a deduplicated list of snapshots sorted chronologically. Useful for tracking how a page has changed over time or verifying when content was live. Each snapshot has url, wayback_url, timestamp, date, status_code, mimetype, and length (bytes). Use 'wayback_url' with a fetch tool to retrieve the actual archived page content.
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | required | The URL to search (e.g. "https://example.com/page"). Use wildcard prefix matching with "example.com/*" to capture all pages on a domain. |
date_from | any | optional | Only include captures on or after this date (YYYY-MM-DD). Optional. |
date_to | any | optional | Only include captures on or before this date (YYYY-MM-DD). Optional. |
limit | integer | optional | Maximum snapshots to return (1-100, default 20). (default: 20) |
status_code | any | optional | Filter by HTTP status at capture time (e.g. "200", "404"). Optional. |
mimetype | any | optional | Filter by MIME type (e.g. "text/html", "application/pdf"). Optional. |
curl -X POST "https://context.gnist.ai/mcp/wayback/" \
-H "Content-Type: application/json" \
-H "Gnist-API-Key: YOUR_API_KEY" \
-d '{"jsonrpc": "2.0", "method": "tools/call", "id": 1, "params": {"name": "search_snapshots", "arguments": {"url": "https://example.com/page"}}}'
import httpx
resp = httpx.post(
"https://context.gnist.ai/mcp/wayback/",
headers={"Gnist-API-Key": "YOUR_API_KEY"},
json={'id': 1,
'jsonrpc': '2.0',
'method': 'tools/call',
'params': {'arguments': {'url': 'https://example.com/page'},
'name': 'search_snapshots'}},
)
print(resp.json())
get_capture_calendar
Get monthly capture counts for a URL in a given year from the Wayback Machine. Returns total captures for the year and a breakdown by month (YYYY-MM keys). Useful for understanding archival coverage and activity patterns for a URL.
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | required | The URL to get capture calendar for (e.g. "https://example.com"). |
year | integer | required | Year to retrieve calendar for (e.g. 2023). |
curl -X POST "https://context.gnist.ai/mcp/wayback/" \
-H "Content-Type: application/json" \
-H "Gnist-API-Key: YOUR_API_KEY" \
-d '{"jsonrpc": "2.0", "method": "tools/call", "id": 1, "params": {"name": "get_capture_calendar", "arguments": {"url": "https://example.com", "year": 2023}}}'
import httpx
resp = httpx.post(
"https://context.gnist.ai/mcp/wayback/",
headers={"Gnist-API-Key": "YOUR_API_KEY"},
json={'id': 1,
'jsonrpc': '2.0',
'method': 'tools/call',
'params': {'arguments': {'url': 'https://example.com', 'year': 2023},
'name': 'get_capture_calendar'}},
)
print(resp.json())
get_capture_stats
Get yearly capture statistics for a URL from the Wayback Machine. Returns total captures across all years, the first and last capture years, and a per-year breakdown. Useful for getting an overview of how long and how frequently a URL has been archived.
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | required | The URL to get capture statistics for (e.g. "https://example.com"). |
curl -X POST "https://context.gnist.ai/mcp/wayback/" \
-H "Content-Type: application/json" \
-H "Gnist-API-Key: YOUR_API_KEY" \
-d '{"jsonrpc": "2.0", "method": "tools/call", "id": 1, "params": {"name": "get_capture_stats", "arguments": {"url": "https://example.com"}}}'
import httpx
resp = httpx.post(
"https://context.gnist.ai/mcp/wayback/",
headers={"Gnist-API-Key": "YOUR_API_KEY"},
json={'id': 1,
'jsonrpc': '2.0',
'method': 'tools/call',
'params': {'arguments': {'url': 'https://example.com'},
'name': 'get_capture_stats'}},
)
print(resp.json())
report_feedback
Report a bug, feature request, or general feedback for this data source. Use this when something doesn't work as expected, when you'd like a new feature, or when you have suggestions for improvement. Args: feedback: Describe the issue or suggestion. feedback_type: One of 'bug', 'feature_request', or 'general'.
| Parameter | Type | Required | Description |
|---|---|---|---|
feedback | string | required | |
feedback_type | string | optional | (default: general) |
curl -X POST "https://context.gnist.ai/mcp/wayback/" \
-H "Content-Type: application/json" \
-H "Gnist-API-Key: YOUR_API_KEY" \
-d '{"jsonrpc": "2.0", "method": "tools/call", "id": 1, "params": {"name": "report_feedback", "arguments": {"feedback": "example"}}}'
import httpx
resp = httpx.post(
"https://context.gnist.ai/mcp/wayback/",
headers={"Gnist-API-Key": "YOUR_API_KEY"},
json={'id': 1,
'jsonrpc': '2.0',
'method': 'tools/call',
'params': {'arguments': {'feedback': 'example'}, 'name': 'report_feedback'}},
)
print(resp.json())
Common Patterns
Use
search_snapshots to find items, then get_capture_calendar to get full details. This two-step pattern is common for exploring data before drilling down.Several tools support
limit, offset, or page parameters. Start with small limits during development, then increase for production queries.Use date range parameters to narrow results to a specific time window. Dates are typically in
YYYY-MM-DD format.FAQ
What data does Wayback Machine provide?
Internet Archive lookups — snapshot search, availability checks, capture calendars, and yearly statistics. It exposes 5 tools: check_availability, search_snapshots, get_capture_calendar, get_capture_stats, report_feedback.
What do I need to get started?
A Gnist API key (free tier: 100 calls/day). Sign up at https://context.gnist.ai/signup.
What format does the Wayback Machine API return?
JSON, via either MCP protocol (JSON-RPC 2.0) or REST API.