TSN research crawler
KK7NQN RadioCrawler
RadioCrawler is a locked-domain research crawler designed to help TSN investigate transcript-derived leads against known source domains.
Overview
Focused transcript research
KK7NQN RadioCrawler is a locked-domain research crawler designed to support the TSN transcript system. TSN can dispatch it to investigate facts, topics, events, callsigns, organizations, and other details discovered in transcript text.
RadioCrawler does not perform broad open-web crawling. It crawls only the domain it is explicitly assigned, making it suitable for focused research, source verification, and follow-up investigation from TSN transcript topics.
Purpose
From transcript mention to source context
RadioCrawler helps TSN move from "this was mentioned in a transcript" to "here is supporting context from a known source."
TSN can send RadioCrawler a domain, optional seed URLs, and a transcript-related topic or query. RadioCrawler then crawls that domain, gathers relevant pages, classifies findings, creates summaries, and exports the result set back to TSN for cross-reference.
Core features
Conservative crawl behavior
- Locked-domain crawling only.
- Automatic sitemap discovery from
robots.txtand/sitemap.xml. - Recursive sitemap index support.
robots.txtcompliance before page fetches.- Per-domain rate limiting.
TSN integration
Dispatch and return data
RadioCrawler is built to be dispatched by TSN when transcript analysis finds something worth checking.
| TSN can provide | RadioCrawler returns |
|---|---|
| A locked domain to crawl. | Crawled URLs. |
| One or more seed URLs. | Page titles and snippets. |
| A topic, phrase, callsign, event name, or transcript-derived query. | HTTP status and content metadata. |
| Optional transcript or topic metadata. | Classification results. |
| Crawl limits such as max pages, max depth, and rate delay. | Source-domain statistics. |
Use cases
Transcript follow-up
- Verify a mentioned event.
- Research a callsign or organization.
- Check a public notice or linked source.
- Gather background context for a transcript topic.
- Find related pages on a trusted domain.
- Produce structured research output for later review.
AI-assisted analysis
Optional model support
RadioCrawler supports two optional model-assisted analysis modes that can be toggled per crawl.
The 3B model is used for fast per-page work. It can classify whether a page is relevant to the TSN topic and produce short page-level summaries.
The 72B model is used for final crawl-level synthesis. When enabled, it reviews the collected result set and produces an overall summary intended for TSN cross-reference and human review.
Design philosophy
Targeted, respectful, auditable
RadioCrawler is intentionally conservative. It is not a general-purpose search crawler. It is designed for targeted, respectful, auditable research against a known domain.
Its job is to help TSN investigate transcript-derived leads without wandering across unrelated websites or generating noisy results.