hnimdbbot - Pure Cinema!

Age	Commit message (Collapse)	Author	Files	Lines
16 hours	feat: add three-level logging with per-request debug output	dev	1	-15/+16
	- New --log-level flag: debug (default info), info, silent debug: every API request logged (method, URL, status, duration) info: normal events (batch progress, entry counts, summaries) silent: only warnings and fatal errors - Replaced all log.Printf/Fatalf calls with level-gated helpers - API request timing added to queryWikiArticle, queryWikidataBatch, downloadFile - Retries and backoff logged in debug mode
27 hours	feat: add -wiki-only flag to rerun only wiki data extraction	dev	1	-11/+21
	- fetchWikiArticlesData is standalone again (re-extracted from consumer) - -wiki-only flag skips SPARQL pipeline, runs only wiki data fetch - Default behavior: full pipeline (SPARQL + wiki data in parallel)
27 hours	refactor: pipeline SPARQL and wiki data in parallel	dev	1	-4/+0
	- Merge fetchWikiArticles + fetchWikiArticlesData into one pipeline - SPARQL producer fetches batches, commits each to DB, forwards resolved articles - Wiki data consumer runs concurrently, fetching at 2s/request - Each SPARQL batch commits independently (no global transaction) - Rate limits respected for both Wikidata SPARQL and wiki server - No parallel requests to either endpoint
33 hours	feat: fetch missing wiki data from custom server and populate imdb table	dev	1	-0/+4
	- Add wiki_server and wiki_username config fields - Query custom server for each wiki_article entry - Extract description, synopsis (Plot), year, poster_url, license, license_url, num_accolades from structured JSON response - Serial processing with 1 req/s rate limit - Update only entries missing at least one target column
35 hours	feat: fetch Wikipedia article titles via Wikidata SPARQL	dev	1	-0/+4
	- Query Wikidata SPARQL in batches of 30 for entries missing wiki_article - Store wiki_article title in imdb table - Respect rate limits with configurable delay and retry on 5xx/429 - Skip entries that have no Wikipedia article - Removed unique constraint on wiki_article (multiple entries can share one)
3 days	feat: fetchAndUpdateImdbData — download IMDB datasets and populate imdb table	dev	1	-0/+4
	- Check for imdb entries with NULL average_rating - Download title.basics.tsv.gz and title.ratings.tsv.gz to imdbdata/ - Decompress alongside gzip originals - Parse only rows matching our imdb_ids (memory-efficient) - Update: average_rating, num_votes, title_type, primary_title, original_title, start_year, runtime_minutes - Results: 3394 ratings, 3093 basics updated out of 3448 entries
3 days	feat: populate imdb table with unique title IDs from links	dev	1	-0/+91
	- Extract distinct IMDb title IDs from links.param (host=imdb.com) - Skip IDs already in imdb table and non-title params (nm, ls, etc.) - Insert 3448 unique title IDs into imdb.imdb_id
3 days	feat: extract IMDB title IDs from links URLs into param field	dev	1	-15/+70
	- Query links table for IMDB title URLs (field=1, host=imdb.com) - Extract ttIDs via regex and batch-update links.param - 5662 rows updated successfully
3 days	feat: switch config to JSON; add go.mod and config.json.example	dev	1	-4/+5
	- Replace Viper-based config with encoding/json (config.go) - Add config.json with sensible defaults (gitignored) - Add config.json.example with empty values as reference - Initialize go module (go.mod) - Update main.go to use LoadConfig()
3 days	Initial commit	dev	1	-0/+46