hnimdbbot - Pure Cinema!

Age	Commit message (Collapse)	Author	Files	Lines
17 hours	fix: add nil checks in extractPeople for missing infobox/section data	dev	1	-14/+19
	- Guard section.has_parts type assertion in extractPeople - Guard Cast section has_parts iteration with ok check
18 hours	feat: extract actors, directors, screenwriters from Wikipedia API	dev	1	-6/+144
	- Extract directors from infobox 'Directed by' field/list - Extract screenwriters from infobox 'Screenplay by' list - Extract actors from Cast section list (first link = person name) - Upsert into people table, link via who table (profession: actor=1, director=2, screenwriter=3) - Track processed entries with has_people flag column - Consumer inserts people and marks has_people=1 on success
27 hours	refactor: pipeline SPARQL and wiki data in parallel	dev	1	-122/+0
	- Merge fetchWikiArticles + fetchWikiArticlesData into one pipeline - SPARQL producer fetches batches, commits each to DB, forwards resolved articles - Wiki data consumer runs concurrently, fetching at 2s/request - Each SPARQL batch commits independently (no global transaction) - Rate limits respected for both Wikidata SPARQL and wiki server - No parallel requests to either endpoint
28 hours	.	dev	1	-8/+0

29 hours	refactor: decode wiki_article names once in DB, encode on send	dev	1	-4/+0
	- One-shot migration decoded 187 percent-encoded rows - Removed decode-on-read from wikiarticle.go (no longer needed) - wikidata.go still decodes SPARQL URLs before storing (for future inserts) - wikiarticle.go encodes on send via url.PathEscape
29 hours	fix: decode wiki article names for clean storage	dev	1	-3/+6
	- wikidata.go: url.PathUnescape SPARQL titles before storing - wikiarticle.go: PathUnescape on read, PathEscape on send - DB holds decoded names; URLs are always freshly encoded
29 hours	fix: avoid double URL-encoding of wiki article names	dev	1	-4/+11
	- wiki_article values are already URL-encoded in the DB - Build query URL manually instead of url.Values.Encode() - Only escape username (not pre-encoded)
29 hours	feat: track wiki_status_code and skip 404 entries on rerun	dev	1	-23/+47
	- queryWikiArticle returns HTTP status code alongside entry data - Always record wiki_status_code for every request (success or failure) - Skip entries with wiki_status_code = 404 in future runs - Only update data fields on HTTP 200; non-200 only records status - Log line shows updated vs skipped (non-200) counts
33 hours	fix: add 429 retry with exponential backoff and increase rate limit delay	dev	1	-9/+32
	- Retry up to 5 times on HTTP 429 with 2s/4s/8s/16s backoff - Move inter-request delay before each request (was after) - Increase base delay from 1s to 2s between requests - Fix: only sleep after first request (skip delay on first call)
33 hours	feat: fetch missing wiki data from custom server and populate imdb table	dev	1	-0/+283
	- Add wiki_server and wiki_username config fields - Query custom server for each wiki_article entry - Extract description, synopsis (Plot), year, poster_url, license, license_url, num_accolades from structured JSON response - Serial processing with 1 req/s rate limit - Update only entries missing at least one target column