hnimdbbot, branch main

feat: add license_short from Wikipedia license identifier

2026-06-26T16:36:46+00:00

- Extract license.identifier (e.g. CC-BY-SA-4.0) into new
  license_short column
- Warn if a license array has more than 1 entry (none seen yet)
- Include license_short IS NULL in getExistingWikiArticles query

fix: only count awards tables in num_accolades

2026-06-26T16:15:45+00:00

extractAccolades was summing rows from all tables (including
episode lists), producing inflated counts (e.g. 708 for
Unreported_World which has 0 actual awards). Now it filters
to tables whose headers contain 'Award'.

feat: add three-level logging with per-request debug output

2026-06-26T12:14:52+00:00

- New --log-level flag: debug (default info), info, silent
  debug: every API request logged (method, URL, status, duration)
  info:  normal events (batch progress, entry counts, summaries)
  silent: only warnings and fatal errors
- Replaced all log.Printf/Fatalf calls with level-gated helpers
- API request timing added to queryWikiArticle, queryWikidataBatch, downloadFile
- Retries and backoff logged in debug mode

chore: sync schema.sql with live database

2026-06-26T11:54:23+00:00

- Add missing columns to imdb: has_no_wiki_article, description,
  license, license_url, num_accolades, wiki_status_code, has_people
- Fix links table: update to match actual DDL with story_id FK
- Fix orphaned links_imdb definition (was missing CREATE TABLE header)

fix: add nil checks in extractPeople for missing infobox/section data

2026-06-26T11:19:02+00:00

- Guard section.has_parts type assertion in extractPeople
- Guard Cast section has_parts iteration with ok check

feat: extract actors, directors, screenwriters from Wikipedia API

2026-06-26T11:10:41+00:00

- Extract directors from infobox 'Directed by' field/list
- Extract screenwriters from infobox 'Screenplay by' list
- Extract actors from Cast section list (first link = person name)
- Upsert into people table, link via who table (profession: actor=1, director=2, screenwriter=3)
- Track processed entries with has_people flag column
- Consumer inserts people and marks has_people=1 on success

fix: prevent dropped wiki entries when channel fills

2026-06-26T02:19:44+00:00

- Remove non-blocking select/default that silently dropped entries
- Channel sized to hold all pending entries (existing + SPARQL)
- Blocking send backpressures SPARQL if consumer is slow

feat: add -wiki-only flag to rerun only wiki data extraction

2026-06-26T01:37:51+00:00

- fetchWikiArticlesData is standalone again (re-extracted from consumer)
- -wiki-only flag skips SPARQL pipeline, runs only wiki data fetch
- Default behavior: full pipeline (SPARQL + wiki data in parallel)

refactor: pipeline SPARQL and wiki data in parallel

2026-06-26T01:26:07+00:00

- Merge fetchWikiArticles + fetchWikiArticlesData into one pipeline
- SPARQL producer fetches batches, commits each to DB, forwards resolved articles
- Wiki data consumer runs concurrently, fetching at 2s/request
- Each SPARQL batch commits independently (no global transaction)
- Rate limits respected for both Wikidata SPARQL and wiki server
- No parallel requests to either endpoint

.

2026-06-26T00:48:18+00:00