| Age | Commit message (Collapse) | Author | Files | Lines |
|
- New --log-level flag: debug (default info), info, silent
debug: every API request logged (method, URL, status, duration)
info: normal events (batch progress, entry counts, summaries)
silent: only warnings and fatal errors
- Replaced all log.Printf/Fatalf calls with level-gated helpers
- API request timing added to queryWikiArticle, queryWikidataBatch, downloadFile
- Retries and backoff logged in debug mode
|
|
- fetchWikiArticlesData is standalone again (re-extracted from consumer)
- -wiki-only flag skips SPARQL pipeline, runs only wiki data fetch
- Default behavior: full pipeline (SPARQL + wiki data in parallel)
|
|
- Merge fetchWikiArticles + fetchWikiArticlesData into one pipeline
- SPARQL producer fetches batches, commits each to DB, forwards resolved articles
- Wiki data consumer runs concurrently, fetching at 2s/request
- Each SPARQL batch commits independently (no global transaction)
- Rate limits respected for both Wikidata SPARQL and wiki server
- No parallel requests to either endpoint
|
|
- Add wiki_server and wiki_username config fields
- Query custom server for each wiki_article entry
- Extract description, synopsis (Plot), year, poster_url, license,
license_url, num_accolades from structured JSON response
- Serial processing with 1 req/s rate limit
- Update only entries missing at least one target column
|
|
- Query Wikidata SPARQL in batches of 30 for entries missing wiki_article
- Store wiki_article title in imdb table
- Respect rate limits with configurable delay and retry on 5xx/429
- Skip entries that have no Wikipedia article
- Removed unique constraint on wiki_article (multiple entries can share one)
|
|
- Check for imdb entries with NULL average_rating
- Download title.basics.tsv.gz and title.ratings.tsv.gz to imdbdata/
- Decompress alongside gzip originals
- Parse only rows matching our imdb_ids (memory-efficient)
- Update: average_rating, num_votes, title_type, primary_title,
original_title, start_year, runtime_minutes
- Results: 3394 ratings, 3093 basics updated out of 3448 entries
|
|
- Extract distinct IMDb title IDs from links.param (host=imdb.com)
- Skip IDs already in imdb table and non-title params (nm, ls, etc.)
- Insert 3448 unique title IDs into imdb.imdb_id
|
|
- Query links table for IMDB title URLs (field=1, host=imdb.com)
- Extract ttIDs via regex and batch-update links.param
- 5662 rows updated successfully
|
|
- Replace Viper-based config with encoding/json (config.go)
- Add config.json with sensible defaults (gitignored)
- Add config.json.example with empty values as reference
- Initialize go module (go.mod)
- Update main.go to use LoadConfig()
|
|
|