| Age | Commit message (Collapse) | Author | Files | Lines |
|
- New --log-level flag: debug (default info), info, silent
debug: every API request logged (method, URL, status, duration)
info: normal events (batch progress, entry counts, summaries)
silent: only warnings and fatal errors
- Replaced all log.Printf/Fatalf calls with level-gated helpers
- API request timing added to queryWikiArticle, queryWikidataBatch, downloadFile
- Retries and backoff logged in debug mode
|
|
The previous run left partial data after a mid-transaction rollback.
INSERT IGNORE makes the junction table insert idempotent.
|
|
- genre table: (id, name) with unique name constraint
- imdb_genre table: (id, imdb_id, genre_id) junction table
- Upsert genres via INSERT ... ON DUPLICATE KEY UPDATE
- Link via imdb_genre using LAST_INSERT_ID
- Check missing genres via LEFT JOIN imdb_genre
|
|
- Parse genres field (rec[8]) from title.basics.tsv, split by comma
- Insert into genre table via SELECT to resolve imdb.id from imdb_id
- Update fetchAndUpdateImdbData to check for missing genres too
- Skip download if TSV already exists (supports stubbed downloadFile)
|
|
- Replace csv.Reader with bufio.Scanner to avoid quote-parsing issues
that skipped ~355 entries (e.g. tt1853728 was on line 4.8M and got
lost when csv.Reader encountered malformed quoted fields earlier)
- Fix column indices: startYear=rec[5], runtimeMinutes=rec[7]
(was rec[4]/rec[5] which mapped to isAdult/startYear)
- Update basics for ALL imdb entries, not just those missing ratings
|
|
|
|
|
|
- Check for imdb entries with NULL average_rating
- Download title.basics.tsv.gz and title.ratings.tsv.gz to imdbdata/
- Decompress alongside gzip originals
- Parse only rows matching our imdb_ids (memory-efficient)
- Update: average_rating, num_votes, title_type, primary_title,
original_title, start_year, runtime_minutes
- Results: 3394 ratings, 3093 basics updated out of 3448 entries
|