| Age | Commit message (Collapse) | Author | Files | Lines |
|
- Parse genres field (rec[8]) from title.basics.tsv, split by comma
- Insert into genre table via SELECT to resolve imdb.id from imdb_id
- Update fetchAndUpdateImdbData to check for missing genres too
- Skip download if TSV already exists (supports stubbed downloadFile)
|
|
- Replace csv.Reader with bufio.Scanner to avoid quote-parsing issues
that skipped ~355 entries (e.g. tt1853728 was on line 4.8M and got
lost when csv.Reader encountered malformed quoted fields earlier)
- Fix column indices: startYear=rec[5], runtimeMinutes=rec[7]
(was rec[4]/rec[5] which mapped to isAdult/startYear)
- Update basics for ALL imdb entries, not just those missing ratings
|
|
|
|
|
|
- Check for imdb entries with NULL average_rating
- Download title.basics.tsv.gz and title.ratings.tsv.gz to imdbdata/
- Decompress alongside gzip originals
- Parse only rows matching our imdb_ids (memory-efficient)
- Update: average_rating, num_votes, title_type, primary_title,
original_title, start_year, runtime_minutes
- Results: 3394 ratings, 3093 basics updated out of 3448 entries
|