<feed xmlns='http://www.w3.org/2005/Atom'>
<title>hnimdbbot/src/imdbdata.go, branch main</title>
<subtitle>Pure Cinema!</subtitle>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/'/>
<entry>
<title>feat: add three-level logging with per-request debug output</title>
<updated>2026-06-26T12:14:52+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-26T12:14:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=06536f57b1fdc76212da6b85fbc9287cc4f0de70'/>
<id>06536f57b1fdc76212da6b85fbc9287cc4f0de70</id>
<content type='text'>
- New --log-level flag: debug (default info), info, silent
  debug: every API request logged (method, URL, status, duration)
  info:  normal events (batch progress, entry counts, summaries)
  silent: only warnings and fatal errors
- Replaced all log.Printf/Fatalf calls with level-gated helpers
- API request timing added to queryWikiArticle, queryWikidataBatch, downloadFile
- Retries and backoff logged in debug mode
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- New --log-level flag: debug (default info), info, silent
  debug: every API request logged (method, URL, status, duration)
  info:  normal events (batch progress, entry counts, summaries)
  silent: only warnings and fatal errors
- Replaced all log.Printf/Fatalf calls with level-gated helpers
- API request timing added to queryWikiArticle, queryWikidataBatch, downloadFile
- Retries and backoff logged in debug mode
</pre>
</div>
</content>
</entry>
<entry>
<title>fix: use INSERT IGNORE for imdb_genre to handle re-runs</title>
<updated>2026-06-24T02:53:05+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-24T02:53:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=d41b60d08fdd5a6589cdb4e33ac1931fa16aef4c'/>
<id>d41b60d08fdd5a6589cdb4e33ac1931fa16aef4c</id>
<content type='text'>
The previous run left partial data after a mid-transaction rollback.
INSERT IGNORE makes the junction table insert idempotent.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The previous run left partial data after a mid-transaction rollback.
INSERT IGNORE makes the junction table insert idempotent.
</pre>
</div>
</content>
</entry>
<entry>
<title>feat: adapt genre code for n:m relation via imdb_genre</title>
<updated>2026-06-24T02:40:22+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-24T02:40:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=1d20ca594c4246a3fcd63c52911b6d56c0aa503e'/>
<id>1d20ca594c4246a3fcd63c52911b6d56c0aa503e</id>
<content type='text'>
- genre table: (id, name) with unique name constraint
- imdb_genre table: (id, imdb_id, genre_id) junction table
- Upsert genres via INSERT ... ON DUPLICATE KEY UPDATE
- Link via imdb_genre using LAST_INSERT_ID
- Check missing genres via LEFT JOIN imdb_genre
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- genre table: (id, name) with unique name constraint
- imdb_genre table: (id, imdb_id, genre_id) junction table
- Upsert genres via INSERT ... ON DUPLICATE KEY UPDATE
- Link via imdb_genre using LAST_INSERT_ID
- Check missing genres via LEFT JOIN imdb_genre
</pre>
</div>
</content>
</entry>
<entry>
<title>feat: populate genre table from title.basics.tsv</title>
<updated>2026-06-24T02:26:46+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-24T02:26:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=6d5231a204790dae325a0557d908c2c6d15bb516'/>
<id>6d5231a204790dae325a0557d908c2c6d15bb516</id>
<content type='text'>
- Parse genres field (rec[8]) from title.basics.tsv, split by comma
- Insert into genre table via SELECT to resolve imdb.id from imdb_id
- Update fetchAndUpdateImdbData to check for missing genres too
- Skip download if TSV already exists (supports stubbed downloadFile)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Parse genres field (rec[8]) from title.basics.tsv, split by comma
- Insert into genre table via SELECT to resolve imdb.id from imdb_id
- Update fetchAndUpdateImdbData to check for missing genres too
- Skip download if TSV already exists (supports stubbed downloadFile)
</pre>
</div>
</content>
</entry>
<entry>
<title>fix: correct TSV parsing — use line-by-line reader and proper column indices</title>
<updated>2026-06-24T02:21:20+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-24T02:04:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=a55f6e227ff397a5d9167cd4ee15442e9cad06ab'/>
<id>a55f6e227ff397a5d9167cd4ee15442e9cad06ab</id>
<content type='text'>
- Replace csv.Reader with bufio.Scanner to avoid quote-parsing issues
  that skipped ~355 entries (e.g. tt1853728 was on line 4.8M and got
  lost when csv.Reader encountered malformed quoted fields earlier)
- Fix column indices: startYear=rec[5], runtimeMinutes=rec[7]
  (was rec[4]/rec[5] which mapped to isAdult/startYear)
- Update basics for ALL imdb entries, not just those missing ratings
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Replace csv.Reader with bufio.Scanner to avoid quote-parsing issues
  that skipped ~355 entries (e.g. tt1853728 was on line 4.8M and got
  lost when csv.Reader encountered malformed quoted fields earlier)
- Fix column indices: startYear=rec[5], runtimeMinutes=rec[7]
  (was rec[4]/rec[5] which mapped to isAdult/startYear)
- Update basics for ALL imdb entries, not just those missing ratings
</pre>
</div>
</content>
</entry>
<entry>
<title>chore: delete .gz files after extracting in downloadImdbDatasets</title>
<updated>2026-06-24T01:52:01+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-24T01:52:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=256c372033bf0ccb6d27ae05c953fa0c18981bf3'/>
<id>256c372033bf0ccb6d27ae05c953fa0c18981bf3</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>move download path</title>
<updated>2026-06-24T01:48:42+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-24T01:48:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=527be75d74aa49cba83158a638ee24ea29a5e5d3'/>
<id>527be75d74aa49cba83158a638ee24ea29a5e5d3</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>feat: fetchAndUpdateImdbData — download IMDB datasets and populate imdb table</title>
<updated>2026-06-24T01:46:14+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-24T01:46:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=86069f011f35e339a30ffb717308990369c5f29f'/>
<id>86069f011f35e339a30ffb717308990369c5f29f</id>
<content type='text'>
- Check for imdb entries with NULL average_rating
- Download title.basics.tsv.gz and title.ratings.tsv.gz to imdbdata/
- Decompress alongside gzip originals
- Parse only rows matching our imdb_ids (memory-efficient)
- Update: average_rating, num_votes, title_type, primary_title,
  original_title, start_year, runtime_minutes
- Results: 3394 ratings, 3093 basics updated out of 3448 entries
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Check for imdb entries with NULL average_rating
- Download title.basics.tsv.gz and title.ratings.tsv.gz to imdbdata/
- Decompress alongside gzip originals
- Parse only rows matching our imdb_ids (memory-efficient)
- Update: average_rating, num_votes, title_type, primary_title,
  original_title, start_year, runtime_minutes
- Results: 3394 ratings, 3093 basics updated out of 3448 entries
</pre>
</div>
</content>
</entry>
</feed>
