<feed xmlns='http://www.w3.org/2005/Atom'>
<title>hnimdbbot/src/wikidata.go, branch main</title>
<subtitle>Pure Cinema!</subtitle>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/'/>
<entry>
<title>feat: add license_short from Wikipedia license identifier</title>
<updated>2026-06-26T16:36:46+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-26T16:36:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=2d72767874f6972726ef09082373e6ac01da169a'/>
<id>2d72767874f6972726ef09082373e6ac01da169a</id>
<content type='text'>
- Extract license.identifier (e.g. CC-BY-SA-4.0) into new
  license_short column
- Warn if a license array has more than 1 entry (none seen yet)
- Include license_short IS NULL in getExistingWikiArticles query
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Extract license.identifier (e.g. CC-BY-SA-4.0) into new
  license_short column
- Warn if a license array has more than 1 entry (none seen yet)
- Include license_short IS NULL in getExistingWikiArticles query
</pre>
</div>
</content>
</entry>
<entry>
<title>feat: add three-level logging with per-request debug output</title>
<updated>2026-06-26T12:14:52+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-26T12:14:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=06536f57b1fdc76212da6b85fbc9287cc4f0de70'/>
<id>06536f57b1fdc76212da6b85fbc9287cc4f0de70</id>
<content type='text'>
- New --log-level flag: debug (default info), info, silent
  debug: every API request logged (method, URL, status, duration)
  info:  normal events (batch progress, entry counts, summaries)
  silent: only warnings and fatal errors
- Replaced all log.Printf/Fatalf calls with level-gated helpers
- API request timing added to queryWikiArticle, queryWikidataBatch, downloadFile
- Retries and backoff logged in debug mode
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- New --log-level flag: debug (default info), info, silent
  debug: every API request logged (method, URL, status, duration)
  info:  normal events (batch progress, entry counts, summaries)
  silent: only warnings and fatal errors
- Replaced all log.Printf/Fatalf calls with level-gated helpers
- API request timing added to queryWikiArticle, queryWikidataBatch, downloadFile
- Retries and backoff logged in debug mode
</pre>
</div>
</content>
</entry>
<entry>
<title>feat: extract actors, directors, screenwriters from Wikipedia API</title>
<updated>2026-06-26T11:10:41+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-26T11:10:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=ef75353f3710d7566aa8b41922f776ecb3968830'/>
<id>ef75353f3710d7566aa8b41922f776ecb3968830</id>
<content type='text'>
- Extract directors from infobox 'Directed by' field/list
- Extract screenwriters from infobox 'Screenplay by' list
- Extract actors from Cast section list (first link = person name)
- Upsert into people table, link via who table (profession: actor=1, director=2, screenwriter=3)
- Track processed entries with has_people flag column
- Consumer inserts people and marks has_people=1 on success
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Extract directors from infobox 'Directed by' field/list
- Extract screenwriters from infobox 'Screenplay by' list
- Extract actors from Cast section list (first link = person name)
- Upsert into people table, link via who table (profession: actor=1, director=2, screenwriter=3)
- Track processed entries with has_people flag column
- Consumer inserts people and marks has_people=1 on success
</pre>
</div>
</content>
</entry>
<entry>
<title>fix: prevent dropped wiki entries when channel fills</title>
<updated>2026-06-26T02:19:44+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-26T02:19:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=6abbf29de5b08448005df974e95bf773de304550'/>
<id>6abbf29de5b08448005df974e95bf773de304550</id>
<content type='text'>
- Remove non-blocking select/default that silently dropped entries
- Channel sized to hold all pending entries (existing + SPARQL)
- Blocking send backpressures SPARQL if consumer is slow
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Remove non-blocking select/default that silently dropped entries
- Channel sized to hold all pending entries (existing + SPARQL)
- Blocking send backpressures SPARQL if consumer is slow
</pre>
</div>
</content>
</entry>
<entry>
<title>feat: add -wiki-only flag to rerun only wiki data extraction</title>
<updated>2026-06-26T01:37:51+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-26T01:37:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=15d06c9802d08037283aa218ccc2f92a9236fcc9'/>
<id>15d06c9802d08037283aa218ccc2f92a9236fcc9</id>
<content type='text'>
- fetchWikiArticlesData is standalone again (re-extracted from consumer)
- -wiki-only flag skips SPARQL pipeline, runs only wiki data fetch
- Default behavior: full pipeline (SPARQL + wiki data in parallel)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- fetchWikiArticlesData is standalone again (re-extracted from consumer)
- -wiki-only flag skips SPARQL pipeline, runs only wiki data fetch
- Default behavior: full pipeline (SPARQL + wiki data in parallel)
</pre>
</div>
</content>
</entry>
<entry>
<title>refactor: pipeline SPARQL and wiki data in parallel</title>
<updated>2026-06-26T01:26:07+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-26T01:26:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=8e2d742e59b3923852e1ef6e7a5e2ee1de14ce45'/>
<id>8e2d742e59b3923852e1ef6e7a5e2ee1de14ce45</id>
<content type='text'>
- Merge fetchWikiArticles + fetchWikiArticlesData into one pipeline
- SPARQL producer fetches batches, commits each to DB, forwards resolved articles
- Wiki data consumer runs concurrently, fetching at 2s/request
- Each SPARQL batch commits independently (no global transaction)
- Rate limits respected for both Wikidata SPARQL and wiki server
- No parallel requests to either endpoint
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Merge fetchWikiArticles + fetchWikiArticlesData into one pipeline
- SPARQL producer fetches batches, commits each to DB, forwards resolved articles
- Wiki data consumer runs concurrently, fetching at 2s/request
- Each SPARQL batch commits independently (no global transaction)
- Rate limits respected for both Wikidata SPARQL and wiki server
- No parallel requests to either endpoint
</pre>
</div>
</content>
</entry>
<entry>
<title>fix: decode wiki article names for clean storage</title>
<updated>2026-06-26T00:01:08+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-26T00:01:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=122a4ec1037dfd027d9f3f7d5d25ce63dfe4450a'/>
<id>122a4ec1037dfd027d9f3f7d5d25ce63dfe4450a</id>
<content type='text'>
- wikidata.go: url.PathUnescape SPARQL titles before storing
- wikiarticle.go: PathUnescape on read, PathEscape on send
- DB holds decoded names; URLs are always freshly encoded
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- wikidata.go: url.PathUnescape SPARQL titles before storing
- wikiarticle.go: PathUnescape on read, PathEscape on send
- DB holds decoded names; URLs are always freshly encoded
</pre>
</div>
</content>
</entry>
<entry>
<title>fix: skip already-classified entries in wikidata query</title>
<updated>2026-06-25T18:45:58+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-25T18:45:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=a5e3f8447022a50080a62285e359d38e0875de21'/>
<id>a5e3f8447022a50080a62285e359d38e0875de21</id>
<content type='text'>
Add has_no_wiki_article = 0 filter so entries previously marked as
having no Wikipedia article are not re-queried on subsequent runs.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add has_no_wiki_article = 0 filter so entries previously marked as
having no Wikipedia article are not re-queried on subsequent runs.
</pre>
</div>
</content>
</entry>
<entry>
<title>feat: set has_no_wiki_article flag for entries without Wikipedia article</title>
<updated>2026-06-25T18:29:42+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-25T18:29:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=1972c3f0a93c23d861f8b00e4b3570f450d4519a'/>
<id>1972c3f0a93c23d861f8b00e4b3570f450d4519a</id>
<content type='text'>
- Mark entries as has_no_wiki_article=1 when Wikidata returns no result
- Also mark entries in batches that failed with HTTP errors
- Re-run populated 2705 wiki articles, 592 marked as no wiki
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Mark entries as has_no_wiki_article=1 when Wikidata returns no result
- Also mark entries in batches that failed with HTTP errors
- Re-run populated 2705 wiki articles, 592 marked as no wiki
</pre>
</div>
</content>
</entry>
<entry>
<title>feat: fetch Wikipedia article titles via Wikidata SPARQL</title>
<updated>2026-06-25T18:07:08+00:00</updated>
<author>
<name>dev</name>
</author>
<published>2026-06-25T18:07:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.iamfabulous.de/hnimdbbot/commit/?id=fa742660190a7d3b7b6f068565ce543d413edbab'/>
<id>fa742660190a7d3b7b6f068565ce543d413edbab</id>
<content type='text'>
- Query Wikidata SPARQL in batches of 30 for entries missing wiki_article
- Store wiki_article title in imdb table
- Respect rate limits with configurable delay and retry on 5xx/429
- Skip entries that have no Wikipedia article
- Removed unique constraint on wiki_article (multiple entries can share one)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Query Wikidata SPARQL in batches of 30 for entries missing wiki_article
- Store wiki_article title in imdb table
- Respect rate limits with configurable delay and retry on 5xx/429
- Skip entries that have no Wikipedia article
- Removed unique constraint on wiki_article (multiple entries can share one)
</pre>
</div>
</content>
</entry>
</feed>
