summaryrefslogtreecommitdiff
path: root/src/wikidata.go
AgeCommit message (Collapse)AuthorFilesLines
27 hoursrefactor: pipeline SPARQL and wiki data in paralleldev1-40/+169
- Merge fetchWikiArticles + fetchWikiArticlesData into one pipeline - SPARQL producer fetches batches, commits each to DB, forwards resolved articles - Wiki data consumer runs concurrently, fetching at 2s/request - Each SPARQL batch commits independently (no global transaction) - Rate limits respected for both Wikidata SPARQL and wiki server - No parallel requests to either endpoint
29 hoursfix: decode wiki article names for clean storagedev1-1/+1
- wikidata.go: url.PathUnescape SPARQL titles before storing - wikiarticle.go: PathUnescape on read, PathEscape on send - DB holds decoded names; URLs are always freshly encoded
34 hoursfix: skip already-classified entries in wikidata querydev1-1/+1
Add has_no_wiki_article = 0 filter so entries previously marked as having no Wikipedia article are not re-queried on subsequent runs.
34 hoursfeat: set has_no_wiki_article flag for entries without Wikipedia articledev1-13/+34
- Mark entries as has_no_wiki_article=1 when Wikidata returns no result - Also mark entries in batches that failed with HTTP errors - Re-run populated 2705 wiki articles, 592 marked as no wiki
35 hoursfeat: fetch Wikipedia article titles via Wikidata SPARQLdev1-0/+239
- Query Wikidata SPARQL in batches of 30 for entries missing wiki_article - Store wiki_article title in imdb table - Respect rate limits with configurable delay and retry on 5xx/429 - Skip entries that have no Wikipedia article - Removed unique constraint on wiki_article (multiple entries can share one)