| Age | Commit message (Collapse) | Author | Files | Lines |
|
- Merge fetchWikiArticles + fetchWikiArticlesData into one pipeline
- SPARQL producer fetches batches, commits each to DB, forwards resolved articles
- Wiki data consumer runs concurrently, fetching at 2s/request
- Each SPARQL batch commits independently (no global transaction)
- Rate limits respected for both Wikidata SPARQL and wiki server
- No parallel requests to either endpoint
|
|
|
|
- One-shot migration decoded 187 percent-encoded rows
- Removed decode-on-read from wikiarticle.go (no longer needed)
- wikidata.go still decodes SPARQL URLs before storing (for future inserts)
- wikiarticle.go encodes on send via url.PathEscape
|
|
- wikidata.go: url.PathUnescape SPARQL titles before storing
- wikiarticle.go: PathUnescape on read, PathEscape on send
- DB holds decoded names; URLs are always freshly encoded
|
|
- wiki_article values are already URL-encoded in the DB
- Build query URL manually instead of url.Values.Encode()
- Only escape username (not pre-encoded)
|
|
- queryWikiArticle returns HTTP status code alongside entry data
- Always record wiki_status_code for every request (success or failure)
- Skip entries with wiki_status_code = 404 in future runs
- Only update data fields on HTTP 200; non-200 only records status
- Log line shows updated vs skipped (non-200) counts
|
|
- Retry up to 5 times on HTTP 429 with 2s/4s/8s/16s backoff
- Move inter-request delay before each request (was after)
- Increase base delay from 1s to 2s between requests
- Fix: only sleep after first request (skip delay on first call)
|
|
- Add wiki_server and wiki_username config fields
- Query custom server for each wiki_article entry
- Extract description, synopsis (Plot), year, poster_url, license,
license_url, num_accolades from structured JSON response
- Serial processing with 1 req/s rate limit
- Update only entries missing at least one target column
|