| Age | Commit message (Collapse) | Author | Files | Lines |
|
- New --log-level flag: debug (default info), info, silent
debug: every API request logged (method, URL, status, duration)
info: normal events (batch progress, entry counts, summaries)
silent: only warnings and fatal errors
- Replaced all log.Printf/Fatalf calls with level-gated helpers
- API request timing added to queryWikiArticle, queryWikidataBatch, downloadFile
- Retries and backoff logged in debug mode
|
|
- Guard section.has_parts type assertion in extractPeople
- Guard Cast section has_parts iteration with ok check
|
|
- Extract directors from infobox 'Directed by' field/list
- Extract screenwriters from infobox 'Screenplay by' list
- Extract actors from Cast section list (first link = person name)
- Upsert into people table, link via who table (profession: actor=1, director=2, screenwriter=3)
- Track processed entries with has_people flag column
- Consumer inserts people and marks has_people=1 on success
|
|
- Merge fetchWikiArticles + fetchWikiArticlesData into one pipeline
- SPARQL producer fetches batches, commits each to DB, forwards resolved articles
- Wiki data consumer runs concurrently, fetching at 2s/request
- Each SPARQL batch commits independently (no global transaction)
- Rate limits respected for both Wikidata SPARQL and wiki server
- No parallel requests to either endpoint
|
|
|
|
- One-shot migration decoded 187 percent-encoded rows
- Removed decode-on-read from wikiarticle.go (no longer needed)
- wikidata.go still decodes SPARQL URLs before storing (for future inserts)
- wikiarticle.go encodes on send via url.PathEscape
|
|
- wikidata.go: url.PathUnescape SPARQL titles before storing
- wikiarticle.go: PathUnescape on read, PathEscape on send
- DB holds decoded names; URLs are always freshly encoded
|
|
- wiki_article values are already URL-encoded in the DB
- Build query URL manually instead of url.Values.Encode()
- Only escape username (not pre-encoded)
|
|
- queryWikiArticle returns HTTP status code alongside entry data
- Always record wiki_status_code for every request (success or failure)
- Skip entries with wiki_status_code = 404 in future runs
- Only update data fields on HTTP 200; non-200 only records status
- Log line shows updated vs skipped (non-200) counts
|
|
- Retry up to 5 times on HTTP 429 with 2s/4s/8s/16s backoff
- Move inter-request delay before each request (was after)
- Increase base delay from 1s to 2s between requests
- Fix: only sleep after first request (skip delay on first call)
|
|
- Add wiki_server and wiki_username config fields
- Query custom server for each wiki_article entry
- Extract description, synopsis (Plot), year, poster_url, license,
license_url, num_accolades from structured JSON response
- Serial processing with 1 req/s rate limit
- Update only entries missing at least one target column
|