Pipeline:
- write_picks() no longer truncates; deduplicates by artist+album
key on append so the same pick is never added twice
- Picks accumulate indefinitely; only the user can remove them
New endpoint POST /api/fgs/picks/remove:
- Accepts {artist, album}, removes matching pick from store
- Also writes the removed pick to the dedup DB so it won't resurface
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Parsing:
- Handle "Album | Artist — Label - Bandcamp" title format (common
Bandcamp search result pattern) — stops group at em-dash so label
name doesn't bleed into artist field
- clean_name() strips label suffixes from parsed tokens
- artist_from_url() now title-cases Bandcamp slug
- looks_like_bad_pick() checks album for pipes, broader regex for
'records'/'bandcamp' without word-boundary requirement
Sanitise pass (post-curator):
- Normalise obscurity to high/medium/low (dashboard badge values)
- Drop picks where artist field contains 'bandcamp'/'records'/pipe
- Detect when a review blog domain name was extracted as the artist;
attempt recovery from original search result or drop the pick
- Review domain blocklist: metalinjection, cvltnation, angrymetalguy,
nocleansinging, meatmeadmetal, decibelmag, and others
Dashboard fix:
- esc() now escapes single quotes (') to prevent broken onclick
attributes when album/why fields contain apostrophes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Dedup: mark only accepted picks as seen (not all prefiltered
candidates) — unselected items stay eligible for re-evaluation,
preventing pool exhaustion across runs
- Queries: expanded from 29 to 37+ with rotating 30-subgenre list,
25 label targets, 14 review sites; Bandcamp/MA queries skip
time_range for broader results; review sites use time_range:year
- Results per query: 15 → 25
- Prefilter: parallel batches of 35 (up to 3 concurrent), processes
all fresh candidates instead of just top 80; be-inclusive prompt
- Curator: cap 20 → 30, score floor 60 → 50, URL prefix matching
in provenance check instead of exact match
Result: 405 candidates/run vs 146 before; 88 passing prefilter vs 10;
pool stays at ~400 fresh on consecutive runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
FastAPI service replacing the 77-node n8n pipeline. Implements full
discovery pipeline: 29 rotating SearXNG queries, nomic-embed-text
scoring against Last.fm taste centroid, Mistral-nemo prefilter and
curator with provenance validation, SQLite dedup, writes to
metal-picks.json for the existing FGS dashboard.
Runs as systemd service on port 8766 (fgs-agent.home via Caddy).
n8n reduced to a 2-node schedule trigger → HTTP POST.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>