Roadmap
Segmentation roadmap
SeekDeep's inpaint pipeline currently uses CLIPSeg (under 200 MB, fast startup, fits in a general VRAM budget). It works but has hard limits on resolution, semantic overlap, and small-object detection. A future upgrade path lifts segmentation quality dramatically — but adds 1.5-3 GB+ of models and complex install deps.
▸ Current behavior · CLIPSeg
Model: CIDAS/clipseg-rd64-refined. Takes a text query (e.g. "the wizard") and returns a low-resolution heatmap of pixel probabilities. The heatmap is upscaled and thresholded (> 0.3) into a binary mask. Mask bounds extend via MaxFilter(21) dilation and GaussianBlur(8) feathering.
Limitations
Internal resolution is 64×64 — jagged edges, missing thin features (fingers, poles). Often segments multiple similar objects together. Very small objects fail to excite the text encoder and return empty masks.
Future · SAM + GroundingDINO
GroundingDINO generates high-quality bounding boxes from text. Segment Anything Model (SAM) uses those boxes as prompts to produce pixel-perfect instance masks. Expected: sharp edges, distinct objects, robust small-object detection.
▸ DEFERRED RATIONALE — SAM + GroundingDINO combined add 1.5-3 GB+ of models, complex dependency chains (torchvision, supervision, custom CUDA extensions), and routinely exceed consumer GPU VRAM when running alongside SDXL + chat + vision. Mask preview (@SeekDeep mask preview <target>) is a stepping stone — debug CLIPSeg locally before spending diffusion cycles.
Persistent memory · shipped
No longer parked. Persistent memory shipped in v10.35.x — data/user-facts.json is the on-disk store, atomic-written by both the Discord bot and the /memory/* HTTP endpoints. The web playground uses web-owner as the default user-id so single-user setups work without auth.
▸ How it works now
Facts live in data/user-facts.json. Writes are atomic (temp-then-rename) so concurrent writers from the bot and the GUI bound to last-writer-wins instead of corruption. Memory injection into prompts is plain inclusion — no semantic search yet — capped at 25 facts / 10 KB per user with oldest-first LRU pruning.
API · live
GET /memory/users
GET /memory/user/<id>
POST /memory/user/<id>/fact
PATCH /memory/user/<id>/fact/<n>
DELETE /memory/user/<id>/fact/<n>
Plus /memory/presets/<id> for the four KNOWN_PRESETS quick-toggles. Same routes power the Discord /remember /recall /forget commands.
Still open
Semantic search / summarization on retrieval is not yet implemented — prompt injection is "include them all" up to the cap. For under ~25 short facts this is fine; for richer profiles we\'d want a small local embedding model. Filed under QoL wishlist, not blocking.