A Discord bot that runs Llama-3.1, Qwen-VL, and SDXL on your own GPU. Includes archive threads, persona overrides, role-routed chat, image-edit pipelines, and live VRAM monitoring — with no outbound API calls.
No OpenAI, no Anthropic, no telemetry. Chat history, archive threads, generated images, and search queries remain on the machine running the bot.
Routes pin specific models: quality chats run Mistral-Nemo, reasoning runs Phi-4, lightweight fallback runs Gemma-3n. A task-LRU policy bounds VRAM use.
Live GPU monitor, thrashing warnings, 4-bit quantization, configurable system reserve, and a single-flight lock for safe model swaps under load.
Trigger with @SeekDeep or a slash command. web:auto defers to the router; web:always forces a SearXNG round-trip. Conversation context is held in a rolling buffer, scoped per user and per channel.


Dreamshaper-XL at 28 steps with the dpmsolver++ scheduler and prompt refinement via the pinned chat model. Edits chain through img2img, InstructPix2Pix, CLIPSeg-masked inpaint, and Lanczos upscale.
Every command reports GPU and VRAM use. When chat, image, and vision can't co-reside, the task-LRU evicts the coldest model and surfaces the decision in the response.
Drop the repo wherever you want it. No system installs.
git clone seekdeep && cd seekdeep
PowerShell does the venv, the npm install, and copies .env.default to .env for you.
./setup_local.ps1
Paste your bot's DISCORD_TOKEN into .env. Optional admin IDs go below it.
DISCORD_TOKEN=...
The launcher brings up SearXNG, the AI server, and the bot in the right order.
./seekdeep_launcher.bat