The tension between cloud-dependent AI tools and developer privacy has become one of the defining debates in AI-assisted software development. Services like GitHub Copilot and Cursor offer impressive code completion capabilities, but they require sending your code to external servers. For developers working on proprietary code, in regulated industries, or simply preferring not to share their work product with cloud services, this is a non-starter. The answer is local AI, and Twinny is one of the best ways to access it.
Twinny is a free, open-source VS Code extension that brings local LLM inference directly into your editor. It connects to Ollama – the popular local model runner – and provides AI code completion and chat assistance without any data ever leaving your machine. No subscription, no rate limits, no cloud dependency. Just a local model running on your hardware, integrated into your development workflow.
The experience is remarkably similar to GitHub Copilot. As you type, Twinny suggests completions in ghost text. You can press Tab to accept, or continue typing to refine. The inline chat panel lets you ask questions about your code, request refactoring, or generate new code – all running through a local model that costs nothing to operate after the initial download. The quality depends on which model you choose, and the community has converged on several excellent options that rival cloud-based solutions for most everyday coding tasks.
Feature Comparison
Twinny provides a comprehensive set of AI-assisted coding features through its VS Code integration:
| Feature | Twinny (Local) | GitHub Copilot | Cursor |
|---|---|---|---|
| Code Completions | Inline ghost text | Inline ghost text | Inline ghost text |
| Chat | Side panel + inline | Side panel | Built-in |
| Privacy | Fully local | Cloud-dependent | Cloud-dependent |
| Model Choice | Any Ollama model | OpenAI proprietary | GPT-4 / Claude |
| Cost | Free | $10/month | $20/month |
| Rate Limits | None | Yes (hourly) | Yes (per usage tier) |
| Offline | Yes (with downloaded models) | No | No |
| Custom Prompts | User-defined | Limited | Limited |
Twinny Workflow
The following diagram illustrates how Twinny processes a code completion request through the local stack:
sequenceDiagram
participant VS as VS Code Editor
participant TW as Twinny Extension
participant Ollama as Ollama Server
participant Model as Local LLM Model<br>(e.g., CodeGemma 7B)
participant GPU as GPU / CPU
VS->>TW: User types code (key event)
TW->>TW: Extract context (current file, cursor position)
TW->>TW: Build prompt from context and prefix
TW->>Ollama: POST /api/generate (prompt, model, context)
Ollama->>Model: Load/keep model in memory
Model->>GPU: Execute inference
GPU-->>Model: Generated completion tokens
Model-->>Ollama: Stream completion tokens
Ollama-->>TW: Stream response
TW->>TW: Parse completion, filter quality
TW->>VS: Show ghost text suggestion
VS->>VS: User presses Tab to accept
VS->>TW: Completion acceptedEach keystroke triggers this pipeline, which typically completes in 200-500ms on a modern GPU with a 7B model. The Twinny extension handles context extraction, prompt construction, and result filtering, while Ollama manages the model lifecycle and inference.
Recommended Models for Twinny
The quality of Twinny’s output depends heavily on model selection:
| Model | Parameters | Code Quality | Speed | VRAM | Best For |
|---|---|---|---|---|---|
| CodeGemma | 7B | Excellent | Fast | 6GB | General code completion |
| DeepSeek-Coder | 6.7B | Excellent | Fast | 6GB | Complex code generation |
| StarCoder2 | 7B | Very Good | Fast | 6GB | Multi-language support |
| Qwen 2.5 Coder | 7B | Very Good | Fast | 6GB | Chinese + English code |
| Llama 3.1 | 8B | Good | Moderate | 8GB | Chat + general coding |
| Qwen 2.5 Coder | 14B | Excellent | Moderate | 12GB | High-quality completions |
Getting Started
To start using Twinny, first install Ollama and download a model, then install the Twinny extension in VS Code:
# Install Ollama and download a code model
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull codegemma
# Then search for "Twinny" in VS Code Extensions panel
The Twinny GitHub repository provides comprehensive setup guides, configuration tips, and community discussions about model preferences and performance tuning.
FAQ
What is Twinny?
Twinny is a free, open-source VS Code extension that provides AI code completion and chat using locally running LLMs via Ollama. It is a privacy-first alternative to GitHub Copilot with no cloud dependencies, no data leaving your machine, and no subscription fees.
How does Twinny compare to GitHub Copilot?
Twinny is fully local while Copilot is cloud-based. Twinny completions depend on the quality of your local model (Llama 3, Qwen, CodeGemma, etc.) rather than Copilot’s proprietary OpenAI model. Twinny offers unlimited completions with no subscription, while Copilot has a $10/month fee and rate limits.
What models work best with Twinny?
For code completion, specialized code models like CodeGemma, StarCoder2, and DeepSeek-Coder perform best. For chat-based assistance, general-purpose models like Llama 3 and Qwen 2.5 work well. Models with 7B parameters typically strike the best balance of quality and speed.
What hardware do I need for Twinny?
For 7B parameter models: 8GB RAM (16GB+ recommended), any modern CPU, and a GPU with 6GB+ VRAM (Apple Silicon works well). For larger models, more memory and GPU power are needed. Even without a GPU, reasonable completions are possible with 4-bit quantized models on CPU.
Is Twinny really free?
Yes, Twinny is completely free and open source under the MIT license. There is no subscription, no usage limits, and no data collection. The only cost is the electricity for running your local model, and the storage space for model weights.
Further Reading
- Twinny GitHub Repository – Source code, releases, and community discussions
- Ollama Official Site – Local LLM runner that powers Twinny’s inference
- Ollama Model Library – Available models for local inference
- CodexBar for macOS – Another local AI coding tool for the macOS menu bar
無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分!