Self-hosted code copilot: Continue.dev + Ollama vs. Cursor

calendar_month May 08, 2026 schedule 8 min read visibility 19 views
person
Valebyte Team
Self-hosted code copilot: Continue.dev + Ollama vs. Cursor
To create a self-hosted copilot, the optimal solution is a combination of the Continue.dev extension in VS Code and an Ollama server with the DeepSeek-Coder-V2-Lite model, deployed on a VPS with at least 16 GB of RAM and a modern CPU. This allows you to completely eliminate the transfer of source code to third-party companies and save from $240 per year on subscriptions.

Why self-hosted copilot is becoming the standard for Enterprise development

Intellectual property security is the main driver for moving to local solutions. When using GitHub Copilot or Cursor, your code—even if encrypted or anonymized—is transmitted to Microsoft or Anthropic servers. For companies with strict security requirements (NDA, fintech, government sector), this is an unacceptable risk. Deploying a self-hosted copilot within your own perimeter on a dedicated server or VPS completely solves the data leakage problem.

Economic feasibility and independence

A subscription to Cursor Pro or GitHub Copilot costs an average of $20 per month per developer. In a team of 10 people, that's $2,400 annually. Renting a powerful VPS or dedicated server to serve the entire team will cost significantly less. Furthermore, you are not dependent on the pricing policies or sanction restrictions of Western providers.

Control over response quality

By using your own github copilot, you choose the model yourself. If you need to write in a rare programming language or a specific framework, you can connect a specialized small-parameter model or fine-tune an existing one. In cloud solutions, you are limited to what the vendor offers (usually Claude 3.5 Sonnet or GPT-4o).

Choosing a VPS for code llm self-hosted: processors, memory, and latency

The performance of an AI assistant directly depends on hardware power. For a comfortable autocomplete experience, latency should be minimal—ideally under 100-200 ms for generating the first batch of tokens. If you plan to run code llm self-hosted on a standard VPS without a GPU, the main focus should be on CPU clock speed and RAM volume.

Minimum and recommended system requirements

To run models from the DeepSeek-Coder or Llama 3 families in quantized form (4-bit or 5-bit), the following specifications are required:
  • CPU: Minimum 4 cores with AVX2 instruction support. The higher the clock speed (from 3.0 GHz), the faster the generation.
  • RAM: 8 GB for 7B models (minimum), 16-32 GB for comfortable operation and context caching.
  • Disk: NVMe SSD is mandatory, as model weights (4-10 GB) must be loaded into memory quickly.
  • Network: 100 Mbps channel or higher if the server is remote from the developer.
For more details on how neural networks work on standard servers, read our article Your own LLM on CPU VPS: Ollama + llama.cpp with 7B-13B models.

Comparison of models for code autocomplete

Models differ in the number of parameters and the quality of context understanding. For self-hosted solutions, the most common choices are:
  1. DeepSeek-Coder-V2-Lite (16B MoE): The leader in accuracy/speed ratio. Thanks to the Mixture of Experts (MoE) architecture, it runs fast even on mid-range CPUs.
  2. DeepSeek-Coder-6.7B: A classic for weaker servers. Occupies about 5 GB of RAM in 4-bit quantization.
  3. CodeLlama-7B/13B: Models from Meta; stable, but often inferior to DeepSeek in specific Python and JS tasks.
  4. StarCoder2: An excellent choice for multi-language support and working with very long context.

Looking for a reliable server for your projects?

VPS from $10/mo and dedicated servers from $9/mo with NVMe, DDoS protection, and 24/7 support.

View Offers →

Step-by-step installation of the continue dev ollama stack on a Linux server

The deployment process is greatly simplified thanks to the Ollama project. This tool packages complex neural network dependencies into a simple binary and provides an OpenAI-compatible API. The continue dev ollama combination allows you to turn a regular server into a powerful backend for AI development in 10 minutes.

Step 1: Installing Ollama on VPS

Connect to your server via SSH and run the command:
curl -fsSL https://ollama.com/install.sh | sh
After installation, check the service status:
systemctl status ollama

Step 2: Downloading models

We will need two models: one for chat (more powerful) and one for autocomplete (as fast as possible).
# Model for chat and refactoring
ollama pull deepseek-coder-v2:lite

# Model for autocomplete
ollama pull deepseek-coder:6.7b-base-q4_K_M

Step 3: Configuring API access

By default, Ollama only listens on localhost:11434. To allow the Continue.dev extension to reach the server, you need to permit external connections. Edit the service config:
sudo systemctl edit ollama.service
Add the following lines to the [Service] section:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=*"
Restart the service:
sudo systemctl daemon-reload
sudo systemctl restart ollama
If you plan to use the server for multiple tasks, such as working with documentation, check out the material Self-hosted ChatGPT alternative: OpenWebUI + Ollama + RAG in 30 minutes.

Configuring VS Code and the Continue.dev extension

Continue.dev is an open-source extension for VS Code and JetBrains, which is the most flexible tool for creating your own AI working environment. Unlike closed plugins, it allows you to fine-tune every aspect of the interaction with the model.

config.json configuration

After installing the extension in VS Code, open the config.json settings file (usually via the gear icon in the Continue panel). You need to specify your server address.
{
  "models": [
    {
      "title": "DeepSeek Coder V2 Lite",
      "provider": "ollama",
      "model": "deepseek-coder-v2:lite",
      "apiBase": "http://your-vps-ip:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "DeepSeek 6.7B Autocomplete",
    "provider": "ollama",
    "model": "deepseek-coder:6.7b-base-q4_K_M",
    "apiBase": "http://your-vps-ip:11434"
  },
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text",
    "apiBase": "http://your-vps-ip:11434"
  }
}

Using an SSH tunnel for security

If you don't want to open port 11434 to the entire internet, use SSH tunneling. This will ensure traffic encryption and key-based authorization. Command to forward the port from your local machine:
ssh -L 11434:localhost:11434 user@your-vps-ip
In this case, you can leave localhost:11434 in the Continue.dev config. This is especially relevant if you are migrating from cloud platforms. We wrote about the nuances of migration in the article Moving from AWS Lightsail/EC2 to dedicated: saving $500-2000/mo.

Comparing Continue.dev and Cursor: what to choose in 2025?

Cursor is a fork of VS Code with built-in AI. It is incredibly convenient "out of the box," but its closed nature and price force many to look for a cursor alternative. Continue.dev offers almost the same functionality but as a plugin that can be installed in a clean VS Code.
Feature Cursor (Pro Plan) Self-hosted (Continue + Ollama)
Cost $20 / mo per user VPS cost ($10-30 / mo for a team)
Privacy Data on Cursor/Anthropic servers 100% local on your server
Model Choice Claude 3.5, GPT-4o Any model from Ollama/HuggingFace library
Offline Work No Yes (in local network)
Code Indexing Cloud (Remote Indexing) Local (LanceDB / Vector DB)
Setup Complexity Zero (install and work) Medium (server setup required)

Functional differences

Cursor wins due to its "Composer" feature, which allows generating code across multiple files simultaneously. Continue.dev is actively catching up, introducing support for "Edit Mode" (Cmd+I / Ctrl+I), where the AI suggests edits directly in the current file. However, for full codebase indexing in Continue.dev, an external vector database might be required. Read about how to deploy one here: Vector DB on VPS: pgvector vs Qdrant vs Weaviate — what to choose.

Optimizing DeepSeek-Coder and Llama 3 for fast autocomplete

To ensure your self-hosted copilot doesn't lag, you need to optimize the inference process. The main problem with CPU generation is the speed of reading weights from memory.

Using quantization

Quantization reduces the precision of model weights from 16-bit to 4 or 5 bits. This reduces RAM requirements by 3-4 times and proportionally speeds up operation.
  • Q4_K_M: The optimal balance for most tasks. Accuracy loss is practically unnoticeable when writing code.
  • Q2_K: Maximum speed, but the model may start to get confused by syntax or produce hallucinations.

Context window parameters

In the Continue.dev config.json, you can limit the number of tokens the model sees "above" and "below" the cursor. For autocomplete on CPU, it is recommended to set:
"tabAutocompleteOptions": {
  "maxContextLength": 2048,
  "maxPromptTokens": 1024
}
This will significantly reduce the model's "thinking" time before providing a suggestion.

Economics of ownership: your own GitHub Copilot vs. subscriptions

Let's look at the real numbers. For a group of 3-5 developers, a single high-performance VPS with 8 vCPUs and 32 GB of RAM is sufficient. Such a server costs about $30-40 per month.
  1. Subscription costs: 5 people * $20 = $100 per month.
  2. Own server costs: $35 per month.
  3. Savings: $65 per month or $780 per year.
In addition, you get not just a Copilot, but a full-fledged server where you can deploy CI/CD, staging, or a corporate VPN. For those concerned about the security of access to their development tools, this guide will be useful: Your own VPN on VPS: VLESS Reality + Xray-core in 10 minutes.

Model tuning and context for improving code accuracy

To make your own github copilot understand the specifics of your project, Continue.dev uses the Context Providers mechanism. This allows you to "feed" the model not just the open file, but also:
  • Documentation from external URLs.
  • Terminal command execution results.
  • Project file structure.
  • Specific code snippets from other branches.
Using System Prompts also helps improve results. You can tell the model: "You are an expert in React and TypeScript, always use functional components and strict typing." This will force DeepSeek-Coder to produce cleaner code that meets your standards.

Conclusions

For maximum privacy and savings, choose the combination of Continue.dev and Ollama on a dedicated VPS, as it provides full control over data and allows you to use top-tier models like DeepSeek-Coder-V2 for free. If you need maximum productivity "out of the box" and are willing to pay $20/mo, Cursor remains the unsurpassed leader in UX quality but loses in flexibility for specific hardware configurations.

Ready to choose a server?

VPS and dedicated servers in 72+ countries with instant activation and full root access.

Start Now →

Share this post:

support_agent
Valebyte Support
Usually replies within minutes
Hi there!
Send us a message and we'll reply as soon as possible.