Parsing Wildberries/OZON/Avito on VPS: anti-ban, proxies, Selenium

For effective scraping of Wildberries, OZON, and Avito on a VPS, a server with at least 2 vCPUs, 4 GB of RAM, and the use of residential proxies with rotation is required to bypass IP blocks and TLS fingerprints — the cost of such a configuration starts from $10–15 per month, and when using browser engines (Selenium/Playwright), memory requirements increase to 8 GB and above.

Choosing Server Resources for Wildberries VPS Scraping

Scraping performance directly depends on the chosen architecture and server resources. For Wildberries VPS scraping, the server should have a high CPU frequency (from 2.5 GHz), as the deserialization of large JSON responses and the operation of headless browsers create a significant load on the CPU. When choosing a plan, you should focus on the volume of data: a basic instance is sufficient for processing 100,000 product cards per day, but monitoring millions of positions in real-time will require a cluster of several VPS units.

Technical Server Requirements

If you plan to use libraries like requests or curl_cffi, RAM consumption will be minimal (about 100-200 MB per thread). However, modern marketplaces actively use dynamic content rendering and complex protection scripts, forcing developers to run full-fledged browsers in headless mode. In this case, each Chrome or Firefox process consumes between 150 and 400 MB of RAM.

Scraping Scale	Recommended VPS	Tech Stack	Approximate Price
Small (up to 50k requests/day)	2 vCPU, 4 GB RAM, 40 GB NVMe	Python, curl_cffi, SQLite	$10 - $15 / mo
Medium (500k requests/day)	4 vCPU, 8 GB RAM, 80 GB NVMe	Playwright, Redis, Postgres	$25 - $40 / mo
Enterprise (5M+ requests/day)	8+ vCPU, 16+ GB RAM, 160 GB NVMe	Distributed Scrapy, Kubernetes	from $70 / mo

Location and Network Latency

For scraping Russian marketplaces (Wildberries, Ozon, Avito), it is optimal to choose a VPS in locations close to their main data centers (Moscow, Saint Petersburg, Kazakhstan) or in European regions with good connectivity. Minimal ping allows for faster TCP connection establishment, which is critical when using thousands of short requests. If you encounter geo-blocking, the problem is solved not by changing the VPS location, but by using high-quality proxies.

Technology Stack: Selenium vs. Playwright and curl_cffi

The choice of tool determines development speed and the probability of getting banned. Ozon scraping and Wildberries scraping are practically impossible today using the standard requests library, as it does not support the simulation of TLS fingerprints of modern browsers, which is instantly detected by protection systems like Cloudflare or DataDome.

Why Playwright is Replacing Selenium

Playwright by Microsoft is considered the industry standard for browser automation. Unlike Selenium, it works via the CDP (Chrome DevTools Protocol), which provides higher speed and stability. Playwright supports automatic waiting for elements, working with multiple contexts (tabs) in a single browser instance, and has built-in tools to bypass automation detection.

Speed: Playwright is faster due to asynchrony (the asyncio library in Python).
Emulation: Easy configuration of User-Agent, screen resolutions, and geolocation.
Stealth Mode: There are plugins (e.g., playwright-stealth) that spoof navigator.webdriver values and other parameters that give away a bot.

Using curl_cffi for High-Speed Requests

If the marketplace allows data retrieval via an API (even a private one), using a browser is overkill. However, standard HTTP clients reveal themselves at the TLS Handshake level. The curl_cffi library allows you to mimic the JA3 fingerprints of real browsers (Chrome, Safari, Firefox), which is critical for the Wildberries API. This allows you to perform thousands of requests per second from a single VPS, consuming 10-20 times fewer resources than Playwright.

Looking for a reliable server for your projects?

VPS from $10/mo and dedicated servers from $9/mo with NVMe, DDoS protection, and 24/7 support.

View offers →

from curl_cffi import requests

# Simulating a request from Chrome version 120
response = requests.get(
    "https://card.wb.ru/cards/v1/detail?nm=12345678",
    impersonate="chrome120"
)
print(response.json())

Anti-ban Scraper: Strategies to Bypass Marketplace Protection

A modern anti-ban scraper is a set of measures aimed at ensuring the marketplace server cannot distinguish your script from a real customer using an iPhone or MacBook. Protection systems analyze hundreds of parameters: from IP address to mouse cursor movement speed and font loading order.

Rotation of Residential and Mobile Proxies

Using datacenter IPs for scraping Wildberries or Avito is a sure path to a captcha or a permanent ban. Marketplaces see that requests are coming from hosting provider subnets. The solution is residential proxies (IPs of real home users) or mobile proxies (IPs of cellular operators). For Avito VPS scraping, mobile proxies are especially effective because thousands of real people can be on a single operator IP address simultaneously, and banning such an address would harm regular users.

To manage a proxy pool on a VPS, an intermediary service is often set up (for example, Privoxy or specialized rotators in Python), which changes the exit IP for each new request or session.

Managing Fingerprints and Headers

In addition to the IP, it is necessary to randomize headers. It is important to maintain logical consistency: if the User-Agent specifies Windows, the sec-ch-ua-platform header must also be Windows. For storing and securely using credentials from proxy services and API keys, it is recommended to use Self-hosted Bitwarden / Vaultwarden, which prevents sensitive information from leaking from the code.

Rotate User-Agents from an up-to-date list (no older than 2-3 months).
Use the correct header order (H2/H3 priorities).
Emulate behavior: pause between clicks, scroll the page, and do not load images to save traffic if it doesn't interfere with rendering.

Working with Wildberries API and Ozon Scraping via Hidden Endpoints

Many developers make the mistake of trying to scrape the frontend part of the site (HTML code), which changes constantly. It is much more stable to work with internal APIs used by mobile apps or web interfaces to retrieve data. Examining the Network tabs in DevTools allows you to find endpoints that return clean JSON.

Reverse Engineering Wildberries Requests

The Wildberries API is characterized by the use of many subdomains (card.wb.ru, catalog.wb.ru, etc.). Data on prices, warehouse stock, and product characteristics arrive in a structured form. The main difficulty lies in forming the correct request parameters, such as appType, curr, and dest (regional stock binding).

When scraping Ozon, the situation is more complicated: they actively use parameter obfuscation and dynamic tokens. Often, you have to combine approaches: use Playwright to obtain valid cookies and tokens, and then pass them to the fast curl_cffi for mass data collection.

Specifics of Avito VPS Scraping

Avito is one of the most difficult platforms to scrape. They apply strict limits on viewing phone numbers and actively use behavioral analysis. For Avito VPS scraping, it is critically important to mimic real sessions: log in (if contact collection is needed), "wander" through other categories, and only then open the target listing. To automate such complex scenarios and notifications about new listings, you can deploy Self-hosted n8n, which will link your scraper to a Telegram bot or CRM.

Data Processing and Storage on VPS

When your scraper collects millions of rows, writing to CSV files becomes a bottleneck. For efficient data management on a VPS, a relational database optimized for writing (Write-Intensive) is required.

PostgreSQL and Schema Optimization

PostgreSQL is the best choice for storing scraping results. To speed up searches by product characteristics (e.g., searching for similar models), you can use the pgvector extension. To learn how to choose the right solution, read the article Vector DB on VPS: pgvector vs Qdrant vs Weaviate. Proper indexing by SKU and scraping time will allow you to quickly build price change charts.

Example table structure for prices:

CREATE TABLE product_history (
    id SERIAL PRIMARY KEY,
    sku VARCHAR(20) NOT NULL,
    marketplace VARCHAR(20),
    price DECIMAL(10, 2),
    stock INTEGER,
    parsed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_sku_timestamp ON product_history(sku, parsed_at);

Task Queues and Load Balancing

To prevent the scraper from "crashing" when a single request fails, use task queues (Celery, RQ, or Redis Streams). A VPS allows you to run Redis as a message broker. This makes it possible to distribute tasks among different workers: one worker collects product links, ten others download the data. This architecture provides fault tolerance: if the marketplace temporarily blocks one IP, the task simply returns to the queue and will be executed by another worker via a different proxy.

Monitoring and Automation of Scraping Processes

Scraping is a process that always breaks. Marketplaces change layouts, update protection algorithms, and proxy providers go down. Without a monitoring system, you will only find out about a data collection halt several days later.

Error Tracking with Sentry

Instead of manually checking logs on the VPS, install an error tracking system. Self-hosted Sentry allows you to receive real-time notifications about blocks, changes in the Wildberries JSON structure, or headless browser crashes. This saves dozens of hours of debugging.

Containerization and CI/CD

Deploying a scraper via Docker simplifies dependency management. You don't need to manually install Chrome and drivers on the VPS — everything is packaged into an image. Using Docker Compose allows you to bring up the entire infrastructure with one command: the scraper, Postgres database, Redis, and Grafana monitoring panel.

services:
  scraper:
    build: .
    depends_on:
      - postgres
      - redis
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres/db
  postgres:
    image: postgres:15
    volumes:
      - pgdata:/var/lib/postgresql/data
volumes:
  pgdata:

For long-term projects, it is important to have documentation and a knowledge base for your scraper's architecture. Self-hosted Outline / BookStack can help with this, where the team can record changes in marketplace APIs and anti-ban system settings.

Conclusions

For stable scraping of Wildberries, Ozon, and Avito, choose a VPS with sufficient RAM (from 4-8 GB) and use modern libraries like Playwright or curl_cffi to bypass TLS fingerprinting. Be sure to implement residential proxy rotation and an error monitoring system to minimize the risk of blocks and downtime in data collection.

Ready to choose a server?

VPS and dedicated servers in 72+ countries with instant activation and full root access.

Start Now →