Introduction
Running Large Language Models directly on your primary workstation sounds convenient until your laptop fans start sounding like a jet engine and simple tasks suffer from system lag. Offloading heavy inference to a dedicated remote server (like a home lab server with a high-end GPU, an older gaming rig, or a cloud VPS) frees up your local machine while letting you keep the polished user interface of desktop applications.
Table of Contents
Connecting LM Studio on your client laptop to a remote Ollama instance gives you the best of both worlds. Your thin client remains cool, quiet, and responsive, while the heavy lifting happens elsewhere. In this guide, we walk through configuring your server host, securing the network connection, setting up the client, and optimizing performance. Before choosing your stack, you can also review our comparison of vLLM vs llama.cpp vs Ollama benchmarks to understand the performance profiles of each inference engine.
Architectural overview
The setup is straightforward. Instead of running the inference engine locally, the client interface communicates with the server via its HTTP API. The remote server runs the background service and holds all the model files. The client workstation sends token payloads and receives the text stream back, using negligible local RAM and CPU.
Step 1: Configuring the Ollama host server
First, you need Ollama running on your server. If you are setting this up on Linux, run the standard installation command:
curl -fsSL https://ollama.com/install.sh | sh
Setting host environment variables
By default, Ollama binds to localhost (127.0.0.1) on port 11434. To accept connections from other computers, you must configure the service to listen on all interfaces (0.0.0.0).
On Linux (systemd)
Edit the systemd service file:
sudo systemctl edit ollama.service
Add these lines to configure the environment variable:
[Service] Environment="OLLAMA_HOST=0.0.0.0"
Save the file, reload the systemd manager, and restart the service to apply changes:
sudo systemctl daemon-reload sudo systemctl restart ollama
On Windows
If your server runs Windows, open the Start menu, search for "Environment Variables", and choose "Edit the system environment variables". Under User variables, click "New", set the variable name to OLLAMA_HOST and the value to 0.0.0.0. Close Ollama from the system tray and restart it.
Verifying the service is accessible
Test the endpoint from another machine on your local network to verify the configuration:
curl http://<server-ip>:11434
If it returns "Ollama is running", the server is listening correctly.
Step 2: Securing the connection
Warning: Exposing port 11434 directly to the public internet without authentication allows anyone to access your GPU, consume your power budget, or download models. You must secure this traffic.
Option A: SSH Tunneling (Recommended for Home Labs)
An SSH tunnel creates an encrypted link between your client machine and the server. This allows you to bind Ollama to localhost on the server, keeping the port completely closed to the outside world.
Run this command on your client laptop to map your local port 11434 to the server:
ssh -L 11434:localhost:11434 user@server-ip
Leave this terminal window open. Your local applications can now talk to the remote server by sending requests directly to localhost.
Option B: Private Overlay Networks (VPN)
If you want a connection that does not require keeping a terminal window open, use a service like Tailscale. Install Tailscale on both the server and your client laptop. Once they are on the same virtual network, you can bind Ollama to your server's Tailscale IP instead of 0.0.0.0.
For example, on Linux, set the variable to your specific Tailscale address:
Environment="OLLAMA_HOST=100.x.y.z"
This restricts connection access only to devices on your private VPN.
Step 3: Connecting LM Studio to your remote server
With the secure connection established, launch LM Studio on your client laptop.
- Configure the Endpoint:
Click on the Settings icon in LM Studio. Navigate to the API / Server section. Under custom provider settings, set the target address. If you are using the SSH Tunnel method, set the address to
http://localhost:11434. If you are using a VPN like Tailscale, use the server's private network IP (e.g.,http://100.x.y.z:11434). - Verify Model List:
When you select the model dropdown in LM Studio, it should automatically query the remote server and show the list of models you have downloaded on the host. Select the model you want to run.
- Model Management:
Remember that the models must be downloaded and stored on the remote server, not your client laptop. To pull a new model, run
ollama pull modelnamein the server terminal. Once the download completes, the client will immediately recognize it.
Unified AI agent instruction frameworks
Stop setting up APIs from scratch. The Skills File System Playbook shows you how to design version-controlled instruction structures that any AI agent or coding assistant can load on demand.
ACCESS_BLUEPRINTTroubleshooting and performance
Connection refused
If your client cannot reach the server, double-check that you restarted the service after editing environment variables. You should also check your server firewall settings. On Linux, ensure that UFW permits traffic on port 11434 if you are connecting directly over a VPN:
sudo ufw allow 11434/tcp
VRAM allocation and GPU usage
To verify that Ollama is utilizing your remote GPU rather than falling back to the CPU, run the monitoring tool on the server during an active request:
nvidia-smi
You should see the ollama_llama_server process in the GPU process table, along with associated VRAM allocation. If it is missing, verify that your CUDA drivers are correctly installed on the server.
Network bottlenecks
If you are running this over a wireless connection, you might notice input lag. Time-to-first-token is sensitive to network latency. Using a wired ethernet connection for your server or connecting via a low-overhead network topology helps keep latency minimal.
Frequently asked questions
Is my traffic encrypted when using a remote Ollama server?
By default, Ollama's HTTP API is unencrypted. If you expose it directly to the network using 0.0.0.0, your prompts and model outputs travel in plain text. You should route your connection through an SSH Tunnel or use a VPN like Tailscale to encrypt all traffic.
Can I host Ollama on a Linux server and connect from a Windows/macOS client running LM Studio?
Yes. Operating systems do not need to match. As long as your client workstation can reach the server's IP and port 11434 is accessible through your secure tunnel or network, the setup will work correctly.
Why is LM Studio saying "Connection Refused" when I enter my server IP?
This typically indicates that the host service is still bound only to localhost (check your OLLAMA_HOST environment configuration), a firewall is blocking incoming packets on port 11434, or you did not reload and restart the daemon after changing settings.
Can multiple clients connect to the same remote Ollama server?
Yes, the server accepts multiple connections. However, Ollama processes requests in a queue. If multiple users send queries at the same time, they will experience a drop in generation speed as the server handles requests sequentially.
Do I need to download models on both the server and client?
No. The models reside only on the remote server. Your local machine running LM Studio acts as a thin client, sending text prompts and receiving text streams, which keeps your laptop storage and RAM free.
Offloading model execution to a remote server lets you run heavy parameters without turning your desk into a furnace. Configure the host, set up a secure pipe, and keep your workspace quiet.
Distributed local inference
Read next: once the remote connection is working, use our vLLM vs llama.cpp vs Ollama benchmarks guide to decide which inference engine configuration best fits your hardware and workload before committing to a stack.