Deploying Ollama on a Jetson Nano for Private LLM Inference

Table of Contents

Running large language models locally is no longer reserved for beefy GPU rigs. With the right setup, a Jetson Nano can serve as a surprisingly capable inference endpoint for smaller models — completely air-gapped from the cloud.

Why edge inference matters

For security professionals, running models locally means:

  • No data exfiltration risk — queries never leave your network
  • Predictable latency — no API rate limits or outages
  • Full control — choose your model, quantization, and context length

Hardware requirements

ComponentSpecification
BoardNVIDIA Jetson Nano (4GB)
Storage64GB+ microSD (A2 rated)
Power5V 4A barrel jack (not USB)
CoolingActive fan recommended

Installation

First, flash JetPack 4.6 and update the system:

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl git

Install Ollama using the official script:

curl -fsSL https://ollama.com/install.sh | sh

Pull a quantized model that fits in 4GB VRAM:

ollama pull phi3:mini

Testing the setup

Run a quick inference test:

ollama run phi3:mini "Explain buffer overflow in 3 sentences"

You should see output within a few seconds. The Phi-3 mini model runs comfortably on the Nano with 4-bit quantization.

Exposing as a local API

Ollama exposes a REST API on port 11434 by default. You can query it from other machines on your network:

import requests

response = requests.post(
    "http://jetson-nano.local:11434/api/generate",
    json={
        "model": "phi3:mini",
        "prompt": "What is a reverse shell?",
        "stream": False
    }
)
print(response.json()["response"])

Performance notes

With the 4GB Jetson Nano, expect roughly 5-8 tokens/second on quantized 3B parameter models. Not fast enough for real-time chat, but perfectly adequate for batch analysis, automated report generation, or offline threat intelligence enrichment.

Next steps

In a follow-up post, I’ll cover integrating this with n8n for automated security workflow enrichment — having an LLM summarize alerts before they hit the SOC dashboard.