Local AI on a GTX 1060: What Could Go Wrong?

NetworkChuck made it look so easy. Just install some TUI tools, run a few commands, and boom – you're an AI wizard!

I watched his video about TUI AI tools and thought: "This is it. This is my moment to dive into AI." He wasn't just showing off Ollama running models in the terminal - he was demonstrating this sick TUI called opencode that could turn your terminal into an AI coding wizard. The interface looked incredible. Clean. Professional. Like having an AI pair programmer right there in your terminal.

But like most things in tech, what looks simple on YouTube becomes a three-week rabbit hole of frustration, learning, and eventual enlightenment.

The Privacy Awakening

Here's the thing about being a developer in 2025: AI is everywhere. ChatGPT this, Copilot that, Gemini everything else. But I had a problem. A big one.

Privacy.

Every time I wanted to ask an AI about my code or bounce ideas around, I'd hesitate. What if my proprietary work ends up in some training dataset? What if my brilliant (okay, mediocre) ideas get leaked? What if my embarrassing debugging questions become public knowledge?

The paranoia was real. So when NetworkChuck showed off those slick terminal-based AI tools running locally, thats what I wanted. Local LLMs! Privacy! All under my control.

Time to get my hands dirty.

Enter Ollama: The Local LLM Promise

I'd heard whispers about Ollama in developer circles. "Run LLMs locally," they said. "It's easy," they said. "Just download and go," they said.

Spoiler alert: They lied. Well, not exactly lied, but they definitely glossed over some crucial details.

My plan was simple:

  1. Install Ollama on my homelab
  2. Download some models
  3. Set up opencode for that sweet TUI experience
  4. Become an AI wizard
  5. ???
  6. Profit

What could go wrong?

The Hardware Reality Check

My homelab setup: A trusty VM running on Proxmox with a GTX 1060 6GB. Not exactly cutting-edge, but hey, it had worked for everything else I'd thrown at it. Gaming, video encoding, crypto mining back in the day – this little GPU was a trooper.

Surely it could handle some AI models, right?

Right?

GPU Passthrough: The First Boss Battle

Getting GPU passthrough working in Proxmox is like solving a Rubik's cube blindfolded while riding a unicycle. On fire. In a thunderstorm and I love adventures like this!

I spent days wrestling with IOMMU groups, VFIO drivers, and kernel parameters. Every time I thought I had it figured out, something would break. The VM wouldn't start. The GPU wouldn't initialize. The drivers would throw tantrums.

# This command became my best friend and worst enemy
lspci -nnv | grep -A 12 VGA

After what felt like a lifetime of forum diving and config file tweaking, I finally got the GTX 1060 passed through to my Ubuntu VM. Victory! Time to install Ollama and start my AI journey.

Or so I thought.

Welcome to AI Terminology Hell

Installing Ollama was actually straightforward:

curl -fsSL https://ollama.ai/install.sh | sh

But then came the fun part: choosing a model. And that's when I realized I'd entered a whole new world of acronyms and jargon that made Kubernetes look simple.

GGUF? Quantization? 7B, 13B, 70B parameters? What the hell is a "parameter" anyway? Why are there so many versions of the same model? What's the difference between llama2:7b and llama2:7b-chat?

I felt like a noob again. And not the good kind of noob where you're excited to learn – the overwhelming kind where you question every life choice that led you to this moment.

Thank god for resources like this AI glossary that helped me decode the madness. Turns out:

  • Parameters: Think of them as the "brain cells" of the AI model
  • 7B/13B/70B: Billion parameters (more = smarter but hungrier)
  • Quantization: Compression to make models smaller and faster
  • GGUF: A file format for storing models efficiently

Armed with this half-baked knowledge, I made my first attempt:

ollama pull llama2:13b

And waited. And waited. And waited some more.

The Great Model Download of 2025

Downloading a 13B model on my internet connection was like watching paint dry in slow motion. We're talking gigabytes of data, /s and my upload-optimized fiber connection suddenly felt like dial-up.

After an hour of downloading, I got an error. Out of disk space. Of course.

Let me try something smaller:

ollama pull llama2:7b

Success! Finally, I had a working local LLM. Time to test it out:

ollama run llama2:7b

The model loaded (eventually), and I asked my first question: "How do I optimize a Python function for better performance?"

And then I waited. And waited. The GTX 1060 was working hard – I could hear the fans spinning up – but the response was... glacial.

Thirty seconds later, I got a decent answer. But thirty seconds! For a simple question! This wasn't the snappy AI experience I was used to from the cloud services.

But hey, at least Ollama was working. Time for the real prize: opencode!

opencode: The Real Goal All Along

Remember that slick TUI interface from NetworkChuck's video? That was OpenCode.ai – and this was what I'd been working toward this whole time. Not just chatting with models in a basic terminal interface like some kind of AI hermit. Nope!

opencode is like having an AI pair programmer right in your terminal. It can read your files, understand your codebase, suggest improvements, and even write code for you. And the best part? It can connect to local Ollama models for that sweet, sweet privacy I was craving.

The setup is actually brilliant:

  • Ollama serves the models (the backend muscle)
  • opencode provides the interface (the frontend magic)
  • You get privacy AND a fantastic coding experience

Perfect, right? Right!

Installing opencode

After the GPU passthrough nightmare, I was cautiously optimistic about installing opencode. Surely this couldn't be as painful as VFIO drivers and IOMMU groups, right?

curl -fsSL https://opencode.ai/install | bash

Of course it's a Node.js app. Because when isn't it?

But you know what? The installation actually went smooth as butter. No kernel panics, no driver issues, no existential crises. Just a simple install and we're good to go.

I was suspicious. This felt too easy.

Connecting to Ollama

Here's where NetworkChuck's video really shined. He made the opencode configuration crystal clear. Unlike my GPU passthrough adventures, this step actually went smoothly because he explained exactly what needed to be done.

The setup is straightforward: opencode needs a config file to talk to Ollama. NetworkChuck walked through this step perfectly, showing exactly where the config goes and what it should contain.

Following his instructions, I created the config file at ~/.config/opencode/opencode.json to tell OpenCode where to find my Ollama models.

Here's my actual config:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://<ollama-server>:11434/v1"
      },
      "models": {
        "granite4-3b": {
          "id": "granite4:3b",
          "name": "granite4:3b"
        },
        "mistral-7b": {
          "id": "mistral:7b",
          "name": "mistral-7b"
        },
        "all-minilm": {
          "id": "hf.co/leliuga/all-MiniLM-L6-v2-GGUF:Q4_K_M",
          "name": "all-minilm"
        },
        "stable-code": {
          "id": "stable-code:3b",
          "name": "stable-code-3b"
        }
      }
    }
  }
}

This config does a few crucial things:

  • Points OpenCode to my local Ollama instance running on port 11434
  • Maps the model names to their actual Ollama IDs
  • Tells OpenCode which models I have available locally

Once the config was in place, connecting was seamless:

# Start OpenCode
opencode

And use /model after launch to see my local models. Ready to server my as their master.

And just like that, I had it! A TUI interface that could:

  • Browse my project files
  • Analyze my code
  • Suggest improvements
  • Generate new functions
  • Debug issues
  • All while using my local Ollama models

The terminal interface was clean, responsive, and actually usable. No more typing questions into a basic chat interface and waiting for responses. This felt like a real development tool.

The setup portion was complete and working flawlessly. But now came the real test: actually using it for development work.

And that's where things got... interesting.

The First Real Test

Time to put this beautiful setup to the test! I opened up one of my Homelab project and let opencode analyze it. The interface was intuitive – I could navigate files, select code blocks, and ask specific questions about my implementation.

"Analyze this my server module and suggest optimizations," I typed, highlighting a particularly gnarly piece of code I'd written during a late-night coding session.

I hit enter and waited for the magic to happen.

And waited.

And waited some more.

Reality hit. Hard.

The Performance Reality

Here's what nobody tells you about running LLMs on consumer hardware: it's slow. Really slow. Especially when you're limited to 7B models because your GPU only has 6GB of VRAM.

Every question was a test of patience. Complex queries would take minutes. And forget about having a flowing conversation – by the time the AI responded, I'd forgotten what I asked.

But the real kicker? No tool support. No code execution. No web browsing. No file analysis. Just basic text generation at the speed of molasses.

I started to understand why everyone was using cloud services.

The Breaking Point

The final straw came during a debugging session. I had a gnarly Python error and wanted to bounce it off the AI. I pasted the traceback, asked for help, and went to make coffee while I waited for the response.

When I came back ten minutes later, the model was still "thinking." The fans were screaming, the GPU was at 100%, and I was questioning my life choices.

That's when it hit me: I was spending more time waiting for my "privacy-focused" local AI than I would have spent just googling the answer or asking on Stack Overflow.

The irony was thick. In my quest for privacy and control, I'd created the most inefficient development workflow imaginable.

The Claude Conversion

Frustrated and slightly defeated, I decided to try Claude Pro. Just for comparison, I told myself. Just to see what I was missing.

I signed up, opened the interface, and asked the same Python question that had broken my homelab setup.

The response came back in seconds. Not minutes. Seconds.

And it wasn't just fast – it was comprehensive, well-formatted, and included code examples. It could analyze my files, execute code snippets, and even browse the web for additional context.

Pro-Tip: Don't forget to opt-out for the use of your chats and coding sessions in Claude. We still value our privacy here!

The Honest Truth About Local vs Cloud AI

Here's what I learned from my homelab AI adventure:

Local LLMs Are Great For:

  • Privacy-sensitive work: When you absolutely can't send data to the cloud.
  • Learning AI fundamentals: Understanding how models work under the hood
  • Offline scenarios: When internet isn't available
  • Experimentation: Playing with different models and parameters

Cloud AI Is Better For:

  • Daily development work: Speed and reliability matter
  • Complex tasks: Tool integration and advanced capabilities
  • Productivity: When you need answers now, not eventually
  • Cost efficiency: Your time is worth more than the subscription fee

The Hardware Reality Check:

  • Consumer GPUs struggle: Even a GTX 1060 is barely adequate for 7B models
  • Model size matters: Bigger models need serious hardware
  • Performance expectations: Local inference is slow compared to cloud services
  • Setup complexity: GPU passthrough and driver issues are real

Lessons Learned and Moving Forward

Am I abandoning local AI entirely? Not quite. I still have Ollama set up for privacy-sensitive tasks and experimentation. But for day-to-day development work, Claude Pro has become my go-to.

The key insight? Choose the right tool for the job. Sometimes that's a local model running on your homelab. Sometimes it's a cloud service optimized for speed and capability.

My Current AI Setup:

  • Claude Pro: Daily development, code review, complex problem-solving
  • Local Ollama: Privacy-sensitive work, AI learning experiments
  • ChatGPT: Quick questions and brainstorming

If You're Starting Your AI Journey:

Start with cloud services. Get familiar with AI capabilities and limitations before diving into the hardware rabbit hole. Once you understand what you need, then consider local options.

Understand your hardware. A GTX 1060 can run 7B models, but don't expect miracles. For serious local AI work, you need serious hardware.

Privacy isn't binary. You can use cloud AI for general questions and local AI for sensitive work. It's not all or nothing.

Time is money. Calculate the cost of your time waiting for local inference versus paying for cloud services. The math might surprise you.

The Continuing Journey

My AI adventure is far from over. I'm now doing a deep dive into AI technology, understanding model architectures, training processes, and the fundamental limitations of current systems.

The homelab experiment taught me more about AI than any tutorial could. Sometimes the best way to understand technology is to struggle with it firsthand.

And hey, at least I can say I've run LLMs on a GTX 1060. That's got to count for something, right?

For any questions about local AI setups or my current workflow, checkout my homelab repository or ask me directly. I'm always happy to share the pain... I mean, knowledge!

Happy Coding!