Home Latest

latest / Building an agentic AI in Ruby

You’ve probably had a conversation with an LLM by now and walked away thinking that was nice, but it can’t actually do anything. A chatbot answers from what it already knows. An agent is different — it decides which tool to use, calls it, reads the result, and keeps going until the task is done. Same model underneath; very different behaviour around it.

This post walks through building one in Ruby. No frameworks, no gems beyond the standard library, no hand-waving. About 200 lines all in. We’ll talk to a local Ollama instance so you can run the whole thing on a laptop with no API keys.

The finished project is on disk as rupi — Ruby agentic AI. If you want to follow along, the layout is:

rupi/
├── bin/rupi              # CLI entry point
└── lib/
    ├── rupi.rb           # top-level requires
    └── rupi/
        ├── agent.rb      # the loop
        ├── ollama_client.rb
        ├── tool_registry.rb
        ├── errors.rb
        └── tools/
            ├── file_reader.rb
            └── web_search.rb

What’s an “agentic” AI, anyway?

A regular LLM call looks like this: you send a prompt, you get text back. The model’s only job is to predict the next token. It can’t fetch a webpage, read a file, or do arithmetic it didn’t already memorise during training.

An agent wraps that LLM in a loop. Each iteration the model sees the task, what’s happened so far, and a list of tools it’s allowed to use. It picks one, the loop executes it, the result is appended to the conversation, and the model goes again. Eventually the model decides it has enough and emits a final answer.

That’s the whole trick.

The ReAct pattern

The shape we’ll use is called ReAct — Reason, Act, Observe. It’s a five-step cycle:

                 ┌──────────────────────────────┐
                 │           Task               │
                 └──────────────┬───────────────┘
                                │
                                ▼
                ┌──────────────────────────────┐
                │  1. Reason   (LLM thinks)    │◀──┐
                └──────────────┬───────────────┘   │
                               │                   │
                               ▼                   │
                ┌──────────────────────────────┐   │
                │  2. Pick a tool + input      │   │
                └──────────────┬───────────────┘   │
                               │                   │
                               ▼                   │
                ┌──────────────────────────────┐   │
                │  3. Act     (run the tool)   │   │
                └──────────────┬───────────────┘   │
                               │                   │
                               ▼                   │
                ┌──────────────────────────────┐   │
                │  4. Observe (capture result) │───┘
                └──────────────┬───────────────┘
                               │  (when satisfied)
                               ▼
                ┌──────────────────────────────┐
                │  5. Final answer             │
                └──────────────────────────────┘

We get the LLM to emit a JSON object every turn. Two shapes:

{ "thought": "I should read the Gemfile", "tool_name": "read_file", "input": "Gemfile" }

…to use a tool, or:

{ "final_answer": "The Gemfile has no runtime deps." }

…to stop. The loop dispatches on whichever key is present.

Step 1: A tool is anything that quacks like one

In a typed language you’d start by declaring an interface. Ruby doesn’t need that — anything that responds to #name, #description, and #call(input) is a tool. A plain class works. A lambda works. A Struct works.

Here are the two we’ll ship. FileReader is real (it reads files); WebSearch is a stub so you can see the pattern without wiring up an actual search API.

# lib/rupi/tools/file_reader.rb
module Rupi
  module Tools
    class FileReader
      def name        = 'read_file'
      def description = 'Reads the contents of a file. Input: file path.'

      def call(input)
        path = input.to_s.strip
        return 'Error: empty file path' if path.empty?

        File.read(path)
      rescue Errno::ENOENT
        "Error: file not found: #{path}"
      rescue Errno::EACCES
        "Error: permission denied: #{path}"
      rescue Errno::EISDIR
        "Error: path is a directory: #{path}"
      end
    end
  end
end

# lib/rupi/tools/web_search.rb
module Rupi
  module Tools
    class WebSearch
      def name        = 'web_search'
      def description = 'Searches the web for information. Input: search query.'

      def call(input)
        "Search results for #{input.to_s.strip.inspect}: " \
          'Found relevant information related to the query.'
      end
    end
  end
end

Two things worth pointing out:

The descriptions aren’t documentation, they’re prompts. The agent will paste these straight into the model’s context as “the menu of tools you can use.” Write them like you’d write a function signature for a colleague.
Tool failures don’t raise — they return error strings. When File.read blows up, the model needs to see “file not found” in the next observation so it can adjust (try a different path, give up, ask the user). An exception would kill the loop.

Step 2: A registry to find tools by name

The agent gets a JSON tool_name back from the LLM and needs to find the matching object. That’s a Hash — but wrapping it in a small class lets us mix in Enumerable for free, which keeps the agent code clean later.

# lib/rupi/tool_registry.rb
module Rupi
  class ToolRegistry
    include Enumerable

    def initialize
      @tools = {}
    end

    def register(tool)
      @tools[tool.name] = tool
      self
    end

    def fetch(name)
      @tools.fetch(name) { raise ToolNotFound, "no tool named #{name.inspect}" }
    end

    def include?(name) = @tools.key?(name)
    def names          = @tools.keys
    def empty?         = @tools.empty?

    def each(&) = @tools.each_value(&)
  end
end

register returns self so you can chain. fetch raises a typed exception (defined below) so the agent can rescue it specifically.

While we’re here, the error classes:

# lib/rupi/errors.rb
module Rupi
  class Error < StandardError; end

  class MaxIterationsExceeded < Error; end
  class ToolNotFound          < Error; end
  class OllamaError           < Error; end
end

One base class so callers can rescue Rupi::Error. The CLI does exactly that.

Step 3: Talking to Ollama

Ollama exposes an HTTP API at http://localhost:11434. To generate text we POST to /api/generate with a JSON body containing the model name and the prompt. The response is JSON with a "response" field holding the model’s text.

Net::HTTP from the standard library is all we need.

# lib/rupi/ollama_client.rb
module Rupi
  class OllamaClient
    DEFAULT_URL   = 'http://localhost:11434'
    DEFAULT_MODEL = 'gemma3'

    attr_reader :base_url, :model

    def initialize(base_url: DEFAULT_URL, model: DEFAULT_MODEL,
                   open_timeout: 10, read_timeout: 300)
      @base_url     = base_url
      @model        = model
      @open_timeout = open_timeout
      @read_timeout = read_timeout
    end

    def generate(prompt)
      uri = URI.join(@base_url, '/api/generate')
      req = Net::HTTP::Post.new(uri, 'Content-Type' => 'application/json')
      req.body = JSON.generate(model: @model, prompt: prompt, stream: false)

      res = Net::HTTP.start(uri.hostname, uri.port,
                            use_ssl: uri.scheme == 'https',
                            open_timeout: @open_timeout,
                            read_timeout: @read_timeout) do |http|
        http.request(req)
      end

      unless res.is_a?(Net::HTTPSuccess)
        raise OllamaError, "ollama returned HTTP #{res.code}: #{res.body}"
      end

      JSON.parse(res.body).fetch('response') do
        raise OllamaError, "ollama response missing 'response' field"
      end
    rescue JSON::ParserError => e
      raise OllamaError, "could not parse ollama response: #{e.message}"
    rescue Errno::ECONNREFUSED, Errno::EHOSTUNREACH, SocketError => e
      raise OllamaError, "could not reach ollama at #{@base_url}: #{e.message}"
    rescue Net::OpenTimeout, Net::ReadTimeout => e
      raise OllamaError, "ollama request timed out: #{e.message}"
    end
  end
end

The four rescue clauses at the bottom convert every plausible network failure into a single Rupi::OllamaError with a useful message. The agent doesn’t care why the LLM is unreachable, only that it is.

This is also the only place that knows about Ollama. If you want to swap in OpenAI or Anthropic later, write a class with the same #generate(prompt) method and inject it instead. Duck typing again.

Step 4: The agent loop

This is the heart of the project — about 80 lines.

# lib/rupi/agent.rb
module Rupi
  class Agent
    DEFAULT_MAX_ITERATIONS = 10

    attr_reader :registry, :max_iterations

    def initialize(llm_client:, tools: [], max_iterations: DEFAULT_MAX_ITERATIONS,
                   logger: nil)
      @llm_client     = llm_client
      @max_iterations = max_iterations
      @logger         = logger
      @registry       = ToolRegistry.new
      tools.each { |tool| @registry.register(tool) }
    end

    def run(task)
      history = ["Task: #{task}"]

      @max_iterations.times do |i|
        log "iteration #{i + 1}/#{@max_iterations}"

        response = @llm_client.generate(build_prompt(history))
        log "llm response: #{response}"

        parsed = parse_response(response)
        case parsed[:type]
        when :final
          return parsed[:answer]
        when :tool
          history << "Thought: #{parsed[:thought]}" unless parsed[:thought].empty?
          history << "Action: #{parsed[:name]}"
          history << "Action Input: #{parsed[:input]}"
          history << "Observation: #{invoke_tool(parsed[:name], parsed[:input])}"
        when :invalid
          history << "Error: #{parsed[:reason]}. Respond with a single JSON object."
        end
      end

      raise MaxIterationsExceeded,
            "agent could not complete the task in #{@max_iterations} iterations"
    end

    # ... private methods below
  end
end

#run is the whole loop. We keep a history array of strings, ask the LLM for a response, parse it, and either return, run a tool, or record an error and try again. The cap on iterations is critical — without it a confused model will loop forever.

The prompt template lives in build_prompt. Each turn we paste in the full history plus the tool menu:

def build_prompt(history)
  <<~PROMPT
    You are a helpful AI agent that completes tasks by reasoning and using tools.

    Available tools:
    #{tool_descriptions}

    To use a tool, respond with a single JSON object on its own line:
      {"thought": "<your reasoning>", "tool_name": "<#{tool_names.join(' | ')}>", "input": "<input string for the tool>"}

    When you have enough information to answer the task, respond with:
      {"final_answer": "<your final answer>"}

    Respond with the JSON object only — no surrounding prose.

    Conversation so far:
    #{history.join("\n")}

    Your response:
  PROMPT
end

def tool_descriptions
  @registry.map { |t| "- #{t.name}: #{t.description}" }.join("\n")
end

def tool_names = @registry.names

Now the parser. Smaller models love to wrap JSON in prose (“Sure! Here’s what I’d do:”) or markdown fences ( json ... ). A strict JSON.parse(response) would fail on every one of those. So we scan for the first balanced {...} substring and only parse that:

def parse_response(text)
  json = extract_json(text)
  return { type: :invalid, reason: 'no JSON object found in response' } unless json

  data = JSON.parse(json)
  if data.key?('final_answer')
    { type: :final, answer: data['final_answer'].to_s }
  elsif data.key?('tool_name') && data.key?('input')
    { type: :tool,
      name: data['tool_name'].to_s,
      input: data['input'].to_s,
      thought: data['thought'].to_s }
  else
    { type: :invalid,
      reason: "JSON must contain either 'final_answer' or both 'tool_name' and 'input'" }
  end
rescue JSON::ParserError => e
  { type: :invalid, reason: "could not parse JSON: #{e.message}" }
end

def extract_json(text)
  depth     = 0
  start     = nil
  in_string = false
  escape    = false

  text.each_char.with_index do |ch, i|
    if in_string
      if    escape    then escape = false
      elsif ch == '\\' then escape = true
      elsif ch == '"' then in_string = false
      end
      next
    end

    case ch
    when '"' then in_string = true
    when '{' then start = i if depth.zero?; depth += 1
    when '}'
      depth -= 1
      return text[start..i] if depth.zero? && start
    end
  end

  nil
end

The extract_json scanner tracks quoting and escapes so a { inside a string literal doesn’t confuse it. It returns the first complete top-level object, which is what an LLM emits 99% of the time.

Finally, invoking a tool. Notice we swallow ToolNotFound and feed the error back to the model — same idea as the FileReader rescues. The model gets to see “you asked for a tool I don’t have, here’s the list” and can recover.

def invoke_tool(name, input)
  @registry.fetch(name).call(input).to_s
rescue ToolNotFound
  "Error: unknown tool #{name.inspect}. Available: #{tool_names.join(', ')}"
end

def log(message)
  @logger&.info(message)
end

Putting it together

bin/rupi is the executable. It reads the task from ARGV, picks up the Ollama URL and model from the environment, wires the components together, and prints the answer.

#!/usr/bin/env ruby
# frozen_string_literal: true

$LOAD_PATH.unshift File.expand_path('../lib', __dir__)

require 'logger'
require 'rupi'

task = ARGV.first || 'Read the Gemfile in this folder and tell me its content and meaning.'

client = Rupi::OllamaClient.new(
  base_url: ENV.fetch('OLLAMA_URL',   Rupi::OllamaClient::DEFAULT_URL),
  model:    ENV.fetch('OLLAMA_MODEL', Rupi::OllamaClient::DEFAULT_MODEL)
)

logger = Logger.new($stderr, level: ENV.fetch('RUPI_LOG', 'info'))

agent = Rupi::Agent.new(
  llm_client: client,
  tools:      [Rupi::Tools::WebSearch.new, Rupi::Tools::FileReader.new],
  logger:     logger
)

begin
  puts agent.run(task)
rescue Rupi::Error => e
  warn "rupi: #{e.message}"
  exit 1
end

Pull a small model with Ollama first:

ollama pull gemma3
ollama serve   # if it isn't already running

Then run it:

$ bin/rupi "what's in the Gemfile?"
I, iteration 1/10
I, llm response: {"thought": "...", "tool_name": "read_file", "input": "Gemfile"}
I, iteration 2/10
I, llm response: {"final_answer": "The Gemfile sources from rubygems.org..."}
The Gemfile sources from rubygems.org and declares two development
dependencies: rake and rubocop. There are no runtime dependencies.

That’s the whole thing.

Important considerations

What we’ve built is a working toy. To put one of these in front of real users, four things matter more than the code itself.

Token budgets

Every iteration appends to the history, and the entire history is sent on every LLM call. A 10-iteration loop with verbose tool outputs can hit thousands of tokens per request very quickly. For local Ollama that’s just latency; for paid APIs it’s the bill.

Cheap wins:

Trim long tool outputs (output[0, 4_000]) before stuffing them back into history.
Drop old Thought: lines once an observation supersedes them.
Summarise older history every N iterations instead of replaying it verbatim.

Safety guardrails

The FileReader here will happily read /etc/passwd if the model asks. That’s fine in a CLI you’re running yourself; it’s a security hole the moment this is exposed to anyone else. Restrict tool inputs at the tool boundary, not in the prompt:

def call(input)
  path = File.expand_path(input.to_s.strip, '/your/sandbox')
  return 'Error: path escapes sandbox' unless path.start_with?('/your/sandbox/')
  File.read(path)
end

Same principle for shell, network, and database tools. The model is an untrusted caller. Treat it like one.

Logging agent decisions

The logger: parameter on Agent.new is there for a reason. When an agent does something weird in production, you need to be able to replay the conversation — the task, every prompt sent, every response received, every tool input and output. Logging it as it happens is much cheaper than reconstructing it later.

In rupi you can crank up verbosity with RUPI_LOG=debug bin/rupi "…" and see every prompt and response on stderr.

Error handling

Three categories, three different responses:

The model emits malformed JSON. Push an error line into the history and let it try again. That’s what :invalid does.
A tool fails. Return an error string from #call. The model reads it as an observation and decides what to do.
The LLM service is unreachable, or you’ve hit max_iterations. Raise a Rupi::Error. There’s nothing useful left to do; bail.

The split is between failures the model can react to (give it another turn) and failures it can’t (stop).

Wrapping up

We’ve gone from “an LLM is a text predictor” to “an LLM is the brain of a tool-using agent” in about 200 lines. The pieces are small:

A duck-typed Tool — anything that responds to name, description, call.
A ToolRegistry that’s basically a Hash with a typed lookup error.
An OllamaClient that wraps Net::HTTP.
An Agent that loops, parses, dispatches, and stops.

The interesting work isn’t any one component — it’s the protocol between the LLM and the runtime. We chose JSON-with-two-shapes; ReAct prompting; a robust JSON extractor for sloppy models; tool failures that return strings instead of raising. Every framework out there (LangChain, smolagents, Pydantic AI) is doing some variation of the same dance.

If you want to extend this:

Real tools. Calculator, HTTP fetcher, shell runner with allow-listed commands, sqlite query.
Multi-step planning. Have the model emit a plan first, then execute it step by step.
Streaming. Set stream: true on the Ollama request and parse partial JSON as it arrives.
Multiple agents. One supervisor agent dispatching to specialist sub-agents, each with its own tool set.

The code from this post is in rupi. It’s about 200 lines, has 19 unit tests, no runtime dependencies, and runs against any Ollama-served model. Steal whatever’s useful.