latest / Building an agentic AI in Ruby
You’ve probably had a conversation with an LLM by now and walked away thinking that was nice, but it can’t actually do anything. A chatbot answers from what it already knows. An agent is different — it decides which tool to use, calls it, reads the result, and keeps going until the task is done. Same model underneath; very different behaviour around it.
This post walks through building one in Ruby. No frameworks, no gems beyond the standard library, no hand-waving. About 200 lines all in. We’ll talk to a local Ollama instance so you can run the whole thing on a laptop with no API keys.
The finished project is on disk as rupi — Ruby agentic AI. If you
want to follow along, the layout is:
rupi/
├── bin/rupi # CLI entry point
└── lib/
├── rupi.rb # top-level requires
└── rupi/
├── agent.rb # the loop
├── ollama_client.rb
├── tool_registry.rb
├── errors.rb
└── tools/
├── file_reader.rb
└── web_search.rb
What’s an “agentic” AI, anyway?
A regular LLM call looks like this: you send a prompt, you get text back. The model’s only job is to predict the next token. It can’t fetch a webpage, read a file, or do arithmetic it didn’t already memorise during training.
An agent wraps that LLM in a loop. Each iteration the model sees the task, what’s happened so far, and a list of tools it’s allowed to use. It picks one, the loop executes it, the result is appended to the conversation, and the model goes again. Eventually the model decides it has enough and emits a final answer.
That’s the whole trick.
The ReAct pattern
The shape we’ll use is called ReAct — Reason, Act, Observe. It’s a five-step cycle:
┌──────────────────────────────┐
│ Task │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 1. Reason (LLM thinks) │◀──┐
└──────────────┬───────────────┘ │
│ │
▼ │
┌──────────────────────────────┐ │
│ 2. Pick a tool + input │ │
└──────────────┬───────────────┘ │
│ │
▼ │
┌──────────────────────────────┐ │
│ 3. Act (run the tool) │ │
└──────────────┬───────────────┘ │
│ │
▼ │
┌──────────────────────────────┐ │
│ 4. Observe (capture result) │───┘
└──────────────┬───────────────┘
│ (when satisfied)
▼
┌──────────────────────────────┐
│ 5. Final answer │
└──────────────────────────────┘
We get the LLM to emit a JSON object every turn. Two shapes:
{ "thought": "I should read the Gemfile", "tool_name": "read_file", "input": "Gemfile" }
…to use a tool, or:
{ "final_answer": "The Gemfile has no runtime deps." }
…to stop. The loop dispatches on whichever key is present.
Step 1: A tool is anything that quacks like one
In a typed language you’d start by declaring an interface. Ruby
doesn’t need that — anything that responds to #name,
#description, and #call(input) is a tool. A plain class works.
A lambda works. A Struct works.
Here are the two we’ll ship. FileReader is real (it reads files);
WebSearch is a stub so you can see the pattern without wiring up an
actual search API.
# lib/rupi/tools/file_reader.rb
module Rupi
module Tools
class FileReader
def name = 'read_file'
def description = 'Reads the contents of a file. Input: file path.'
def call(input)
path = input.to_s.strip
return 'Error: empty file path' if path.empty?
File.read(path)
rescue Errno::ENOENT
"Error: file not found: #{path}"
rescue Errno::EACCES
"Error: permission denied: #{path}"
rescue Errno::EISDIR
"Error: path is a directory: #{path}"
end
end
end
end
# lib/rupi/tools/web_search.rb
module Rupi
module Tools
class WebSearch
def name = 'web_search'
def description = 'Searches the web for information. Input: search query.'
def call(input)
"Search results for #{input.to_s.strip.inspect}: " \
'Found relevant information related to the query.'
end
end
end
end
Two things worth pointing out:
-
The descriptions aren’t documentation, they’re prompts. The agent will paste these straight into the model’s context as “the menu of tools you can use.” Write them like you’d write a function signature for a colleague.
-
Tool failures don’t raise — they return error strings. When
File.readblows up, the model needs to see “file not found” in the next observation so it can adjust (try a different path, give up, ask the user). An exception would kill the loop.
Step 2: A registry to find tools by name
The agent gets a JSON tool_name back from the LLM and needs to find
the matching object. That’s a Hash — but wrapping it in a small
class lets us mix in Enumerable for free, which keeps the agent
code clean later.
# lib/rupi/tool_registry.rb
module Rupi
class ToolRegistry
include Enumerable
def initialize
@tools = {}
end
def register(tool)
@tools[tool.name] = tool
self
end
def fetch(name)
@tools.fetch(name) { raise ToolNotFound, "no tool named #{name.inspect}" }
end
def include?(name) = @tools.key?(name)
def names = @tools.keys
def empty? = @tools.empty?
def each(&) = @tools.each_value(&)
end
end
register returns self so you can chain. fetch raises a typed
exception (defined below) so the agent can rescue it specifically.
While we’re here, the error classes:
# lib/rupi/errors.rb
module Rupi
class Error < StandardError; end
class MaxIterationsExceeded < Error; end
class ToolNotFound < Error; end
class OllamaError < Error; end
end
One base class so callers can rescue Rupi::Error. The CLI does
exactly that.
Step 3: Talking to Ollama
Ollama exposes an HTTP API at http://localhost:11434. To generate
text we POST to /api/generate with a JSON body containing the model
name and the prompt. The response is JSON with a "response" field
holding the model’s text.
Net::HTTP from the standard library is all we need.
# lib/rupi/ollama_client.rb
module Rupi
class OllamaClient
DEFAULT_URL = 'http://localhost:11434'
DEFAULT_MODEL = 'gemma3'
attr_reader :base_url, :model
def initialize(base_url: DEFAULT_URL, model: DEFAULT_MODEL,
open_timeout: 10, read_timeout: 300)
@base_url = base_url
@model = model
@open_timeout = open_timeout
@read_timeout = read_timeout
end
def generate(prompt)
uri = URI.join(@base_url, '/api/generate')
req = Net::HTTP::Post.new(uri, 'Content-Type' => 'application/json')
req.body = JSON.generate(model: @model, prompt: prompt, stream: false)
res = Net::HTTP.start(uri.hostname, uri.port,
use_ssl: uri.scheme == 'https',
open_timeout: @open_timeout,
read_timeout: @read_timeout) do |http|
http.request(req)
end
unless res.is_a?(Net::HTTPSuccess)
raise OllamaError, "ollama returned HTTP #{res.code}: #{res.body}"
end
JSON.parse(res.body).fetch('response') do
raise OllamaError, "ollama response missing 'response' field"
end
rescue JSON::ParserError => e
raise OllamaError, "could not parse ollama response: #{e.message}"
rescue Errno::ECONNREFUSED, Errno::EHOSTUNREACH, SocketError => e
raise OllamaError, "could not reach ollama at #{@base_url}: #{e.message}"
rescue Net::OpenTimeout, Net::ReadTimeout => e
raise OllamaError, "ollama request timed out: #{e.message}"
end
end
end
The four rescue clauses at the bottom convert every plausible
network failure into a single Rupi::OllamaError with a useful
message. The agent doesn’t care why the LLM is unreachable, only
that it is.
This is also the only place that knows about Ollama. If you want to
swap in OpenAI or Anthropic later, write a class with the same
#generate(prompt) method and inject it instead. Duck typing again.
Step 4: The agent loop
This is the heart of the project — about 80 lines.
# lib/rupi/agent.rb
module Rupi
class Agent
DEFAULT_MAX_ITERATIONS = 10
attr_reader :registry, :max_iterations
def initialize(llm_client:, tools: [], max_iterations: DEFAULT_MAX_ITERATIONS,
logger: nil)
@llm_client = llm_client
@max_iterations = max_iterations
@logger = logger
@registry = ToolRegistry.new
tools.each { |tool| @registry.register(tool) }
end
def run(task)
history = ["Task: #{task}"]
@max_iterations.times do |i|
log "iteration #{i + 1}/#{@max_iterations}"
response = @llm_client.generate(build_prompt(history))
log "llm response: #{response}"
parsed = parse_response(response)
case parsed[:type]
when :final
return parsed[:answer]
when :tool
history << "Thought: #{parsed[:thought]}" unless parsed[:thought].empty?
history << "Action: #{parsed[:name]}"
history << "Action Input: #{parsed[:input]}"
history << "Observation: #{invoke_tool(parsed[:name], parsed[:input])}"
when :invalid
history << "Error: #{parsed[:reason]}. Respond with a single JSON object."
end
end
raise MaxIterationsExceeded,
"agent could not complete the task in #{@max_iterations} iterations"
end
# ... private methods below
end
end
#run is the whole loop. We keep a history array of strings, ask
the LLM for a response, parse it, and either return, run a tool, or
record an error and try again. The cap on iterations is critical —
without it a confused model will loop forever.
The prompt template lives in build_prompt. Each turn we paste in
the full history plus the tool menu:
def build_prompt(history)
<<~PROMPT
You are a helpful AI agent that completes tasks by reasoning and using tools.
Available tools:
#{tool_descriptions}
To use a tool, respond with a single JSON object on its own line:
{"thought": "<your reasoning>", "tool_name": "<#{tool_names.join(' | ')}>", "input": "<input string for the tool>"}
When you have enough information to answer the task, respond with:
{"final_answer": "<your final answer>"}
Respond with the JSON object only — no surrounding prose.
Conversation so far:
#{history.join("\n")}
Your response:
PROMPT
end
def tool_descriptions
@registry.map { |t| "- #{t.name}: #{t.description}" }.join("\n")
end
def tool_names = @registry.names
Now the parser. Smaller models love to wrap JSON in prose (“Sure!
Here’s what I’d do:”) or markdown fences ( json ... ). A
strict JSON.parse(response) would fail on every one of those. So we
scan for the first balanced {...} substring and only parse that:
def parse_response(text)
json = extract_json(text)
return { type: :invalid, reason: 'no JSON object found in response' } unless json
data = JSON.parse(json)
if data.key?('final_answer')
{ type: :final, answer: data['final_answer'].to_s }
elsif data.key?('tool_name') && data.key?('input')
{ type: :tool,
name: data['tool_name'].to_s,
input: data['input'].to_s,
thought: data['thought'].to_s }
else
{ type: :invalid,
reason: "JSON must contain either 'final_answer' or both 'tool_name' and 'input'" }
end
rescue JSON::ParserError => e
{ type: :invalid, reason: "could not parse JSON: #{e.message}" }
end
def extract_json(text)
depth = 0
start = nil
in_string = false
escape = false
text.each_char.with_index do |ch, i|
if in_string
if escape then escape = false
elsif ch == '\\' then escape = true
elsif ch == '"' then in_string = false
end
next
end
case ch
when '"' then in_string = true
when '{' then start = i if depth.zero?; depth += 1
when '}'
depth -= 1
return text[start..i] if depth.zero? && start
end
end
nil
end
The extract_json scanner tracks quoting and escapes so a { inside
a string literal doesn’t confuse it. It returns the first complete
top-level object, which is what an LLM emits 99% of the time.
Finally, invoking a tool. Notice we swallow ToolNotFound and feed
the error back to the model — same idea as the FileReader rescues.
The model gets to see “you asked for a tool I don’t have, here’s the
list” and can recover.
def invoke_tool(name, input)
@registry.fetch(name).call(input).to_s
rescue ToolNotFound
"Error: unknown tool #{name.inspect}. Available: #{tool_names.join(', ')}"
end
def log(message)
@logger&.info(message)
end
Putting it together
bin/rupi is the executable. It reads the task from ARGV, picks
up the Ollama URL and model from the environment, wires the
components together, and prints the answer.
#!/usr/bin/env ruby
# frozen_string_literal: true
$LOAD_PATH.unshift File.expand_path('../lib', __dir__)
require 'logger'
require 'rupi'
task = ARGV.first || 'Read the Gemfile in this folder and tell me its content and meaning.'
client = Rupi::OllamaClient.new(
base_url: ENV.fetch('OLLAMA_URL', Rupi::OllamaClient::DEFAULT_URL),
model: ENV.fetch('OLLAMA_MODEL', Rupi::OllamaClient::DEFAULT_MODEL)
)
logger = Logger.new($stderr, level: ENV.fetch('RUPI_LOG', 'info'))
agent = Rupi::Agent.new(
llm_client: client,
tools: [Rupi::Tools::WebSearch.new, Rupi::Tools::FileReader.new],
logger: logger
)
begin
puts agent.run(task)
rescue Rupi::Error => e
warn "rupi: #{e.message}"
exit 1
end
Pull a small model with Ollama first:
ollama pull gemma3
ollama serve # if it isn't already running
Then run it:
$ bin/rupi "what's in the Gemfile?"
I, iteration 1/10
I, llm response: {"thought": "...", "tool_name": "read_file", "input": "Gemfile"}
I, iteration 2/10
I, llm response: {"final_answer": "The Gemfile sources from rubygems.org..."}
The Gemfile sources from rubygems.org and declares two development
dependencies: rake and rubocop. There are no runtime dependencies.
That’s the whole thing.
Important considerations
What we’ve built is a working toy. To put one of these in front of real users, four things matter more than the code itself.
Token budgets
Every iteration appends to the history, and the entire history is sent on every LLM call. A 10-iteration loop with verbose tool outputs can hit thousands of tokens per request very quickly. For local Ollama that’s just latency; for paid APIs it’s the bill.
Cheap wins:
- Trim long tool outputs (
output[0, 4_000]) before stuffing them back into history. - Drop old
Thought:lines once an observation supersedes them. - Summarise older history every N iterations instead of replaying it verbatim.
Safety guardrails
The FileReader here will happily read /etc/passwd if the model
asks. That’s fine in a CLI you’re running yourself; it’s a security
hole the moment this is exposed to anyone else. Restrict tool inputs
at the tool boundary, not in the prompt:
def call(input)
path = File.expand_path(input.to_s.strip, '/your/sandbox')
return 'Error: path escapes sandbox' unless path.start_with?('/your/sandbox/')
File.read(path)
end
Same principle for shell, network, and database tools. The model is an untrusted caller. Treat it like one.
Logging agent decisions
The logger: parameter on Agent.new is there for a reason. When an
agent does something weird in production, you need to be able to
replay the conversation — the task, every prompt sent, every
response received, every tool input and output. Logging it as it
happens is much cheaper than reconstructing it later.
In rupi you can crank up verbosity with RUPI_LOG=debug bin/rupi
"…" and see every prompt and response on stderr.
Error handling
Three categories, three different responses:
- The model emits malformed JSON. Push an error line into the
history and let it try again. That’s what
:invaliddoes. - A tool fails. Return an error string from
#call. The model reads it as an observation and decides what to do. - The LLM service is unreachable, or you’ve hit
max_iterations. Raise aRupi::Error. There’s nothing useful left to do; bail.
The split is between failures the model can react to (give it another turn) and failures it can’t (stop).
Wrapping up
We’ve gone from “an LLM is a text predictor” to “an LLM is the brain of a tool-using agent” in about 200 lines. The pieces are small:
- A duck-typed
Tool— anything that responds toname,description,call. - A
ToolRegistrythat’s basically aHashwith a typed lookup error. - An
OllamaClientthat wrapsNet::HTTP. - An
Agentthat loops, parses, dispatches, and stops.
The interesting work isn’t any one component — it’s the protocol between the LLM and the runtime. We chose JSON-with-two-shapes; ReAct prompting; a robust JSON extractor for sloppy models; tool failures that return strings instead of raising. Every framework out there (LangChain, smolagents, Pydantic AI) is doing some variation of the same dance.
If you want to extend this:
- Real tools. Calculator, HTTP fetcher, shell runner with allow-listed commands, sqlite query.
- Multi-step planning. Have the model emit a plan first, then execute it step by step.
- Streaming. Set
stream: trueon the Ollama request and parse partial JSON as it arrives. - Multiple agents. One supervisor agent dispatching to specialist sub-agents, each with its own tool set.
The code from this post is in rupi. It’s about 200 lines,
has 19 unit tests, no runtime dependencies, and runs against any
Ollama-served model. Steal whatever’s useful.