Building effective AI agents with Ruby

Over the past year, I’ve worked with teams building large language model (LLM) agents across various industries. Consistently, the most successful implementations are with simple, composable patterns.

What’s interesting is that while most examples online use Python or TypeScript, Ruby provides equally effective patterns for building agents. In fact, Ruby’s emphasis on simplicity and readability aligns perfectly with the principle of building the simplest solution that works.

In this post, I share what I’ve learned from reimplementing Anthropic’s agent patterns in Ruby, demonstrating that you don’t need Python to build effective agents.

What are agents?

“Agent” can be defined in several ways. Some define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks. Others use the term to describe more prescriptive implementations that follow predefined workflows. Following Anthropic’s framework, I categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents:

Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.

When (and when not) to use agents

Following Anthropic’s guidance: find the simplest solution possible, and only increase complexity when needed. This might mean not building agentic systems at all. Agentic systems often trade latency and cost for better task performance, and you should consider when this tradeoff makes sense.

When more complexity is warranted, workflows offer predictability and consistency for well-defined tasks, whereas agents are the better option when flexibility and model-driven decision-making are needed at scale. For many applications, optimizing single LLM calls with retrieval and in-context examples is usually enough.

When and how to use frameworks

There are many frameworks that make agentic systems easier to implement, including LangGraph from LangChain, Amazon Bedrock’s AI Agent framework, and others. However, these frameworks often create extra layers of abstraction that can obscure the underlying prompts and responses, making them harder to debug.

I suggest starting by using LLM APIs directly: many patterns can be implemented in a few lines of Ruby code. Ruby’s expressiveness makes this particularly straightforward. If you do use a framework, ensure you understand the underlying code.

My Ruby LLM cookbook demonstrates these patterns implemented directly with the RubyLLM gem, showing how clean and maintainable these implementations can be in Ruby.

Building blocks, workflows, and agents

In this section, I’ll explore the common patterns for agentic systems, starting with the foundational building block—the augmented LLM—and progressively increasing complexity, from simple compositional workflows to autonomous agents.

Building block: The augmented LLM

The basic building block of agentic systems is an LLM enhanced with augmentations such as retrieval, tools, and memory. Current models can actively use these capabilities—generating their own search queries, selecting appropriate tools, and determining what information to retain.

In Ruby, this looks clean and readable:

class AugmentedLLM
  def initialize
    @client = RubyLLM.new(model: "meta-llama/llama-4-scout")
    @tools = load_available_tools
    @memory = ConversationMemory.new
  end

  def call(prompt, context: {})
    enhanced_prompt = build_prompt_with_context(prompt, context)
    response = @client.chat(enhanced_prompt, tools: @tools)
    @memory.store(prompt, response)
    response
  end
end

I recommend focusing on two key aspects: tailoring these capabilities to your specific use case and ensuring they provide an easy, well-documented interface for your LLM.

Workflow: Prompt chaining

Prompt chaining decomposes a task into a sequence of steps, where each LLM call processes the output of the previous one. You can add programmatic checks on any intermediate steps to ensure that the process is still on track.

When to use this workflow: This workflow is ideal for situations where the task can be easily and cleanly decomposed into fixed subtasks. The main goal is to trade off latency for higher accuracy, by making each LLM call an easier task.

Examples where prompt chaining is useful:

Generating marketing copy, then translating it into a different language
Writing an outline of a document, checking that the outline meets certain criteria, then writing the document based on the outline

In my Ruby implementation, this becomes:

def marketing_workflow(product, target_language)
  # Step 1: Generate original copy
  copy = llm.call("Generate marketing copy for #{product}")
  return error unless valid_copy?(copy)
  
  # Step 2: Translate
  translated = llm.call("Translate this to #{target_language}: #{copy}")
  return error unless valid_translation?(translated)
  
  translated
end

Workflow: Routing

Routing classifies an input and directs it to a specialized followup task. This workflow allows for separation of concerns, and building more specialized prompts. Without this workflow, optimizing for one kind of input can hurt performance on other inputs.

When to use this workflow: Routing works well for complex tasks where there are distinct categories that are better handled separately, and where classification can be handled accurately.

Examples where routing is useful:

Directing different types of customer service queries (general questions, refund requests, technical support) into different downstream processes
Routing easy/common questions to smaller models and hard/unusual questions to more capable models to optimize cost and speed

The Ruby implementation in my cookbook shows how clean this pattern can be:

def route_support_query(message)
  classification = llm.call("Classify: #{message}. Return: general|refund|technical")
  
  case classification.strip.downcase
  when "general"
    GeneralSupportAgent.new.handle(message)
  when "refund"
    RefundAgent.new.handle(message)
  when "technical"
    TechnicalSupportAgent.new.handle(message)
  end
end

Workflow: Parallelization

LLMs can sometimes work simultaneously on a task and have their outputs aggregated programmatically. This workflow manifests in two key variations:

Sectioning: Breaking a task into independent subtasks run in parallel
Voting: Running the same task multiple times to get diverse outputs

When to use this workflow: Parallelization is effective when the divided subtasks can be parallelized for speed, or when multiple perspectives or attempts are needed for higher confidence results.

Examples where parallelization is useful:

Sectioning:

Implementing guardrails where one model instance processes user queries while another screens them for inappropriate content
Automating evals where each LLM call evaluates a different aspect of model performance

Voting:

Reviewing code for vulnerabilities, where several different prompts review and flag the code
Evaluating whether content is inappropriate, with multiple prompts evaluating different aspects

Ruby’s threading capabilities make this straightforward:

def parallel_analysis(document)
  tasks = [
    -> { analyze_sentiment(document) },
    -> { extract_keywords(document) },
    -> { generate_summary(document) }
  ]
  
  results = tasks.map { |task| Thread.new { task.call } }.map(&:value)
  combine_results(results)
end

Workflow: Orchestrator-workers

In the orchestrator-workers workflow, a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results.

When to use this workflow: This workflow is well-suited for complex tasks where you can’t predict the subtasks needed. The key difference from parallelization is its flexibility—subtasks aren’t pre-defined, but determined by the orchestrator based on the specific input.

Example where orchestrator-workers is useful:

Coding products that make complex changes to multiple files each time
Search tasks that involve gathering and analyzing information from multiple sources

Workflow: Evaluator-optimizer

In the evaluator-optimizer workflow, one LLM call generates a response while another provides evaluation and feedback in a loop.

When to use this workflow: This workflow is particularly effective when we have clear evaluation criteria, and when iterative refinement provides measurable value. The two signs of good fit are: first, that LLM responses can be demonstrably improved when a human articulates their feedback; and second, that the LLM can provide such feedback.

Examples where evaluator-optimizer is useful:

Literary translation where there are nuances that the translator LLM might not capture initially
Complex search tasks that require multiple rounds of searching and analysis

Agents

Agents are emerging in production as LLMs mature in key capabilities—understanding complex inputs, engaging in reasoning and planning, using tools reliably, and recovering from errors. Agents begin their work with either a command from, or interactive discussion with, the human user. Once the task is clear, agents plan and operate independently, potentially returning to the human for further information or judgement.

Agents can handle sophisticated tasks, but their implementation is often straightforward. They are typically just LLMs using tools based on environmental feedback in a loop. It is therefore crucial to design toolsets and their documentation clearly and thoughtfully.

When to use agents: Agents can be used for open-ended problems where it’s difficult or impossible to predict the required number of steps, and where you can’t hardcode a fixed path. The LLM will potentially operate for many turns, and you must have some level of trust in its decision-making.

The autonomous nature of agents means higher costs, and the potential for compounding errors. I recommend extensive testing in sandboxed environments, along with the appropriate guardrails.

Ruby’s expressiveness makes agent loops particularly readable:

def autonomous_agent(task)
  max_iterations = 10
  iteration = 0
  
  while iteration < max_iterations && !task_complete?
    plan = llm.call("Given current state, what should I do next for: #{task}")
    result = execute_action(plan)
    
    break if result[:complete]
    iteration += 1
  end
end

Combining and customizing these patterns

These building blocks aren’t prescriptive. They’re common patterns that you can shape and combine to fit different use cases. The key to success is measuring performance and iterating on implementations. You should consider adding complexity only when it demonstrably improves outcomes.

Summary

Success in the LLM space isn’t about building the most sophisticated system. It’s about building the right system for your needs. Start with simple prompts, optimize them with comprehensive evaluation, and add multi-step agentic systems only when simpler solutions fall short.

When implementing agents, I try to follow three core principles:

Maintain simplicity in your agent’s design
Prioritize transparency by explicitly showing the agent’s planning steps
Carefully craft your agent-computer interface through thorough tool documentation and testing

Ruby’s readability and expressiveness make it an excellent choice for implementing these patterns. The code remains clean and maintainable, making it easier to debug and iterate on your agent implementations.

Frameworks can help you get started quickly, but don’t hesitate to reduce abstraction layers and build with basic components as you move to production. By following these principles, you can create agents that are not only powerful but also reliable, maintainable, and trusted by their users.

The examples in this post are implemented in the Ruby LLM Cookbook, which demonstrates these patterns using the RubyLLM gem. The implementations show how Ruby’s expressiveness makes these agent patterns both readable and maintainable.

While Python and TypeScript have dominated the AI development landscape, they’re not the only options when it comes to building AI agents. Here’s the thing—after working with dozens of teams building LLM agents across industries, I’ve noticed something interesting: the most successful implementations weren’t necessarily using the fanciest AI frameworks. They were using solid, boring tools that just work. And you know what’s really solid and boring (in the best way)? Ruby on Rails.

Over the past year, we’ve seen teams pivot from complex AI-specific frameworks to Rails-based implementations, and the results have been pretty eye-opening. Turns out, building production-ready AI agents has more in common with building web applications than most people realize.

What are agents, anyway?

Before we dive in, let’s get on the same page about what we mean by “agents.” Some folks think of agents as these fully autonomous systems that run wild for hours, making decisions and using tools like digital interns you can’t fire. Others use “agent” to describe more structured workflows that follow predefined steps.

We like to think of it this way:

Workflows are like well-planned dinner parties—everything’s orchestrated, the flow is predictable, and you know what’s coming next
Agents are more like jazz improvisation—they’ve got the basic structure down, but they’re making real-time decisions about what to do next

Both are useful! And both work surprisingly well in Rails.

Why Rails? (No, seriously, hear me out)

“Convention Over Configuration” Meets AI Chaos

Here’s the dirty secret about AI development: it gets messy fast. You start with a simple script, then you need error handling, then logging, then you want to save conversation history, then you need background jobs for long-running tasks, then… well, you get the idea. Before you know it, you’re rebuilding half of what Rails gives you for free.

Rails’ “convention over configuration” philosophy is like having a really good project manager for your AI chaos:

Your agent code lives in predictable places (app/services/agents/, app/models/conversation.rb)
Testing patterns are established (hello, RSpec!)
Database stuff just works (ActiveRecord FTW)
You get proper logging, error handling, and monitoring without thinking about it

The “Boring Technology” Advantage

Dan McKinley wrote about choosing boring technology, and Rails is delightfully boring in 2024. It’s been around forever, it’s stable, and—this is key—your team probably already knows it.

Compare that to the AI framework du jour:

Rails: “Oh, this is just another service object”
AI Framework X: “Well, first you need to understand the abstractions, then learn the DSL, then figure out why it’s not working…”

It’s Built for The Real World

Rails was designed for applications that real humans use every day. That means:

Proper error handling (because LLMs fail in creative ways)
Background job processing (because agents take time)
Database transactions (because multi-step workflows need consistency)
Security features (because you’re probably hitting external APIs)
Caching layers (because LLM calls are expensive)

All this stuff exists and works. No need to cobble together microservices or worry about whether your fancy AI framework handles edge cases.

Real Examples (Thanks to the Ruby Community!)

Speaking of real examples, sahmed007 has put together an awesome Ruby cookbook that shows these patterns in action. It’s basically the Ruby equivalent of Anthropic’s Python examples, and it’s pretty sweet. The repo covers all the major agent patterns we’re about to discuss, using the clean RubyLLM gem for actual LLM calls.

Building Blocks: The Rails Way

Let’s walk through the common patterns, but with a Rails twist.

The Augmented LLM (Your Basic Building Block)

Every good agent starts with an LLM that can use tools, retrieve information, and remember things. In Rails, this might look like:

# app/services/agent_llm.rb
class AgentLLM
  def initialize(conversation_id = nil)
    @conversation = find_or_create_conversation(conversation_id)
    @llm = RubyLLM::Client.new
  end

  def chat(message, context: {})
    # Add some retrieval if needed
    relevant_context = RetrievalService.new.search(message) if context[:use_retrieval]
    
    # Build the full prompt with memory
    full_prompt = build_prompt_with_memory(message, relevant_context)
    
    # Make the LLM call
    response = @llm.chat(full_prompt, tools: available_tools)
    
    # Save everything (because Rails makes this easy)
    @conversation.messages.create!(
      role: 'user', 
      content: message,
      context: context
    )
    @conversation.messages.create!(
      role: 'assistant', 
      content: response.content
    )
    
    response
  end

  private

  def available_tools
    [
      WebSearchTool.new,
      CalculatorTool.new,
      DatabaseQueryTool.new
    ]
  end
end

See what’s happening here? We’re using ActiveRecord for conversation persistence, service objects for clean organization, and Rails conventions to keep everything predictable.

Workflow: Prompt Chaining (Controllers Style)

Prompt chaining is just breaking down a big task into smaller steps. Rails controllers are perfect for this:

# app/controllers/marketing_workflow_controller.rb
class MarketingWorkflowController < ApplicationController
  def create_campaign
    @workflow = MarketingWorkflow.create!(params: workflow_params)
    
    # Step 1: Generate the base copy
    GenerateMarketingCopyJob.perform_later(@workflow.id)
    
    render json: { workflow_id: @workflow.id, status: 'started' }
  end

  def show
    @workflow = MarketingWorkflow.find(params[:id])
    render json: @workflow.as_json(include: :steps)
  end
end

# app/jobs/generate_marketing_copy_job.rb
class GenerateMarketingCopyJob < ApplicationJob
  def perform(workflow_id)
    workflow = MarketingWorkflow.find(workflow_id)
    
    # Generate original copy
    copy = AgentLLM.new.chat("Generate marketing copy for #{workflow.product}")
    workflow.steps.create!(name: 'original_copy', output: copy)
    
    # Queue up translation if everything looks good
    if copy_looks_good?(copy)
      TranslateCopyJob.perform_later(workflow_id)
    end
  end
end

This is just good Rails patterns! Background jobs for async work, models to track state, controllers for the API. The AI stuff is just another service.

Workflow: Routing (Literally Rails Routing)

Sometimes you need to classify a request and send it to different handlers. Rails… already does this:

# app/controllers/support_router_controller.rb
class SupportRouterController < ApplicationController
  def route_ticket
    classification = classify_ticket(params[:message])
    
    case classification
    when 'billing'
      BillingAgentService.new.handle(params[:message])
    when 'technical'
      TechnicalAgentService.new.handle(params[:message])
    when 'general'
      GeneralAgentService.new.handle(params[:message])
    end
  end

  private

  def classify_ticket(message)
    AgentLLM.new.chat(
      "Classify this support ticket: #{message}. 
       Return one word: billing, technical, or general"
    ).strip.downcase
  end
end

You could even get fancy and use Rails routing patterns to make this super clean, but you get the idea.

Workflow: Parallelization (Sidekiq FTW)

Need to run multiple LLM calls in parallel? Rails has you covered with battle-tested job queues:

# app/jobs/parallel_analysis_job.rb
class ParallelAnalysisJob < ApplicationJob
  def perform(document_id)
    document = Document.find(document_id)
    
    # Kick off multiple analysis jobs in parallel
    jobs = [
      SentimentAnalysisJob.perform_later(document_id),
      KeywordExtractionJob.perform_later(document_id),
      SummaryGenerationJob.perform_later(document_id)
    ]
    
    # Wait for all jobs to complete (or use callbacks)
    results = jobs.map { |job| job.result }
    
    # Combine results
    combined_analysis = combine_analyses(results)
    document.update!(analysis: combined_analysis)
  end
end

The sahmed007 cookbook has a great parallelization example that shows how to do stakeholder impact analysis using this pattern. It’s clean, it works, and it scales.

Agents: The Full Autonomous Experience

Real agents—the kind that make decisions and iterate on their own—are where Rails really shines. You need robust error handling, state management, and the ability to pause/resume execution. Rails gives you all of this:

# app/services/coding_agent_service.rb
class CodingAgentService
  MAX_ITERATIONS = 10

  def initialize(task_description)
    @task = task_description
    @session = AgentSession.create!(
      task: task_description,
      status: 'running'
    )
    @iteration_count = 0
  end

  def execute
    while @iteration_count < MAX_ITERATIONS && !task_complete?
      @iteration_count += 1
      
      # Agent decides what to do next
      next_action = plan_next_action
      
      # Execute the action
      result = execute_action(next_action)
      
      # Log everything (because debugging agents is hard)
      @session.iterations.create!(
        iteration: @iteration_count,
        action: next_action,
        result: result,
        success: result[:success]
      )
      
      # Check if we're done
      break if result[:task_complete]
      
      # Give the agent a breather (and save state)
      @session.update!(
        current_iteration: @iteration_count,
        last_action: next_action
      )
    end
    
    finalize_session
  end

  private

  def plan_next_action
    context = build_context_from_history
    AgentLLM.new(@session.id).chat(
      "Based on the task and current progress, what should I do next?",
      context: context
    )
  end
end

When NOT to Use Rails for Agents

Look, Rails isn’t magic. There are times when you shouldn’t use it:

Research/prototyping: If you’re just experimenting with prompts, a Jupyter notebook is fine
Pure ML workloads: If you’re training models or doing heavy numerical computation, stick with Python
Ultra-low latency: If you need sub-100ms response times, you might want something more bare-metal
Team doesn’t know Rails: If your team is all Python data scientists, don’t force it

But if you’re building something that real people will use, that needs to be maintained, and that has to integrate with existing systems? Rails is pretty hard to beat.

The Real Talk: Why This Actually Matters

Here’s the thing nobody talks about in AI land: most AI projects fail because of engineering problems, not AI problems. The LLM works fine in your notebook, but then you need to deploy it, handle errors, scale it, monitor it, and maintain it. That’s where Rails shines.

Your Rails app can:

Handle user authentication and authorization
Manage conversation history and user data
Integrate with your existing systems
Provide APIs for mobile apps or other services
Scale horizontally when you get popular
Give your team familiar debugging and monitoring tools

Plus, when the AI hype dies down a bit (and it will), you’ll still have a solid web application that does useful things for people. That’s not nothing.

Getting Started

If you want to try this out, check out the ruby-llm-cookbook repo. It’s got working examples of all these patterns, using the RubyLLM gem for clean API calls.

The setup is pretty straightforward:

Clone the repo
Bundle install
Set up your API keys (works with OpenRouter, Anthropic, OpenAI, etc.)
Run the examples

The code is clean, well-commented, and shows how these patterns work in practice.

Wrapping Up

Building AI agents doesn’t have to mean learning a whole new stack or dealing with experimental frameworks. Sometimes the best tool for the job is the one you already know how to use well.

Rails gives you:

Proven patterns for building complex applications
A mature ecosystem for handling real-world problems
Developer productivity that lets you focus on the AI logic, not the plumbing
A clear path from prototype to production

The AI world moves fast, but the fundamentals of building good software don’t change. Rails has been helping developers build maintainable, scalable applications for 20 years. Turns out, that experience translates pretty well to AI agents too.

So next time someone asks why you’re building AI agents in Rails instead of the hot new AI framework, just smile and deploy your working, tested, maintainable code while they’re still figuring out their dependencies.

Built something cool with Rails and AI? We’d love to hear about it. The Ruby community’s always been good at taking solid tools and finding creative uses for them.

Thanks to sahmed007 for putting together the Ruby cookbook that inspired this post, and to the broader Ruby community for proving that sometimes the best innovation is applying proven tools to new problems.

product engineering ai

Samad Ahmed

I strategize and lead early product development for ambitious startups. Passionate about emerging interfaces, accelerated experimentation, and high throughput systems.