Creating an Agent Swarm

In the world of autonomous agents, collaboration can lead to powerful results. That's why we're going to build an agent swarm with Sagentic. Our goal is to create a system that can answer questions by searching Google and summarizing content from the resulting web pages.

Architecture

This swarm will consist of two types of agents: SearcherAgent and ReaderAgent.

The SearcherAgent

The SearcherAgent is responsible for initiating the search. It uses a tool powered by SerpApi to query Google and sift through the search results. This agent autonomously decides what information to look for and determines which Google results are relevant. As it processes the search results, it can invoke instances of ReaderAgent to delve deeper into the URLs it finds.

The ReaderAgent

The ReaderAgent specializes in reading and summarizing web pages. It encapsulates the action of extracting and condensing the information from a website. For our swarm, we'll create a simple version of this agent, but it could be expanded into a comprehensive web crawler that follows links and gathers data from multiple pages.

Agent Collaboration

By equipping SearcherAgent with the ability to call upon ReaderAgent, we create a dynamic where one agent can leverage the skills of another. SearcherAgent operates with a degree of autonomy, making decisions on the fly, while ReaderAgent provides focused expertise in content summarization.

Together, these agents form a swarm with a shared mission: to efficiently answer questions using the vast resources of the internet. This approach showcases the power of agent orchestration and the modular design philosophy of Sagentic, where each agent plays a distinct role yet contributes to a larger, cohesive operation.

Getting Started

Let's get started by creating a new Sagentic project.

Once you've created your project, we will go through the following steps:

Reuse the searchTool we created in Your First Tool needed by the SearcherAgent that we will build later.
Move on to creating the ReaderAgent and the tool encapsulating its functionality.
Create the SearcherAgent and orchestrate the swarm.

After this, our swarm will be ready to answer questions!

Implementing the Search Tool

The SearcherAgent will use the searchTool we created in Your First Tool to perform Google searches. We'll reuse that tool here, so if you haven't already, go through the steps in that guide to create the tool and add it to your project.

Make sure that the searchTool is placed in tools/search.ts so that it can be imported by the SearcherAgent later on.

Creating the ReaderAgent

The ReaderAgent is a crucial component of our agent swarm, designed to read and summarize content from web pages. It's a one-shot agent, similar to the HelloAgent we discussed in the Your First Agent, which means it performs a single LLM interaction and then stops.

By focusing on a single task — summarizing web content—it can be used by other agents, like the SearcherAgent, to build a more complex and capable swarm.

Agent Responsibilities

The ReaderAgent takes a URL and a question as input and attempts to answer the question based on the website's content. It uses a function called readWebsite to fetch the content from the specified URL and then processes it with the html-to-text library (which you'll need to install) to extract the text content. This conversion is important because it reduces the number of tokens the LLM needs to process and helps it focus on the actual content rather than HTML markup.

Source Code Explanation

Here's the source code for the ReaderAgent, place it in agents/reader.ts:

typescript

import {
  AgentOptions,
  OneShotAgent,
  ModelType,
  countTokens,
  pricing,
  isTool,
} from "sagentic";
import { convert } from "html-to-text";
import { z } from "zod";

const readWebsite = async (url: string): Promise<string> => {
  const response = await fetch(url);
  const html = await response.text();
  const text = convert(html);
  return text;
};

export interface ReaderAgentOptions extends AgentOptions {
  prompt: string;
  url: string;
}

const Options = z.object({
  prompt: z.string().describe("The prompt to ask"),
  url: z.string().url().describe("The url to read"),
});

export interface ReaderAgentResult {
  answer: string;
}

const Result = z.object({
  answer: z.string().describe("The answer to the question"),
});

@isTool("read", "Reads a website and returns the summary", Options, Result)
export default class ReaderAgent extends OneShotAgent<
  ReaderAgentOptions,
  ReaderAgentResult
> {
  model: ModelType = ModelType.GPT35Turbo;
  systemPrompt = `Your task is to answer a question solely based on the supplied source text. 
  If the answer cannot be found in the source text you MUST state that you don't know the answer.`;

  async input(): Promise<string> {
    let text = await readWebsite(this.options.url);

    while (countTokens(text) > pricing[this.model].contextSize)
      text = text.slice(0, -(text.length / 10));

    return `Question: ${this.options.prompt}\n\nSource text:\n\n${text}`;
  }

  async output(answer: string): Promise<ReaderAgentResult> {
    return { answer };
  }
}

Here's a breakdown of the ReaderAgent source code:

The agent inherits from OneShotAgent, specifying ReaderAgentOptions for its input and ReaderAgentResult for its output.
In the constructor, it sets up a systemPrompt that instructs the LLM to answer the question based on the provided text and to explicitly state if the answer is not found in the source.
The input() method fetches the website content, converts it to text, and trims it if necessary to fit within the LLM's context window. This is done using the countTokens helper from Sagentic to ensure the text doesn't exceed the token limit for the GPT-3.5 Turbo model.
The output() method simply packages the LLM's response into the expected result format.

Preprocessing the Input

The readWebsite function is straightforward:

It fetches the HTML content from the given URL.
It converts the HTML to plain text using html-to-text.
It returns the text for the LLM to process.

Handling Large Contexts

Since the content of a website might exceed the context size limit of the GPT-3.5 Turbo model, the input() method includes a loop that progressively trims the text until it fits within the LLM's context window. This is a simple strategy to ensure the LLM can process the text.

In the future, a more sophisticated strategy could be implemented within this agent to improve the summarization results, showcasing the benefits of encapsulation.

Turning the Agent into a Tool

The ReaderAgent is also a tool, which means it can be invoked by other agents. To make it a tool, we use the @isTool decorator to specify the tool's name, description, input schema, and output schema.

In addition to agent's options and result types (ReaderAgentOptions and ReaderAgentResult), we also define the schemas for the tool's input and output (Options and Result). This is done using the zod library, which provides a convenient way to define schemas and infer types.

Once the schemas are defined, we pass them to the @isTool decorator, along with the tool's name and description. This allows the tool to be invoked by other agents, such as the SearcherAgent we will write next.

typescript

@isTool("read", "Reads a website and returns the summary", Options, Result)

TIP

Agents in Sagentic AF can directly use the spawnAgent method to create instances of other agents, where the first argument is the agent's constructor and the second one is the options object to pass to the agent

For instance, if an agent wants to spawn a ReaderAgent to summarize a webpage, it would call the spawnAgent method like this:

typescript

const readerAgentInstance = this.spawnAgent(ReaderAgent, {
  url: "https://example.com",
  prompt: "What is the main topic of the article?",
} as ReaderAgentOptions);

Creating the SearcherAgent

The SearcherAgent is a key player in our agent swarm, designed to answer questions by utilizing Google search results and summarizing web pages. It showcases the full life-cycle of a Sagentic agent: initialize, step, and finalize.

The SearcherAgent follows a specific algorithm, using the search and read tools to gather information and make notes, ultimately providing an answer or stating that it couldn't find one. We'll also add a counter to its internal state. The counter is to make sure that the agent doesn't loop indefinitely, prompting it to give a partial answer if it exceeds certain number of tries.

This example uses the bare ReactiveAgent class to illustrate the flexibility and control developers have when defining custom agents in terms of reactions.

TIP

While Sagentic offers specialized classes for different types of agents, each agent is a state machine -- you can implement your own patterns by extending BaseAgent class.

Let us know if you'd like to use any specific pattern for your agents on Discord. 😎

The Source Code

Create agents/searcher.ts and add the following code:

typescript

import {
  AgentOptions,
  ReactiveAgent,
  ModelType,
  when,
  ToolLike,
  toTool,
} from "sagentic";
import moment from "moment";
import { z } from "zod";
import { searchTool } from "../tools/search";
import ReaderAgent from "./reader";

export interface SearcherAgentOptions extends AgentOptions {
  question: string;
}

export interface SearcherAgentState {
  counter: number;
  answer?: string;
  sources?: string[];
}

export interface SearcherAgentResult {
  answer: string;
  sources: string[];
}

const NoteSchema = z.object({
  note: z.string().describe("The note to make"),
});
type Note = z.infer<typeof NoteSchema>;

const AnswerSchema = z.object({
  answer: z.string().describe("The answer to the question"),
  sources: z
    .string()
    .array()
    .describe("The URLs of sources used to answer the question"),
});
type Answer = z.infer<typeof AnswerSchema>;

export default class SearcherAgent extends ReactiveAgent<
  SearcherAgentOptions,
  SearcherAgentState,
  SearcherAgentResult
> {
  model: ModelType = ModelType.GPT4Turbo;
  systemPrompt: string = `Your task is to answer the question posed by the user.
  In order to answer the question you must use the available tools to look up relevant information.
  You may search the web in any language that will help you answer the question better.
  Use queries that will help you gather the most relevant information.
  Give your answer in the language of the question.
  When asked for final answer and you don't have enough information just answer with justification.`;

  async input(options: SearcherAgentOptions): Promise<SearcherAgentState> {
    const search = searchTool(this.session.context["serp-api-key"]);
    const read = toTool(ReaderAgent as ToolLike);
    this.tools = [search, read];

    this.respond(`Current date: ${moment().format("DD/MM/YYYY HH:mm")}
    Question: ${options.question}`);

    return { counter: 0 };
  }

  @when("you want to make a note", NoteSchema)
  async note(
    { counter }: SearcherAgentState,
    _input: Note
  ): Promise<SearcherAgentState> {
    if (counter > 3) {
      this.respond("Please give your final answer.");
    } else {
      this.respond("Please continue.");
    }
    return { counter: counter + 1 };
  }

  @when("you want to give your final answer", AnswerSchema)
  async answer(
    state: SearcherAgentState,
    { answer, sources }: Answer
  ): Promise<SearcherAgentState> {
    this.stop();
    return { ...state, answer: answer, sources };
  }

  async output({
    answer,
    sources,
  }: SearcherAgentState): Promise<SearcherAgentResult> {
    return {
      answer: answer || "I don't know the answer.",
      sources: sources || [],
    };
  }
}

TIP

Sagentic is in its early stages, so the API is subject to change. If you have any ideas or suggestions, let's chat on Discord 😎

Source Code Explanation

This is a lot of code, so let's break it down.

typescript

import {
  AgentOptions,
  ReactiveAgent,
  ModelType,
  when,
  ToolLike,
  toTool,
} from "sagentic";

We start by importing the necessary modules from Sagentic, including BaseAgent and utilities for handling JSON and threads.

typescript

import { searchTool } from "../tools/search";
import ReaderAgent from "./reader";

The searchTool is imported from the tools directory, equipping the agent with the ability to search the web. The ReaderAgent is imported from the agents directory, allowing the new agent to invoke instances of ReaderAgent to summarize web pages.

typescript

import { z } from "zod";
import moment from "moment";

zod is used for schema validation, and moment helps with formatting the current date and time.

typescript

const NoteSchema = z.object({
  note: z.string().describe("The note to make"),
});
type Note = z.infer<typeof NoteSchema>;

const AnswerSchema = z.object({
  answer: z.string().describe("The answer to the question"),
  sources: z
    .string()
    .array()
    .describe("The URLs of sources used to answer the question"),
});
type Answer = z.infer<typeof AnswerSchema>;

AnswerSchema and NoteSchema define the JSON structure for the agent's responses, ensuring that the LLM's output is structured and can be parsed. Types Answer and Note are inferred from the schemas, allowing the agent to use them in its methods.

typescript

export interface SearcherAgentOptions extends AgentOptions {
  question: string;
}

export interface SearcherAgentState {
  counter: number;
  answer?: string;
  sources?: string[];
}

export interface SearcherAgentResult {
  answer: string;
  sources: string[];
}

Here we define the input and output and state types for the agent.

SearcherAgentOptions extends the base AgentOptions to include a question field, which is the query the agent will attempt to answer.

SearcherAgentState defines the state of the agent, including a counter to track the number of attempts, an answer field to store the final answer, and a sources field to store the URLs of the sources used to answer the question.

SearcherAgentResult defines the result of the agent, including the answer and sources fields.

typescript

export default class SearcherAgent extends ReactiveAgent<
  SearcherAgentOptions,
  SearcherAgentState,
  SearcherAgentResult
> {
  // ...
}

SearcherAgent is defined as a default export and extends ReactiveAgent, indicating it's a custom agent with a specific purpose. It uses the SearcherAgentOptions for its input, SearcherAgentState for its state, and SearcherAgentResult for its output. By extending ReactiveAgent, it inherits the input and output methods, which we'll implement shortly. Using ReactiveAgent also allows us to define reactions using the @when decorator.

typescript

model: ModelType = ModelType.GPT4Turbo;
systemPrompt: string = `Your task is to answer the question posed by the user.
  In order to answer the question you must use the available tools to look up relevant information.
  You may search the web in any language that will help you answer the question better.
  Use queries that will help you gather the most relevant information.
  Give your answer in the language of the question.
  When asked for final answer and you don't have enough information just answer with justification.`;

The SearcherAgent class sets up several key properties:

model: Specifies the LLM that the agent will use, which in this case is ModelType.GPT4Turbo. This model is chosen for its ability to process information efficiently and effectively.
systemPrompt: Defines the instructions for the LLM, outlining the task of answering a user's question by using the available tools to search the web and synthesize information.

typescript

async function input(
  options: SearcherAgentOptions
): Promise<SearcherAgentState> {
  const search = searchTool(this.session.context["serp-api-key"]);
  const read = toTool(ReaderAgent as ToolLike);
  this.tools = [search, read];

  this.respond(`Current date: ${moment().format("DD/MM/YYYY HH:mm")}
    Question: ${options.question}`);

  return { counter: 0 };
}

The input method is called when the agent is spawned, and it's responsible for setting up the agent's state and tools. In this method, the agent initializes its tools: searchTool and ReaderAgent.

The method sets the initial state of the agent by returning it.

Notice that searchTool is instantiated with the API key from the session context, which is passed in as an option. This allows the agent to use the API key without having to store it in the agent's options. The Session object is intended to store information that is shared across agents, such as API keys and other credentials. This is a more secure approach than storing sensitive information in the agent's options. The values for the session object are supplied by the user when they invoke the agent swarm.

asTool is used to convert the ReaderAgent class into a tool, allowing the agent to invoke instances of ReaderAgent to summarize web pages.

typescript

  @when("you want to make a note", NoteSchema)
  async function note(
    { counter }: SearcherAgentState,
    _input: Note
  ): Promise<SearcherAgentState> {
    if (counter > 3) {
      this.respond("Please give your final answer.");
    } else {
      this.respond("Please continue.");
    }
    return { counter: counter + 1 };
  }

  @when("you want to give your final answer", AnswerSchema)
  async function answer(
    state: SearcherAgentState,
    { answer, sources }: Answer
  ): Promise<SearcherAgentState> {
    this.stop();
    return { ...state, answer: answer, sources };
  }

This is the core of the SearcherAgent, where the agent's reactions are defined. The @when decorator is used to define reactions, which are functions that are called when the LLM's output matches the specified schema.

The note reaction is called when the LLM's output matches the NoteSchema. It increments the counter and prompts the LLM to continue if the counter is less than 3, or to give the final answer if the counter is greater than 3. This ensures that the agent doesn't loop indefinitely, prompting it to give a partial answer if it exceeds certain number of tries. This method updates the agent's state and returns it.

The answer reaction is called when the LLM's output matches the AnswerSchema. It stops the agent and returns the final state, including the answer and sources.

Notice that the answer reaction uses the stop method to stop the agent. This is a key difference between ReactiveAgent and OneShotAgent. ReactiveAgent is designed to run indefinitely, reacting to the LLM's output, while OneShotAgent is designed to run once and then stop. In this case, we want the agent to stop only after it has answered the question.

Sagentic AF takes care of instructing the LLM on what it is allowed to send and makes sure that the LLM output parses correctly. This allows the us to focus on logic and not worry about parsing the LLM's output. If the LLM's output is not usable by the agent, the agent takes care of handling the error and instructing the LLM to try again automatically.

typescript

async function output({
  answer,
  sources,
}: SearcherAgentState): Promise<SearcherAgentResult> {
  return {
    answer: answer || "I don't know the answer.",
    sources: sources || [],
  };
}

The output method transforms the agent's final state into the result that will be returned to the user.

That's it for the SearcherAgent! It's a moderately complex agent, but it demonstrates the full life-cycle of a Sagentic AF agent, from initialization, through reactions to finalization.

Testing the Swarm

To test the agent swarm you've created with Sagentic, follow these steps:

Start the Dev Server: Ensure your local development server is running. If it's not, refer to the Local Development section for instructions on how to start it.
List Available Agents: To confirm that your agents are loaded and ready, make a GET request to http://localhost:3000/. You should see a JSON response listing the SearcherAgent and ReaderAgent:
json
```
{
  "service": "sagentic.ai server",
  "version": "0.0.1",
  "agents": ["SearcherAgent", "ReaderAgent"]
}
```
Invoke the SearcherAgent: To start the SearcherAgent, send a POST request to /spawn with the question you want to be answered. Include your SerpApi key in the env object:
bash
```
curl -X POST http://localhost:3000/spawn -H "Content-Type: application/json" -d '{
    "type": "<your-project>/SearcherAgent",
    "options": {
        "question": "What is going on in Seattle?"
    },
    "env": {
        "serp-api-key": "your API key"
    }
}'
```
TIP
Replace <your-project> with the name of your project. You can find the name of your project in the package.json file. Agents are namespaced because they are unique to each project and user. In the future this will allow you to seamlessly call other agents from different projects without any conflicts.

Review the Answer: After processing, you should receive a response with the answer and the sources used to compile it. The session field will provide details on the cost, tokens used, and time elapsed:

json

{
  "success": true,
  "result": {
    "answer": "Seattle is currently experiencing extreme cold weather, with frigid temperatures forecasted across the Puget Sound region. There are school closures and delays, warming centers have been opened, and King County has opened emergency cold weather shelters. Additionally, there are reports of a Boeing whistleblower related to a door plug incident midflight, and various local news events including legal matters, transportation disruptions, and health concerns.",
    "sources": [
      "https://komonews.com/",
      "https://www.king5.com/",
      "https://www.kiro7.com/homepage",
      "https://www.seattletimes.com/seattle-news/"
    ]
  },
  "session": {
    "cost": 0.07287500000000001,
    "tokens": {
      "gpt-4-1106-preview": 0.0405,
      "gpt-3.5-turbo-1106": 0.032375
    },
    "elapsed": 58.67
  }
}

This test demonstrates the capabilities of your agent swarm in action, answering real-world questions by leveraging the combined efforts of the SearcherAgent and ReaderAgent.

Next Steps

Now that you've created your first agent swarm, you can deploy it to the Sagentic platform and share it with the world. See Deploy with sagentic.ai for instructions on how to deploy your agents to the cloud.

Framework

Enumerations

Classes

Interfaces

Type Aliases

Variables

Functions

Creating an Agent Swarm

Architecture

The SearcherAgent

The ReaderAgent

Agent Collaboration

Getting Started

Implementing the Search Tool

Creating the ReaderAgent

Agent Responsibilities

Source Code Explanation

Preprocessing the Input

Handling Large Contexts

Turning the Agent into a Tool

Creating the SearcherAgent

The Source Code

Source Code Explanation

Testing the Swarm

Next Steps

Creating an Agent Swarm ​

Architecture ​

The SearcherAgent ​

The ReaderAgent ​

Agent Collaboration ​

Getting Started ​

Implementing the Search Tool ​

Creating the ReaderAgent ​

Agent Responsibilities ​

Source Code Explanation ​

Preprocessing the Input ​

Handling Large Contexts ​

Turning the Agent into a Tool ​

Creating the SearcherAgent ​

The Source Code ​

Source Code Explanation ​

Testing the Swarm ​

Next Steps ​

Creating an Agent Swarm

Architecture

The SearcherAgent

The ReaderAgent

Agent Collaboration

Getting Started

Implementing the Search Tool

Creating the ReaderAgent

Agent Responsibilities

Source Code Explanation

Preprocessing the Input

Handling Large Contexts

Turning the Agent into a Tool

Creating the SearcherAgent

The Source Code

Source Code Explanation

Testing the Swarm

Next Steps