Creating an Agent Swarm β
In the world of autonomous agents, collaboration can lead to powerful results. That's why we're going to build an agent swarm with Sagentic. Our goal is to create a system that can answer questions by searching Google and summarizing content from the resulting web pages.
Architecture β
This swarm will consist of two types of agents: SearcherAgent
and ReaderAgent
.
The SearcherAgent β
The SearcherAgent
is responsible for initiating the search. It uses a tool powered by SerpApi to query Google and sift through the search results. This agent autonomously decides what information to look for and determines which Google results are relevant. As it processes the search results, it can invoke instances of ReaderAgent
to delve deeper into the URLs it finds.
The ReaderAgent β
The ReaderAgent
specializes in reading and summarizing web pages. It encapsulates the action of extracting and condensing the information from a website. For our swarm, we'll create a simple version of this agent, but it could be expanded into a comprehensive web crawler that follows links and gathers data from multiple pages.
Agent Collaboration β
By equipping SearcherAgent
with the ability to call upon ReaderAgent
, we create a dynamic where one agent can leverage the skills of another. SearcherAgent
operates with a degree of autonomy, making decisions on the fly, while ReaderAgent
provides focused expertise in content summarization.
Together, these agents form a swarm with a shared mission: to efficiently answer questions using the vast resources of the internet. This approach showcases the power of agent orchestration and the modular design philosophy of Sagentic, where each agent plays a distinct role yet contributes to a larger, cohesive operation.
Getting Started β
Let's get started by creating a new Sagentic project.
Once you've created your project, we will go through the following steps:
- Reuse the
searchTool
we created in Your First Tool needed by theSearcherAgent
that we will build later. - Move on to creating the
ReaderAgent
and the tool encapsulating its functionality. - Create the
SearcherAgent
and orchestrate the swarm.
After this, our swarm will be ready to answer questions!
Implementing the Search Tool β
The SearcherAgent
will use the searchTool
we created in Your First Tool to perform Google searches. We'll reuse that tool here, so if you haven't already, go through the steps in that guide to create the tool and add it to your project.
Make sure that the searchTool
is placed in tools/search.ts
so that it can be imported by the SearcherAgent
later on.
Creating the ReaderAgent β
The ReaderAgent
is a crucial component of our agent swarm, designed to read and summarize content from web pages. It's a one-shot agent, similar to the HelloAgent
we discussed in the Your First Agent, which means it performs a single LLM interaction and then stops.
By focusing on a single task β summarizing web contentβit can be used by other agents, like the SearcherAgent
, to build a more complex and capable swarm.
Agent Responsibilities β
The ReaderAgent
takes a URL and a question as input and attempts to answer the question based on the website's content. It uses a function called readWebsite
to fetch the content from the specified URL and then processes it with the html-to-text
library (which you'll need to install) to extract the text content. This conversion is important because it reduces the number of tokens the LLM needs to process and helps it focus on the actual content rather than HTML markup.
Source Code Explanation β
Here's the source code for the ReaderAgent
, place it in agents/reader.ts
:
import {
AgentOptions,
OneShotAgent,
ModelType,
countTokens,
pricing,
isTool,
} from "sagentic";
import { convert } from "html-to-text";
import { z } from "zod";
const readWebsite = async (url: string): Promise<string> => {
const response = await fetch(url);
const html = await response.text();
const text = convert(html);
return text;
};
export interface ReaderAgentOptions extends AgentOptions {
prompt: string;
url: string;
}
const Options = z.object({
prompt: z.string().describe("The prompt to ask"),
url: z.string().url().describe("The url to read"),
});
export interface ReaderAgentResult {
answer: string;
}
const Result = z.object({
answer: z.string().describe("The answer to the question"),
});
@isTool("read", "Reads a website and returns the summary", Options, Result)
export default class ReaderAgent extends OneShotAgent<
ReaderAgentOptions,
ReaderAgentResult
> {
model: ModelType = ModelType.GPT35Turbo;
systemPrompt = `Your task is to answer a question solely based on the supplied source text.
If the answer cannot be found in the source text you MUST state that you don't know the answer.`;
async input(): Promise<string> {
let text = await readWebsite(this.options.url);
while (countTokens(text) > pricing[this.model].contextSize)
text = text.slice(0, -(text.length / 10));
return `Question: ${this.options.prompt}\n\nSource text:\n\n${text}`;
}
async output(answer: string): Promise<ReaderAgentResult> {
return { answer };
}
}
Here's a breakdown of the ReaderAgent
source code:
- The agent inherits from
OneShotAgent
, specifyingReaderAgentOptions
for its input andReaderAgentResult
for its output. - In the constructor, it sets up a
systemPrompt
that instructs the LLM to answer the question based on the provided text and to explicitly state if the answer is not found in the source. - The
input()
method fetches the website content, converts it to text, and trims it if necessary to fit within the LLM's context window. This is done using thecountTokens
helper from Sagentic to ensure the text doesn't exceed the token limit for the GPT-3.5 Turbo model. - The
output()
method simply packages the LLM's response into the expected result format.
Preprocessing the Input β
The readWebsite
function is straightforward:
- It fetches the HTML content from the given URL.
- It converts the HTML to plain text using
html-to-text
. - It returns the text for the LLM to process.
Handling Large Contexts β
Since the content of a website might exceed the context size limit of the GPT-3.5 Turbo model, the input()
method includes a loop that progressively trims the text until it fits within the LLM's context window. This is a simple strategy to ensure the LLM can process the text.
In the future, a more sophisticated strategy could be implemented within this agent to improve the summarization results, showcasing the benefits of encapsulation.
Turning the Agent into a Tool β
The ReaderAgent
is also a tool, which means it can be invoked by other agents. To make it a tool, we use the @isTool
decorator to specify the tool's name, description, input schema, and output schema.
In addition to agent's options and result types (ReaderAgentOptions
and ReaderAgentResult
), we also define the schemas for the tool's input and output (Options
and Result
). This is done using the zod
library, which provides a convenient way to define schemas and infer types.
Once the schemas are defined, we pass them to the @isTool
decorator, along with the tool's name and description. This allows the tool to be invoked by other agents, such as the SearcherAgent
we will write next.
@isTool("read", "Reads a website and returns the summary", Options, Result)
TIP
Agents in Sagentic AF can directly use the spawnAgent
method to create instances of other agents, where the first argument is the agent's constructor and the second one is the options object to pass to the agent
For instance, if an agent wants to spawn a ReaderAgent
to summarize a webpage, it would call the spawnAgent
method like this:
const readerAgentInstance = this.spawnAgent(ReaderAgent, {
url: "https://example.com",
prompt: "What is the main topic of the article?",
} as ReaderAgentOptions);
Creating the SearcherAgent β
The SearcherAgent
is a key player in our agent swarm, designed to answer questions by utilizing Google search results and summarizing web pages. It showcases the full life-cycle of a Sagentic agent: initialize
, step
, and finalize
.
The SearcherAgent
follows a specific algorithm, using the search
and read
tools to gather information and make notes, ultimately providing an answer or stating that it couldn't find one. We'll also add a counter
to its internal state. The counter is to make sure that the agent doesn't loop indefinitely, prompting it to give a partial answer if it exceeds certain number of tries.
This example uses the bare ReactiveAgent
class to illustrate the flexibility and control developers have when defining custom agents in terms of reactions.
TIP
While Sagentic offers specialized classes for different types of agents, each agent is a state machine -- you can implement your own patterns by extending BaseAgent
class.
Let us know if you'd like to use any specific pattern for your agents on Discord. π
The Source Code β
Create agents/searcher.ts
and add the following code:
import {
AgentOptions,
ReactiveAgent,
ModelType,
when,
ToolLike,
toTool,
} from "sagentic";
import moment from "moment";
import { z } from "zod";
import { searchTool } from "../tools/search";
import ReaderAgent from "./reader";
export interface SearcherAgentOptions extends AgentOptions {
question: string;
}
export interface SearcherAgentState {
counter: number;
answer?: string;
sources?: string[];
}
export interface SearcherAgentResult {
answer: string;
sources: string[];
}
const NoteSchema = z.object({
note: z.string().describe("The note to make"),
});
type Note = z.infer<typeof NoteSchema>;
const AnswerSchema = z.object({
answer: z.string().describe("The answer to the question"),
sources: z
.string()
.array()
.describe("The URLs of sources used to answer the question"),
});
type Answer = z.infer<typeof AnswerSchema>;
export default class SearcherAgent extends ReactiveAgent<
SearcherAgentOptions,
SearcherAgentState,
SearcherAgentResult
> {
model: ModelType = ModelType.GPT4Turbo;
systemPrompt: string = `Your task is to answer the question posed by the user.
In order to answer the question you must use the available tools to look up relevant information.
You may search the web in any language that will help you answer the question better.
Use queries that will help you gather the most relevant information.
Give your answer in the language of the question.
When asked for final answer and you don't have enough information just answer with justification.`;
async input(options: SearcherAgentOptions): Promise<SearcherAgentState> {
const search = searchTool(this.session.context["serp-api-key"]);
const read = toTool(ReaderAgent as ToolLike);
this.tools = [search, read];
this.respond(`Current date: ${moment().format("DD/MM/YYYY HH:mm")}
Question: ${options.question}`);
return { counter: 0 };
}
@when("you want to make a note", NoteSchema)
async note(
{ counter }: SearcherAgentState,
_input: Note
): Promise<SearcherAgentState> {
if (counter > 3) {
this.respond("Please give your final answer.");
} else {
this.respond("Please continue.");
}
return { counter: counter + 1 };
}
@when("you want to give your final answer", AnswerSchema)
async answer(
state: SearcherAgentState,
{ answer, sources }: Answer
): Promise<SearcherAgentState> {
this.stop();
return { ...state, answer: answer, sources };
}
async output({
answer,
sources,
}: SearcherAgentState): Promise<SearcherAgentResult> {
return {
answer: answer || "I don't know the answer.",
sources: sources || [],
};
}
}
TIP
Sagentic is in its early stages, so the API is subject to change. If you have any ideas or suggestions, let's chat on Discord π
Source Code Explanation β
This is a lot of code, so let's break it down.
import {
AgentOptions,
ReactiveAgent,
ModelType,
when,
ToolLike,
toTool,
} from "sagentic";
We start by importing the necessary modules from Sagentic, including BaseAgent
and utilities for handling JSON and threads.
import { searchTool } from "../tools/search";
import ReaderAgent from "./reader";
The searchTool
is imported from the tools
directory, equipping the agent with the ability to search the web. The ReaderAgent
is imported from the agents
directory, allowing the new agent to invoke instances of ReaderAgent
to summarize web pages.
import { z } from "zod";
import moment from "moment";
zod
is used for schema validation, and moment
helps with formatting the current date and time.
const NoteSchema = z.object({
note: z.string().describe("The note to make"),
});
type Note = z.infer<typeof NoteSchema>;
const AnswerSchema = z.object({
answer: z.string().describe("The answer to the question"),
sources: z
.string()
.array()
.describe("The URLs of sources used to answer the question"),
});
type Answer = z.infer<typeof AnswerSchema>;
AnswerSchema
and NoteSchema
define the JSON structure for the agent's responses, ensuring that the LLM's output is structured and can be parsed. Types Answer
and Note
are inferred from the schemas, allowing the agent to use them in its methods.
export interface SearcherAgentOptions extends AgentOptions {
question: string;
}
export interface SearcherAgentState {
counter: number;
answer?: string;
sources?: string[];
}
export interface SearcherAgentResult {
answer: string;
sources: string[];
}
Here we define the input and output and state types for the agent.
SearcherAgentOptions
extends the base AgentOptions
to include a question
field, which is the query the agent will attempt to answer.
SearcherAgentState
defines the state of the agent, including a counter
to track the number of attempts, an answer
field to store the final answer, and a sources
field to store the URLs of the sources used to answer the question.
SearcherAgentResult
defines the result of the agent, including the answer
and sources
fields.
export default class SearcherAgent extends ReactiveAgent<
SearcherAgentOptions,
SearcherAgentState,
SearcherAgentResult
> {
// ...
}
SearcherAgent
is defined as a default export and extends ReactiveAgent
, indicating it's a custom agent with a specific purpose. It uses the SearcherAgentOptions
for its input, SearcherAgentState
for its state, and SearcherAgentResult
for its output. By extending ReactiveAgent
, it inherits the input
and output
methods, which we'll implement shortly. Using ReactiveAgent
also allows us to define reactions using the @when
decorator.
model: ModelType = ModelType.GPT4Turbo;
systemPrompt: string = `Your task is to answer the question posed by the user.
In order to answer the question you must use the available tools to look up relevant information.
You may search the web in any language that will help you answer the question better.
Use queries that will help you gather the most relevant information.
Give your answer in the language of the question.
When asked for final answer and you don't have enough information just answer with justification.`;
The SearcherAgent
class sets up several key properties:
model
: Specifies the LLM that the agent will use, which in this case isModelType.GPT4Turbo
. This model is chosen for its ability to process information efficiently and effectively.systemPrompt
: Defines the instructions for the LLM, outlining the task of answering a user's question by using the available tools to search the web and synthesize information.
async function input(
options: SearcherAgentOptions
): Promise<SearcherAgentState> {
const search = searchTool(this.session.context["serp-api-key"]);
const read = toTool(ReaderAgent as ToolLike);
this.tools = [search, read];
this.respond(`Current date: ${moment().format("DD/MM/YYYY HH:mm")}
Question: ${options.question}`);
return { counter: 0 };
}
The input
method is called when the agent is spawned, and it's responsible for setting up the agent's state and tools. In this method, the agent initializes its tools: searchTool
and ReaderAgent
.
The method sets the initial state of the agent by returning it.
Notice that searchTool
is instantiated with the API key from the session context, which is passed in as an option. This allows the agent to use the API key without having to store it in the agent's options. The Session
object is intended to store information that is shared across agents, such as API keys and other credentials. This is a more secure approach than storing sensitive information in the agent's options. The values for the session object are supplied by the user when they invoke the agent swarm.
asTool
is used to convert the ReaderAgent
class into a tool, allowing the agent to invoke instances of ReaderAgent
to summarize web pages.
@when("you want to make a note", NoteSchema)
async function note(
{ counter }: SearcherAgentState,
_input: Note
): Promise<SearcherAgentState> {
if (counter > 3) {
this.respond("Please give your final answer.");
} else {
this.respond("Please continue.");
}
return { counter: counter + 1 };
}
@when("you want to give your final answer", AnswerSchema)
async function answer(
state: SearcherAgentState,
{ answer, sources }: Answer
): Promise<SearcherAgentState> {
this.stop();
return { ...state, answer: answer, sources };
}
This is the core of the SearcherAgent
, where the agent's reactions are defined. The @when
decorator is used to define reactions, which are functions that are called when the LLM's output matches the specified schema.
The note
reaction is called when the LLM's output matches the NoteSchema
. It increments the counter
and prompts the LLM to continue if the counter is less than 3, or to give the final answer if the counter is greater than 3. This ensures that the agent doesn't loop indefinitely, prompting it to give a partial answer if it exceeds certain number of tries. This method updates the agent's state and returns it.
The answer
reaction is called when the LLM's output matches the AnswerSchema
. It stops the agent and returns the final state, including the answer and sources.
Notice that the answer
reaction uses the stop
method to stop the agent. This is a key difference between ReactiveAgent
and OneShotAgent
. ReactiveAgent
is designed to run indefinitely, reacting to the LLM's output, while OneShotAgent
is designed to run once and then stop. In this case, we want the agent to stop only after it has answered the question.
Sagentic AF takes care of instructing the LLM on what it is allowed to send and makes sure that the LLM output parses correctly. This allows the us to focus on logic and not worry about parsing the LLM's output. If the LLM's output is not usable by the agent, the agent takes care of handling the error and instructing the LLM to try again automatically.
async function output({
answer,
sources,
}: SearcherAgentState): Promise<SearcherAgentResult> {
return {
answer: answer || "I don't know the answer.",
sources: sources || [],
};
}
The output
method transforms the agent's final state into the result that will be returned to the user.
That's it for the SearcherAgent
! It's a moderately complex agent, but it demonstrates the full life-cycle of a Sagentic AF agent, from initialization, through reactions to finalization.
Testing the Swarm β
To test the agent swarm you've created with Sagentic, follow these steps:
Start the Dev Server: Ensure your local development server is running. If it's not, refer to the Local Development section for instructions on how to start it.
List Available Agents: To confirm that your agents are loaded and ready, make a GET request to
http://localhost:3000/
. You should see a JSON response listing theSearcherAgent
andReaderAgent
:json{ "service": "sagentic.ai server", "version": "0.0.1", "agents": ["SearcherAgent", "ReaderAgent"] }
Invoke the SearcherAgent: To start the
SearcherAgent
, send a POST request to/spawn
with the question you want to be answered. Include your SerpApi key in theenv
object:bashcurl -X POST http://localhost:3000/spawn -H "Content-Type: application/json" -d '{ "type": "<your-project>/SearcherAgent", "options": { "question": "What is going on in Seattle?" }, "env": { "serp-api-key": "your API key" } }'
TIP
Replace
<your-project>
with the name of your project. You can find the name of your project in thepackage.json
file. Agents are namespaced because they are unique to each project and user. In the future this will allow you to seamlessly call other agents from different projects without any conflicts.Review the Answer: After processing, you should receive a response with the answer and the sources used to compile it. The
session
field will provide details on the cost, tokens used, and time elapsed:json{ "success": true, "result": { "answer": "Seattle is currently experiencing extreme cold weather, with frigid temperatures forecasted across the Puget Sound region. There are school closures and delays, warming centers have been opened, and King County has opened emergency cold weather shelters. Additionally, there are reports of a Boeing whistleblower related to a door plug incident midflight, and various local news events including legal matters, transportation disruptions, and health concerns.", "sources": [ "https://komonews.com/", "https://www.king5.com/", "https://www.kiro7.com/homepage", "https://www.seattletimes.com/seattle-news/" ] }, "session": { "cost": 0.07287500000000001, "tokens": { "gpt-4-1106-preview": 0.0405, "gpt-3.5-turbo-1106": 0.032375 }, "elapsed": 58.67 } }
This test demonstrates the capabilities of your agent swarm in action, answering real-world questions by leveraging the combined efforts of the SearcherAgent
and ReaderAgent
.
Next Steps β
Now that you've created your first agent swarm, you can deploy it to the Sagentic platform and share it with the world. See Deploy with sagentic.ai for instructions on how to deploy your agents to the cloud.