Shifts in LLM Usage

Main Shift in AI in early 2024

Recently, a significant shift has occurred from text generation (e.g., generating subject lines or database queries) to drive agentic decision-making. In this paradigm, the AI encounters a problem, is equipped with tools to address it, and independently determines a course of action to achieve an objective and resolve the problem. It is worth noting that such agentic endeavors are currently limited by slow processing and response times. Increasing tokens-per-minute (TPM) is a key objective for many companies, and the success of agent interaction has been demonstrated in tests that employed GPT-4 for the task of generating a newsletter from the crawled subreddit LocalLLama. Nonetheless, advancements in the areas of increasing TPM, hosting local LLMs, continuous improvements in models (such as Gemini Ultra or the next GPT iteration), and enhanced opportunities for retrieval augmented generation, wider context spans (e.g., Gemini’s 1.5M tokens), and fine-tuning LLMs (e.g., PEFT and Lightning AI) are poised to overcome these limitations in the foreseeable future.

Technical Perspective: CompilerCrew & Memory

The following layers are considered important for an agentic state-of-the-art AI architecture:

  1. Interaction layer: Allows for interaction with the orchestrating agent.
  2. Orchestration layer: Coordinates interactions with user, agent crew, and accesses memory layer.
  3. Model layer: Different models can be utilized for different types of functionality.
  4. Storage layer: Considerations abotu different memory types and storage of business logic.
  5. Agents layer: Agents (referred to as a crew) to

1. Interaction Layer

The interaction layer is a crucial component of the system, facilitating seamless interactions with AI and enabling integration with various platforms. It comprises three key elements: an AI Portal, an API endpoint, and administrative tools (observability, prompt management).

AI Chat Portal

An AI chat portal serves as a dedicated platform for interactions with the agentic AI system. It provides a user-friendly interface tailored for efficient and effective communication with the AI system (implements streaming for faster responses). Is is further characterized by the following properties:

This chat portal can be quickly implemented with Chainlit. Besides integration with various other platforms should be considered. Integrating the orchestrator also wide range of platforms, including Slack and MS Teams, Email and Zoom allows users to interact with the AI system directly within their preferred communication channels, enhancing productivity and streamlining workflows.

Additionally, I recommend the following systems:

2. Orchestration Layer

This orchestration layer needs to be optimized for speed by supporting streaming and smart prompting strategies. Based on a review from February 2024, the currently most suitable agentic architecture is the LLMCompiler. The LLMCompiler, by Kim, et. al., is an agent architecture designed to speedy task execution. The LLM Compiler has the following main components:

The key runtime-boosting ideas here are:

The orchestrator agent communicates with the user through the interaction layer and instructs, checks, manages, and steers the agent crew.

It’s tasks involve:

3. Model Layer

Consider employing a hybrid architecture of diverse LLMs to optimize the balance between quality and speed in achieving the agent’s goals. This approach includes:

These methods can be implemented independently or in combination to significantly enhance response generation speed.

4. Storage Layer

There are a myriad of considerations when it comes to the storing of all data points involved, hence, I am listing different types, techniques and consider a high-level concept of key aspects to store.

Storage Layer Types

Storage Techniques

Storage Implementation Concept

5. Agent Layer

The Data Maestro

This agent tirelessly aggregates, cleans, and enriches your data from multiple sources. It tirelessly seeks out the most promising patterns and segments for maximum targeting effectiveness.

The Insights Oracle

Driven by advanced machine learning, this agent analyzes the patterns uncovered by the Data Maestro. It reveals critical insights about customer preferences, behaviors, and potential churn signals.

The Campaign Strategist

Armed with the Insights Oracle’s recommendations, this agent drafts multi-channel campaign blueprints tailored to your goals. It outlines optimal audiences, messaging ideas, and even suggests A/B testing setups for continuous improvement.

The Workflow Wizard

This agent loves turning plans into action. It automates email flows, sets up SMS triggers based on website activity, and orchestrates campaigns across platforms with precision.

The Optimization Guru

No campaign is ever ‘finished’ with this agent on board. It scrutinizes performance metrics in real-time, tweaking subject lines, adjusting send times, and suggesting content changes to enhance results.

Critical Appreciation

With the current limitations of AI, that I mentioned in the introductory part of this article, it is crucial to design our AI architecture in a way that it minimizes these shortcomings. Aspects to pay particular attention to are: