Integrating Large Language Models into Frontends
Leveraging AI for Dynamic Content Generation in Web Applications
Diego de Miguel
Front End Developer at Deutsche Presse-Agentur
In this article, we will explore how to integrate Large Language Models (LLMs) into a front-end application by creating a Product Description Generator. The concept involves sending a prompt to an LLM through its API, streaming the generated result to the UI, and then passing the content to a WYSIWYG text editor to allow users to edit the generated content. This approach is similar to OpenAI’s Canvas.
From Idea to Proof of Concept
On Friday, September 20th, I attended an event titled Frontend in the Age of AI: Happy Hour, hosted by Vercel. During this event, Malte Ubl, CTO of Vercel, introduced two innovative products: Vercel’s AI SDK and V0, a generative AI tool tailored for web development.
Malte, who spent 11 years at Google refining its search algorithm, highlighted a significant shift in user behavior driven by AI:
Users are now willing to write lengthy prompts to query LLMs. Typically, users dislike typing, so much of my work at Google involved interpreting their intent rather than relying on what they actually typed. This shift can only mean one thing: the value-to-effort ratio must be exceptionally high.
The AI SDK is a powerful tool that simplifies the integration of LLMs with UIs, addressing many of the typical challenges developers face during this process. Dynamic content insertion in web browsers can be complex, often introducing rendering and state management issues. This SDK provides effective solutions to manage these interactions smoothly. In this article, we will explore how to leverage its capabilities. Let’s dive in!
Main Features
AI-Powered Content Generation
OpenAI Integration
OpenAI provides a library to connect the LLM with the UI, which will be integrated with Vercel’s AI SDK to generate content in two ways:
- Single Batch Generation: Ideal for concise items such as product tags or image descriptions within this application.
- Continuous Streaming: Suitable for generating more detailed product descriptions.
This dual approach allows for flexible and efficient content generation tailored to varying application needs.
Multi-Language Support
The application automatically detects the language of the user input and generates content accordingly. This ensures that both tags and image descriptions are provided in the user’s selected language, offering an inclusive and adaptable user experience across different linguistic markets.
WYSIWYG Editor Integration
We will be using EditorJS for this integration. As explained in its documentation, EditorJS handles data internally by converting HTML into clean JSON data. It achieves this by breaking down the content into structured blocks, each with specific attributes and data, without the extra HTML markup.
For example, in HTML, each <p>
or <h3>
tag is translated into a JSON object with a type (like “paragraph” or “header”) and relevant data (e.g., “text” content). This approach results in an easily reusable JSON format that can be rendered across various platforms, processed in the backend, or utilized in applications such as social media templates or chatbots.
The primary benefit is that it provides only the essential data without HTML, making it versatile, lightweight, and adaptable for different uses—ideal for our current project.
// output.json
{
"time" : 1550476186479,
"blocks" : [
{
"type" : "paragraph",
"data" : {
"text" : "The example of text that was written in <b>one of popular</b> text editors."
}
},
{
"type" : "header",
"data" : {
"text" : "With the header of course",
"level" : 2
}
},
{
"type" : "paragraph",
"data" : {
"text" : "So what do we have?"
}
}
],
"version" : "2.8.1"
}
streamObject: The Server-Side Missing Piece
Let’s consider the following function:
// StreamObject.tsx
import { openai as vercelAi } from '@ai-sdk/openai'
import { streamObject } from 'ai'
const result = await streamObject({
model: vercelAi('gpt-4-turbo'),
schema: EditorBlocksSchema,
system: SYSTEM_CONTEXT(detectedLanguage),
prompt: prompt,
maxTokens: MAX_TOKENS,
})
This function handles the server-side processing by taking five attributes:
- LLM Model to Use: Specifies which language model will generate the content.
- System Reference: Provides contextual setup for the prompt.
- Prompt: The user’s input defining the product details.
- Max Tokens: When combined with prompt caching, it helps save tokens by storing frequently used inputs, reducing costs and latency.
- Schema: Defines the structure of the response using Zod for validation to ensure compatibility with EditorJS’s data format.
// stream-schema.tsx
import { z } from 'zod'
const EditorBlock = z
.object({
id: z.string(),
type: z.enum(['paragraph', 'header']),
data: z.object({
text: z.string(),
level: z.number().optional(),
}),
})
.refine(
(block) => {
if (block.type === 'header') {
return block.data.level === 1 || block.data.level === 2
}
return true
},
{
message: 'Header level must be 1 or 2',
path: ['data', 'level'],
},
)
const EditorBlocksSchema = z.object({
blocks: z.array(EditorBlock),
})
Most of the server side challenges presented by the current implementation are solved by streamObject
: Structuring the API’s response and prompting it accordingly. The front-end side is still missing, how can the user submit the prompt, stop the streaming and get a loading state?
useObject Hook: The Answer to Front-End Control
The useObject hook consumes streamed JSON data from an API and parses it into a complete object based on a predefined schema. This enables real-time loading control by handling state as JSON data streams in chunks. The hook provides several attributes:
- isLoading: Monitors loading states.
- object: Represents the current object state.
- stop: Cancels ongoing requests mid-stream.
This offers a high degree of control over data processing and enhances the user experience. We will use the isLoading
state from useObject
to conditionally render a stop button, which will call the stop
callback to halt the content stream before the LLM completes its task.
// use-object.ts
import { experimental_useObject as useObject } from 'ai/react'
const { object, submit, isLoading, stop } = useObject({
api: '/api/generate-content',
schema: EditorBlocksSchema,
})
MDX Markup: Streaming Content into the UI
The object
variable returned by useObject
is passed to another component, which renders the text stream into the UI using MDX markup on an intermediate screen between the initial form and the editor. JSON data is converted to MDX markup through a block renderer function, as shown below:
// block-renderer.tsx
import { OutputBlockData } from "@editorjs/editorjs";
import React from "react";
const BlockRenderer = ({ blocks }: { blocks: OutputBlockData[] }) => {
return (
<div className="prose">
{blocks.map((block) => {
switch (block.type) {
case "header":
return (
<Header
key={block.id}
level={block.data.level!}
text={block.data.text}
/>
);
case "paragraph":
return <Paragraph key={block.id} text={block.data.text} />;
default:
return null;
}
})}
</div>
);
};
export default BlockRenderer;
const Header = ({ level, text }: { level: number; text: string }) => {
const Tag = `h${level}` as keyof JSX.IntrinsicElements;
return React.createElement(Tag, { className: "" }, text);
};
const Paragraph = ({ text }: { text: string }) => {
return <p className="ce-paragraph cdx-block">{text}</p>;
};
This component takes the streamed JSON data and converts it into MDX markup using ReactMarkdown
, enabling the effortless rendering of structured content within the UI.
Image Caption Generation
Beyond text, the application extends its capabilities to generate captions for images using the OpenAI Vision API, providing detailed and engaging descriptions for products within the web app. The process involves two main steps:
- Image Upload: The image is uploaded to a blob storage service. In our case, we will use Vercel’s Blob.
- Caption Generation: The uploaded image is then sent to OpenAI’s Vision API, which analyzes the content and generates a caption tailored to highlight the product’s key features.
// generate-image-caption.ts
const result = await streamObject({
model: vercelAi("gpt-4-turbo"),
schema: EditorBlocksSchema,
system: SYSTEM_CONTEXT(detectedLanguage),
prompt: prompt,
maxTokens: MAX_TOKENS,
});
return result.toTextStreamResponse();
} catch (error) {
console.error("Error in POST /api/generate-content:", error);
return NextResponse.json(
{ error: "Internal Server Error" },
{ status: 500 }
);
}
Conclusion
By harnessing powerful AI models such as OpenAI’s GPT-4, o1-mini, and o1-preview, developers can create dynamic, interactive, and highly personalized user experiences. This efficient integration not only stremlines content creation but also enhances user engagement. What’s truly novel is its ability to stream structured data in real-time, delivering organized content that’s ready for immediate integration.
However, it introduces a new challenge for front-end developers: determining what to display to users while content is being generated. As Guillermo Rauch, CEO of Vercel, explains in an interview on Spotify I will display below:
When it comes to LLMs, content generation can take up to 20, 30, or even 60 seconds, which presents unique challenges for front-end developers. Streaming provides a valuable solution, keeping users informed and in control throughout the waiting period and enhancing the overall experience. What’s truly novel is its ability to stream structured data in real-time, delivering organized content that’s ready for immediate integration.
As AI continues to evolve, the possibilities for enhancing web applications with intelligent content generation are boundless, paving the way for more innovative and user-centric digital experiences.
FAQ about Integrating Large Language Models into Frontends
Share article