---
title: "Text Classification"
description: "Use `generateText` with `Output.array()` and Zod schemas for reliable text classification. Build a tool to automatically categorize user feedback or content."
canonical_url: "https://vercel.com/academy/ai-sdk/text-classification"
md_url: "https://vercel.com/academy/ai-sdk/text-classification.md"
docset_id: "vercel-academy"
doc_version: "1.0"
last_updated: "2026-04-11T14:17:37.149Z"
content_type: "lesson"
course: "ai-sdk"
course_title: "Builders Guide to the AI SDK"
prerequisites:  []
---

<agent-instructions>
Vercel Academy — structured learning, not reference docs.
Lessons are sequenced.
Adapt commands to the human's actual environment (OS, package manager, shell, editor) — detect from project context or ask, don't assume.
The lesson shows one path; if the human's project diverges, adapt concepts to their setup.
Preserve the learning goal over literal steps.
Quizzes are pedagogical — engage, don't spoil.
Quiz answers are included for your reference.
</agent-instructions>

# Text Classification

# Classification - Structuring Unstructured Data

Now that you understand why Invisible AI matters you'll put it in practice with **Classification** - turning messy unstructured text into clean, categorized data. You will use the AI SDK's `generateText` with `Output.array()` and Zod schemas to classify unstructured text into predefined categories.

\*\*Note: Project Context\*\*

Continuing with the same codebase from
[Lesson 1.4](./ai-sdk-dev-setup). For this
section, you'll find the classification example files in the
`app/(2-classification)/` directory.

## The Problem: Data Chaos

Imagine getting flooded with user feedback, support tickets, or GitHub issues. It's a goldmine of information, but it's a messy constant firehose! Manually reading and categorizing everything is slow, tedious, and doesn't scale.

- **Support Tickets:** Is it a billing question? A bug report? A feature request?
- **User Feedback:** Positive? Negative? A specific feature suggestion?
- **GitHub Issues:** Bug? Feature? Docs issue? Needs triage?

This is where using LLMs to classify shines. We can teach an LLM our desired categories and have it automatically sort incoming text.

## generateText with Output

The AI SDK provides two approaches for working with LLM outputs:

**generateText (text only):** Text Input → generateText → Unstructured Text

**generateText + Output.object/array:** Text Input + Zod Schema → generateText → Validation → Typed JSON

Take a look at the `support_requests.json` file from our project (`app/(2-classification)/support_requests.json`). It contains typical user messages like:

```typescript title="app/(2-classification)/support_requests.json"
[
  {
    id: 1,
    text: "I'm having trouble logging into my account. Can you please assist?",
  },
  {
    id: 2,
    text: "The export feature isn't working correctly. Is there a known issue?",
  },
  // ... more requests
];
```

Our goal is to automatically assign a category (like *account issues or product issues*) to each request.

## The Solution: `generateText` with `Output.array()`

The AI SDK's `generateText` function can produce structured output when you provide an `output` specification. For classifying multiple items, we use `Output.array()` which tells the model to return an array of typed objects.

To make this work reliably, we need to tell the LLM exactly what we want to generate. That's where Zod comes in.

[Zod](https://zod.dev/) is a TypeScript schema definitions and validation library that gives you a powerful and relatively simple way to provide the LLM with the shape of data that is expected in its output. It's like TypeScript types, but with runtime validation - perfect for ensuring AI outputs match your expectations.

Here's a short example of a Zod schema:

```typescript
const exampleSchema = z.object({
  firstName: z.string(),
  lastName: z.string(),
});
```

This schema describes the shape of a name object and the type of data expected for its properties. Zod provides many different data types and structures as well as the ability to define custom types.

With Zod you can define a schema that includes the request text and the available categories.

## Step 1: Define the Schema with `z.enum`

Open `app/(2-classification)/classification.ts`. The file already has imports set up and TODOs to guide you.

The first step is to define our categories and the structure we want for each classified item using Zod. The `z.enum()` function is ideal for defining a fixed set of possible categories.

Replace the first TODO with this schema definition:

```typescript title="app/(2-classification)/classification.ts"

// Define the schema for a single classified request
const classificationSchema = z.object({
  request: z.string().describe('The original support request text.'),
  category: z
    .enum([
      'billing',
      'product_issues',
      'enterprise_sales',
      'account_issues',
      'product_feedback',
    ])
    .describe('The most relevant category for the support request.'),
});

```

Now replace the remaining TODOs with the generateText implementation:

```typescript
import { generateText, Output } from 'ai';

// Use generateText with Output.array() to get structured output
const { output: classifiedRequests } = await generateText({
  model: 'openai/gpt-5-mini', // Fast model ideal for classification tasks (low cost, immediate response)
                            // For nuanced edge cases, consider 'openai/gpt-5' (reasoning model)
  // Prompt combines instruction + stringified data
  prompt: `Classify the following support requests based on the defined categories.\n\n${JSON.stringify(supportRequests)}`,
  // Output.array() tells the SDK we expect an array of objects matching our schema
  output: Output.array({
    element: classificationSchema,
  }),
});

console.log('\n--- AI Response (Structured JSON) ---');
// Output the validated, structured array
console.log(JSON.stringify(classifiedRequests, null, 2));
console.log('-----------------------------------');
```

Code Breakdown:

- **Imports**: Import `generateText` and `Output` from `ai`, plus `Zod` and our JSON data.
- **Schema**: Defines the structure for one classified item using `z.object` and `z.enum`.
- **Prompt**: Clear instructions plus the raw data for context.
- **generateText Call**: Uses `model`, `prompt`, and `output: Output.array({ element: schema })`.
- **Output**: Accesses the validated array via `result.output`.

## Step 2: Run the Script

Time to see it in action! In your terminal (project root):

```bash
pnpm classification
```

You should get a clean JSON array like this:

```typescript
// Terminal Output (Example)
[
  {
    request:
      "I'm having trouble logging into my account. Can you please assist?",
    category: 'account_issues',
  },
  {
    request:
      "The export feature isn't working correctly. Is there a known issue?",
    category: 'product_issues',
  },
  {
    request: 'I need help integrating your API with our existing system.',
    category: 'product_issues', // Note: Model might choose this or another category
  },
  // ... other requests classified ...
];
```

Success! Structured, usable data instead of messy text. Now you can route billing questions to finance, bugs to engineering, and feature requests to product automatically.

That's classification power.

## Step 3: Iteration 1 — Adding Urgency

Let's make this even more useful. What if we wanted the AI to estimate the urgency of each request? Easy! Just add it to the schema:

```typescript title="app/(2-classification)/classification.ts"
// Update the schema definition
const classificationSchema = z.object({
	request: z.string().describe('The original support request text.'),
	category: z
		.enum(['billing', 'product_issues', 'enterprise_sales', 'account_issues', 'product_feedback'])
		.describe('The most relevant category for the support request.'),
    urgency: z
		.enum(['low', 'medium', 'high'])
    .describe('The probable urgency of the support request.'),
})

// ... rest of the main function remains the same ...
```

Run `pnpm classification` again. You'll now see the urgency field added to each object, with the AI making its best guess (e.g., "high" for API integration help, "medium" for the export feature issue).

## Step 4: Iteration 2 — Handling Multi-Language & Refining with `.describe()`

Now, let's throw a curveball: `support_requests_multilanguage.json`. This file has requests in Spanish, German, Chinese, etc.

Can your setup handle it?

### Challenge:

Modify `classification.ts`:

```typescript title="app/(2-classification)/classification.ts"
import supportRequests from './support_requests_multilanguage.json';
import { z } from 'zod';
import { generateText, Output } from 'ai';

// Define the schema for a single classified request
const classificationSchema = z.object({
  request: z.string().describe('The original support request text.'),
  category: z
    .enum([
      'billing',
      'product_issues',
      'enterprise_sales',
      'account_issues',
      'product_feedback',
    ])
    .describe('The most relevant category for the support request.'),
  urgency: z
    .enum(['low', 'medium', 'high'])
    .describe('The probable urgency of the support request.'),
  language: z.string(),
});

// ... rest of the main function remains the same ...
```

- Change the import: `import supportRequests from './support_requests_multilanguage.json';`
- Add `language: z.string()` to the classificationSchema.
- Run `pnpm classification`.

You'll see the AI detects the languages, but maybe gives you codes ("ES"). We want full names ("Spanish"). This requires more instructions to make it precise. You might think to update the prompt itself, but we are going to update the schema to better indicate what data is expected.

Solution: Use `.describe()` to prompt the model for exactly what you want for any specific key!

Update the language field in your schema to include a description of what is expected in the field:

```typescript
// Inside classificationSchema
  language: z.string().describe("The full name of the language the support request is in (e.g., English, Spanish, German)."),
```

Run the script one more time. You should now see full language names. Clean, full language names, thanks to our more specific schema instructions.

\*\*Note: What if the AI gets it wrong?\*\*

The AI SDK uses your Zod schema to *validate* the LLM's output. If the model
returns a category not in your `z.enum` list (e.g., "sales\_inquiry" instead
of "enterprise\_sales") or fails other schema rules, `generateText` with `Output`
will throw a validation error. This prevents unexpected data structures from
breaking your application. You might need to refine your prompt, schema
descriptions, or use a more capable model if validation fails often.

Iteration is the name of the game. Add fields to your schema incrementally. Use `.describe()` to fine-tune the output for specific fields when the default isn't perfect. This schema-driven approach keeps your AI interactions predictable and robust.

As you build more sophisticated classification systems, you'll encounter edge cases and ambiguous inputs. The next callout provides a structured approach to refining your schemas when basic descriptions aren't enough.

\*\*Note: 💡 Refining Classification Accuracy\*\*

Getting inconsistent or ambiguous categories? Try asking an AI assistant to help refine your approach:

```markdown title="Prompt: Improving Multi-Language Classification"
<context>
I'm building a support ticket classification system using Vercel AI SDK's generateText with Output.array() and Zod schemas.
My system classifies requests into categories: billing, product_issues, enterprise_sales, account_issues, product_feedback.
I've added multi-language support and urgency detection.
</context>

<current-implementation>
const classificationSchema = z.object({
  request: z.string().describe('The original support request text.'),
  category: z.enum(['billing', 'product_issues', 'enterprise_sales', 'account_issues', 'product_feedback'])
    .describe('The most relevant category for the support request.'),
  urgency: z.enum(['low', 'medium', 'high'])
    .describe('The probable urgency of the support request.'),
  language: z.string().describe("The full name of the language (e.g., English, Spanish, German)."),
});
</current-implementation>

<problem>
I'm seeing inconsistent results where:
1. Some ambiguous requests get categorized differently on repeated runs
2. Urgency detection seems overly cautious (marking everything as "medium")
3. Edge cases like billing issues that affect product access are unclear

How can I improve my schema and prompt to get more consistent, accurate classifications?
</problem>

<specific-questions>
1. Should I add confidence scoring to help identify ambiguous cases?
2. How can I use `.describe()` more effectively to guide urgency detection?
3. For edge cases spanning multiple categories, should I switch to multi-label classification?

Provide specific schema improvements and `.describe()` examples that address each issue.
</specific-questions>
```

This will help you understand strategies for improving classification accuracy and handling edge cases!

## Step 4: Enhancing the UI

\*\*Side Quest: Multi-Label Classification Challenge\*\*

\*\*Note: 💡 Need Help with Multi-Label Schemas?\*\*

Stuck on transforming your single-category enum to multi-label arrays? Try this:

```markdown title="Prompt: Enum to Array Transformation for Multi-Label"
<context>
I'm working on the Multi-Label Classification Challenge in the Vercel AI SDK course.
Currently, my schema uses a single category enum:

category: z.enum(['billing', 'product_issues', 'enterprise_sales', 'account_issues', 'product_feedback'])
  .describe('The most relevant category for the support request.')
</context>

<goal>
I need to transform this to allow multiple categories per request, since real support tickets often span multiple areas.
For example: "I can't access my premium features after my payment went through" is both billing AND product_issues.
</goal>

<question>
1. How do I change `z.enum()` to allow an array of multiple categories?
2. Should I use `z.array(z.enum(...))` - and if so, what does that actually mean?
3. Do I need to update the `.describe()` to guide the AI on when to assign multiple categories?
4. How can I prevent the AI from assigning too many categories (everything becomes "product_issues")?

Show me the transformed schema code and explain how to balance multiple categories without over-assignment.
</question>
```

\*\*Side Quest: Real-time Moderation System\*\*

```typescript title="app/api/moderation/route.ts"
import { generateText, Output } from 'ai';
import { z } from 'zod';

export async function handleMessage(message: string) {
  // TODO: classify, score, and route message
  const { output: classification } = await generateText({
    model: 'openai/gpt-5-mini',
    output: Output.object({
      schema: z.object({
        severity: z.enum(['safe', 'warning', 'critical']),
        categories: z.array(z.enum(['spam', 'violence', 'pii', 'other'])),
        confidence: z.number().min(0).max(1)
      }),
    }),
    prompt: `Classify this message: "${message}"`
  });

  // Route based on severity
  if (classification.severity === 'critical') {
    await sendAlert(message, classification);
  }

  return classification;
}
```

## Further Reading (Optional)

Enhance your schema-validation skills with these resources:

- [Zod Documentation](https://zod.dev/)\
  Complete reference covering parsing, transforms, custom refinements, and error handling.
- [Advanced Schema Validation Patterns](https://github.com/colinhacks/zod#schema-methods)\
  Cookbook-style examples for preprocessors, unions, discriminated unions, and more.

## Key Takeaways

- Classification uses AI to assign predefined categories to text.
- `generateText` with `Output.array()` and a Zod schema is the core AI SDK tool for this.
- Use `z.enum([...categories])` to define your classification labels.
- Use `Output.array({ element: schema })` to classify multiple items at once.
- Use `.describe()` on schema fields to guide the model's output format (like getting full language names).

This technique can be used to automate workflows like ticket routing, content moderation, and feedback analysis.

## Up Next: Summarization Essentials

You've successfully used `generateText` with structured output to classify text. Now, let's apply similar techniques to another powerful Invisible AI feature: summarization. In the next lesson you'll build a tool that creates concise summaries from longer text inputs, helping users quickly grasp key information. We'll also touch on displaying this neatly in your user interface (UI).


---

[Full course index](/academy/llms.txt) · [Sitemap](/academy/sitemap.md)