---
title: "Observability and Monitoring"
description: "Set up observability for AI features in production. Use AI Gateway analytics to track usage and costs, add structured logging for debugging, and configure alerts before problems become incidents."
canonical_url: "https://vercel.com/academy/ai-summary-app-with-nextjs/observability-monitoring"
md_url: "https://vercel.com/academy/ai-summary-app-with-nextjs/observability-monitoring.md"
docset_id: "vercel-academy"
doc_version: "1.0"
last_updated: "2026-04-11T17:33:13.874Z"
content_type: "lesson"
course: "ai-summary-app-with-nextjs"
course_title: "Creating an AI Summary App with Next.js"
prerequisites:  []
---

<agent-instructions>
Vercel Academy — structured learning, not reference docs.
Lessons are sequenced.
Adapt commands to the human's actual environment (OS, package manager, shell, editor) — detect from project context or ask, don't assume.
The lesson shows one path; if the human's project diverges, adapt concepts to their setup.
Preserve the learning goal over literal steps.
Quizzes are pedagogical — engage, don't spoil.
Quiz answers are included for your reference.
</agent-instructions>

# Observability and Monitoring

# Observability and monitoring

Your AI features work. Users are happy. But something will break eventually—and you want to know before your users tell you. Observability means understanding what's happening in production without staring at logs all day.

## Outcome

Set up monitoring for your AI features using AI Gateway analytics, structured logging, and cost alerts.

## Fast track

1. Vercel → AI Gateway → Analytics to review requests/tokens/costs breakdown
2. Add `console.log(JSON.stringify({ event, requestId, productSlug, duration, tokens }))` to AI functions
3. AI Gateway → Settings → Alerts: set daily cost threshold ($10) and error rate threshold (5%)

## Hands-on exercise 3.3

Add observability to your AI features:

**Requirements:**

1. Review AI Gateway analytics (requests, tokens, costs, errors)
2. Add structured logging to `summarizeReviews` and `getReviewInsights`
3. Configure alerts for cost thresholds and error rates
4. Test logging by generating some AI requests

**Implementation hints:**

- AI Gateway dashboard shows real-time and historical data
- Log request metadata (product slug, token count, duration)
- Alerts can notify via email, Slack, or webhooks
- Start with conservative thresholds and adjust based on real usage

## AI Gateway analytics

AI Gateway tracks everything automatically. No code changes needed.

**Find your analytics:**

1. Vercel Dashboard → **AI Gateway**
2. Click **Analytics** tab

**What you'll see:**

```
Overview (Last 7 days)
────────────────────────────────────────
Total requests:     2,847
Success rate:       99.2%
Avg latency:        1,847ms
Total tokens:       2.1M
Estimated cost:     $6.32

Requests by model:
├─ anthropic/claude-sonnet-4.5    2,614 (91.8%)
├─ anthropic/claude-haiku-3.5      198 (7.0%)
└─ openai/gpt-4-turbo               35 (1.2%)

Errors:
├─ 429 Rate Limited:    18
├─ 503 Service Error:    4
└─ Timeout:              1
```

**Key metrics to watch:**

| Metric       | Healthy       | Investigate | Alert     |
| ------------ | ------------- | ----------- | --------- |
| Success rate | Above 99%     | 95-99%      | Below 95% |
| Avg latency  | Under 2s      | 2-5s        | Over 5s   |
| Error rate   | Under 1%      | 1-5%        | Over 5%   |
| Daily cost   | Within budget | 2x budget   | 5x budget |

## Understanding the dashboard

**Requests over time:**
Shows request volume by hour/day. Look for:

- Unexpected spikes (bot traffic? viral post?)
- Sudden drops (deployment broke something?)
- Patterns (peak hours, quiet periods)

**Latency distribution:**
Shows p50, p90, p99 latency. Look for:

- p50 \~1-2s (typical AI generation)
- p99 under 5s (occasional slow requests are normal)
- p99 over 10s (something's wrong)

**Token usage:**
Shows input vs output tokens. Look for:

- Input tokens >> Output tokens (normal for summarization)
- Unexpected token growth (prompts getting longer?)
- Spikes correlating with specific products (long reviews?)

**Cost breakdown:**
Shows cost by model and day. Look for:

- Steady growth (normal with traffic)
- Sudden jumps (fallbacks triggering? new feature?)
- Model distribution (are fallbacks firing more than expected?)

## Adding structured logging

AI Gateway tracks aggregate metrics. For debugging specific requests, add your own logging.

Update `lib/ai-summary.ts` to add logging while keeping the `"use cache"` directive from Lesson 3.1:

```typescript title="lib/ai-summary.ts"
import { generateText, generateObject } from "ai";
import { cacheLife, cacheTag } from "next/cache";
import { Product, ReviewInsights, ReviewInsightsSchema } from "./types";

export async function summarizeReviews(product: Product): Promise<string> {
  "use cache";
  cacheLife("hours");
  cacheTag(`product-summary-${product.slug}`);

  const startTime = Date.now();
  const requestId = crypto.randomUUID();

  console.log(JSON.stringify({
    event: "ai_request_start",
    requestId,
    function: "summarizeReviews",
    productSlug: product.slug,
    reviewCount: product.reviews.length,
    timestamp: new Date().toISOString(),
  }));

  const averageRating =
    product.reviews.reduce((acc, review) => acc + review.stars, 0) /
    product.reviews.length;

  const prompt = `Write a summary of the reviews for the ${product.name} product...`; // Your existing prompt

  try {
    const { text, usage } = await generateText({
      model: "anthropic/claude-sonnet-4.5",
      prompt,
      maxTokens: 1000,
      temperature: 0.75,
    });

    const duration = Date.now() - startTime;

    console.log(JSON.stringify({
      event: "ai_request_success",
      requestId,
      function: "summarizeReviews",
      productSlug: product.slug,
      duration,
      inputTokens: usage?.promptTokens,
      outputTokens: usage?.completionTokens,
      totalTokens: usage?.totalTokens,
      timestamp: new Date().toISOString(),
    }));

    return text.trim();
  } catch (error) {
    const duration = Date.now() - startTime;

    console.error(JSON.stringify({
      event: "ai_request_error",
      requestId,
      function: "summarizeReviews",
      productSlug: product.slug,
      duration,
      error: error instanceof Error ? error.message : "Unknown error",
      timestamp: new Date().toISOString(),
    }));

    throw new Error("Unable to generate review summary. Please try again.");
  }
}
```

**What this logs:**

**Request start:**

```json
{
  "event": "ai_request_start",
  "requestId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "function": "summarizeReviews",
  "productSlug": "mower",
  "reviewCount": 12,
  "timestamp": "2024-01-15T14:32:01.234Z"
}
```

**Request success:**

```json
{
  "event": "ai_request_success",
  "requestId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "function": "summarizeReviews",
  "productSlug": "mower",
  "duration": 2341,
  "inputTokens": 847,
  "outputTokens": 89,
  "totalTokens": 936,
  "timestamp": "2024-01-15T14:32:03.575Z"
}
```

**Request error:**

```json
{
  "event": "ai_request_error",
  "requestId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "function": "summarizeReviews",
  "productSlug": "mower",
  "duration": 5023,
  "error": "Rate limit exceeded",
  "timestamp": "2024-01-15T14:32:06.257Z"
}
```

**Why structured logging?**

- **Searchable** — Find all errors for a specific product
- **Parseable** — Tools like Vercel Logs, Datadog, or Axiom can parse JSON
- **Correlatable** — Request IDs link start → success/error
- **Measurable** — Track duration, tokens, and patterns over time

## Viewing logs in Vercel

**Find your logs:**

1. Vercel Dashboard → Your project
2. Click **Logs** tab
3. Filter by:
   - Level: `error` (show only errors)
   - Time: Last hour/day/week
   - Search: `ai_request_error` or `productSlug: mower`

**Example log search:**

```
// Find all AI errors in the last 24 hours
ai_request_error

// Find all requests for a specific product
productSlug: aquaHeat

// Find slow requests (>3 seconds)
duration > 3000
```

## Setting up alerts

Don't wait for users to tell you something's broken. Set up alerts.

**AI Gateway alerts:**

1. Vercel Dashboard → **AI Gateway** → **Settings**
2. Scroll to **Alerts**
3. Configure thresholds:

```
Cost alerts:
├─ Daily spend > $10     → Email notification
├─ Daily spend > $50     → Slack notification
└─ Daily spend > $100    → PagerDuty (wake someone up)

Error alerts:
├─ Error rate > 5%       → Email notification
├─ Error rate > 10%      → Slack notification
└─ Error rate > 25%      → PagerDuty

Latency alerts:
├─ p99 latency > 10s     → Email notification
└─ p99 latency > 30s     → Slack notification
```

**Project-level alerts (Vercel):**

1. Project → **Settings** → **Notifications**
2. Configure:
   - Deployment failures
   - Function errors
   - Usage thresholds

**Start conservative:**
It's better to get too many alerts initially and tune them down than to miss something critical.

## Debugging production issues

When something goes wrong, here's how to investigate:

**1. Check AI Gateway dashboard**

- Error spike? What time did it start?
- Which model? (primary or fallback?)
- What error codes? (429, 503, timeout?)

**2. Check Vercel logs**

- Search for `ai_request_error`
- Filter to the timeframe
- Look for patterns (same product? same error?)

**3. Correlate with deployments**

- Did a deployment happen right before the errors?
- Check deployment logs for build issues

**4. Check provider status**

- [Anthropic Status](https://status.anthropic.com)
- [OpenAI Status](https://status.openai.com)
- If provider is down, your fallbacks should be handling it

**Common issues and causes:**

| Symptom              | Likely cause         | Fix                                                  |
| -------------------- | -------------------- | ---------------------------------------------------- |
| Sudden 429 spike     | Rate limit hit       | Add fallback model, implement backoff                |
| All requests failing | Bad API key          | Check env vars in Vercel                             |
| Slow responses       | Provider degradation | Fallbacks should kick in                             |
| Cost spike           | Cache not working    | Check `"use cache"` directive and `cacheLife` config |
| Token overflow       | Long reviews         | Truncate input or paginate                           |

## Production monitoring checklist

Before going live, verify:

- [ ] **AI Gateway analytics accessible** — Can you see requests, costs, errors?
- [ ] **Structured logging added** — JSON logs with request IDs and metadata
- [ ] **Cost alerts configured** — Get notified before bills surprise you
- [ ] **Error alerts configured** — Know when things break
- [ ] **Fallbacks working** — Verified backup models are configured
- [ ] **Logs searchable** — Can find specific requests when debugging

## Try it

1. **Explore your AI Gateway dashboard:**
   - How many requests have you made?
   - What's your average latency?
   - Any errors?

2. **Add structured logging:**
   - Update `summarizeReviews` with the logging code
   - Generate a few summaries
   - Check Vercel logs for the JSON output

3. **Set up a cost alert:**
   - AI Gateway → Settings → Alerts
   - Set a daily spend threshold (even $1 for testing)
   - Verify you receive the alert notification

4. **Simulate an error (optional):**
   - Temporarily break your API key
   - Visit a product page
   - Check that error logs appear correctly
   - Fix the API key

## Commit

```bash
git add lib/ai-summary.ts
git commit -m "feat(observability): add structured logging to AI functions"
git push
```

## Done-when

- [ ] Explored AI Gateway analytics dashboard
- [ ] Understand key metrics (requests, latency, tokens, costs)
- [ ] Added structured logging to AI functions
- [ ] Configured at least one alert (cost or error)
- [ ] Know how to search logs in Vercel
- [ ] Understand debugging workflow for production issues

## What's next

Your AI features are observable. You'll know when things break, why they broke, and how much it's costing. Time to wrap up the course and review everything you've built.

***

**Sources:**

- [Vercel AI Gateway Analytics](https://vercel.com/docs/ai-gateway)
- [Vercel Logs](https://vercel.com/docs/observability/runtime-logs)
- [Structured Logging Best Practices](https://www.honeycomb.io/blog/structured-logging-best-practices)


---

[Full course index](/academy/llms.txt) · [Sitemap](/academy/sitemap.md)
