---
title: "Error Handling"
description: "Handle errors in workflows using FatalError for permanent failures, RetryableError with retryAfter for transient failures, and getStepMetadata for attempt-aware backoff."
canonical_url: "https://vercel.com/academy/svelte-on-vercel/workflow-error-handling"
md_url: "https://vercel.com/academy/svelte-on-vercel/workflow-error-handling.md"
docset_id: "vercel-academy"
doc_version: "1.0"
last_updated: "2026-04-11T16:52:42.652Z"
content_type: "lesson"
course: "svelte-on-vercel"
course_title: "Svelte on Vercel"
prerequisites:  []
---

<agent-instructions>
Vercel Academy — structured learning, not reference docs.
Lessons are sequenced.
Adapt commands to the human's actual environment (OS, package manager, shell, editor) — detect from project context or ask, don't assume.
The lesson shows one path; if the human's project diverges, adapt concepts to their setup.
Preserve the learning goal over literal steps.
Quizzes are pedagogical — engage, don't spoil.
Quiz answers are included for your reference.
</agent-instructions>

# Error Handling

# Workflow Error Handling

Right now, every failure in the workflow looks the same. The weather API times out? Step fails, retries three times, gives up. An alert references a resort called `narnia`? Step fails, retries three times with the same bad ID, gives up. Three wasted retries on something that was never going to work.

The Workflow DevKit gives you two error classes to fix this. `FatalError` says "stop trying, this is broken forever." `RetryableError` says "try again, but wait a bit first." And `getStepMetadata()` tells you which attempt you're on, so you can back off gracefully instead of hammering a struggling API.

## Outcome

Add error classification to the `evaluateResort` step with `FatalError` for permanent failures, `RetryableError` with exponential backoff for transient failures, and `getStepMetadata()` for attempt-aware logic.

## Fast Track

1. Throw `FatalError` for invalid resort IDs (no point retrying)
2. Catch weather API failures and throw `RetryableError` with a `retryAfter` duration
3. Use `getStepMetadata().attempt` to calculate exponential backoff

## Three Kinds of Failure

```
Resort not found → FatalError
  "narnia doesn't exist. Stop trying."

Weather API timeout → RetryableError
  "Open-Meteo is slow right now. Try again in 5 seconds."

Unknown error → Default retry
  "Something unexpected happened. Retry with default timing."
```

| Error Type       | Behavior                                      | When to Use                                |
| ---------------- | --------------------------------------------- | ------------------------------------------ |
| `FatalError`     | Immediately fails the step, skips all retries | Bad data, missing resources, auth failures |
| `RetryableError` | Retries after the specified delay             | API timeouts, rate limits, 503 errors      |
| Unhandled error  | Retries with default timing (up to 3 times)   | Unexpected failures                        |

Without these classes, every error gets the same default retry behavior. That means three identical requests to a resort that doesn't exist. With error classification, the first failure is the last.

## Hands-on exercise 3.3

Add error handling to the `evaluateResort` step:

**Requirements:**

1. Import `FatalError`, `RetryableError`, and `getStepMetadata` from `workflow`
2. Throw `FatalError` when `getResort(resortId)` returns nothing (permanent failure)
3. Wrap `fetchWeather()` in try/catch and throw `RetryableError` on failure
4. Use `getStepMetadata().attempt` to calculate exponential backoff for the `retryAfter` option
5. Log the attempt number so you can track retries in server logs

**Implementation hints:**

- `FatalError` and `RetryableError` are imported from `workflow`
- `RetryableError` accepts a second argument: `{ retryAfter: '5s' }` with a duration string, milliseconds as a number, or a `Date`
- `getStepMetadata()` returns `{ attempt, stepId }`. The `attempt` count starts at 1
- Exponential backoff formula: `Math.min(1000 * 2^(attempt-1), 30000)` caps at 30 seconds
- You can set a custom retry limit on a step function: `evaluateResort.maxRetries = 5` (6 total attempts)

## Try It

1. **Test with a valid resort (should work as before):**

   ```bash
   $ curl -X POST http://localhost:5173/api/workflow \
     -H "Content-Type: application/json" \
     -d '{"alerts": [{"id": "a1", "resortId": "mammoth", "condition": {"type": "conditions", "match": "powder"}, "originalQuery": "test", "createdAt": "2025-01-01", "triggered": false}]}'
   ```

   No errors in the response. Workflow completes normally.

2. **Test with an invalid resort ID:**

   ```bash
   $ curl -X POST http://localhost:5173/api/workflow \
     -H "Content-Type: application/json" \
     -d '{"alerts": [{"id": "a1", "resortId": "narnia", "condition": {"type": "conditions", "match": "powder"}, "originalQuery": "test", "createdAt": "2025-01-01", "triggered": false}, {"id": "a2", "resortId": "steamboat", "condition": {"type": "conditions", "match": "powder"}, "originalQuery": "test", "createdAt": "2025-01-01", "triggered": false}]}'
   ```

   Open `npx workflow web`. The `evaluateResort` step for `narnia` should show as immediately failed with no retries. The `steamboat` step should succeed normally.

3. **Check server logs:**

   ```
   [Evaluate] Fatal: Resort not found: narnia
   [Workflow] Round complete { round: 1, evaluated: 1, triggered: 0 }
   ```

   The fatal error is logged once. No retry attempts.

4. **Inspect in the dashboard:**

   ```bash
   npx workflow web
   ```

   Click into the workflow run. You should see:

   - `evaluateResort (narnia)`: failed, 0 retries, `FatalError`
   - `evaluateResort (steamboat)`: completed successfully

## Commit

```bash
git add -A
git commit -m "feat(workflow): add FatalError and RetryableError handling"
git push
```

## Done-When

- [ ] Invalid resort IDs throw `FatalError` and skip all retries
- [ ] Weather API failures throw `RetryableError` with a `retryAfter` duration
- [ ] `getStepMetadata().attempt` drives exponential backoff
- [ ] `npx workflow web` shows fatal stops and retry attempts
- [ ] Valid resorts still process successfully alongside failures

## Solution

```typescript title="workflows/evaluate-alerts.ts" {1,34-52}
import { sleep, FatalError, RetryableError, getStepMetadata } from 'workflow';
import type { Alert } from '$lib/schemas/alert';

interface EvaluateInput {
  alerts: Alert[];
  recheckCount?: number;
}

interface AlertResult {
  alertId: string;
  resortId: string;
  triggered: boolean;
}

export default async function evaluateAlerts(
  { alerts, recheckCount = 0 }: EvaluateInput
) {
  "use workflow";

  const alertsByResort = Object.groupBy(alerts, (a) => a.resortId);
  const resortIds = Object.keys(alertsByResort);

  const results = await Promise.all(
    resortIds.map((resortId) =>
      evaluateResort(resortId, alertsByResort[resortId]!)
    )
  );

  const allResults = results.flat();
  const triggered = allResults.filter((r) => r.triggered);

  console.log('[Workflow] Round complete', {
    round: recheckCount + 1,
    evaluated: allResults.length,
    triggered: triggered.length
  });

  if (triggered.length === 0 && recheckCount < 3) {
    await sleep('30m');
    return evaluateAlerts({ alerts, recheckCount: recheckCount + 1 });
  }

  return {
    results: allResults,
    rounds: recheckCount + 1,
    triggered: triggered.length
  };
}

async function evaluateResort(
  resortId: string,
  alerts: Alert[]
): Promise<AlertResult[]> {
  "use step";

  const { attempt } = getStepMetadata();
  const { getResort } = await import('$lib/data/resorts');
  const { fetchWeather } = await import('$lib/services/weather');
  const { evaluateCondition } = await import('$lib/services/alerts');

  // Permanent failure: resort doesn't exist
  const resort = getResort(resortId);
  if (!resort) {
    console.error(`[Evaluate] Fatal: Resort not found: ${resortId}`);
    throw new FatalError(`Resort not found: ${resortId}`);
  }

  // Transient failure: weather API might be down
  let weather;
  try {
    weather = await fetchWeather(resort);
  } catch (error) {
    const backoff = Math.min(1000 * Math.pow(2, attempt - 1), 30000);
    console.warn(
      `[Evaluate] Weather fetch failed for ${resort.name}, attempt ${attempt}`,
      error
    );
    throw new RetryableError(
      `Weather API failed for ${resort.name}`,
      { retryAfter: backoff }
    );
  }

  return alerts.map((alert) => ({
    alertId: alert.id,
    resortId,
    triggered: evaluateCondition(alert.condition, weather)
  }));
}
```

Three changes from lesson 3.2:

**`FatalError` for bad resort IDs.** If `getResort()` returns nothing, there's no resort to evaluate. `FatalError` stops the step immediately with zero retries. In the dashboard, you'll see it marked as a permanent failure.

**`RetryableError` for weather API failures.** The `fetchWeather()` call is wrapped in try/catch. When it fails, we throw `RetryableError` with a `retryAfter` value. The Workflow DevKit waits that long before the next attempt. Since `retryAfter` accepts milliseconds, we can pass the backoff calculation directly.

**Exponential backoff with `getStepMetadata().attempt`.** The attempt number starts at 1. The formula `1000 * 2^(attempt-1)` gives us 1s, 2s, 4s, 8s, 16s, capped at 30s. This prevents hammering a struggling API with rapid retries.

The workflow function itself doesn't change. It still uses `Promise.all` to dispatch parallel steps. `FatalError` and `RetryableError` only affect the individual step that threw them. Other steps continue independently.

## Troubleshooting

\*\*Warning: FatalError doesn't stop retries\*\*

Make sure you're importing `FatalError` from `workflow`, not defining your own class. The Workflow DevKit checks the error prototype to determine behavior. A custom class with the same name won't work.

\*\*Warning: RetryableError retries immediately instead of waiting\*\*

Check the `retryAfter` value. It accepts a duration string (`'5s'`), milliseconds as a number (`5000`), or a `Date` object. If you pass a string that isn't a valid duration format, the delay may be ignored.

## Advanced: Custom Retry Limits

By default, steps retry 3 times (4 total attempts). You can customize this per step:

```typescript
async function evaluateResort(resortId: string, alerts: Alert[]) {
  "use step";
  // ...step logic
}

// Allow more retries for flaky APIs
evaluateResort.maxRetries = 5; // 6 total attempts
```

Set `maxRetries = 0` for steps that should never retry (one attempt only). Combine this with `FatalError` for steps where any failure is permanent.

## Advanced: Idempotency Keys

`getStepMetadata()` also returns a `stepId` that's stable across retries. Use it as an idempotency key for external APIs:

```typescript
async function sendNotification(userId: string, message: string) {
  "use step";

  const { stepId } = getStepMetadata();

  await fetch('https://api.notifications.example/send', {
    method: 'POST',
    headers: { 'Idempotency-Key': stepId },
    body: JSON.stringify({ userId, message })
  });
}
```

If the step retries, the same `stepId` is sent again. The external API sees the duplicate key and skips the second send. No double notifications, even with retries.


---

[Full course index](/academy/llms.txt) · [Sitemap](/academy/sitemap.md)
