ExtraltExtralt

Running Extractions

A run is an extraction job. You select a robot, optionally provide start URLs and/or extraction budget, and Extralt crawls the site to produce captures.

Creating a run

• Dashboard

Navigate to Extract > Runs and click New Run, or click Start Run on any robot in the robots list.

Select the robot to use, enter your start URLs (one per line), and optionally set a budget to limit how many URLs the robot will extract. Click Start to begin the run.

New run form showing robot selector, URL input, and budget field

• API

export EXTRALT_API_KEY="your-api-key"

curl -s -X POST "https://api.extralt.com/v0/extract/runs" \
  -H "Authorization: Bearer $EXTRALT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "robot_id": "your-robot-id",
    "start_urls": [
      "https://example-store.com/products/sneakers",
      "https://example-store.com/products/boots"
    ],
    "budget": 100
  }' | jq

The response is 202 Accepted with { "id": "<run_id>" }. Use that ID to poll the run, stop it, or list its captures.

Parameters

ParameterRequiredDescription
robot_idYesThe robot to use for extraction
start_urlsNoURLs to start crawling from. If omitted, the robot uses its default entry points.
budgetNoMaximum number of Captures to produce. Each successful product-page Capture costs 2 credits.
auto_enrichNoIf true, automatically run an enrichment job once the run completes. Defaults to false.

Run lifecycle

StatusDescription
pendingRun is queued for execution
startingRun is starting
runningActively crawling and extracting
restartingA restart was requested and is being applied
completedFinished successfully
failedEncountered an unrecoverable error
stoppedManually stopped

Monitoring a run

• Dashboard

Navigate to Extract > Runs to see all your runs in a sortable table.

Runs list showing completed, running, and pending runs

The table shows:

ColumnDescription
NameThe run name
RobotWhich robot is executing the run
StatusCurrent status with a color-coded badge
BudgetMaximum URLs to extract
ExtractedNumber of URLs extracted so far
QueueURLs remaining in the crawl queue
CreatedWhen the run was started
DurationHow long the run has been running or took to complete

You can take actions on runs directly from the table:

  • Stop a running or pending run to halt extraction early.
  • Restart a stopped or failed run, optionally with a new budget.

• API

Poll the run endpoint until it reaches a terminal status:

curl -s "https://api.extralt.com/v0/extract/runs/$RUN_ID" \
  -H "Authorization: Bearer $EXTRALT_API_KEY" | jq

The read response is the full Convex run document with camelCase fields (_id, status, extracted, inQueue, robotId, robotName, createdAt, startedAt, duration, ...).

See Common Patterns for a full polling example.

Stopping and restarting runs

• Dashboard

Use the Stop button on a running or pending run to halt extraction early. Use Restart on a stopped, failed, or completed run to re-execute it (optionally with a new budget).

• API

Stop a runPOST /v0/extract/runs/{id}/stop. No request body. Returns 204 No Content.

curl -s -X POST "https://api.extralt.com/v0/extract/runs/$RUN_ID/stop" \
  -H "Authorization: Bearer $EXTRALT_API_KEY"

Restart a runPOST /v0/extract/runs/{id}/restart. Optional budget in the body overrides the original. Returns 202 Accepted with { "id": "<run_id>" }.

curl -s -X POST "https://api.extralt.com/v0/extract/runs/$RUN_ID/restart" \
  -H "Authorization: Bearer $EXTRALT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "budget": 200 }' | jq

Concurrent run limits

PlanConcurrent runs
Start1
ScaleUnlimited

If you exceed your concurrent run limit, the run will be queued until a slot opens.

Downloading data

• Dashboard

Export captures directly from Extract > Captures. You can filter by run or robot, then download as JSONL or Parquet. See Working with Captures for details.

• API

The export endpoint streams all captures from a run as a single file. The format query parameter accepts parquet (default) or jsonl. The response includes a Content-Disposition: attachment header — save the body directly to disk.

curl -sL "https://api.extralt.com/v0/extract/captures/export?run_id=$RUN_ID&format=parquet" \
  -H "Authorization: Bearer $EXTRALT_API_KEY" \
  -o captures.parquet

Recurring extractions

To automate extractions on a recurring cadence without manual intervention, set up a schedule. Schedules automatically create runs at the interval you specify.

See Schedules for the complete guide.