Skip to main content
POST
/
tavily
/
crawl
Graph-based website traversal tool using Tavily Crawl.
curl --request POST \
  --url https://api.aisa.one/apis/v1/tavily/crawl \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "docs.tavily.com",
  "instructions": "<string>",
  "chunks_per_source": 3,
  "max_depth": 1,
  "max_breadth": 20,
  "limit": 50,
  "select_paths": [
    "<string>"
  ],
  "select_domains": [
    "<string>"
  ],
  "exclude_paths": [
    "<string>"
  ],
  "exclude_domains": [
    "<string>"
  ],
  "allow_external": true,
  "include_images": false,
  "extract_depth": "basic",
  "format": "markdown",
  "include_favicon": false,
  "timeout": 150,
  "include_usage": false
}
'
{
  "base_url": "<string>",
  "results": [
    {
      "url": "<string>",
      "raw_content": "<string>",
      "favicon": "<string>"
    }
  ],
  "response_time": 123,
  "usage": {
    "credits": 123
  },
  "request_id": "<string>"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
url
string
required

The root URL to begin the crawl.

Example:

"docs.tavily.com"

instructions
string

Natural language instructions for the crawler.

chunks_per_source
integer
default:3

Maximum number of relevant chunks returned per source.

Required range: 1 <= x <= 5
max_depth
integer
default:1

Max depth of the crawl.

Required range: 1 <= x <= 5
max_breadth
integer
default:20

Max number of links to follow per level of the tree.

Required range: 1 <= x <= 500
limit
integer
default:50

Total number of links the crawler will process before stopping.

Required range: x >= 1
select_paths
string[]

Regex patterns to select only URLs with specific path patterns.

select_domains
string[]

Regex patterns to select crawling to specific domains or subdomains.

exclude_paths
string[]

Regex patterns to exclude URLs with specific path patterns.

exclude_domains
string[]

Regex patterns to exclude specific domains or subdomains from crawling.

allow_external
boolean
default:true

Include external domain links in the final results list.

include_images
boolean
default:false

Include images in the crawl results.

extract_depth
enum<string>
default:basic

Depth of the extraction process.

Available options:
basic,
advanced
format
enum<string>
default:markdown

Format of the extracted web page content.

Available options:
markdown,
text
include_favicon
boolean
default:false

Include the favicon URL for each result.

timeout
number<float>
default:150

Maximum time in seconds to wait for the crawl operation.

Required range: 10 <= x <= 150
include_usage
boolean
default:false

Include credit usage information in the response.

Response

200 - application/json

Crawl results returned successfully.

base_url
string

The base URL that was crawled.

results
object[]
response_time
number<float>

Time in seconds it took to complete the request.

usage
object
request_id
string

Unique request identifier.