Enterprise API general documentation
Consult this page if you are currently running or considering running Graphite's Enterprise Internal Links API
Crawling and indexing
To index pages for related link selection, Graphite’s bot, user agent: GraphiteBot/1.0 (+<https://www.graphitehq.com>)
, crawls pages in the sitemap. The daily crawling run starts at 00:00 UTC and crawls 60-240 pages per minute.
Related Links selection
Links for pages are selected to maximize relatedness subject to the constraint that every page receives at least k
incoming links. The relatedness of a pair of pages is computed as the semantic similarity of the page text.
When a page does not have enough related pages, or a page has not yet been indexed, the API returns links to randomly selected pages without replacement.
Check each /related-links endpoint details to know how many related links are currently selected for their responses.
The number of links selected for each page will be dependent on an agreed upon number we've built for the related links endpoint. We recommend 8 or more internal links per page to receive the most SEO benefit.
API
The links for a specific page can be retrieved from the API.
Host
URL: https://api.graphitehq.com/il/{{CLIENT}}/
Endpoints
{{PAGE_TYPE}}/related-links/
- Description: Returns a list of related links for a page. If the application can’t find related links, it returns randomly selected links from the index.
- URL: https://api.graphitehq.com/il/{{CLIENT}}/{{PAGE_TYPE}}/related-links
- Method: GET
- Allowed Cross-Origin Resource Sharing: True
- Special Headers Required: None
- HTTP Authentication: None
- Input Parameters: Query Strings
Parameters
Query String Parameters
Links for a page can be retrieved using the page canonical URL.
- url
- Description: Unique URL of a page
- Required: Yes
- Notes:
- The index document IDs are derived from unique URLs. Using URLs leads to an ID-based search on the API index, and allows possible and immediate side crawling processes of new pages that have not been indexed yet.
- Example Request URL: https://api.graphitehq.com/il/{{CLIENT}}/{{PAGE_TYPE}}/related-links?url={{URL}}
Schema
- Status: 200
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"description": "Successful response from the related-links/ API endpoint.",
"required": [
"message",
"related_links"
],
"properties": {
"message": {
"type": "string",
"description": "Response results description."
},
"related_links": {
"type": "array",
"description": "Related links array containing related links to a single page.",
"items": {
"type": "object",
"description": "Related link object containing data from a single related link to a page.",
"required": [
"type",
"title",
"url",
"url_path"
],
"properties": {
"type": {
"type": "string",
"description": "Link type: 'related' if the link was selected using related selection logic, or 'random' if it was selected uniformly at random without replacement."
},
"title": {
"type": "string",
"description": "Page title."
},
"url": {
"type": "string",
"description": "Page URL."
},
"url_path": {
"type": "string",
"description": "Page URL path."
}
}
}
}
}
}
Other link properties in the API index available fields can be included, if desired.
- Status: 4XX, 5XX
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"description": "Error response from the related-links/ API endpoint.",
"required": [
"message"
],
"properties": {
"message": {
"type": "string",
"description": "Error message."
}
}
}
Example call
Request
cURL
curl --location --request GET 'https://api.graphitehq.com/il/{{CLIENT}}/{{PAGE_TYPE}}/related-links?url={{URL}}'
Javascript Fetch
var requestOptions = {
method: 'GET',
redirect: 'follow'
};
fetch("https://api.graphitehq.com/il/{{CLIENT}}/{{PAGE_TYPE}}/related-links?url={{URL}}", requestOptions)
.then(response => response.text())
.then(result => console.log(result))
.catch(error => console.log('error', error));
Python
import requests
url = "https://api.graphitehq.com/il/{{CLIENT}}/{{PAGE_TYPE}}/related-links?url={{URL}}"
response = requests.request("GET", url)
print(response.text.encode('utf8'))
Response
{{EXAMPLE_RESPONSE}}
{
"message": "...",
"related_links": [
{
"type": "related",
"title": "...",
"url": "...",
"url_path": "..."
},
...
{
"type": "related",
"title": "...",
"url": "...",
"url_path": "..."
}
]
}
Index
Available Link Fields
The index has several fields with page information obtained from crawling. All of these fields are available for export through the related-links/ endpoint response:
- text (text): Page plain text content.
- title (text): Page title
- url (text): Page canonical URL
- url_path (text): Page URL path
- {{ADDITIONAL_FIELDS_FROM_INDEX}}
Current Response Link Fields
- title
- type (added when processing the API request)
- url
- url_path
- {{ADDITIONAL_FIELDS_FROM_INDEX}}
Uptime and latency
The API is built on standard AWS services and as of May 27, 2022 we have had no major outages, with a 99.9% uptime. The average response time is approximately 150ms.
Requests rate limits
The API endpoints are not restricted by request rate limits; however, we encourage keeping the requests under 20 requests/second per endpoint. Updating data for a set of 10k pages will be done in less than 10 minutes.
When using the API endpoints to get data by batches, the API users should plan their jobs accordingly, considering the number of pages and the data update period, which could vary from one day to one week.
Caching
Server-side rendering with caching is strongly recommended. The results can be cached using the endpoint URL with query string parameters as the key.
The links are updated at most daily, so a one day TTL is appropriate.
GraphiteGrowth™
Updated 4 months ago