API Host

URL:https://ilapi.graphite.io

Internal Links API endpoints

Related links endpoint

This endpoint delivers a list of related links from a source webpage to other target webpages within the same website. The source and target pages can be of the same type (e.g., blog-to-blog links) or different types (e.g., blog-to-product links).

Prerequisites

Upon the client's or API user's request, Graphite creates a Related links endpoint
The client specifies the sets of source and target pages for which related links will be built
Graphite then performs an initial crawl to gather the data that will be used to calculate related links or be returned through the endpoint response
Following this, Graphite carries out the initial computation of related links
The links for pages are chosen to maximize relatedness, ensuring that each page receives 'k' incoming links on average. The semantic similarity of their text determines the relatedness between the pages.
The endpoints are fully functional once the data is collected and the links are computed

Request

Endpoint Path: /<client_id>/<page_type_relation>/related-links
HTTP Method: GET
HTTP Authentication: None
Required Headers: None
CORS: Yes, accessible from all origins without the need for authentication or headers
Input Parameters: Query Strings

Path parameters

client_id: (Required) A Graphite’s client (or API user) ID
Provided by Graphite
page_type_relation: (Required) A label that describes the relationship between the source and target pages, for example, “blog” for same-type pages or “blog-to-product” for different types
Provided by Graphite

📘
Informational API Endpoints
Check the Informational API Endpoints section to learn about available Internal Links API endpoints and their details for a certain client_id

Query string parameters

`url`

(Required) A canonical or unique webpage URL
The API IDs for each webpage are derived from unique URLs, which lead to an ID-based search on the API index.

Valid URLs:

URLs with a scheme, e.g., https://example.com/example/path.html
URLs without a scheme, e.g., example.com/example/path.html
A URL path (technically not a URL but valid for the API request). It must start with a slash (/), which indicates the root folder, e.g., /example/path.html
Encoded URLs, e.g., https%3A%2F%2Fexample.com%2Fexample%2Fpath.html
URLs with query string parameters must be encoded, e.g., https%3A%2F%2Fexample.com%2Fexample%2Fpath.html%3Fq%3Dv

The order of parameters in a URL doesn't change the data it leads to; however, different sets of parameters pointing to the same data are treated as unique webpages. Therefore, URLs with query string parameters should be standardized.

Example usage:

https://ilapi.graphite.io/example-client/example-section/related-links?url=https%3A%2F%2Fexample-client.com%2Fexample-section%2Feu-wants-to-see-if-lawmakers-will-block-brexit-before-striking-new-deal-uk-s-johnson

example-client: client_id as input
example-section: page_type_relation as input
example url=: url as input

Response

HTTP Status: 200
Content-Type: application/json

Response Body Properties

message: (String, Non-null) Text that gives details about the response. Only for information.
related_links: (List of JSON objects, Non-null) A list of link objects. A link object is a JSON object containing link data, which supports the creation of HTML link elements from a source page to a target page the link object describes.

📘
Properties of link objects
The specific properties of the link object are outlined in Related Link Object Properties

This list can contain random links, which are link objects selected uniformly at random without replacement if the random link completion feature is enabled (default)

Random links are still considered related because they are of the same kind as target pages. The primary purpose of random links is to fulfill the SEO constraint of the API, ensuring that each page receives 'k' incoming links in order to fill gaps whenever the API cannot assign an expected number of related links.

This list could be empty under some conditions.

“Non-null” properties will always be present in the link object and never be null. Other properties may be null or absent in the link object.

curl --request GET 
'https://ilapi.graphite.io/example-client/example-section/related-links?url=https%3A%2F%2Fexample-client.com%2Fexample-section%2Feu-wants-to-see-if-lawmakers-will-block-brexit-before-striking-new-deal-uk-s-johnson'

{
  "message": "Related links found",
  "related_links": [
    {
      "type": "related",
      "author": "Reuters Editorial",
      "image_url": "https://s3.reutersmedia.net/resources/r/?m=02&d=20190903&t=2&i=1425818863&w=1200&r=LYNXNPEF821I6",
      "published_time": "2019-09-03T16:27:26Z",
      "description": "Prime Minister Boris Johnson...",
      "title": "UK public must decide next steps if parliament votes against Johnson: PM's spokesman",
      "url": "https://example-client.com/example-section/uk-public-must-decide-next-steps-if-parliament-votes-against-johnson-pm-s-spokesman",
      "url_path": "/example-section/uk-public-must-decide-next-steps-if-parliament-votes-against-johnson-pm-s-spokesman"
    },
    ...
    {
      "type": "related",
      "author": "Reuters Editorial",
      "image_url": "https://s2.reutersmedia.net/resources/r/?m=02&d=20190903&t=2&i=1425826613&w=1200&r=LYNXNPEF821KH",
      "published_time": "2019-09-03T16:53:48Z",
      "description": "Earnings and revenue expectations for European...",
      "title": "European third quarter profit outlook improves slightly but still in recession: Refinitv",
      "url": "https://example-client.com/example-section/european-third-quarter-profit-outlook-improves-slightly-but-still-in-recession-refinitv",
      "url_path": "/example-section/european-third-quarter-profit-outlook-improves-slightly-but-still-in-recession-refinitv"
    }
  ]
}

Other response statuses

This section provides information on the circumstances that lead to various HTTP statuses

HTTP 204:

The API endpoint exists, but data still needs to be gathered. The response does not have a body.

Error responses

This section provides information on the circumstances that lead to various HTTP error statuses

Usually, responses are delivered in the “application/json” format and include details regarding any errors that may have transpired

HTTP 400

Input parameters not found
Input parameters validation error

HTTP 404

API endpoint not found

HTTP 500

Internal server error

Informational API endpoints

Internal Links API endpoints lists

This endpoint delivers a comprehensive list of endpoints that a specific Graphite client or API user can access.

Request

Endpoint Path: /<client_id>/endpoints
HTTP Method: GET
HTTP Authentication: None
Required Headers: None

Path parameters

client_id": (Required) A Graphite’s client (or API user) ID
Provided by Graphite

Response

HTTP Status: 200
Content-Type: application/json

Response body properties

message: (String, Non-null) Text that gives details about the response.
client_id: (String, Non-null) API user ID that matches the client_id path parameter.
endpoints: (List of JSON objects, non-null) A list of endpoint information objects. An endpoint information object is a JSON object that holds information about an Internal Links API endpoint, including its path and basic configuration details.

📘
Endpoint information object properties
The specific properties of the endpoint object are outlined in Endpoint Information Object Properties

“Non-null” properties will always be present in the link object and never be null. Other properties may be null or absent in the endpoint information object.

curl --request GET 'https://ilapi.graphite.io/example-client/endpoints'

{
  "message": "Success",
  "client_id": "example-client",
  "endpoints": [
    {
      "endpoint_path": "/example-client/example-section-2/related-links",
      "is_active": true,
      "endpoint_type": "related_links",
      "source_set_id": "example-client-example-section-2",
      "target_set_id": "example-client-example-section-2",
      "links_count": 4,
      "random_links_completion": true
    },
    {
      "endpoint_path": "/example-client/example-section/related-links",
      "is_active": true,
      "endpoint_type": "related_links",
      "source_set_id": "example-client-example-section",
      "target_set_id": "example-client-example-section",
      "links_count": 4,
      "random_links_completion": true
    }
  ]
}

Endpoint information object properties

endpoint_path: (String, Non-null) The endpoint route
is_active: (Boolean, Non-null) true if the endpoint is active; otherwise, false
endpoint_type: (String, Non-null) Endpoint type. Possible values are: “related_links” for a Related Links Endpoint.
source_set_id: (String, Non-null) ID of the source set of pages. This ID is an internal value but can be helpful to check the type of links provided by the endpoint.
target_set_id: (String) ID of the target set of pages. This ID is an internal value but can be helpful to check the type of links provided by the endpoint.
- It will always be present in the object for the “related_links” endpoint type
links_count: (Number) Default number of links returned in the endpoint response
- It will always be present in the object for the “related_links” endpoint type
- It is a final value for the “related_links” endpoint type, as related links cannot be computed on the fly
random_links_completion: (Boolean) true if random links completion is enabled; otherwise, false
- It will always be present in the object for the “related_links” endpoint type. If set to true, the API will randomly pick links to fill up the resultant list if there aren't enough related links (links_count).

Error responses

This section provides information on the circumstances that lead to various HTTP error statuses.

Usually, responses are delivered in the “application/json” format and include details regarding any errors that may have transpired.

HTTP 404

Unable to locate endpoint settings as the client_id provided could not be found

HTTP 500

Internal server error

Crawling

To index pages for related links selection, Graphite’s bot crawls pages from a source of URLs.

Optimal URL sources are:

XML sitemap index (preferred)
- A sitemap URL pattern could also be specified to avoid crawling all sub-sitemaps in massive websites
XML sitemap
HTML sitemap
robots.txt file with sitemaps

We run crawling at most daily starting at 00:00 UTC with a request rate of 60-240 pages per minute

📘
Crawl settings
For more information about our crawler including speed, IP and user agent please visit our Crawl settings documentation

Graphite's crawling bot

These are the currently used Graphite bot user agents:

GraphiteBot
- Base bot identifier.
GraphiteBot/1.0 (+https://www.graphitehq.com)
- Identifier with version number and comment.

We recommend allowing GraphiteBot to crawl using the user agent. If you are worried about “User-Agent” spoofing, Graphite can provide static IP addresses for the bot's connections. These can be utilized to grant access authorization.

Indexing

This applies to pages that are part of the client's pre-determined source or target sets for available endpoints. The pre-determined source or target sets are most commonly identified by URL patterns (i.e., example.com/blog/{slug}). More complex patterns or multiple patterns are also supported.

Page Status Codes

200 HTTP status

The GraphiteBot crawler will index all pages from URL sources that successfully return a 200 HTTP status that are part of the pre-determined source or target sets for available endpoints

301 and 302 HTTP status

The GraphiteBot crawler will follow 301 and 302 HTTP statuses by default
- The link URL the crawler indexes will be the canonical tag or the last seen URL if there are redirections
- The redirected URL needs to be part of the same URL pattern identified for source and target sets for available endpoints or it will be excluded
  - The URLs in between redirects can have different patterns, but the first and last URL to crawl needs to follow the identified pattern from the set

4xx HTTP status

By default the GraphiteBot will not include any pages found with 400 HTTP status codes

Preventing page indexing

API users have several options to prevent page indexing:

Provide URL sources that omit the pages the user wishes to exclude from indexing
- This can be done by providing a dedicated sitemap for the API that only includes pages that should be included in the index
Use the HTML robots meta tag to mark a page as “noindex”. This means that any pages blocked from indexing by search engines will not be indexed by GraphiteBot either. For example, use <meta name="robots" content="noindex">
- Other tags like <meta name="googlebot" content="noindex"> are also supported.
- The X-Robots-Tag HTTP header is another supported method, as detailed in https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag.
To specifically prevent GraphiteBot from indexing a page, use the HTML robots meta tag. For instance, the tag <meta name="graphitebot" content="noindex"> or the header X-Robots-Tag: graphitebot: noindex can be used.

Notes on data extraction

GraphiteBot uses a hierarchy of data sources to extract information from HTML documents:

It looks at the page's first <h1> element for the webpage title
Then looks at schema.org's JSON structured data, particularly the CreativeWork type, to gather various details like title, description, authors, categories, images, published time, and text content
If needed, it also uses the Open Graph protocol to gather details like the title, description, authors, images, and published time
Finally, standard HTML metadata may be used to gather details from following elements: title tag, meta author, and meta description

The order of precedence ensures that the most reliable and accurate data sources are prioritized in the extraction. If a piece of information isn't available from a high-precedence source, the system will then look to the next highest precedence source. This method helps maintain data quality and reliability.

Text context

To provide accurate content to the API algorithms, text extraction is crucial. This can be achieved using schema.org's objects via the "text" property. If the text is not available through JSON structured data, HTML blocks within the <body> element marked with the itemprop="text" attribute can be used. If neither of these sources provide text, a heuristics-based extraction from the <body> will be performed as a last resort.

Recommendations

Generally, to build HTML linking elements, an API user would only need the target page URLs and will make use of its database to get the required information for building navigable links. However, when relying on the API’s extracted data, these recommendations may be helpful:

Use a Canonical Link: Always ensure that the canonical link you use is unique. This helps in avoiding duplicate content issues and improves SEO.
Use One <h1> Element: It's a good SEO practice to use one <h1> element to mark up the page's title. This helps search engines understand the content of the page better.
Use Structured Data: Make use of structured data from either schema.org or Open Graph, or even both. This helps in providing more detailed information about the page content to search engines and our API.
Add an ID to Main Content: Adding an ID to the main content of the page can help in better navigation and accessibility.
Add IDs to Relevant HTML Elements: Consider adding IDs to relevant HTML elements such as authors, breadcrumbs, categories, published time, images, etc. This can help in better organization and accessibility of the content.

Uptime and latency

Built on robust AWS services, the API has consistently achieved a 99% uptime, with no major disruptions. It also maintains an average response time of 150 milliseconds.

Request rate limit

Although API endpoints are not restricted by rate limits, we recommend maintaining a rate of under 20 requests/second for each endpoint when retrieving data by batches. With this rate, an API user can update links for 10,000 pages in less than 10 minutes.

Graphite can provide guidance on the frequency of batch data retrieval jobs.

Caching

We highly recommend server-side rendering coupled with caching for optimal performance. You can cache the results using the endpoint URL (with query string parameters) serving as the key.

The links are refreshed no more than once a day, thus a 24-hour Time To Live (TTL) is suitable.

Graphite Growth™

API Host

Internal Links API endpoints

Related links endpoint

Prerequisites

Request

Path parameters

Informational API Endpoints

Query string parameters

url

Response

Response Body Properties

Properties of link objects

Related Link Object Properties

Data extraction

Missing or null properties

Other response statuses

HTTP 204:

Error responses

HTTP 400

HTTP 404

HTTP 500

Informational API endpoints

Internal Links API endpoints lists

Request

Path parameters

Response

Response body properties

Endpoint information object properties

Endpoint information object properties

Error responses

HTTP 404

HTTP 500

Crawling

Crawl settings

Graphite's crawling bot

Indexing

Page Status Codes

Preventing page indexing

Notes on data extraction

Text context

Recommendations

Uptime and latency

Request rate limit

Caching

`url`