Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Status messages returned by the page API.

Status Message

Description

INITIATED

Once Verity has checked that the URL is properly formed and does not already exist in the database, the request is passed to the Verity classification systems and the status updates to INITIATED.

PROCESSING

The Verity classification system is processing the text and images on the specified page.

PROCESSED

The URL has been processed and the Verity analysis JSON is available. The analysis results have been stored.

ERROR

Page processing has been attempted and failed. The page URL is recorded in the Error Cache for 1 hour.

If another request to process the same URL is received within 1 hour, Verity will return an Error status (unless the ignoreCache flag is enabled).

After 1 hour, the ERROR status is cleared and Verity will process a new request for the URL. 

Several different conditions may result in an ERROR status message:

  • Unreachable page.

  • A processing module has returned a value other than a success status code.

NOT_SUPPORTED

The language of the page is not supported (see Language Support Grid ). This status message may also be returned if Verity is unable to process the requested website.

INSUFFICIENT_CONTENT

Verity’s content extraction processes cannot extract sufficient relevant content from a page to adequately perform classification tasks across text and imagery.

INVALID

The HTTP URL request may be malformed, for example:

  • Incomplete URL.

  • Missing HTTP header.

  • Invalid domain-specific information.

Example Invalid Request:

Code Block
languagejson
{
  "dataAvailable": false,
  "Status":  "INVALID", 
  "pageUrl": "://gg.invalid/1"
  "uuid": "73146868-8d77-4bf0-8e89-eeb9d8e04cb2"
}

...

Page URL Error Codes

Error codes returned by the page API.

Error Code

 

 

Description

 

HTTP Status Code

Internal/Client Facing

Response Examples

INSUFFICIENT_CONTENT

  • Whenever the number of characters in the text associated with the page is less than 50 characters, the further processing of page ceases and the status is saved in the DB.

422 - Unprocessable Entity

-> Client facing.

Image Added

PAGE_CONTENT_CLASSIFICATION_FAILED

  • Denotes the failure of the classification process, is usually the client facing generic error code for more granular internal error codes to ease for debugging purposes.

500 - Internal Server Error

-> Client facing.

Image Added

TEXT_EXTRACTION_REQUEST_FAILED

  • After the downloading of the page, when the request by PCE send to Tapas for further text extraction fails, this status is saved in the DB.

500 - Internal Server Error

-> Internal.

-> Mapped to PAGE_CONTENT_CLASSIFICATION_FAILED

Image Added

IMAGE_CLASSIFIACTION_REQUEST_FAILED

  • After the downloading of the page, when the request by PCE send to prism for threat analysis on the OG image of the page fails, this status is saved in the DB.

500 - Internal Server Error

-> Internal.

-> Mapped to PAGE_CONTENT_CLASSIFICATION_FAILED

Image Added

TEXT_EXTRACTION_RESPONSE_NOT_SUPPORTED

  • During text extraction if tapas detects that the language of the text is not supported by the classification models, this status is returned to verity router which in turn stores it in the DB.

200 - OK

-> Internal.
-> Mapped to status field NOT_SUPPORTED in verity response.

Image Added

TEXT_EXTRACTION_RESPONSE_INTERNAL_ERROR

  • When tapas’s systems encounter any internal errors during the text extraction process, this status is returned to verity router and saved as this error code in the DB.

500 - Internal Server Error

-> Internal.

-> Mapped to PAGE_CONTENT_CLASSIFICATION_FAILED

Image Added

TEXT_EXTRACTION_RESPONSE_INTERNAL_UNKNOWN

  • Post the text extraction, verity router only recognises only three statuses from tapas - “SUCCESS”, “NOT_SUPPORTED” and “INTERNAL_ERROR”, any other status returned by tapas is saved as this error code in the DB by verity router. 

500 - Internal Server Error

-> Internal.

-> Mapped to PAGE_CONTENT_CLASSIFICATION_FAILED

Image Added

PAGE_CONTENT_EXTRACTION_FAILED_WITH_403_FORBIDDEN

A website has blocked our web crawler from downloading content.

  • This error code primarily denotes that the PCE which is responsible for downloading the contents of a page is being denied by the host site from doing so mostly due to missing authorisation credentials.

  • When a client sees this code, it needs to make sure that the correct request parameters and the necessary header values are being passed to verity while sending the requests.

422 - Unprocessable Entity

-> Client facing.

Image Added

PAGE_CONTENT_EXTRACTION_FAILED_WITH_404_NOT_FOUND

Verity’s web crawler was not able to locate any content for the provided url.
  • This error code denotes that the PCE which is responsible for downloading the contents of a page is unable to find the webpage for the requested URL.

  • On receiving this code the client needs to ensure that the correct values of the URL and the necessary authorisation header values are passed in as the request parameters.

422 - Unprocessable Entity

-> Client facing.

Image Added

PAGE_CONTENT_EXTRACTION_FAILED_WITH_500_INTERNAL_SERVER_ERROR

There was an unknown issue received from the webpage when an attempt to crawl was made. Note: The PCE attempts up to three times to extract web content.

  • This error code denotes that the PCE is experiencing errors while trying to download the contents of a page.

  • On receiving this error code, the client may retry with the request again after a while.

422 - Unprocessable Entity

-> Client facing.

Image Added

PAGE_CONTENT_EXTRACTION_FAILED_WITH_4XX

A generic 4XX response was received during a crawl attempt by PCE.

  • This error code denotes that the PCE which is responsible for downloading the contents of a page is facing issues that are mostly from the client side.

  • On receiving this code the client needs to ensure that the correct values of the URL and the necessary authorisation header values are passed in as the request parameters.

422 - Unprocessable Entity

-> Client facing.

Image Added

PAGE_CONTENT_EXTRACTION_FAILED_WITH_5XX

  • This error code denotes that the PCE which is responsible for downloading the contents of a page is experiencing internal issues.

A generic 5XX response was received during a crawl attempt by PCE.
  •  On receiving this error code, the client may retry with the request again after a while.

422 - Unprocessable Entity

-> Client facing.

Image Added

PAGE_CONTENT_EXTRACTION_FAILED

  • A generic error code denoting the failure of the PCE from downloading all the contents of a page successfully. This code is usually saved in the DB along with a more granular error code describing the failure reason of PCE and the status.

  • If this is the sole code present for a page in DB, it means that tapas has failed to send any value for the expected status field.

422 - Unprocessable Entity

-> Client facing.

Image Added

URL_CANNOT_EXCEED_2048_BYTES

  • This error code shows that the URL being passed in the request is greater than 2048 bytes and this is deemed to be invalid as this is the limit of Dynamo Db’s partition key’s size limit.

  • One action client can take on seeing this error code is to ensure that the URL is being stripped from possible query parameters.

400 - Bad (Invalid) request.

-> Client facing.

Image Added

URL_MISSING_HTTP_PROTOCOL

  • This error code shows that the requested URL is not adhering the HTTP protocol.

  • The client should ensure that the URL is beginning with `http://' or ‘https://' .

400 - Bad (Invalid) request.

-> Client facing.

Image Added

URL_MUST_NOT_BE_EMPTY

  • This error code denotes that the request is missing the URL. 

400 - Bad (Invalid) request.

-> Client facing.

Image Added

URL_MALFORMED

  • This error code denotes that the URL being requested cannot be associated to a standard RFC 2396 value.

400 - Bad (Invalid) request.

-> Client facing.

Image Added

URL_CANNOT_EXCEED_2048_BYTES

  • This error code shows that the URL being passed in the request is greater than 2048 bytes and this is deemed to be invalid as this is the limit of Dynamo Db’s partition key’s size limit.

  • One action client can take on seeing this error code is to ensure that the URL is being stripped from possible query parameters.

400 - Bad (Invalid) request.

-> Client facing.

Image Added