...
Limitation | Description |
---|---|
Websites with login required | Some websites may require user login before any content is displayed. In these cases, Verity will return an error and will not attempt to classify the content. However, in most cases partners add Verity user agents to their Allow list so this issue does not arise. |
Geographic content | Content is often tailored to a specific geographic market, for example for News, Sports, or Streaming sites. The site may be designed to effectively serve a local market, or to conform to region-specific regulations such as GDPR. Websites may automatically detect the a user’s geographic address based on their IP address and dynamically serve the content targeted to their region. Verity user agents run in the U.S.A., may be served content targeted to that market from these websites. However, most multi-national publishers run websites with country-specific domains for each nation they serve. Verity will classify the content of the country-specific page URL requested. |
Paywall | Many Publisher websites are protected by a paywall, and limit access to their content in various ways, such as:
Verity can often extract enough content from these page to successfully perform a classification, however in most cases the Publisher has added Verity user agents to their Allow list, so the paywall does not impact Verity. |
Rate limits | Web properties may want to reduce their exposure to DoS (Denial of Service) or bot attacks. Multiple requests within a short time span may trigger the website to block subsequent requests from Verity. In this case, Verity is unable to extract page content until the block is lifted. |
Robots.txt | A Robots.txt file may limit access to a site or parts of a site. The site may also limit the number of pages that can be downloaded (for example, only 10 pages per month). This may limit Verity’s ability to download content from the site. |
Fake Page Content | In theory a Publisher could set up a pages to return different content for a page URL, in order to manipulate Verity’s classification results. In practice, Verity has not encountered an issue of this kind. |
Text Ingestion Limitations
Verity service page content ingestion has the following limitations:
Processes only the first 20,000 characters on any page in any supported language.
Cannot process infinite scrolling pages.
Cannot process pages loaded by Javascript.
Image Image Support
Verity applies logic to identify the prominent image on a web page for analysis. Additional images on the page may be subject to image extraction limitations. Supported image formats are:
...