Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Limitation

Description

Websites with login required

Some websites may require user login before any content is displayed. In these cases, Verity will return an error and will not attempt to classify the content. However, in most cases partners add Verity user agents to their Allow list so this issue does not arise.

Geographic content

Content is often tailored to a specific geographic market, for example for News, Sports, or Streaming sites. The site may be designed to effectively serve a local market, or to conform to region-specific regulations such as GDPR.

Websites may automatically detect the a user’s geographic address based on their IP address and dynamically serve the content targeted to their region. Verity user agents run in the U.S.A., may be served content targeted to that market from these websites.

However, most multi-national publishers run websites with country-specific domains for each nation they serve. Verity will classify the content of the country-specific page URL requested.

Paywall

Many Publisher websites are protected by a paywall, and limit access to their content in various ways, such as:

  • Limiting the number of pages a user can read without logging in and subscribing.

  • Displaying the opening sentences or paragraphs of a page, but concealing the rest of the content until a reader selects a subscription option or logs in.

Verity can often extract enough content from these page to successfully perform a classification, however in most cases the Publisher has added Verity user agents to their Allow list, so the paywall does not impact Verity.

Rate limits

Web properties may want to reduce their exposure to DoS (Denial of Service) or bot attacks. Multiple requests within a short time span may trigger the website to block subsequent requests from Verity. In this case, Verity is unable to extract page content until the block is lifted.

Robots.txt

A Robots.txt file may limit access to a site or parts of a site. The site may also limit the number of pages that can be downloaded (for example, only 10 pages per month). This may limit Verity’s ability to download content from the site.

Fake Page Content

In theory a Publisher could set up a pages to return different content for a page URL, in order to manipulate Verity’s classification results. In practice, Verity has not encountered an issue of this kind.

Image

...

Formats Analyzed

Verity applies logic to identify the prominent image on a web page for analysis. Additional images on the page may be subject to image extraction limitations. Supported image formats are:

  • BMP

  • EPS

  • ICNS

  • ICO

  • IM

  • JPEG

  • JPEG 2000

  • MSP

  • PCX

  • PNG

  • PPM

  • SGI

  • SPIDER

  • TIFF

  • WebP

  • XBM

Video Data Analyzed

The Verity Video analysis pipeline processes and analyzes video content and metadata, specifically:

  • Audio
    Transcription of the video's audio track. The maximum transcription length supported is 14400 seconds.

  • OCR
    Text and cursive text detected in the video frames. OCR is included in the process when the video transcription yields fewer than 50 words. 

  • Metadata and title
    Page title and metadata.

  • Video frames
    Sampling is performed at a rate of 1 frame per second.

  • Video formats
    Supported formats are MPEG-4, MOV, MP3, FLAC, and M3U8.

  • Video size
    The maximum video size is 2 GB.

...

User Information is Not Analyzed

Verity does not process or store user information (such as cookies or browsing history). Verity analysis is based solely on the content of media analyzed.