Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Verity report includes complete brand safety, keyword, and categorization analysis data for the requested content. Each report contains the following analysis results:

dataAvailable

States whether the classification request has already been processed. If processed data exists, Verity returns the results from the database. If not Verity starts a new processing request.

status

The current processing status of the analysis request.

pageUrl
Url

The URL of the page, video, image, or text analyzed by Verity, as applicable.

uuid

A unique identifier generated for the classification request.

languageCode

The standard ISO 639-1 code for the language of the content. Verity currently supports content in:

  • English

  • Japanese

  • Spanish

  • French

  • German

Note: If Verity detects an unsupported language, a status of NOT_SUPPORTED is returned.

iab v1

The IAB v1.0 categories identified for the page.

IAB v1.0 categories are widely adopted in programmatic and Real-Time-Bidding (RTB) ad marketplaces. IAB v1.0 categories are organized into the following tiers:

  • Tier 1  identifies broad level categories, such as Pets, defined with the following targeting depths:

    • Category/portal

    • Site section

    • Page

  • Tier 2 and greater identify more granular categories, such as Dogs, and are nested under Tier 1 categories. 

Refer to the Verity Taxonomy document for a listing of IAB v1 categories.

Verity video analysis does not support IAB v1.0 categories.

iab v2

The IAB v2.0 categories identified for the content.

The IAB defined a more granular content taxonomy in IAB Tech Lab Content Taxonomy v2.0 (released in 2017). IAB v2.0 defines additional content classifications and restructures existing IAB v1.0 classifications. 

Each IAB v2.0 category has a unique three-digit ID, and is structured into a tiered hierarchy with up to 4 tiers of categories.

Refer to the Verity Taxonomy for a listing of IAB v2 categories.

keywords

The top Keywords identified for the content, listed in order of prominence.

safe

The final aggregated Brand Safety summary result for the content.  

If any threat classifications are identified with a risk level of VERY_HIGH, the safe value is false and the content is considered unsafe.

If no (or low-risk) threat classifications are identified, the safe value is true, and the content is considered safe.

threats

Threat categories are part of GumGum’s brand safety taxonomy. GumGum classifies content into nine threat categories. For a complete list of Threat category IDs and Names, refer to Threat Categories in the Verity Taxonomy document.

To detect possible threats, Verity analyzes and scores all the extracted content. Verity then correlates the scores to determine a per-category threat risk-level for the content.

Possible threat category risk-levels are:

  • VERY_HIGH

  • HIGH

  • MODERATE

  • LOW

  • VERY_LOW

sentiments

Identifies and extracts opinions within digital content. 

The positive, neutral, and negative levels of sentiment expressed in the content are evaluated. For contextual targeting purposes, a sentiment level of neutral or positive is generally recommended.

processedAt

The date and time of the classification. 

 

Classification Approaches

...

Verity analyses contextual categories, keywords and sentiment results using various methods and data models, outlined in the following table:

IAB Content Categories 
v1 and v2

Content classifiers predict the likelihood that the given content belongs to one or more IAB categories.

Keywords

A set of rules derives, scores, and ranks the most important keywords.

Sentiments

Machine learning predicts the sentiment of each sentence within content by applying models trained on content with varying tones of voice. Verity returns an aggregated breakdown of the proportion of sentences in the content that are positive, neutral or negative (referred to as Document Level Sentiment Analysis).

Content classification is used for targeting purposes so Verity favors Precision over Recall. Data Scientists use Precision-Recall curves to maximize Precision with minimum loss in Recall, thereby maximizing the accuracy of the classified targets.

...

The following table summarizes some of the known issues Verity may encounter when downloading and extracting pages for analysis.

Limitation

Description

Maximum characters per page

Verity processes only the first 20,000 characters on any page in any supported language. Note that, according to the Verity team’s research, the majority of web pages are under 7,500 characters per page. Few pages exceed the 20,000 character limitation.

Infinite scrolling pages

Infinite scrolling enables users to keep scrolling through information on a web page, without clicking a “Load More” or “Next Page” option. Many platforms, such as espn.com, have implemented Infinite Scrolling, as information loads quickly and maintains user engagement. In many Infinite Scrolling environments, each component page of the Infinite Scroll page has its own URL and the URL changes as the content is loaded. As Verity has a 20,000 character maximum limit and only processes page URLs that are specifically requested by the partner, Verity typically does not process the complete content of an Infinite Scrolling page.

Dynamically rendered pages

Dynamic web pages contain content that is generated automatically from a web server via Javascript, instead of being hard-coded on the page. The content of the page may change based on multiple variables, for example, new data on the web server or user selection. The content of these page can only be reliably discovered by rendering the page. Verity therefore does not attempt to classify dynamically rendered pages.

Home pages

Home pages for a site may have more complicated layouts than the main corpus of the site content and often contain text passages quoted from other pages on the site. Verity’s contextual categorization of home page content may therefore be less useful than the classification of other pages on the site.

Intricate page layouts

Some sites may implement complex HTML and CSS schemes that may require rendering to reveal the main body text of the pages. These design practices are not typically employed by established publishers and therefore rarely impede Verity content extraction.

User Generated Content (UGC)

Verity does not process or analyze UGC, such as Comments or Social Media posts. UGC is constantly changing, therefore Verity does not attempt to provide a UGC content classification that could immediately become outdated.

Site Access Limitations

Partner restrictions on website access may limit Verity’s ability to download content. Typically, to bypass partner site restrictions, Verity partners configure their Allow lists enabling Verity user agents to access their content.

Limitation

Description

Websites with login required

Some websites may require user login before any content is displayed. In these cases, Verity will return an error and will not attempt to classify the content. However, in most cases partners add Verity user agents to their Allow list so this issue does not arise.

Geographic content

Content is often tailored to a specific geographic market, for example for News, Sports, or Streaming sites. The site may be designed to effectively serve a local market, or to conform to region-specific regulations such as GDPR.

Websites may automatically detect the a user’s geographic address based on their IP address and dynamically serve the content targeted to their region. Verity user agents run in the U.S.A., may be served content targeted to that market from these websites.

However, most multi-national publishers run websites with country-specific domains for each nation they serve. Verity will classify the content of the country-specific page URL requested.

Paywall

Many Publisher websites are protected by a paywall, and limit access to their content in various ways, such as:

  • Limiting the number of pages a user can read without logging in and subscribing.

  • Displaying the opening sentences or paragraphs of a page, but concealing the rest of the content until a reader selects a subscription option or logs in.

Verity can often extract enough content from these page to successfully perform a classification, however in most cases the Publisher has added Verity user agents to their Allow list, so the paywall does not impact Verity.

Rate limits

Web properties may want to reduce their exposure to DoS (Denial of Service) or bot attacks. Multiple requests within a short time span may trigger the website to block subsequent requests from Verity. In this case, Verity is unable to extract page content until the block is lifted.

Robots.txt

A Robots.txt file may limit access to a site or parts of a site. The site may also limit the number of pages that can be downloaded (for example, only 10 pages per month). This may limit Verity’s ability to download content from the site.

Fake Page Content

In theory a Publisher could set up a page to return different content for a page URL, in order to manipulate Verity’s classification results. A publisher that intentionally misrepresents page content for the purposes of avoiding or circumventing Verity’s brand safety measures would be considered nefarious. To our awareness, Verity has not encountered an issue of this kind.

Image Formats Analyzed

Verity applies logic to identify the prominent image on a web page for analysis. Additional images on the page may be subject to image extraction limitations. Supported image formats are:

  • BMP

  • EPS

  • ICNS

  • ICO

  • IM

  • JPEG

  • JPEG 2000

  • MSP

  • PCX

  • PNG

  • PPM

  • SGI

  • SPIDER

  • TIFF

  • WebP

  • XBM

Video Data Analyzed

The Verity Video analysis pipeline processes and analyzes video content and metadata, specifically:

  • Audio
    Transcription of the video's audio track. The maximum transcription length supported is 4 hours (14400 seconds).

  • OCR
    Text and cursive text detected in the video frames. OCR is included in the process when the video transcription yields fewer than 50 words. 

  • Metadata and title
    Page title and metadata.

  • Video frames
    Sampling is performed at a rate of 1 frame per second.

  • Video formats
    Supported formats are MPEG-4, MOV, MP3, FLAC, and M3U8. Video size
    The maximum video size is 2 GB.

User Information is Not Analyzed

...