Table of Contents

maxLevel	1

Summary

This Description of Methodology (DoM) describes the processes that deliver Verity – GumGum’s content-level contextual analysis and brand safety solution.

...

Once analysis is complete, Verity returns a detailed report featuring a brand safety score for the content, along with contextual targeting categories, prominent keywords, event and sentiment categories. Verity supports the contextual targeting categories defined in the Interactive Advertising Bureau (IAB) Content Taxonomy v1.0 and 2.0.

Primary Users and Use Cases

Verity serves publishers, DSPs, agencies, and advertisers as a third-party content-level contextual analysis and brand safety solution.

...

As of September 2020, GumGum's Verity service processes approximately 1 billion unique requests per month for content and brand safety classification (100% originating within North America).

Verity Platform Functions

Verity’s function is to provide data to clients who explicitly request and pay for analysis information about specific digital content. The clients are interested in establishing brand suitability and contextual classification for specific content, to drive their own content creation or ad serving decisioning.

Verity applies natural language processing (NLP) and computer vision (CV) based machine learning techniques to analyze digital content. Multiple kinds of content can be analyzed, such as desktop and mobile web pages, images, and Online Video Platform (OLV) and Over-The-Top (OTT) videos (including audio).

Web Page Analysis Functions

Going beyond simple strategies like identifying keywords on the page or in the URL string or metadata, Verity works by scanning the full text and prominent imagery of a web page. Verity’s NLP processes analyse the core page content, while CV processes analyse the imagery.

...

Verity does not apply content-level analysis to code or objects (including third-party code or objects) that appear outside, adjacent to, or embedded within the core text on a page.
Verity does not download or analyze the CSS, JavaScript, navigation, footer, sidebars, and other areas extraneous to the core textual content on the page. For example, on a typical Blog page Verity extracts and analyzes the central content of the page, but not the surrounding elements such as third-party advertising or related content.
Verity also does not provide analysis of continually changing dynamically loaded user generated content within publisher pages (e.g., reviews sections, comments sections, social media plug-ins) or social media environments.
Verity applies logic to identify the prominent image on a web page for analysis. Additional images on the page may be subject to image extraction limitations.
GumGum informs clients that Verity analyzes the web page (not the surrounding material) specifying that the analysis includes the core textual content and prominent imagery but nothing else – not graphics, sidebar content, or third-party insertions such as paid advertising.
Verity acknowledges that surrounding, adjacent, or embedded content on a web page (which may be provided by JavaScript executions or non-textual content) can affect the context of a page as presented to users and may be a consideration for advertisers.
Other key platform functions such as ad serving, detection of ad fraud, identification of invalid traffic (IVT/SIVT), measurement of viewability, measurement of audiences, and other cookie implementations are not handled by Verity or its technology.

Video Analysis Functions

Verity analyzes video content by applying powerful classifiers to the video’s transcribed audio track and data from sampled video frames.

Video analysis leverages GumGum’s industry leading NLP text analysis processes, CV image analysis, plus fast and accurate audio transcription services.

Verity Machine Learning Technology

Verity is the only solution that applies machine learning techniques to provide content-level brand safety and contextual analysis. Alternative solutions may only leverage keyword methodologies to look at text and are limited to page-level analysis, use of Allow or Block lists, or URL-level analysis. These more crude contextual approaches often eliminate safe and relevant inventory. They also miss relevant content (e.g., keywords that are spelled differently), overlook related content and mistakenly target irrelevant content (e.g., keywords with multiple meanings).

...

The supervised learning algorithm searches for patterns in the data that correlate with the desired outputs. After training, the supervised learning algorithm can process new unseen pages and label them with a classification based on the prior training data. For example, the model could predict whether digital content references drugs or alcohol and classify it accordingly for the purposes of brand safety.

...

Architecture and Flow

Customers use Verity to analyze specific digital content and determine the eligibility of the content for ads. Verity does not crawl the internet for content; instead a client application calls Verity (via their integration with the Verity API) specifying the URLs of specific content they’d like to analyze.

GumGum's Verity service exists entirely within a secure Cloud infrastructure. Verity’s Cloud-based architecture is massively scalable and currently processes approximately 1 billion unique requests per month for content and brand safety classification.

Access for Verity User Agents

If a requested URL blocks a Verity browser, Verity cannot process the content and returns an error. Verity customers are therefore requested to configure their domain access permissions to enable Verity to access their site in order to extract and process content.

...

Page Analysis Process

The Verity page analysis process involves the following core components:

...

Verity API Gateway: The Verity API Gateway receives a page URL request, authenticates the client request and passes the URL to the Verity API.
Verity API: The Verity API initiates the request and then orchestrates the Content Extractor, Text and Image analyses systems to extract the page data and perform the analyses.
Content Extractor: The Content Extractor accepts page requests sent by the Verity API from a queue. The Content Extractor loads the page URL, downloads the page title, metadata, and HTML and saves it as a text string in the database. If a prominent image is identified for the page, the Content Extractor downloads and saves the image to the database with identification information for the associated page. The Content Extractor passes the Page URL and image information on for text and image analysis.
Text Analysis: The Text Analysis engine applies Natural Language Processing (NLP) for text classification (e.g. IAB and Threat categories) and information extraction (e.g. Keywords).
Image analysis: The Image Analysis engine houses GumGum’s core Computer Vision capabilities in a modular architecture. The Image Analysis component passes images through multiple data models to determine their classification information.
Verity Report: The Verity API retrieves the text and image classification results, applies weighting and merging logic to the results, and returns the final Verity page report to the client.

Verity Video Analysis

Verity analyzes videos for the purposes of content-level contextual targeting and brand safety.

...

Transcribe Service – Applies automatic speech recognition (ASR) to convert speech to text.
OCR Service – Performs Optical Character Recognition (OCR) to detect text in video and convert the detected text into machine-readable text.
Verity Text Processing – Applies machine learning models to the video metadata, title, transcription text, and OCR text and provides a brand safety and contextual classification report.

Verity Video Analysis Process

The Verity video analysis process involves the following core components:

...

Verity API Gateway: The Verity API Gateway receives a video URL request, authenticates the client request and passes the URL to the Verity API.
Verity API: The Verity API passes the request to the Video Transcribe component to orchestrate video transcription and optical character recognition.
Video Transcribe: Video Transcribe downloads the video from the request URL and stores the video. Verity API initiates a transcription job with the transcription service. If the video is in MU38 format it is transcoded prior to transcription. Once the transcription service finishes a job it sends the results back to the object storage service, triggering a notification to the Verity API.
Verity API/OCR service:The Verity API verifies if the transcription results contain a sufficient sample of words. If not, Verity API requests Video Transcribe to initiate an OCR job. Upon OCR job completion, Verity API receives a notification and retrieves the OCR text results. Verity API passes the concatenated text results (comprising transcription, OCR, Client metadata title and description) to Verity Text Processing.
Verity Text Processing: The Text Processing engine processes the video transcription, OCR, client metadata title and description by applying Natural Language Processing (NLP) for text classification (e.g. IAB Content Categories v2.0 and Threat categories) and information extraction (e.g. Keywords).
Verity Report: The Verity API accepts the text analysis results, applies result weighting and merging logic, then returns the final video analysis Verity Report to the client.

Verity Brand Safety

Verity Machine learning predicts threat categories by applying data models trained on collections of various kinds of threatening content. Verity’s sophisticated Computer Vision machine learning can identify threatening scenes, such as natural disasters or accidents. Object detection picks out potentially threatening objects within an image, such as weapons, exposed skin or drinks.

...

Clients can set a unique threshold or risk-tolerance level for each threat category. For example, a healthcare provider may choose to set no threshold for the “Medical” threat category, yet higher thresholds for categories that are less suitable for ad placement (e.g., “Hate”, “Violence”, or “Obscene”).

Verity Content Classification

Verity works by applying machine learning techniques to relevant content to assign contextual categories.

IAB Categories

The Interactive Advertising Bureau (IAB) defines a Content Taxonomy to provide Publishers with a consistent and easy way to organize their website content, and enable advertisers to target standard content categories. Verity returns all IAB hierarchy tiers for both versions 1.0 and 2.0 of the taxonomy:

...

For example, Verity analysis of an article on “The Rise of Alternative Venture Capital” identifies IAB v1.0 categories in 2 tiers, and IAB v2.0 categories in 4 tiers.

Event Categories

GumGum Events offer hundreds of categories that add another layer of targeting on top of the IAB standard categories and provide more granularity. For example, IAB v2 offers a single category for “National & Civic Holidays”, while GumGum covers content about specific holidays, like “Thanksgiving” and “Christmas.”

Keywords

Keywords are derived from content, metadata, and headlines. Verity ranks keywords according to frequency of use and prominence. Objects and scenes detected in an image may be included in the list of keywords.

Sentiment

Verity predicts the sentiment of each sentence within content (referred to as Document Level Sentiment Analysis), and returns an aggregated breakdown of the proportion of sentences within content that are positive, neutral or negative. Sentiment thresholds are entirely up to the Publisher to set. Across the web, “neutral” is the most common primary sentiment classification.

Verity Classification and Brand Safety Report

The Verity report includes complete brand safety, keyword, and categorization analysis data for the requested content. Each report contains the following analysis results:

dataAvailable	States whether the classification request has already been processed. If it has, Verity returns the results from the database. If not Verity starts a new processing request.
status	The current processing status of the analysis request.
pageUrl videoUrl	The URL of the page or video analyzed by Verity, as applicable.
languageCode	The standard ISO 639-1 code for the language of the content. Verity currently supports content in: English Japanese Verity video analysis currently supports English only. Note: If Verity detects an unsupported language, a status of NOT_SUPPORTED is returned.
iab v1	The IAB v1.0 categories identified for the page. IAB v1.0 categories are widely adopted in programmatic and Real-Time-Bidding (RTB) ad marketplaces. IAB v1.0 categories are organized into the following tiers: Tier 1 identifies broad level categories, such as Pets, defined with the following targeting depths: Category/portal Site section Page Tier 2 and greater identify more granular categories, such as Dogs, and are nested under Tier 1 categories. Refer to the Verity Taxonomy document for a listing of IAB v1 categories. Verity video analysis does not support IAB v1.0 categories.
iab v2	The IAB v2.0 categories identified for the content. The IAB defined a more granular content taxonomy in IAB Tech Lab Content Taxonomy v2.0 (released in 2017). IAB v2.0 defines additional content classifications and restructures existing IAB v1.0 classifications. Each IAB v2.0 category has a unique three-digit ID, and is structured into a tiered hierarchy with up to 4 tiers of categories. Refer to the Verity Taxonomy for a listing of IAB v2 categories.
keywords	The top Keywords identified for the content, listed in order of prominence.
safe	The final aggregated Brand Safety summary result for the content. If any threat classifications are identified with a high-risk level, the safe value is false and the content is considered unsafe. If no (or low-risk) threat classifications are identified, the safe value is true, and the content is considered safe.
threats	Threat categories are part of GumGum’s brand safety taxonomy. GumGum classifies content into nine threat categories. For a complete list of Threat category IDs and Names, refer to Threat Categories in the Verity Taxonomy document. To detect possible threats, Verity analyzes and scores all the extracted content. Verity then correlates the scores to determine a per-category threat risk-level for the content. Possible threat category risk-levels are: VERY_HIGH HIGH MODERATE LOW VERY_LOW
events	The Events classifier identifies seasonal events such as the Olympics (e.g. annual, bi-annual, 4-yearly events) for the purposes of contextual ad targeting. Verity lists up to five Event categories, in order of prominence. For a complete list of Event category IDs and Names, refer to Event Categories in the Verity Taxonomy document. Verity video analysis does not support Events.
sentiments	Identifies and extracts opinions within digital content. The positive, neutral, and negative levels of sentiment expressed in the content are evaluated. For contextual targeting purposes, a sentiment level of neutral or positive is generally recommended.
processedAt	The date and time of the classification.

...

Classification and Scoring

Verity analyses threat, contextual categories, keywords and sentiment results in different ways. The data models Verity implements vary for different purposes and are fine-tuned and optimized on an ongoing basis.

...

Content classification is used for targeting purposes. In this case, GumGum favors Precision over Recall. Data Scientists use Precision recall curves to maximize Precision with minimum loss in Recall, thereby maximizing the accuracy of the classified targets.

Verity and the 4A’s Brand Safety Floor

The 4A’s, the leading trade organization for marketing communications agencies, defines the Advertising Assurance Brand Safety Floor and Brand Suitability Framework (revised in May 2020). The following table details the mapping between the 4A’s Brand Safety Floor and GumGum’s threat categories.

4A’s Floor		GumGum’s Verity brand safety categories
Category	Definition	Category
1 Adult & Explicit Sexual Content	Illegal sale, distribution, and consumption of child pornography. Explicit or gratuitous depiction of sexual acts, and/or display of genitals, real or animated.	GGT4	Sexual; sexually charged
2 Arms & Ammunition	Promotion and advocacy of Sale of illegal arms, rifles, and handguns. Instructive content on how to obtain, make, distribute, or use illegal arms. Glamorization of illegal arms for the purpose of harm to others. Use of illegal arms in unregulated environments.	GGT1	Violence and gore
2 Arms & Ammunition		GGT2	Illegal/criminal
3 Crime & Harmful acts to individuals and Society and Human Rights Violations	Graphic promotion, advocacy, and depiction of willful harm and actual unlawful criminal activity – Explicit violations/demeaning offenses of Human Rights (e.g. human trafficking, slavery, self harm, animal cruelty etc.), Targeted harassment of individuals and groups	GGT1	Violence and gore
		GGT2	Illegal/criminal
4 Death, Injury or Military Conflict	Promotion or advocacy of Death or Injury. Murder or Willful bodily harm to others. Graphic depictions of willful harm to others. Incendiary content provoking, enticing, or evoking military aggression. Live action footage/photos of military actions & genocide or other war crimes.	GGT1	Violence and gore
4 Death, Injury or Military Conflict		GGT9	Illness/medical
5 Online piracy	Pirating, Copyright infringement, & Counterfeiting.	GGT8	Malware
5 Online piracy	Note: GumGum Verity classifies content that covers the topics of piracy, copyright infringement, or counterfeiting. Verity does not consider whether the content itself was pirated, counterfeited, or infringes on copyright.
6 Hate speech & acts of aggression	Unlawful acts of aggression based on race, nationality, ethnicity, religious affiliation, gender, or sexual image or preference. Behavior or commentary that incites such hateful acts, including bullying.	GGT6	Hate; hate speech, harassment and cyberbullying
7 Obscenity and Profanity, including language, gestures, and explicitly gory, graphic or repulsive content intended to shock and disgust	Excessive use of profane language or gestures and other repulsive actions with the intent to shock, offend, or insult.	GGT5	Obscene; profanity/vulgarity
8 Illegal Drugs/Tobacco/ eCigarettes/ Vaping/Alcohol	Promotion or sale of illegal drug use – including abuse of prescription drugs. Federal jurisdiction applies, but allowable where legal local jurisdiction can be effectively managed. Promotion and advocacy of tobacco and eCigarette (Vaping) & Alcohol use to minors.	GGT3	Drugs and alcohol
9 Spam or Harmful Content	Malware/Phishing.	GGT8	Malware and phishing
10 Terrorism	Promotion and advocacy of graphic terrorist activity involving defamation, physical and/or emotional harm of individuals, communities, and society.	GGT1	Violence and gore (both text and image)
11 DebatedSensitive Social Issue/ Violations of Human Rights	Insensitive, irresponsible and harmful treatment of debated social issues and related acts intended to demean a particular group or incite greater conflict.	GGT6	Hate; hate speech, harassment and cyberbullying.
		GGT2	Illegal; criminal
The 4A’s floor categories do not map to this GumGum Threat category.		GGT7	Disasters

...

Integration Methods

Verity integration clients include Publishers who can sell ad space directly to advertisers, using Verity data to place ads with contextually targeted content, or to avoid brand-unsafe content.

...

Verity offers separate APIs for Page and Video Analysis via server-to-server (S2S) connections. In either case a user or client application calls the Verity API, specifying the URL of content to be analyzed. Clients implement webhooks to listen for the JSON response body results on a Verity callback URL.

Page Tags

In this case, Publishers implement a page tag that automatically calls Verity to analyse a page whenever a user visits the page.

A Publisher could set pages up to fetch new ads based on the page keywords identified by Verity. In this case, a callback could publish targeting keywords using Verity data, then fetch new ads using Google Publisher Tag refresh functionality. The page could be configured to disable the initial loading of ads until Verity returns analysis data.

...

Processing Time

Once a request is sent, Verity takes less than a second to return an initial response, indicating whether or not data is already available for the URL.

...

If the request is for new digital content, Verity initiates an asynchronous process to analyze the content and correlate the results into a Verity response. It may take a few minutes to complete processing for new media.

Machine Learning Model Development

The Verity team carefully selects and trains machine learning models for each contextual and brand-safety classification. As part of the normal Verity lifecycle, existing models are continually enhanced or seamlessly replaced with higher-performing models.

...

GumGum’s legal and business teams carefully monitor all technology partner relationships on an ongoing basis.

Classification Quality Maintenance

The Verity team constantly runs A/B testing to evaluate alternative data models and competitor results. On a quarterly basis, Verity also maintains a Rolling KPI quality check where URLs are collected randomly from Publisher domains and added to a Gold Standard Data Set.

The URLs are human-annotated for threat and contextual classifications using both individual annotators and data annotation platforms. The Verity team runs classification processes, checks the results, and determines remediation or enhancement steps.

Text Ingestion Limitations

Verity service page content ingestion has the following limitations:

Processes only the first 20,000 characters on any page in any supported language.
Cannot process infinite scrolling pages.
Cannot process pages loaded by Javascript.

Image Support

Verity applies logic to identify the prominent image on a web page for analysis. Additional images on the page may be subject to image extraction limitations. Supported image formats are:

BMP
EPS
ICNS
ICO
IM
JPEG

JPEG 2000
MSP
PCX
PNG
PPM
SGI

SPIDER
TIFF
WebP
XBM

Video Data Analyzed

The Verity Video analysis pipeline processes and analyzes video content and metadata, specifically:

...

Supported formats are MPEG-4, MOV, MP3, FLAC, and M3U8. The maximum video size is 2 GB.

Verity Does Not Process User Information

Verity does not process or store user information (such as cookies or browsing history). Verity analysis is based solely on the content of media analyzed.

Verity Does Not Process User Generated Content (UGC)

Verity does not process or analyze UGC, such as Comments, Social Media posts, or forum posts.

...

Versions Compared

Old Version 5

New Version 6

Key

Summary

Primary Users and Use Cases

Verity Platform Functions

Web Page Analysis Functions

Video Analysis Functions

Verity Machine Learning Technology

Architecture and Flow

Access for Verity User Agents

Page Analysis Process

Verity Video Analysis

Verity Video Analysis Process

Verity Brand Safety

Verity Content Classification

IAB Categories

Event Categories

Keywords

Sentiment

Verity Classification and Brand Safety Report

Classification and Scoring

Verity and the 4A’s Brand Safety Floor

Integration Methods

Page Tags

Processing Time

Machine Learning Model Development

Classification Quality Maintenance

Text Ingestion Limitations

Image Support

Video Data Analyzed

Verity Does Not Process User Information

Verity Does Not Process User Generated Content (UGC)

Page Comparison

Versions Compared

Old Version 5

New Version 6

Key

Summary

Primary Users and Use Cases

Verity Platform Functions

Web Page Analysis Functions

Video Analysis Functions

Verity Machine Learning Technology

Architecture and Flow

Access for Verity User Agents

Page Analysis Process

Verity Video Analysis

Verity Video Analysis Process

Verity Brand Safety

Verity Content Classification

IAB Categories

Event Categories

Keywords

Sentiment

Verity Classification and Brand Safety Report

Classification and Scoring

Verity and the 4A’s Brand Safety Floor

Integration Methods

Page Tags

Processing Time

Machine Learning Model Development

Classification Quality Maintenance

Text Ingestion Limitations

Image Support

Video Data Analyzed

Verity Does Not Process User Information

Verity Does Not Process User Generated Content (UGC)