Table of Contents | ||
---|---|---|
|
This Description of Methodology (DoM) describes the processes that deliver Verity GumGum Contextual – GumGum’s content-level contextual analysis and brand safety solution.
Powered by GumGum’s AI technology, Verity GumGum Contextual applies sophisticated machine learning techniques to analyze digital content, including web pages, images, and videos (plus audio).
Once analysis is complete, Verity GumGum Contextual returns a detailed report featuring a brand safety score scores for the content, along with contextual targeting categories, prominent keywords, event and sentiment categories. Verity
GumGum Contextual supports the contextual targeting categories defined in the Interactive Advertising Bureau (IAB) Content Taxonomy v1.0, 2.0, and 23.0.
Primary Users and Use Cases
Verity serves publishers, DSPs, agencies, and advertisers as a third-party content-level contextual analysis and brand safety solution.
...
Media Ratings Council (MRC) Content-Level Accreditation
GumGum Contextual is the first independent third-party solution to achieve MRC accreditation for content-level brand safety.
This recognition by the MRC validates that GumGum’s proprietary contextual intelligence solution is able to consider all available signals (text, image, audio, and video) needed to give a true contextual reading.
GumGum Contextual is officially accredited for content-level Contextual Analysis, Brand Safety and Brand Suitability for English-language text/image, video image, and audio classification (Desktop, Mobile Web, CTV).
Primary Users and Use Cases
GumGum Contextual serves agencies, advertisers, DSPs, and publishers as a third-party content-level contextual analysis and brand safety data solution.
Operating as a fee-based third-party service in the cloud, publishers can integrate GumGum Contextual into content management systems (CMS) or data management platforms (DMP) to analyze and optimize media content.
Supply-side and demand-side platforms (SSPs and DSPs) can implement the Verity GumGum Contextual service on their own technology platforms, ad exchanges, and ad servers. There There are two primary product use cases:
Increased Brand Safety —
...
Advertisers can deploy
...
GumGum Contextual to detect objectionable content and avoid serving their advertising messaging adjacent to or embedded within that content. Publishers can use GumGum Contextual to identify and assess potentially objectionable digital content prior to publication.
Optimum Contextual Targeting — Advertisers and Publishers
...
can access the
...
GumGum Contextual service to locate content that is highly relevant
...
, enabling contextually
...
aligned advertising
...
to be served
...
.
Verity’s GumGum Contextual’s core technology remains unchanged for each implementation. Integrations are accomplished via the Verity GumGum Contextual API.
As of September 20202024, GumGum's Verity GumGum Contextual service processes approximately 1 ~2.5 billion unique monthly requests per month for content and brand safety classification (100% originating within North America)globally.
...
GumGum Contextual Functions
Verity’s GumGum Contextual’s function is to provide data to clients who explicitly request and pay for analysis information about specific digital content. The clients are interested in establishing brand suitability and contextual classification for specific content, to drive their own content creation or ad serving decisioning.
Verity GumGum Contextual applies natural language processing (NLP) and computer vision (CV) based machine learning techniques to analyze digital content. Multiple kinds of content can be analyzed, such as desktop and mobile web pages, images, and Online Video Platform platforms (OLV) and Over-The-Top (OTTconnected TV (CTV) videos (including audio).
Web Page Analysis Functions
Going beyond simple strategies like identifying keywords on the page or in the URL string or metadata, Verity GumGum Contextual works by scanning the full text and prominent imagery of a web page. Verity’s GumGum Contextual’s NLP processes analyse analyze the core page content, while CV processes analyse analyze the imagery.
Verity GumGum Contextual provides what the Media Ratings Council (MRC) refers to as content-level reporting. defined as “more granular context and brand safety measurement and reporting for video and display content within a domain, site, platform, mobile application or URL”.
Note the following details about Verity GumGum Contextual web page content-level processing:
Verity GumGum Contextual does not apply content-level analysis to code or objects (including third-party code or objects) that appear outside, adjacent to, or embedded within the core text on a page.
Verity GumGum Contextual does not download or analyze the CSS, JavaScript, navigation, footer, sidebars, and other areas extraneous to the core textual content on the page. For example, on a typical Blog page Verity GumGum Contextual extracts and analyzes the central content of the page, but not the surrounding elements such as third-party advertising or related content.
Verity GumGum Contextual also does not provide analysis of continually changing dynamically loaded user-generated content within publisher pages (e.g., reviews sections, comments sections, social media plug-ins) or social media environments.
Verity GumGum Contextual applies logic to identify the prominent image on a web page for analysis. Additional images on the page may be subject to image extraction limitations.
GumGum informs clients that Verity GumGum Contextual analyzes the web page (not the surrounding material) specifying that the analysis includes the core textual content and prominent imagery but nothing else – not graphics, sidebar content, or third-party insertions such as paid advertising.
Verity GumGum Contextual acknowledges that surrounding, adjacent, or embedded content on a web page (which may be provided by JavaScript executions or non-textual content) can affect the context of a page as presented to users and may be a consideration for advertisers.
Other key platform functions such as ad serving, detection of ad fraud, identification of invalid traffic (IVT/SIVT), measurement of viewability, measurement of audiences, and other cookie implementations are not handled by Verity GumGum Contextual or its technology.
Video Analysis Functions
Verity GumGum Contextual analyzes video content by applying powerful classifiers to the video’s transcribed audio track and image data from sampled video frames.
Video analysis leverages GumGum’s industry-leading NLP text analysis processes, and CV image analysis processes, plus fast and accurate audio transcription services.
...
GumGum Contextual Machine Learning Technology
Verity GumGum Contextual is the only solution that applies machine learning techniques to provide content-level brand safety and contextual analysis. Alternative solutions may only leverage keyword methodologies to look at that consider the text and are limited to page-level analysis, use of Allow or Block listsBlocklists, or URL-level analysis. These more crude cruder contextual approaches often eliminate safe and relevant inventory. They also miss relevant content (e.g., keywords that are spelled differently), overlook related content, and mistakenly target irrelevant content (e.g., keywords with multiple meanings).
Verity’s GumGum Contextual’s supervised machine learning works by first training a machine learning model with training data that comprises thousands of pieces of example content (i.e. pages, images, and videos) for each category paired with the correctly labeled outputs. For example, to learn how to classify a GumGum threat category on “Drugs and alcohol”, first a human has to hand-annotate thousands of pieces of content that have something to do with drugs or alcohol.
The supervised learning algorithm searches for patterns in the data that correlate with the desired outputs. After training, the supervised learning algorithm can process new unseen pages and label them with a classification based on the prior training data. For example, the model could predict whether digital content references drugs or alcohol and classify it accordingly for the purposes of brand safety.
Architecture and Flow
Customers use Verity GumGum Contextual to analyze specific digital content and determine the eligibility of the content for ads. Verity GumGum Contextual does not crawl the internet for content; instead, a client application calls Verity GumGum Contextual (via their integration with the Verity GumGum Contextual API) specifying the URLs of specific content they’d like to analyze.
GumGum's Verity GumGum Contextual service exists entirely within a secure Cloud infrastructure. Verity’s GumGum Contextual’s Cloud-based architecture is massively scalable and currently processes approximately 1 2.5 billion unique requests per month for content and brand safety classification.
Access for
...
GumGum Contextual User Agents
If a requested URL blocks a Verity GumGum Contextual browser, Verity GumGum Contextual cannot process the content and returns an error. Verity GumGum Contextual customers are therefore requested to configure their domain access permissions to enable Verity GumGum Contextual to access their site in order to extract and process content.
Page Analysis Process
The Verity GumGum Contextual page analysis process involves the following core components:
...
GumGum Contextual API Gateway: The
...
GumGum Contextual API Gateway receives a page URL request, authenticates the client request and passes the URL to the
...
GumGum Contextual API.
...
GumGum Contextual API: The
...
GumGum Contextual API initiates the request and then orchestrates the Content Extractor, Text and Image analyses systems to extract the page data and perform the analyses.
Content Extractor: The Content Extractor accepts page requests sent by the
...
GumGum Contextual API from a queue. The Content Extractor loads the page URL, downloads the page title, metadata, and HTML and saves it as a text string in the database. If a prominent image is identified for the page, the Content Extractor downloads and saves the image to the database with identification information for the associated page. The Content Extractor passes the Page URL and image information on for text and image analysis.
Text Analysis: The Text Analysis engine applies Natural Language Processing (NLP) for text classification (e.g. IAB and Threat categories) and information extraction (e.g. Keywords).
Image analysis: The Image Analysis engine houses GumGum’s core Computer Vision capabilities in a modular architecture. The Image Analysis component passes images through multiple data models to determine their classification information.
...
GumGum Contextual Report: The
...
GumGum Contextual API retrieves the text and image classification results, applies weighting and merging logic to the results, and returns the final
...
GumGum Contextual page report to the client.
Video Analysis
Verity GumGum Contextual analyzes videos for the purposes of content-level contextual targeting and brand safety.
Verity GumGum Contextual works by applying machine learning techniques to the video audio track, sampled video frames, and video metadata (where available) and assigning contextual categories, detecting keywords, and calculating a brand safety score.
Verity GumGum Contextual Video Analysis leverages the following systems:
Transcribe Service – Applies automatic speech recognition (ASR) to convert speech to text.
OCR Service – Performs Optical Character Recognition (OCR) to detect text in video and convert the detected text into machine-readable text.
Verity GumGum Contextual Text Processing – Applies machine learning models to the video metadata, title, transcription text, and OCR text and provides a brand safety and contextual classification report.
GumGum Contextual Image Processing – Applies machine learning models to sampled video frames and provides a brand safety report.
Video Analysis Process
The Verity GumGum Contextual video analysis process involves the following core components:
...
...
GumGum Contextual API Gateway: The
...
GumGum Contextual API Gateway receives a video URL request, authenticates the client request and passes the URL to the
...
GumGum Contextual API.
...
GumGum Contextual API: The
...
GumGum Contextual API passes the request to the Video
...
Service to orchestrate video
...
analysis.
Video
...
Service: the Video
...
Service downloads
...
video
...
and audio into separate files.
Audio Transcribe: The audio file is sent for transcription.
Optical Character Recognition (OCR): GumGum Contextual API verifies if the audio transcription results contain a sufficient sample of at least 50 words. If not,
...
GumGum Contextual API initiates an OCR job to detect text in the video file and convert the detected text into machine-readable text.
Prism Video Frame Threat Classifier: Video is sent to the Video Threat Classifier for brand safety analysis of video frames.
GumGum Contextual Text Processing: GumGum Contextual API passes concatenated text results (comprising transcription, OCR if available, Client metadata title and description) to
...
GumGum Contextual Tapas Text Processing.
...
The Text Processing engine processes the video transcription, OCR, client metadata title and description by applying Natural Language Processing (NLP) for text classification (e.g. IAB Content Categories v2.0 and Threat categories) and information extraction (e.g. Keywords).
...
GumGum Contextual Report: The
...
GumGum Contextual API accepts the text analysis results, applies result weighting and merging logic, then returns the final video analysis
...
GumGum Contextual Report to the client.
Brand Safety
Verity GumGum Contextual Machine learning predicts threat categories by applying data models trained on collections of various kinds of threatening content. Verity’s GumGum Contextual’s sophisticated Computer Vision machine learning can identify threatening scenes, such as natural disasters or accidents. Object detection picks out potentially threatening objects within an image, such as weapons, exposed skin or drinks.
Verity GumGum Contextual detects brand safety threats for each of the following categories.
...
These categories align with the 4As Advertising Assurance GARM’s Brand Safety Floor and Brand Suitability Framework.
Clients can set a unique threshold or risk-tolerance level for each threat category. For example, a healthcare provider may choose to set no threshold for the “Medical” threat category, yet higher thresholds for categories that are less suitable for ad placement (e.g., “Hate”, “Violence”, or “Obscene”).
Content Classification
Verity GumGum Contextual works by applying machine learning techniques to relevant content to assign contextual categories.
IAB Categories
The Interactive Advertising Bureau (IAB) defines a Content Taxonomy to provide Publishers publishers with a consistent and easy way to organize their website content, and enable advertisers to target standard content categories. Verity GumGum Contextual returns all IAB hierarchy tiers for both versions 1.0, 2.0 and 23.0 of the taxonomy:
IAB V1 – 2 tiers - 372 categories
IAB V2 – 4 tiers - 698 categories
...
IAB V3 - 4 tiers - 709 categories
For example, Verity GumGum Contextual analysis of an article on “The Rise of Alternative Venture Capital” identifies IAB v1.0 categories in 2 tiers, and IAB v2.0 and v3.0 categories in 4 tiers.
Event Categories
GumGum Events offer hundreds of categories that add another layer of targeting on top of the IAB standard categories and provide more granularity. For example, IAB v2 offers a single category for “National & Civic Holidays”, while GumGum covers content about specific holidays, like “Thanksgiving” and “Christmas.”
Keywords
Keywords are derived from content, metadata, and headlines. Verity ranks keywords according to frequency of use and prominence. Objects and scenes detected
Keywords
Keywords are derived from content, metadata, and headlines. GumGum Contextual ranks keywords according to frequency of use and prominence. Objects detected in an image may be included in the list of keywords.
Sentiment
Verity predicts the sentiment of each sentence within content (referred to as Document Level Sentiment Analysis), and returns an aggregated breakdown of the proportion of sentences within content that are positive, neutral or negative. Sentiment thresholds are entirely up to the Publisher to set. Across the web, “neutral” is the most common primary sentiment classification.
Verity Classification and Brand Safety Report
...
GumGum Contextual Classification and Brand Safety Report
The GumGum Contextual report includes complete brand safety, keyword, and categorization analysis data for the requested content. Each report contains the following analysis results:
dataAvailable | States whether the classification request has already been processed. If |
---|
processed data exists, GumGum Contextual returns the results from the database. If not |
GumGum Contextual starts a new processing request. | |
status | The current processing status of the analysis request. |
---|---|
pageUrl |
Url | The URL of the page, video, image, or |
---|
text analyzed by |
GumGum Contextual, as applicable. | |
uuid | A unique identifier generated for the classification request. |
---|---|
languageCode | The standard ISO 639-1 code for the language of the content. |
English
Japanese
Refer to the Language Support Grid for the latest supported languages. Note: If |
GumGum Contextual detects an unsupported language, a status of NOT_SUPPORTED is returned. |
iab |
---|
IAB |
contextual categories are defined in the IAB Content Taxonomy and are widely adopted in programmatic and Real-Time-Bidding (RTB) ad marketplaces. |
Tier 1 identifies broad level categories, such as Pets, defined with the following targeting depths:
Category/portal
Site section
Page
Tier 2 and greater identify more granular categories, such as Dogs, and are nested under Tier 1 categories.
Refer to the Verity Taxonomy document for a listing of IAB v1 categories.
Verity video analysis does not support IAB v1.0 categories.
iab v2
The IAB v2.0 categories identified for the content.
The IAB defined a more granular content taxonomy in IAB Tech Lab Content Taxonomy v2.0 (released in 2017). IAB v2.0 defines additional content classifications and restructures existing IAB v1.0 classifications.
Each IAB v2.0 category has a unique three-digit ID, and is structured into a tiered hierarchy with up to 4 tiers of categories.
Refer to the Verity Taxonomy for a listing of IAB v2 categories.
keywords
The top Keywords identified for the content, listed in order of prominence.
safe
The final aggregated Brand Safety summary result for the content.
If any threat classifications are identified with a high-risk level, the safe value is false and the content is considered unsafe.
If no (or low-risk) threat classifications are identified, the safe value is true, and the content is considered safe.
threats
Threat categories are part of GumGum’s brand safety taxonomy. GumGum classifies content into nine threat categories. For a complete list of Threat category IDs and Names, refer to Threat Categories in the Verity Taxonomy document.
To detect possible threats, Verity analyzes and scores all the extracted content. Verity then correlates the scores to determine a per-category threat risk-level for the content.
Possible threat category risk-levels are:
VERY_HIGH
HIGH
MODERATE
LOW
VERY_LOW
events
The Events classifier identifies seasonal events such as the Olympics (e.g. annual, bi-annual, 4-yearly events) for the purposes of contextual ad targeting.
Verity lists up to five Event categories, in order of prominence. For a complete list of Event category IDs and Names, refer to Event Categories in the Verity Taxonomy document.
Verity video analysis does not support Events.
sentiments
Identifies and extracts opinions within digital content.
The positive, neutral, and negative levels of sentiment expressed in the content are evaluated. For contextual targeting purposes, a sentiment level of neutral or positive is generally recommended.
processedAt
The date and time of the classification.
Classification and Scoring
Verity analyses threat, contextual categories, keywords and sentiment results in different ways. The data models Verity implements vary for different purposes and are fine-tuned and optimized on an ongoing basis.
IAB Content Categories
v1 and v2
Content classifiers predict the likelihood that the given content belongs to one or more IAB categories.
Threats
Machine learning predicts threat categories by applying data models trained on collections of various kinds of threatening content.
Events
Machine learning predicts event categories by applying data models trained on large-scale collections of event-related content pages.
Keywords
A set of rules derives, scores, and ranks the most important keywords from content based on prominence and term frequency–inverse document frequency (TF-IDF) scores.
Sentiments
GumGum Contextual supports current versions of the IAB Content Taxonomy. The GumGum Contextual team keeps track of new taxonomy releases and implements updates in a timely fashion. Refer to the GumGum Contextual Taxonomy document for a listing of IAB contextual categories. | |
keywords | The top Keywords identified for the content, listed in order of prominence. |
---|---|
safe | The final aggregated Brand Safety summary result for the content. If any threat classifications are identified with a risk level of HIGH, the safe value is false and the content is considered unsafe. If no (or low-risk) threat classifications are identified, the safe value is true, and the content is considered safe. |
threats | Threat categories are part of GumGum’s brand safety taxonomy. GumGum classifies content into nine threat categories. For a complete list of Threat category IDs and Names, refer to Threat Categories in the GumGum Contextual Taxonomy document. To detect possible threats, GumGum Contextual analyzes and scores all the extracted content. GumGum Contextual then correlates the scores to determine a per-category threat risk-level for the content. Possible threat category risk-levels are:
|
processedAt | The date and time of the classification. |
Classification Approaches
GumGum Contextual analyses threat, contextual categories, keywords and sentiment results in different ways. The data models GumGum Contextual implements vary for different purposes and are fine-tuned and optimized on an ongoing basis.
Partners should be aware that, as with any machine learning technology, performance is highly dependent on the specific data set being analyzed, consequently no single error rate nor range exists. GumGum Contextual handles proprietary data sets and cannot disclose proprietary partner result data.
GumGum Contextual calculates and measures error rates in the form of Precision, Recall, F1, and F2 for each machine learning model. As part of this process, GumGum:
Engages data annotation leveraging human-annotators to establish Ground Truth for various data sets.
Works with third-party vendors and research consultants to conduct relevancy testing.
Note: If a GumGum Contextual data set that has been delivered to a partner is deemed erroneous or incomplete, GumGum will follow the GumGum Contextual Data Reissuance Policy.
The following sections outline the data models and scoring used for Brand Safety and Contextual Classification in GumGum Contextual, and points to a relevant third-party study.
Brand Safety Classification and Scoring
GumGum Contextual’s brand safety classification relies on GumGum’s threat data model. The threat model is trained on collections of various kinds of threatening content.
As brand safety and content classification serve different purposes, GumGum Contextual considers different approaches for scoring brand safety versus content classification models. Both approaches use Recall scoring (e.g. out of all the images of weapons in a dataset, how many weapons were identified) and Precision scoring (e.g. the number of times an image identified as a weapon was actually a weapon).
Brand safety is a threat detection algorithm, so in this case GumGum Contextual favors Recall over Precision. Data Scientists use Precision-Recall curves to maximize Recall with minimum loss in Precision, thereby maximizing the number of potential threats classified.
GumGum Contextual results comprise risk and confidence levels for each Threat category.
The risk level represents the risk potential of unsafe content within a page, video, image, or text string. Possible risk levels are LOW, MEDIUM and HIGH.
In traditional statistical measures, confidence in observed results may be assessed according to the number of samples involved in a test. Larger scale sampling leads to a higher confidence score. However, GumGum Contextual confidence levels are not related to the quantity of sample data. For example:
A threat category result “confidence”: “VERY_LOW” should be interpreted as GumGum Contextual identifying a very low risk for that category within the content, with a high level of confidence.
A threat category result “confidence”: “VERY_HIGH” should be interpreted as GumGum Contextual identifying a very high level risk for that category within the content, with a high level of confidence.
Contextual Classification and Scoring
GumGum Contextual analyses contextual categories, keywords and sentiment results using various methods and data models, outlined in the following table:
Integration Methods
Verity integration clients include Publishers who can sell ad space directly to advertisers, using Verity data to place ads with contextually targeted content, or to avoid brand-unsafe content.
Verity client integrations also include video implementations, such as a Contextual Video Marketplace where brands and advertisers can access Verity’s contextual and brand-safety data for the marketplace Publishers’ video inventory.
Clients leverage Verity data via RESTful API or Page Tag integration. In both cases, Verity analysis results are returned in a JSON response body.
API Integration
Verity offers separate APIs for Page and Video Analysis via server-to-server (S2S) connections. In either case a user or client application calls the Verity API, specifying the URL of content to be analyzed. Clients implement webhooks to listen for the JSON response body results on a Verity callback URL.
Page Tags
In this case, Publishers implement a page tag that automatically calls Verity to analyse a page whenever a user visits the page.
A Publisher could set pages up to fetch new ads based on the page keywords identified by Verity. In this case, a callback could publish targeting keywords using Verity data, then fetch new ads using Google Publisher Tag refresh functionality. The page could be configured to disable the initial loading of ads until Verity returns analysis data.
Processing Time
Once a request is sent, Verity takes less than a second to return an initial response, indicating whether or not data is already available for the URL.
If data is available (i.e. the content has been processed recently and results are in the database) the Verity response is returned immediately.
If the request is for new digital content, Verity initiates an asynchronous process to analyze the content and correlate the results into a Verity response. It may take a few minutes to complete processing for new media.
Machine Learning Model Development
The Verity team carefully selects and trains machine learning models for each contextual and brand-safety classification. As part of the normal Verity lifecycle, existing models are continually enhanced or seamlessly replaced with higher-performing models.
GumGum develops machine learning models and also works with technology partners in various ways. GumGum:
Engages data annotation companies to provide human annotators and crowdsourced data annotation platforms.
Adopts machine learning models from technology partners or open source frameworks.
When working with a technology partner, GumGum:
Verifies that the technology partner is a good fit for the Verity service.
Vets the technology partner for quality of service (e.g. by completing a pilot implementation with GumGum as a proof of concept).
Validates that the technology partner business is legitimate and appropriately licensed and that the relationship does not pose any undue risk to GumGum.
Contractually obligates the technology partner to comply with all applicable laws and regulations (international, federal and state).
GumGum’s legal and business teams carefully monitor all technology partner relationships on an ongoing basis.
Classification Quality Maintenance
The Verity team constantly runs A/B testing to evaluate alternative data models and competitor results. On a quarterly basis, Verity also maintains a Rolling KPI quality check where URLs are collected randomly from Publisher domains and added to a Gold Standard Data Set.
The URLs are human-annotated for threat and contextual classifications using both individual annotators and data annotation platforms. The Verity team runs classification processes, checks the results, and determines remediation or enhancement steps.
Text Ingestion Limitations
Verity service page content ingestion has the following limitations:
Processes only the first 20,000 characters on any page in any supported language.
Cannot process infinite scrolling pages.
Cannot process pages loaded by Javascript.
Image Support
...
IAB Content Categories | Content classifier predicts the likelihood that the given content belongs to one or more IAB categories. |
---|---|
Keywords | A set of rules derives, scores, and ranks the most important keywords. |
Sentiments | Machine learning predicts the sentiment of each sentence within content by applying models trained on content with varying tones of voice. GumGum Contextual returns an aggregated breakdown of the proportion of sentences in the content that are positive, neutral or negative (referred to as Document Level Sentiment Analysis). |
As brand safety and content classification serve different purposes, Verity considers different approaches for scoring brand safety versus content classification models. Both approaches use Recall scoring (e.g. out of all the images of weapons in a dataset, how many weapons were identified) and Precision scoring (e.g. the number of times an image identified as a weapon was actually a weapon).
Brand safety is a threat detection algorithm, so in this case Verity favors Recall over Precision. Data Scientists use Precision Recall curves to maximize Recall with minimum loss in Precision, thereby maximizing the number of potential threats classified.
Content classification is used for targeting purposes. In this case, GumGum favors Precision over Recall. Data Scientists use Precision recall curves to maximize Precision with minimum loss in Recall, thereby maximizing the accuracy of the classified targets.
Verity and the 4A’s Brand Safety Floor
The 4A’s, the leading trade organization for marketing communications agencies, defines the Advertising Assurance Brand Safety Floor and Brand Suitability Framework (revised in May 2020). The following table details the mapping between the 4A’s Brand Safety Floor and GumGum’s threat categories.
...
4A’s Floor
...
GumGum’s Verity brand safety categories
...
Category
...
Definition
...
Category
...
1 Adult & Explicit Sexual Content
...
Illegal sale, distribution, and consumption of child pornography.
Explicit or gratuitous depiction of sexual acts, and/or display of genitals, real or animated.
...
GGT4
...
Sexual; sexually charged
...
2 Arms & Ammunition
...
Promotion and advocacy of Sale of illegal arms, rifles, and handguns.
Instructive content on how to obtain, make, distribute, or use illegal arms.
Glamorization of illegal arms for the purpose of harm to others.
Use of illegal arms in unregulated environments.
...
GGT1
...
Violence and gore
...
GGT2
...
Illegal/criminal
...
3 Crime & Harmful acts to individuals and Society and Human Rights Violations
...
Graphic promotion, advocacy, and depiction of willful harm and actual unlawful criminal activity – Explicit violations/demeaning offenses of Human Rights (e.g. human trafficking, slavery, self harm, animal cruelty etc.),
Targeted harassment of individuals and groups
...
GGT1
...
Violence and gore
...
GGT2
...
Illegal/criminal
...
4 Death, Injury or Military Conflict
...
Promotion or advocacy of Death or Injury.
Murder or Willful bodily harm to others.
Graphic depictions of willful harm to others.
Incendiary content provoking, enticing, or evoking military aggression.
Live action footage/photos of military actions & genocide or other war crimes.
...
GGT1
...
Violence and gore
...
GGT9
...
Illness/medical
...
5 Online piracy
...
Pirating, Copyright infringement, & Counterfeiting.
...
GGT8
...
Malware
...
Note: GumGum Verity classifies content that covers the topics of piracy, copyright infringement, or counterfeiting. Verity does not consider whether the content itself was pirated, counterfeited, or infringes on copyright.
...
6 Hate speech & acts of aggression
...
Unlawful acts of aggression based on race, nationality, ethnicity, religious affiliation, gender, or sexual image or preference.
Behavior or commentary that incites such hateful acts, including bullying.
...
GGT6
...
Hate; hate speech, harassment and cyberbullying
...
7 Obscenity and Profanity, including language, gestures, and explicitly gory, graphic or repulsive content intended to shock and disgust
...
Excessive use of profane language or gestures and other repulsive actions with the intent to shock, offend, or insult.
...
GGT5
...
Obscene; profanity/vulgarity
...
8 Illegal Drugs/Tobacco/
eCigarettes/
Vaping/Alcohol
...
Promotion or sale of illegal drug use – including abuse of prescription drugs.
Federal jurisdiction applies, but allowable where legal local jurisdiction can be effectively managed.
Promotion and advocacy of tobacco and eCigarette (Vaping) & Alcohol use to minors.
...
GGT3
...
Drugs and alcohol
...
9 Spam or Harmful Content
...
Malware/Phishing.
...
GGT8
...
Malware and phishing
...
10 Terrorism
...
Promotion and advocacy of graphic terrorist activity involving defamation, physical and/or emotional harm of individuals, communities, and society.
...
GGT1
...
Violence and gore (both text and image)
...
Insensitive, irresponsible and harmful treatment of debated social issues and related acts intended to demean a particular group or incite greater conflict.
...
GGT6
...
Hate; hate speech, harassment and cyberbullying.
...
GGT2
...
Illegal; criminal
...
The 4A’s floor categories do not map to this GumGum Threat category.
...
GGT7
...
Disasters
There are inherent accuracy limitations for sentiment reporting, as this varies by data set, largely due to the subjective nature of the classification task. Our studies have shown that Neutral is typically the highest scoring sentiment value for documents analyzed. |
Content classification is used for targeting purposes so GumGum Contextual favors Precision over Recall. Data Scientists use Precision-Recall curves to maximize Precision with minimum loss in Recall, thereby maximizing the accuracy of the classified targets.
Contextual Intelligence Relevancy Study
GumGum participates in publicly available third-party media studies, such as the Comparison of Contextual Intelligence Vendors and Behavioral Targeting undertaken with the Dentsu Aegis Network in 2020. The study report found that:
GumGum GumGum Contextual™ had the highest percentage of relevant pages across all four Contextual Intelligence vendors.
Partners may review the complete report, available from this link Understanding Contextual Relevance and Efficiency.
GumGum Contextual and the GARM Brand Safety Floor
Include Page | ||
---|---|---|
|
Integration Methods
GumGum Contextual integration clients include publishers who can sell ad space directly to advertisers, using GumGum Contextual data to place ads with contextually targeted content, or to avoid brand-unsafe content.
GumGum Contextual client integrations also include video implementations, such as a Contextual Video Marketplace where brands and advertisers can access GumGum Contextual’s contextual and brand-safety data for the marketplace publishers’ video inventory.
Clients leverage GumGum Contextual data via RESTful API or Page Tag integration. In both cases, GumGum Contextual analysis results are returned in a JSON response body.
API Integration
GumGum Contextual offers separate APIs for Page and Video Analysis via server-to-server (S2S) connections. In either case a user or client application calls the GumGum Contextual API, specifying the URL of content to be analyzed. Clients implement webhooks to listen for the JSON response body results on a GumGum Contextual callback URL.
Page Tags
In this case, publishers implement a page tag that automatically calls GumGum Contextual to analyze a page whenever a user visits the page.
For example, a publisher could set up a page tag to fetch new ads for the page based on the keywords identified by GumGum Contextual. Initial ad loading is disabled until GumGum Contextual returns the keyword data. A callback publishes targeting keywords using the GumGum Contextual data, then fetches new ads via Google publisher Tag refresh functionality.
Processing Time
Once a request is sent, GumGum Contextual takes less than a second to return an initial response, indicating whether or not data is already available for the URL.
If data is available (i.e. the content has been processed recently and results are in the database) the GumGum Contextual response is returned immediately.
If the request is for new digital content, GumGum Contextual initiates an asynchronous process to analyze the content and correlate the results into a GumGum Contextual response. It may take a few minutes to complete processing for new media.
Machine Learning Model Development
The GumGum Contextual team carefully selects and trains machine learning models for each contextual and brand-safety classification. As part of the normal GumGum Contextual lifecycle, existing models are continually enhanced or seamlessly replaced with higher-performing models.
GumGum develops machine learning models and also works with technology partners in various ways. GumGum:
Engages data annotation companies to provide human annotators and crowdsourced data annotation platforms.
Adopts machine learning models from technology partners or open source frameworks.
When working with a technology partner, GumGum:
Verifies that the technology partner is a good fit for the GumGum Contextual service.
Vets the technology partner for quality of service (e.g. by completing a pilot implementation with GumGum as a proof of concept).
Validates that the technology partner business is legitimate and appropriately licensed and that the relationship does not pose any undue risk to GumGum.
Contractually obligates the technology partner to comply with all applicable laws and regulations (international, federal and state).
GumGum’s legal and business teams carefully monitor all technology partner relationships on an ongoing basis.
Classification Quality Maintenance
The GumGum Contextual team constantly runs A/B testing to evaluate alternative data models and competitor results. On a quarterly basis, GumGum Contextual also maintains a Rolling KPI quality check where URLs are collected randomly from publisher domains and added to a Gold Standard Data Set.
The URLs are human-annotated for threat and contextual classifications using both individual annotators and data annotation platforms. The GumGum Contextual team runs classification processes, checks the results, and determines remediation or enhancement steps.
Page Minimum Reporting Requirements
Web pages must meet certain minimum requirements in order for GumGum Contextual to successfully process the page content:
The URL specified in the page request must be valid and meet these requirements:
Start with
http://
orhttps://
.Have a properly URL-encoded address.
Any request parameter values must be properly URL-encoded.
GumGum Contextual must be able to download HTML from the page URL.
GumGum Contextual will attempt to extract content for analysis from pages that meet the above requirements.
GumGum Contextual’s content extraction function can successfully process a wide range of web page designs, HTML markup, and image formats, however, some known issues exist that may impede the extraction of usable web page content.
Review the limitations detailed in the following sections.
Content Extraction Limitations
The following table summarizes some of the known issues GumGum Contextual may encounter when downloading and extracting pages for analysis.
Limitation | Description |
---|---|
Maximum characters per page | GumGum Contextual processes only the first 20,000 characters on any page in any supported language. Note that, according to the GumGum Contextual team’s research, the majority of web pages are under 7,500 characters per page. Few pages exceed the 20,000 character limitation. |
Insufficient Content | Where GumGum Contextual’s content extraction processes cannot extract sufficient relevant content from a page (typically 50 text characters or less), GumGum Contextual is unable adequately perform classification tasks across text. An error message |
Infinite scrolling pages | Infinite scrolling enables users to keep scrolling through information on a web page, without clicking a “Load More” or “Next Page” option. Many platforms, such as espn.com, have implemented Infinite Scrolling, as information loads quickly and maintains user engagement. In many Infinite Scrolling environments, each component page of the Infinite Scroll page has its own URL and the URL changes as the content is loaded. As GumGum Contextual has a 20,000 character maximum limit and only processes page URLs that are specifically requested by the partner, GumGum Contextual typically does not process the complete content of an Infinite Scrolling page. |
Dynamically rendered pages | Dynamic web pages contain content that is generated automatically from a web server via Javascript, instead of being hard-coded on the page. The content of the page may change based on multiple variables, for example, new data on the web server or user selection. The content of these page can only be reliably discovered by rendering the page. GumGum Contextual therefore does not attempt to classify dynamically rendered pages. |
Home pages | Home pages for a site may have more complicated layouts than the main corpus of the site content and often contain text passages quoted from other pages on the site. GumGum Contextual’s contextual categorization of home page content may therefore be less useful than the classification of other pages on the site. |
Intricate page layouts | Some sites may implement complex HTML and CSS schemes that may require rendering to reveal the main body text of the pages. These design practices are not typically employed by established publishers and therefore rarely impede GumGum Contextual content extraction. |
User Generated Content (UGC) | GumGum Contextual does not process or analyze UGC, such as Comments or Social Media posts. UGC is constantly changing, therefore GumGum Contextual does not attempt to provide a UGC content classification that could immediately become outdated. |
Embedded video content | GumGum Contextual video classification requires access to the video asset directly, to perform content-level analysis. As such, video content embedded within a webpage (or hosted video player) is not considered in Page Classification reporting. There is the potential that the page classification may vary entirely, or in part, from the video classification, within which a video ad may be served. GumGum Contextual page classification reporting should be used to support page-level ad targeting or avoidance. GumGum Contextual video classification reporting should be used to support video-level ad targeting or avoidance. |
Site Access Limitations
Partner restrictions on website access may limit GumGum Contextual’s ability to download content. Typically, to bypass partner site restrictions, GumGum Contextual partners configure their Allow lists enabling GumGum Contextual user agents to access their content.
Limitation | Description |
---|---|
Websites with login required | Some websites may require user login before any content is displayed. In these cases, GumGum Contextual will return an error and will not attempt to classify the content. However, in most cases partners add GumGum Contextual user agents to their Allow list so this issue does not arise. |
Geographic content | Content is often tailored to a specific geographic market, for example for News, Sports, or Streaming sites. The site may be designed to effectively serve a local market, or to conform to region-specific regulations such as GDPR. Websites may automatically detect the a user’s geographic address based on their IP address and dynamically serve the content targeted to their region. GumGum Contextual user agents run in the U.S.A., may be served content targeted to that market from these websites. However, most multi-national publishers run websites with country-specific domains for each nation they serve. GumGum Contextual will classify the content of the country-specific page URL requested. |
Paywall | Many Publisher websites are protected by a paywall, and limit access to their content in various ways, such as:
GumGum Contextual can often extract enough content from these page to successfully perform a classification, however in most cases the Publisher has added GumGum Contextual user agents to their Allow list, so the paywall does not impact GumGum Contextual. |
Rate limits | Web properties may want to reduce their exposure to DoS (Denial of Service) or bot attacks. Multiple requests within a short time span may trigger the website to block subsequent requests from GumGum Contextual. In this case, GumGum Contextual is unable to extract page content until the block is lifted. |
Robots.txt | A Robots.txt file may limit access to a site or parts of a site. The site may also limit the number of pages that can be downloaded (for example, only 10 pages per month). This may limit GumGum Contextual’s ability to download content from the site. |
Fake Page Content | In theory a Publisher could set up a page to return different content for a page URL, in order to manipulate GumGum Contextual’s classification results. A publisher that intentionally misrepresents page content for the purposes of avoiding or circumventing GumGum Contextual’s brand safety measures would be considered nefarious. To our awareness, GumGum Contextual has not encountered an issue of this kind. |
Image Formats Analyzed
GumGum Contextual applies logic to identify the prominent image on a web page for analysis. Additional images on the page may be subject to image extraction limitations. Supported image formats are:
|
|
|
Video Data Analyzed
The Verity GumGum Contextual Video analysis pipeline processes and analyzes video content and metadata, specifically:
Audio
Transcription of the video's audio track. The maximum transcription length supported is is 4 hours (14400 seconds).OCR
Text and cursive text detected in the video frames. OCR is included in the process when the video transcription yields fewer than 50 words.Metadata and Titletitle
Page title and metadata.Video frames
Sampling is performed at a rate of 1 frame per second.Video formats
Supported formats are MPEG-4, MOV, MP3, FLAC, and M3U8.
...
...
User Information is Not Analyzed
Verity GumGum Contextual does not process or store user information (such as cookies or browsing history). Verity GumGum Contextual analysis is based solely on the content of media analyzed.
Verity Does Not Process User Generated Content (UGC)
Verity does not process or analyze UGC, such as Comments, Social Media posts, or forum posts.