Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

IAB Content Categories 

Content classifier predicts the likelihood that the given content belongs to one or more IAB categories.

Keywords

A set of rules derives, scores, and ranks the most important keywords.

Sentiments

Machine learning predicts the sentiment of each sentence within content by applying models trained on content with varying tones of voice. Verity returns an aggregated breakdown of the proportion of sentences in the content that are positive, neutral or negative (referred to as Document Level Sentiment Analysis). There are inherent accuracy limitations for sentiment reporting, as this varies by data set, largely due to the subjective nature of the classification task. Our studies have shown that Neutral is typically the highest scoring sentiment value for documents analyzed.

Content classification is used for targeting purposes so Verity favors Precision over Recall. Data Scientists use Precision-Recall curves to maximize Precision with minimum loss in Recall, thereby maximizing the accuracy of the classified targets.

...

Limitation

Description

Maximum characters per page

Verity processes only the first 20,000 characters on any page in any supported language. Note that, according to the Verity team’s research, the majority of web pages are under 7,500 characters per page. Few pages exceed the 20,000 character limitation.

Insufficient Content

Where Verity’s content extraction processes cannot extract sufficient relevant content from a page (typically 50 text characters or less), Verity is unable adequately perform classification tasks across text. An error message INSUFFICIENT_CONTENT is returned. The benefit of excluding insufficient content from Verity analysis is that classifications are only made based on meaningful amounts of data, enabling increased accuracy across all classes. 

Infinite scrolling pages

Infinite scrolling enables users to keep scrolling through information on a web page, without clicking a “Load More” or “Next Page” option. Many platforms, such as espn.com, have implemented Infinite Scrolling, as information loads quickly and maintains user engagement. In many Infinite Scrolling environments, each component page of the Infinite Scroll page has its own URL and the URL changes as the content is loaded. As Verity has a 20,000 character maximum limit and only processes page URLs that are specifically requested by the partner, Verity typically does not process the complete content of an Infinite Scrolling page.

Dynamically rendered pages

Dynamic web pages contain content that is generated automatically from a web server via Javascript, instead of being hard-coded on the page. The content of the page may change based on multiple variables, for example, new data on the web server or user selection. The content of these page can only be reliably discovered by rendering the page. Verity therefore does not attempt to classify dynamically rendered pages.

Home pages

Home pages for a site may have more complicated layouts than the main corpus of the site content and often contain text passages quoted from other pages on the site. Verity’s contextual categorization of home page content may therefore be less useful than the classification of other pages on the site.

Intricate page layouts

Some sites may implement complex HTML and CSS schemes that may require rendering to reveal the main body text of the pages. These design practices are not typically employed by established publishers and therefore rarely impede Verity content extraction.

User Generated Content (UGC)

Verity does not process or analyze UGC, such as Comments or Social Media posts. UGC is constantly changing, therefore Verity does not attempt to provide a UGC content classification that could immediately become outdated.

Embedded video content

Verity video classification requires access to the video asset directly, to perform content-level analysis. As such, video content embedded within a webpage (or hosted video player) is not considered in Page Classification reporting. There is the potential that the page classification may vary entirely, or in part, from the video classification, within which a video ad may be served. Verity page classification reporting should be used to support page-level ad targeting or avoidance. Verity video classification reporting should be used to support video-level ad targeting or avoidance.

Site Access Limitations

Partner restrictions on website access may limit Verity’s ability to download content. Typically, to bypass partner site restrictions, Verity partners configure their Allow lists enabling Verity user agents to access their content.

...