Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Transcribe Service – Applies automatic speech recognition (ASR) to convert speech to text.

  • OCR Service –  Performs Optical Character Recognition (OCR) to detect text in video and convert the detected text into machine-readable text. 

  • Verity Text Processing – Applies machine learning models to the video metadata, title, transcription text, and OCR text and provides a brand safety and contextual classification report.

  • Verity Image Processing – Applies machine learning models to sampled video frames and provides a brand safety report.

Video Analysis Process

The Verity video analysis process involves the following core components:

...

  1. Verity API Gateway: The Verity API Gateway receives a video URL request, authenticates the client request and passes the URL to the Verity API.

  2. Verity API: The Verity API passes the request to the Video Transcribe component to orchestrate video transcription and optical character recognition. 

  3. Video Transcribe: Video Transcribe downloads the video from the request URL and stores the video. Verity API initiates a transcription job with the transcription service. If the video is in MU38 format it is transcoded prior to transcription. Once the transcription service finishes a job it sends the results back to the object storage service, triggering  a notification to the Verity API.

  4. Verity API/OCR service:The Verity API verifies if the transcription results contain a sufficient sample of words. If not, Verity API requests Video Transcribe to initiate an OCR job. Upon OCR job completion,  Verity API receives a notification and retrieves the OCR text results. Verity API passes the concatenated text results (comprising transcription, OCR, Client metadata title and description) to Verity Text Processing.

  5. Verity Text Processing: The Text Processing engine processes the video transcription, OCR, client metadata title and description by applying Natural Language Processing (NLP) for text classification (e.g. IAB Content Categories v2.0 and Threat categories) and information extraction (e.g. Keywords). 

  6. Verity Image Processing: The Image Processing engine processes the video frame samples by applying Computer Vision (CV) for image classification (e.g. Threat categories).

  7. Verity Report: The Verity API accepts the text analysis results, applies result weighting and merging logic, then returns the final video analysis Verity Report to the client.

...

  • Audio
    Transcription of the video's audio track. The maximum transcription length supported is 14400 seconds.

  • OCR
    Text and cursive text detected in the video frames. OCR is included in the process when the video transcription yields fewer than 50 words. 

  • Metadata and Title
    Page title and metadata.

  • Video Frame Sampling

  • 1 frame per second

Supported formats are MPEG-4, MOV, MP3, FLAC, and M3U8. The maximum video size is 2 GB.

...