Verity processes standalone audio tracks (or audio tracks extracted from video) by transcribing the audio to text. The speech-to-text output is enriched with any available video metadata (such as video title and description) then sent to Verity’s Natural Language Processing (NLP) machine learning models for classification.