Transcribing Video To Text With Google Video Intelligence API Solutions

Case Study

The Client

The client is a next-generation global technology company that helps enterprises reimagine their businesses for the digital age. The company is primarily engaged in providing a range of software services, business process outsourcing and infrastructure services. Niveus Solutions helped the client with a speech to text solution for transcribing virtual recorded sessions using Google Video Intelligence.

Project Objective

The client desired to create a platform that can upload large recorded video files effortlessly into the cloud along with transcribing the audio into a text format using AI technologies.

They also required taking screenshots of different parts of the video to capture the architecture diagrams along with visual demos of any applications and wanted the pictures to be attached to the notes that were transcribed into text. Screenshots have to be in a logical sequence along with the transcribed converted using speech to text solutions.

The client’s highest priority was the data security, as they wanted all the confidential information to be kept in a secured place.

Business Solution

Developed a Video Intelligence Solution for transcribing meetings’ videos for KT sessions in HCL by converting speech to text and capturing screenshots of presentations/architecture diagrams/flowcharts etc.
Speakers are classified in the video and the document is sequenced in a way that the speech and related screenshots come one after the other.
It also provides a feature to mask any sensitive information in videos and omit courtesy words from transcription.

Implementation

The solution is executed using Google’s Cloud Video Intelligence API.
VideoIntelligenceServiceClient API is used to detect labels that are related to shared screens/architecture diagrams.
Transfer/upload videos to Google Cloud Storage using REST APIs actuated from the custom solution’s web UI
Upload CSV files containing video URIs and labels into the same bucket as videos for ML training
Cloud Data Loss Prevention (DLP) was used to identify sensitive data in transcripts and screenshots.
Identity-Aware Proxy (IAP) in GCP is used to allow only the authenticated users to access the application.
Web UI was implemented with React JS as the frontend and Java at the backend.

The Impact

97% of the words in sample video transcription are accurate and all valid screenshots from sample video are captured

Integration with other cloud platforms such as AWS and Azure, and sync between cloud platform can be achieved

Text-to-Text translation from one language to another is supported

Multilingual Support is extended in the application

Technology Stack

Cloud Storage

App Engine

Cloud Speech-to-Text

Cloud Translation API

Document API

Video Intelligence

Cloud Auto ML

Cloud Pub/Sub

Drive Modernization to Unlock Innovation with Google Cloud

Connect Now

Previous ProjectCo-op Marketing Service Cut Ad Server Response Time By 30%
Next ProjectData Warehousing Solution for Automotive Client

Transcribing Video To Text With Google Video Intelligence API Solutions

Case Study

The Client

Project Objective

Business Solution

Implementation

The Impact

97% of the words in sample video transcription are accurate and all valid screenshots from sample video are captured

Integration with other cloud platforms such as AWS and Azure, and sync between cloud platform can be achieved

Text-to-Text translation from one language to another is supported

Multilingual Support is extended in the application

Technology Stack

Drive Modernization to Unlock Innovation with Google Cloud

Previous ProjectCo-op Marketing Service Cut Ad Server Response Time By 30%

Next ProjectData Warehousing Solution for Automotive Client

CONNECT WITH US ON :

CONTACT US

HEAD OFFICE

MANGALORE OFFICE

SINGAPORE OFFICE

AUSTRALIA OFFICE

QUICK LINKS