Case Study
The Client
The client is a next-generation global technology company that helps enterprises reimagine their businesses for the digital age. The company is primarily engaged in providing a range of software services, business process outsourcing and infrastructure services. Niveus Solutions helped the client with a speech to text solution for transcribing virtual recorded sessions using Google Video Intelligence.
Project Objective
The client desired to create a platform that can upload large recorded video files effortlessly into the cloud along with transcribing the audio into a text format using AI technologies.
They also required taking screenshots of different parts of the video to capture the architecture diagrams along with visual demos of any applications and wanted the pictures to be attached to the notes that were transcribed into text. Screenshots have to be in a logical sequence along with the transcribed converted using speech to text solutions.
The client’s highest priority was the data security, as they wanted all the confidential information to be kept in a secured place.
Business Solution
- Developed a Video Intelligence Solution for transcribing meetings’ videos for KT sessions in HCL by converting speech to text and capturing screenshots of presentations/architecture diagrams/flowcharts etc.
- Speakers are classified in the video and the document is sequenced in a way that the speech and related screenshots come one after the other.
- It also provides a feature to mask any sensitive information in videos and omit courtesy words from transcription.
Implementation
- The solution is executed using Google’s Cloud Video Intelligence API.
- VideoIntelligenceServiceClient API is used to detect labels that are related to shared screens/architecture diagrams.
- Transfer/upload videos to Google Cloud Storage using REST APIs actuated from the custom solution’s web UI
- Upload CSV files containing video URIs and labels into the same bucket as videos for ML training
- Cloud Data Loss Prevention (DLP) was used to identify sensitive data in transcripts and screenshots.
- Identity-Aware Proxy (IAP) in GCP is used to allow only the authenticated users to access the application.
- Web UI was implemented with React JS as the frontend and Java at the backend.