The client is one of India’s oldest music labels and a multi-language TV content producer. The company continued to expand its catalog to become the largest in-perpetuity global owner of both sound recording and publishing copyrights of Indian music across 14 different languages. Steadily, the company diversified its portfolio to include intellectual property rights of over 4000 hours of TV content produced for channels in Hindi, Tamil, Telugu, Kannada, Malayalam and Bengali.
Project Objective: Data Warehouse Implementation
Business Value – The client needed a solution that accommodates the growing volume of data. The larger the volume of data, the more expensive it is to keep data for long periods in-house. As increase in data is proportional to better insights and strategies, low cost data warehousing is essential for a business’ growth. Our data warehouse modernization solution was a good fit for the client.
The client gets data from their partners through multiple .CSV data sources and they wanted to bring the data from these data sources into a pay-as-you-go data warehouse. The implementation simplified the operational challenges that come with traditional data warehouses. As a part of this engagement, the customer was looking to design and implement an ETL process to visualize data and generate reports from across their multiple data sources.
- A centralized, scalable, and pay-as-you-go data warehouse for analyzing their music partners’ data from OTT platforms
- Creation of an ETL pipeline to process data from the 3 available data sources
- Generation of reports based on the data processed from the ETL pipeline to analyze insights from data
The project was implemented in the following phases:
- Data Ingestion Layer: Data from multiple .CSV sources (OTT1, OTT2, OTT3) were ingested into the ETL layer for processing and to be stored in Google Cloud storage. Cloud Composer is used as a fully managed workflow orchestration service built on Apache Airflow that helps users to author, schedule and monitor data pipelines that span across hybrid and multi-cloud environments
- Data Transformation/ETL Layer: The ingested data was transformed using Cloud Dataflow
- Data Warehouse Layer: The processed dataset is stored in Google BigQuery, to store historical transaction data to reduce storage costs while providing high availability
- Reporting Layer: The processed dataset in BigQuery is integrated into Data Studio for reporting