Skip to main content
BLOG

SAP Data Migration To Google Cloud | Cloud Data Fusion Framework

By August 18, 2022No Comments
sap-data-migration

SAP is the largest Enterprise Resource Planning (ERP) software in the world. It streamlines and organizes core processes for departments such as finance, manufacturing, HR, supply chain, services, procurement, and others. This system makes it easier for these departments to communicate and share information with each other, which can help to improve efficiency and coordination. It can often be difficult for companies to migrate to another ERP system. However, Google Cloud Platform (GCP) can help manage SAP data migration to the cloud with our Cloud Data Fusion, making it easier to move from a legacy system to the cloud. Here we will look at the two methods how we at Niveus enable data migration via CDF.

What is Cloud Data Fusion

Cloud Data Fusion (CDF) is a simple, fast and cost effective way to load your data into Google Cloud Platform. It can be used for extraction, transformation and loading all your data sources including on-prem. It provides businesses with the information they need to make vital business decisions and are typically used by businesses to track their finances, manage inventory, and handle other aspects of business. 

Our framework uses two methods for data migration with CDF, i.e., SAP ODP plugin & SAP Table plugin, to ingest data from SAP sources into GCP via CDF, taking BigQuery as the target.

The need for SAP data migration 

By migrating SAP data to BigQuery, enterprises gain the benefits that come with BigQuery, such as serverless architecture and scalability. Seamless data insights can then be provided using the popular BI tools such as Looker, Tableau and Data Studio. Machine learning models can be built using standard SQL using BigQuery ML as the data is readily available. It also provides robust security and high availability.  

Prerequisites for building a CDF framework

SAP Configuration

The SAP ODP and Table Plugins can be used to configure and execute bulk data transfers from SAP DataSources / Tables with minimal coding. The following configurations need to be implemented in order to ingest data from SAP Datasources:

  • Configure the SAP ERP system (activate DataSources in SAP) 
  • Set up a SAP Router to establish a connection to SAP system from GCP
  • Deploy the plugin in your Cloud Data Fusion environment
  • Download the SAP transport from Cloud Data Fusion and install it in SAP
  • Use Cloud Data Fusion and SAP ODP to create data pipelines for integrating SAP data

BigQuery Components

BigQuery will have the following components:

  • Landing dataset – a staging area that holds raw unprocessed data from S4 HANA Tables
  • Replication dataset – a layer that holds ingested data from SAP Datasources and processed data from Landing layer
  • Datawarehouse (DW) dataset –  includes reporting layer objects, created based on a combination of several Replication layer objects
  • JobMaster and JobDetails tables –  stores details of each pipeline and Job Details captures individual runs of the pipelines. 

Leveraging Cloud Data Fusion for SAP data migration

Cloud Data Fusion uses the SAP Table and SAP ODP plugins to fetch data from SAP source systems into the BigQuery Landing and Replication dataset tables respectively. Based on the use case, the landing table data is either Inserted or Upserted into the Replication layer table.

The SAP ODP connector can ingest data directly into the Replication layer. The SAP Table connector on the other hand, first stages data in the Landing layer based on the Change Data Capture method, and then merges / inserts into the Replication layer.

Cloud Composer is used to schedule and orchestrate CDF pipelines. It triggers pipelines to load data from source to the Landing / Replication BigQuery dataset. It also moves data from the Landing to Replication Datasets for SAP Table loads.

The standard pipeline comprises of 3 main components:

  1. Fetch BigQuery Parameters – The plugin “BigQuery Execute” is used to fetch the row of parameters for the current pipeline. It queries the BigQuery JobMaster table for this specific pipeline to fetch the pipeline arguments in a row. 
  2. Source SAP Connector (ODP or Table Batch Source) – The SAP ODP connector handles CDC implicitly. It has the option of “Full” and “Sync” Extract Types. With the ‘Full’ option, the pipeline always fetches the entire data from the datasource for the defined filters.  In “Sync” mode, only changed data from the previous run is fetched. The SAP Table Batch Source plugin does not have the “Extract Type” option, and therefore cannot automatically determine the incremental data. In order to implement incremental data, the last run details should be stored in a metadata table (JobMaster) and used as a parameter in the next run.
  3. Sink BigQuery Connector – The BigQuery Sink connector is fairly straightforward; the target is defined as a table in the Landing or Replication dataset. The loads into the Landing layer (SAP Table) are always truncated before new data is loaded. The loads into the Replication layer (SAP ODP) on the other hand are Upserts meaning the key column needs to be defined. 

Cloud Data Fusion pricing 

Cloud data fusion pricing is measured by the length of time, in minutes. It is measured from when an instance is created to when it is deleted – GCP bills CDF by minutes. This usage is then used to calculate the price charged per hour. 

Cloud data fusion usually offers pricing under one of three editions: Developer, Basic and Enterprise. The developer edition would come to approximately $250 per month, while Basic edition comes to around $1100 per month. For businesses, CDF comes with an enterprise edition of around $3000 per month.

The CDF framework is a new way to leverage Cloud Data Fusion for SAP on the Google Cloud Platform. CDF is designed to provide a streamlined approach to create, configure and deploy a new application to help you migrate, transform and load your data using a single interface. With this approach you can help accelerate your journey to the cloud to make the most of the flexible, cost-effective and high-performance service that Google Cloud Platform provides.

If you are interested in SAP data migration to Google Cloud, contact us today at biz@niveussolutions.com

Divakar Prabhu

Author Divakar Prabhu

A Cloud Leader from Data Modernisation team at Niveus, Divakar's extensive work with Data Engineering and GCP has made him an integral part of data transformation strategies.

More posts by Divakar Prabhu

Leave a Reply