What is Teradata?
Teradata is a popular opensource Relational Database management System used by companies for creating large data warehouses. It was developed by Teradata Corporation in 1979. Teradata provides scalability, Automatic distribution, SQL, parallel processing and many other features. Due to these features, Teradata was adopted by many companies for their data storage and processing. Most fortune 100 companies were using Teradata. It has capacity to support 50+ petabytes of data. It can also be integrated with many Business Intelligence tools for analytics. The cost of using Teradata is also affordable and very easy to set up. It is available in On-premises version as well as on the cloud.
Why migrate to Google BigQuery?
Teradata was widely used RDBMS for managing large volumes of data. But now due to emergence of cloud computing companies like Google cloud platform, Amazon Web Services and Microsoft Azure, there are many alternatives available to Teradata. Also, the later are developed recently and has many advantages and upgrades integrated into it. One of the best alternative to Teradata is Google BigQuery. Teradata’s pricing is multi-year while Google charges only for amount of data storage and queries that you use. Teradata is mostly used On-premises by many organizations because it was being used for many years and there was no option for cloud services. Teradata is static and you need more hardware to expand your storage capacity, while Google BigQuery has a dynamic storage model that has no upper limit and can scale according to your storage requirements.
BigQuery VS Teradata:
|Fully cloud managed Data Warehouse||Hybrid Warehouse (On-premises and Cloud)|
|Column level database (Column operations are faster)||Row level database (works on one row at a time)|
|Highly scalable||Scalable only on cloud version|
|10000 columns/table||2048 columns/table|
|Max row size is 100MB||Max row size is 1MB|
|Column and table name length could be up to 16382 Unicode characters||Column and table name length could be up to 128 Unicode characters|
Migrating On-premises Teradata storage to Google BigQuery.
To migrate data from On-premises Teradata warehouse to BigQuery, BigQuery Data transfer service and a migration agent is used. You need to download an on-premises migration agent that will connect to your local data warehouse. It will coordinate with BigQuery Data transfer service to copy all the tables from local warehouse to BigQuery cloud storage.
The steps for transferring data to BigQuery are:
- Extracting tables from Data warehouse On-premises using an agent which is bounded by JDBC connection.
- Uploading the extracted data to Cloud storage bucket.
- Selection of automatic schema conversion or custom schema which will be used to create a data warehouse on the BigQuery.
- After this, the BigQuery Data Transfer service loads all the data stored in cloud buckets into BigQuery according to the selected schema and the migration process is completed.
The Extraction methods supported in BigQuery are:
- Extraction with JDBC driver- It uses Fast Export connection. A table is copied and collected on AVRO location, and it is uploaded to a predetermined cloud storage bucket and then deleted from local system.
- Extraction with Teradata Parallel Transporter (TPT)- It uses batch extraction method. A TPT extraction Script is created for each batch, and after uploading the file to cloud bucket storage, it is deleted from local storage. This does not require additional storage on the local storage.
Automatic schema conversion and Custom schema file-
- Automatic schema conversion- Big Query creates a schema file that is basically a JSON fle that has information of database objects. It has sets of databases, sets of tables, and their column names. BigQuery data transfer service can automatically extract and create this schema.
- Custom Schema creation- You can also create a custom schema and specify custom names of tables, database and column names of your choice.
In my opinion, Google Cloud platform has many advantages to offer as compared to Teradata as GCP is a very new service which is made with that fact to compete with already existing market giant companies in big data technology. The additional data analytics and machine learning services provided by GCP will prove to be very useful when developing and deploying an advanced AI product. Also, the pricing of GCP is very affordable and scalable that will be beneficial and give equal opportunity to startups as well as fortune 500 giants to use the data warehousing technology on cloud without worrying about the server configurations.