Tag Archives: Azure Databricks

Incrementally Process Data Lake Files Using Azure Databricks Autoloader and Spark Structured Streaming API.

Use Case. In this post, I will share my experience evaluating an Azure Databricks feature that hugely simplified a batch-based Data ingestion and processing ETL pipeline. Implementing an ETL pipeline to incrementally process only new files as they land in … Continue reading

Posted in Azure Databricks | Tagged , , , , , , , , , , , , , , , , , , , , , , , , , | Leave a comment

Build a Jar file for the Apache Spark SQL and Azure SQL Server Connector Using SBT.

The Apache Spark Azure SQL Connector is a huge upgrade to the built-in JDBC Spark connector. It is more than 15x faster than generic JDBC connector for writing to SQL Server. In this short post, I articulate the steps required … Continue reading

Posted in Unified Analytics | Tagged , , , , , , , | 4 Comments

Configure a Databricks Cluster-scoped Init Script in Visual Studio Code.

Databricks is a distributed data analytics and processing platform designed to run in the Cloud. This platform is built on Apache Spark which is currently at version 2.4.4. In this post, I will demonstrate the deployment and installation of custom … Continue reading

Posted in Apache Spark, Bash, Cluster Init Scripts, Databricks Notebooks, Install.packages(), Logs, R, Shell | Tagged , , , , , , , , | Leave a comment