Category Archives: Azure Databricks

Data Preparation of PySpark Dataframes in Azure Databricks Cluster using Databricks Connect.

In my limited experience with processing big data workloads on the Azure Databricks platform powered by Apache Spark, it has become obvious that a significant part of the tasks are targeted towards Data Quality. Data quality in this context mostly … Continue reading

Posted in Azure Databricks | Tagged , , , , , , , , , , | Leave a comment

Automate Azure Databricks Job Execution using Custom Python Functions.

Introduction Thanks to a recent Azure Databricks project, I’ve gained insight into some of the configuration components, issues and key elements of the platform. Let’s take a look at this project to give you some insight into successfully developing, testing, … Continue reading

Posted in Apache Spark, Azure Databricks, Cluster Init Scripts, Databricks Notebooks, Python | Tagged , , , , , , , , , , | 2 Comments