Author Archives: jbernec

Publish PySpark Streaming Query Metrics to Azure Log Analytics using the Data Collector REST API.

Overview. At the time of this writing, there doesn’t seem to be built-in support for writing PySpark Structured Streaming query metrics from Azure Databricks to Azure Log Analytics. After some research, I found a work around that enables capturing the … Continue reading

Posted in PySpark Streaming Logs | Tagged , , , , , , , | Leave a comment

Write Data from Azure Databricks to Azure Dedicated SQL Pool(formerly SQL DW) using ADLS Gen 2.

In this post, I will attempt to capture the steps taken to load data from Azure Databricks deployed with VNET Injection (Network Isolation) into an instance of Azure Synapse DataWarehouse deployed within a custom VNET and configured with a private … Continue reading

Posted in Apache Spark, Azure Synapse DW | Tagged , , , , , , , , , , | Leave a comment

Incrementally Process Data Lake Files Using Azure Databricks Autoloader and Spark Structured Streaming API.

Use Case. In this post, I will share my experience evaluating an Azure Databricks feature that hugely simplified a batch-based Data ingestion and processing ETL pipeline. Implementing an ETL pipeline to incrementally process only new files as they land in … Continue reading

Posted in Azure Databricks | Tagged , , , , , , , , , , , , , , , , , , , , , , , , , | Leave a comment

Build a Jar file for the Apache Spark SQL and Azure SQL Server Connector Using SBT.

The Apache Spark Azure SQL Connector is a huge upgrade to the built-in JDBC Spark connector. It is more than 15x faster than generic JDBC connector for writing to SQL Server. In this short post, I articulate the steps required … Continue reading

Posted in Unified Analytics | Tagged , , , , , , , | 5 Comments

Configure a Databricks Cluster-scoped Init Script in Visual Studio Code.

Databricks is a distributed data analytics and processing platform designed to run in the Cloud. This platform is built on Apache Spark which is currently at version 2.4.4. In this post, I will demonstrate the deployment and installation of custom … Continue reading

Posted in Apache Spark, Bash, Cluster Init Scripts, Databricks Notebooks, Install.packages(), Logs, R, Shell | Tagged , , , , , , , , | Leave a comment

Data Preparation of PySpark Dataframes in Azure Databricks Cluster using Databricks Connect.

In my limited experience with processing big data workloads on the Azure Databricks platform powered by Apache Spark, it has become obvious that a significant part of the tasks are targeted towards Data Quality. Data quality in this context mostly … Continue reading

Posted in Azure Databricks | Tagged , , , , , , , , , , | Leave a comment

Setting Up Jupyter Notebook to Run in a Python Virtual Environment.

1) Install Jupyter on the local machine outside of any existing Python Virtual environment: pip install jupyter –no-cach-dir 2) Create a Python Virtual environment. mkdir virtualenv cd virtualenv python.exe -m venv dbconnect 3) Change directory into the virtual environment and … Continue reading

Posted in Python | Tagged , , , | Leave a comment

Automate Azure Databricks Job Execution using Custom Python Functions.

Introduction Thanks to a recent Azure Databricks project, I’ve gained insight into some of the configuration components, issues and key elements of the platform. Let’s take a look at this project to give you some insight into successfully developing, testing, … Continue reading

Posted in Apache Spark, Azure Databricks, Cluster Init Scripts, Databricks Notebooks, Python | Tagged , , , , , , , , , , | 2 Comments

Provisioning a Jenkins Instance Container with Persistent Volume in Azure Kubernetes Service.

In this post, I want to write about my experience testing and using Azure Kubernetes service to deploy a Jenkins Instance solution that is highly available and resilient. With the Kubernetes persistent volume feature, an Azure disk can be dynamically … Continue reading

Posted in Azure Kubernetes, Kubernetes | Tagged , , , , , , , , , , , , , , , | Leave a comment