Author Archives: jbernec

Build a Jar file for the Apache Spark SQL Server and Azure SQL Connector Using SBT.

The Apache Spark Azure SQL Connector is a huge upgrade to the built-in JDBC Spark connector. It is more than 15x faster than generic JDBC connector for writing to SQL Server. In this short post, I articulate the steps required … Continue reading

Posted in Unified Analytics | Tagged , , , , , , , | Leave a comment

Configure a Databricks Cluster-scoped Init Script in Visual Studio Code.

Databricks is a distributed data analytics and processing platform designed to run in the Cloud. This platform is built on Apache Spark which is currently at version 2.4.4. In this post, I will demonstrate the deployment and installation of custom … Continue reading

Posted in Apache Spark, Bash, Cluster Init Scripts, Databricks Notebooks, Install.packages(), Logs, R, Shell | Tagged , , , , , , , , | Leave a comment

Data Preparation of PySpark Dataframes in Azure Databricks Cluster using Databricks Connect.

In my limited experience with processing big data workloads on the Azure Databricks platform powered by Apache Spark, it has become obvious that a significant part of the tasks are targeted towards Data Quality. Data quality in this context mostly … Continue reading

Posted in Azure Databricks | Tagged , , , , , , , , , , | Leave a comment

Setting Up Jupyter Notebook to Run in a Python Virtual Environment.

1) Install Jupyter on the local machine outside of any existing Python Virtual environment: pip install jupyter –no-cach-dir 2) Create a Python Virtual environment. mkdir virtualenv cd virtualenv python.exe -m venv dbconnect 3) Change directory into the virtual environment and … Continue reading

Posted in Python | Tagged , , , | Leave a comment

Automate Azure Databricks Job Execution using Custom Python Functions.

Introduction Thanks to a recent Azure Databricks project, I’ve gained insight into some of the configuration components, issues and key elements of the platform. Let’s take a look at this project to give you some insight into successfully developing, testing, … Continue reading

Posted in Apache Spark, Azure Databricks, Cluster Init Scripts, Databricks Notebooks, Python | Tagged , , , , , , , , , , | 2 Comments

Provisioning a Jenkins Instance Container with Persistent Volume in Azure Kubernetes Service.

In this post, I want to write about my experience testing and using Azure Kubernetes service to deploy a Jenkins Instance solution that is highly available and resilient. With the Kubernetes persistent volume feature, an Azure disk can be dynamically … Continue reading

Posted in Azure Kubernetes, Kubernetes | Tagged , , , , , , , , , , , , , , , | Leave a comment

PowerShell function to Provision a Windows Server EC2 Instance in AWS Cloud.

Introduction. Microsoft just updated the ASWPowerShell module to better enable Cloud administrators manage and provision cloud resources in the AWS cloud space while using the same familiar PowerShell tool. As at last count today, the AWSPowerShell module contains almost four … Continue reading

Posted in AWS | Tagged , , , , , , | Leave a comment

Thoughts on the Meltdown and Spectre Processor Vulnerabilities.

Summary: A new class of security vulnerabilities referred to as “Speculative execution side-channel attacks” also known as “Meltdown and Spectre” were publicly disclosed by Cyber security researchers this week. Given the gravity of these flaws, many concerns have been rightly … Continue reading

Posted in Uncategorized | Tagged , , , , , , , , , , | Leave a comment