Automate Azure Databricks Job Execution using Custom Python Functions.


Thanks to a recent Azure Databricks project, I’ve gained insight into some of the configuration components, issues and key elements of the platform. Let’s take a look at this project to give you some insight into successfully developing, testing, and deploying artifacts and executing models.

One note: This post is not meant to be an exhaustive look into all the issues and required Databricks elements. Instead, let’s focus on a custom Python script I developed to automate model/Job execution using the Databricks Jobs REST APIs. The script will be deployed to extend the functionality of the current CICD pipeline.

Background of the Databricks Project.

I’ve been involved in an Azure Databricks project for a few months now. The objective was to investigate the Databricks platform and determine best practice configuration and recommendations that will enable the client to successfully migrate and execute their machine learning models.

We started with a proof of concept phase to demonstrate some of the features and capabilities of the Databricks platform, then moved on to a Pilot phase.

The objective of the Pilot phase was to migrate the execution of machine learning models and their datasets from the Azure HDInsight and Apache Hive platform to Azure Databricks and Spark SQL. A few cons with the HDInsight platform had become obvious, some of which included:

a) Huge administrative overhead and time to configure and provision HDInsight clusters which can run into hours.
b) Limitations in managing the state of HDInsight Clusters which can’t really be shut down without deleting the cluster. The autoscaling and auto-termination features in Azure Databricks play a big role in cost control and overall Cluster management.
c) Model execution times are considerably longer on Azure HDInsight than on Azure Databricks.

Key Databricks Elements, Tools and Issues.

Some of the issues faced are related to:

Data Quality: We had to develop Spark code (using the PySpark API) to implement Data clean up, handle trailing and leading spaces and replace null string values. These were mostly due to the Data type differences between HDInsight Hive and Spark on Databricks, e.g char types on Hive and String types on Spark.

End-to-end CICD Tooling: We configured Azure Pipeline and Repos to cover specific areas of the CICD pipeline to the Databricks platform, but this was not sufficient for automating Model execution on Databricks without writing custom python code to implement this functionality.

Development: Implementing RStudio Server deployment on a Databricks Cluster to help with the development and debugging of models.

Azure Data Factory: We explored version 2, but at the time of initial testing, version control integration was not supported. At the time of this writing though, it is supported. While the configuration works and I confirmed connectivity to my Github repository, it appears the current integration allows for an ADF Pipeline template to be pushed to the defined repository root folder.

I have not been able to import existing notebook code from my repo, to be used as a Notebook activity. Maybe this use case is currently not supported/required or I’m yet to figure it out. Will continue to review.

Azure Databricks VNET Peering: Connecting the Azure Databricks workspace with existing production VNET was not possible at the start of this engagement. VNET peering is now currently supported.

Other Databricks elements and tools include:

Databricks Hive Metastore: Databricks’ central hive metastore that allows for the persistence of table data and metadata.

Databricks File System (DBFS): The DBFS is a distributed file system that is a layer over Azure Blob Storage. Files in DBFS persist to Azure Storage Account or AWS S3 bucket, so there’s no data loss even after a Cluster termination.

Cluster Init Scripts: These are critical for the successful installation of the custom R package dependencies on Databricks Clusters, keeping in mind that R does not appear to have as much feature and functionality support on Databricks as Python and Scala. At least at this time.

Databricks CLI: This is a python-based command-line, tool built on top of the Databricks REST API.

Databricks Notebooks: These enable collaboration, In-line multi-language support via magic commands, Data exploration during testing which in turn reduces code rewrites. I’m able to write PySpark and Spark SQL code and test them out before formally integrating them in Spark jobs.

Databricks-Connect: This is a python-based Spark client library that let us connect our IDE (Visual Studio Code, IntelliJ, Eclipse, PyCharm, e.t.c), to Databricks clusters and run Spark code. With this tool, I can write jobs using Spark native APIs like dbutils and have them execute remotely on a Databricks cluster instead of in the local Spark session.

DevOps: The steps involved in the integration of Databricks Notebooks and RStudio on Databricks with version control are pretty much straightforward. A number of version control solutions are currently supported: Bitbucket, Github and Azure DevOps. This brings me to the focus of this post. At the time of this writing, there is currently no CICD solution (that we know of), capable of implementing an end-to-end CICD pipeline for Databricks. We used the Azure DevOps Pipeline and Repos services to cover specific phases of the CICD pipeline, but I had to develop a custom Python script to deploy existing artifacts to the Databricks File System (DBFS) and automatically execute a job on a Databricks jobs cluster on a predefined schedule or run on submit.

MLFlow on Databricks: This new tool is described as an open source platform for managing the end-to-end machine learning lifecycle. It has three primary components: Tracking, Models, and Projects. It’s still in beta and I haven’t reviewed it in detail. I plan to do so in the coming weeks.


Databricks Jobs are the mechanism to submit Spark application code for execution on the Databricks Cluster.In this Custom script, I use standard and third-party python libraries to create https request headers and message data, configure the Databricks token on the build server, check for the existence of specific DBFS-based folders/files and Databricks workspace directories and notebooks, delete them if necessary while creating required folders, copy existing artifacts and cluster init scripts for custom package installation on the Databricks Cluster, create and submit a Databricks Job using the REST APIs and execute/run the job. In the next section, I look at specific parts of the Python script.

Configure header and message data:

import sys
import base64
import subprocess
import json
import requests
import argparse
import logging
import yaml
from pprint import pprint

JOB_SRC_PATH = "c:/bitbucket/databricks_repo/Jobs/job_demo.ipynb"
JOB_DEST_PATH = "/notebooks/jobs_demo"
INIT_SCRIPT_SRC_PATH = "c:/bitbucket/databricks_repo/Workspace-DB50/"
INIT_SCRIPT_DEST_PATH = "dbfs:/databricks/rstudio/"
RSTUDIO_FS_PATH = "dbfs:/databricks/rstudio"
WORKSPACE_PATH = "/notebooks"
JOB_JSON_PATH = "c:/BitBucket/databricks_repo/Jobs/databricks_new_cluster_job.json"

with open("c:\\bitbucket\\databricks_repo\\jobs\\databricks_vars_file.yaml") as config_yml_file:
    databricks_vars = yaml.safe_load(config_yml_file)

JOBSENDPOINT = databricks_vars["databricks-config"]["host"]
DATABRICKSTOKEN = databricks_vars["databricks-config"]["token"]

def create_api_post_args(token, job_jsonpath):
    WORKSPACETOKEN = token.encode("ASCII")
    headers = {'Authorization': b"Basic " +
           base64.standard_b64encode(b"token:" + WORKSPACETOKEN)}
    with open(job_jsonpath) as json_read:
        json_data = json.load(json_read)
    data = json_data
    return headers, data

Configure Databricks Token on Build server:

#Configure token function
def config_databricks_token(token):
        databricks token"""
        databricks_config_path = "c:/Users/juser/.databrickscfg"
        with open(databricks_config_path, "w") as f:
            clear_file = f.truncate()
            databricks_config = ["[DEFAULT]\n","host =\n","token = {}\n".format(token),"\n"]
    except Exception as err:
        logging.debug("Exception occured:", exc_info = True)

Create and run the job using the Python subprocess module that calls the databricks-cli external tool:

def create_job(job_endpoint, header_config, data):
        Azure Databricks Spark Notebook Task Job"""
        response =
        return response
    except Exception as err:
        logging.debug("Exception occured with create_job:", exc_info = True)

def run_job(job_id):
    """Use the passed job id to run a job.
      To be used with the create jobs api only, not the run-submit jobs api"""
    try:"databricks jobs run-now --job-id {}".format(job_id))
    except subprocess.CalledProcessError as err:
        logging.debug("Exception occured with create_job:", exc_info = True)

The actual Job task is executed as a notebook task as defined in the Job json file. The notebook task which contains sample PySpark ETL code, was used in order to demonstrate the preferred method for running an R based model at this time. The following Job tasks are currently supported in Databricks: notebook_task, spark_jar_task, spark_python_task, spark_submit_task. None of the defined tasks can be used to run R based code/model except the notebook_task. The spark_submit_task accepts --jars and --py-files to add Java and Python libraries, but not R packages which are packaged as “tar.gzip” files.

In the custom functions, I used the subprocess python module in combination with the databricks-cli tool to copy the artifacts to the remote Databricks workspace. The Create Jobs API was used instead of the Runs-Submit API because the former makes the Spark UI available after job completion, to view and investigate the job stages in the event of a failure. In addition, email notification is enabled using the Create Jobs REST API, which is not available with the Runs-Submit Jobs API. A screen shot of the Spark UI is attached:

Job Status:

Spark Job View:

Spark Job Stages:

A snippet of the JSON request code for the job showing the notebook_task section is as follows:

"timeout_seconds": 3600,
"max_retries": 1,
"notebook_task": {
"_comment": "path param||notebook_path||notebooks/python_notebook",
"notebook_path": "/notebooks/jobs_demo",
"base_parameters": {
"argument1": "value 1",
"argument2": "value 2"

As stated at the start of this post, I developed this Python script to extend the existing features of a CICD pipeline for Azure Databricks. I’m continuosly updating it to take advantage of new features being introduced into the Azure Databricks ecosystem and possibly integrate it with tools still in development, but that could potentially improve the end-to-end Azure Databricks/Machine learning workflow.
The complete code can be found here.

Posted in Apache Spark, Azure Databricks, Cluster Init Scripts, Databricks Notebooks, Python | Tagged , , , , , , , , , , | 2 Comments

Provisioning a Jenkins Instance Container with Persistent Volume in Azure Kubernetes Service.

In this post, I want to write about my experience testing and using Azure Kubernetes service to deploy a Jenkins Instance solution that is highly available and resilient. With the Kubernetes persistent volume feature, an Azure disk can be dynamically provisioned and attached to a Jenkins Instance container deployment. In another scenario, an existing Azure Disk containing data related to a software team’s Jenkins projects could be attached to a new Jenkins deployment in a kubernetes cluster, to maintain continuity. Also, data loss in the event of a Pod failure can be averted since the persistent volume storage has a lifecycle independent of any individual pod that uses the persistent volume and can be managed as a separate cluster resource.

Jenkins automation server is one of the most in demand DevOps tools today. Therefore, deploying a Jenkins instance for software development and engineering teams in a fast, flexible and highly available way is key to maintaining an efficient and smooth running CI/CD and testing process.With Azure Kubernetes, we are able to deploy multiple Jenkins instances customized for each team on a centrally controlled cluster.


All the Kubernetes related work in this post was done on my Windows 10 machine. The tools are available for both Windows and Linux.

The following tools need to be installed and configured on the working machine before creating and configuring an Azure Kubernetes cluster.
1) Azure CLI : Azure command line tool for creating Kubernetes Clusters.
2) Kubectl : Follow the link to download the Kubectl executable and put kubectl.exe somewhere in your system PATH.
3) Azure Subscription.

I’ll also mention that I used the awesome Kompose.exe tool to convert my existing Docker-Compose files to Kubernetes compatible yaml files. Obviously, the yaml files needed to be edited after the convert process.

Login to Azure and create a resource group for the Kubernetes services.

In this and following sections, I will detail the steps I followed to setup and deploy a highly available Jenkins deployment in Azure Kubernetes using first a) A dynamic Azure Disk and b) A static existing Azure Disk with data.

az login --username --password passw04rd

Set a subscription to be the current active subscription:

az account set --subscription 0fr96513-fea6-4bca-ad18-311920der789

Create a resource group:
az group create --resource-group rgaks --location centralus

Create an Azure Kubernetes Cluster in the resource group above.

At this time, using PowerShell to create an Azure kubernetes cluster is not supported. I suspect this will change in the future. For now, I will use Azure CLI az aks command to create a new cluster in the new resource group:

az aks create --resource-group rgaks --name akscluster0 --node-count 2 --node-vm-size Standard_D2s_v3 --ssh-key-value C:\SSH_KubeCluster\

The one line of az aks create code provisions a Kubernetes service object in the rgaks resource group. In addition, it also automatically creates another resource group containing a two node cluster with all corresponding resources as shown in the following screen shot :

After creating the Kubernetes cluster, I’ll connect to the cluster from my powershell console by running the az aks get-credentials command to set the current security context on the /.kube/config file and use the kubectl command to verify the state of the kubernetes cluster nodes:

az aks get-credentials --name akscluster0 --resource-group rgaks

kubectl.exe get nodes

Setup Persistent Storage.

At this point, the cluster nodes are ready to host a Jenkins Container Instance in a kubernetes Pod.Azure Kubernetes clusters are created with two default storage classes as displayed in the screen shot below. Both of these classes are configured to work with Azure disks.A storage class is used to define how a unit of storage is dynamically created. This saves me the step of having to create a storage class manifest file to be used by my persistent volume claim yaml file. The default storage class provisions a standard Azure disk. The managed-premium storage class provisions a premium Azure disk. I verified this using the kubectl get storageclasses command:

I selected the managed-premium class for my persistent volume claim configuration. To provision persistent storage to be used for this deployment, I created a yaml file of kind: PersistentVolumeClaim to request a storage unit of capacity 5Gi based on the managed-premium storage class.The pvc will be referenced in my Jenkins deployment yaml file to attach/map the automatically created Azure Disk volume to the Jenkins data home folder path. The following yaml file defines the dynamic creation of a 5Gi unit of storage:

apiVersion: v1
kind: PersistentVolumeClaim
name: azure-managed-disk
annotations: managed-premium
- ReadWriteOnce
storage: 5Gi

Use the Persistent volume to deploy a Jenkins Instance.

I merged the persistent volume claim yaml file with the Jenkins application deployment and service yaml files to automatically provision the storage, provision the Jenkins deployment, map the persistent storage as a volume to the Jenkins instance and create a service type LoadBalancer to enable me access the Jenkins instance from outside the kubernetes cluster virtual network. The full and correctly indented yaml file can be found here on my GitHub page. Using the kubectl.exe tool, I can run a one line script that provisions the deployment, services and persistent volumes from a single yaml file:

apiVersion: extensions/v1beta1
kind: Deployment
kompose.cmd: C:\Kompose\kompose.exe convert -f .\docker-compose.yml
kompose.service.type: LoadBalancer
kompose.version: 1.13.0 (84fa826)
creationTimestamp: null
io.kompose.service: jenkinsbox
name: jenkinsbox
replicas: 1
strategy: {}
creationTimestamp: null
io.kompose.service: jenkinsbox
- image: jenkinsci/blueocean
name: jenkins-container
- mountPath: "/var/jenkins_home"
name: volume
- containerPort: 8080
- containerPort: 50000
resources: {}
- name: volume
claimName: azure-managed-disk
- name: permissionsfix
image: alpine:latest
command: ["/bin/sh", "-c"]
- chown 1000:1000 /var/jenkins_home;
- name: volume # Or you can replace with any name
mountPath: /var/jenkins_home # Must match the mount path in the args line
restartPolicy: Always
status: {}
apiVersion: v1
kind: Service
kompose.cmd: C:\Kompose\kompose.exe convert -f .\docker-compose.yml
kompose.service.type: LoadBalancer
kompose.version: 1.13.0 (84fa826)
creationTimestamp: null
io.kompose.service: jenkinsbox
name: jenkinsbox
type: LoadBalancer
- name: "jenkinsport"
port: 8080
targetPort: 8080
io.kompose.service: jenkinsbox
loadBalancer: {}
apiVersion: v1
kind: PersistentVolumeClaim
name: azure-managed-disk
annotations: managed-premium
- ReadWriteOnce
storage: 5Gi

The screen shot displays the one line kubectl script and the deployment, services and persistent volumes created:

kubectl.exe apply -f .\jenkinsbox-k8s-all-in-one.yaml

Check the status to the deployments,services and pods.

Start initial configuration of the Jenkins Instance in kubernetes container.

Use the following command to retrieve the initialAdmin default password for Jenkins:

kubectl.exe logs jenkinsbox-55f58fcbcb-2ltqx

Use the public ip address from the kubernetes service in the above screen shot to access the Jenkins initial setup page:

After initial configuration of Jenkins, I created a sample pipeline job:

Simulate failure and recovery.

To verify failure and recovery, I’ll delete the pod using the following command in the screen shot:

kubectl.exe delete pods --all

The screen shot indicates the pod deletion and immediate automatic creation of a new pod with same storage volume to match the number of replicas defined the deployment yaml file. In the next screen shot, I simply login to the Jenkins instance without going through the setup wizard of a new instance. I can also confirm that the pipeline job created in the preceding steps is still available:

Verify recovery by deleting the current Jenkins deployment and using existing Azure Disk for static disk persistent volume on new deployment.

In this example, I delete the deployment and services created in th epreceding steps and develop a yaml file config to deploy a new Jenkins instance using the existing disk provisioned above for an Azure static disk persistent storage volume mapped to a new Jenkins deployment.

Create a new deployment using the new yaml file:

kubectl.exe apply -f .\jenkinsbox-deployment-static-disk.yaml

The following screen shot confirms that all the deployment components and pods have been successfully provisioned with the existing persistent storage volume mapped to the new pod.

As soon as the new deployment and pod are running. I can login to the existing jenkins instance without a setup prompt (since the static disk contains data from the initial deployments in the preceding steps) and confirm the existing pipeline Job which is available since the existing disk is mapped to the new kubernetes pod.

My new yaml file does not have a persistent volume claim section. I simply reference the diskUri of the existing managed disk in the deployment section.I used my jenkinsbox-deployment-static-disk.yaml file on Github for the new deployment to map an existing azure disk to the new deployment. The following snippet displays the section of the yaml file that maps the existing Azure disk:

apiVersion: extensions/v1beta1
kind: Deployment
- name: azure
kind: Managed
diskName: kubernetes-dynamic-pvc-2d4066e6-7f34-11e8-a8dc-0a58ac1f067c
diskURI: /subscriptions/0c696513-fea6-4bca-ad18-3119de65acef/resourceGroups/MC_rgaks_akscluster0_centralus/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-2d4066e6-7f34-11e8-a8dc-0a58ac1f067c
restartPolicy: Always

Blockers/Issues encountered.

A problem I experienced during initial deployment was a failed container. After digging into the logs for the container by using the kubectl.exe get pods, kubectl.exe logs pods and kubectl.exe describe pods commands, I noticed the pod was stuck in a “ContainerCreating” loop. I also observed the following kubernetes event log messages: “MountVolume.SetUp failed for volume” and “do not have required permission”.

The pod could not attach the disk volume. This is similar to an issue that occurs using bind mounts in Docker. Resolving the issue in Docker is easy. But I couldn’t immediately determine how to resolve it within a kubernetes deployment environmet. This issue occurs because by default, non-root users do not have write permission on the volume mount path for NFS-backed storage. Some common app images, such as Jenkins and Nexus3, specify a non-root user that owns the mount path in the Dockerfile. When a container is created from this Dockerfile, the creation of the container fails due to insufficient permissions of the non-root user on the mount path.

After some research, I found the following solution that uses an InitContainer in my deployment, to give a non-root user that is specified in my Dockerfile write permissions for the volume mount path inside the container.

The init container creates the volume mount path inside the container, changes the mount path to be owned by the correct (non-root) user, and closes. Then, my Jenkins container starts with the non-root user that must write to the mount path. Because the path is already owned by the non-root user, writing to the mount path is successful. The full example on using the InitContainer can also be found here. The following is a section of my deployment yaml file that defines the InitContainer:

- name: permissionsfix
image: alpine:latest
command: ["/bin/sh", "-c"]
- chown 1000:1000 /var/jenkins_home;
- name: azure # Or you can replace with any name
mountPath: /var/jenkins_home # Must match the mount path in the args line


Having tested both Azure Disks and Azure File Share as Persistent Volume options for deploying Jenkins container in AKS and I have observed that without a doubt the Azure Disk option provides much faster performance and response times for the application.

The drawback for me though lies in the fact that the access mode for Azure Disk persistent volumes is ReadWriteOnce. This means that an Azure disk can be attached to only one cluster node at a time. In the event of a node failure or update, it could take anywhere between 1-5 minutes for the Azure disk to get detached and attached to the next available node. This means that a Pod/Container/Application may incur a short outage while Kubernetes schedules it for another node. This scenario could slightly impact the high availability offering of AKS. Hopefully, Azure will improve on this behavior soon.

Posted in Azure Kubernetes, Kubernetes | Tagged , , , , , , , , , , , , , , , | Leave a comment

PowerShell function to Provision a Windows Server EC2 Instance in AWS Cloud.

Microsoft just updated the ASWPowerShell module to better enable Cloud administrators manage and provision cloud resources in the AWS cloud space while using the same familiar PowerShell tool. As at last count today, the AWSPowerShell module contains almost four thousand cmdlets:

This means Microsoft is committed to expanding on PowerShell functionality as a robust tool for managing both Azure and Amazon cloud platforms.
In this post I want to quickly demonstrate how to provision an AWS EC2 instance using PowerShell. The following steps help accomplish this objective.

Install the ASWPowerShell Module.
For this post, I’ll be using the version 5.1.16299.98 of Windows PowerShell as indicated in the following screen shot:

I’ll install the AWSPowerShell module using the Find-Module cmdlet.

PS C:\Scripts> Find-Module -Name AWSPowerShell | Install-Module -Force

Configure AWS Credential Profile.
During initial signup for an AWS account, a root account is created with full administrative access. According to AWS best practices, while making API calls and using PowerShell to programmatically access and manage resources, a sub user account should be created with corresponding access key ID and secret key credentials. This way, if the keys are compromised, the associated user can be disabled instead of risking the compromise of the root account and all the resources associated with it.

Use the Users tab of the IAM (Identity and Access Management) console in the AWS portal to create a subuser and generate the corresponding access key ID and secret key.

After generating the keys, I’ll use the Set-AWSCredential cmdlet to save and persist the the credential keys to my local AWS SDK store for use across multiple PowerShell sessions. The Initialize-AWSDefaultConfiguration cmdlet sets the new profile and region as active within the PowerShell session. The following script accomplishes this task. Please note that the AccessKey and SecretKey parameter values are represented by variables:

#Set and Initialize AWS Credential and Profile for login and authentication
Set-AWSCredential -AccessKey $AccessKey -SecretKey $SecretKey -StoreAs AWSDemoProfile
Initialize-AWSDefaultConfiguration -ProfileName AWSDemoProfile -Region us-west-1

Create an EC2 Key Pair.
Use the New-EC2KeyPair cmdlet to create an EC2 key pair. This cmdlet calls the Amazon Elastic Compute Cloud CreateKeyPair API. It creates a 2048-bit RSA key pair with the specified name. Amazon EC2 stores the public key and displays the private key to be saved to a file. The private key is returned as an unencrypted PEM encoded PKCS#1 private key. The private key is used during the logon operation to a virtual machine to create a password for login. If a key with the specified name already exists, Amazon EC2 returns an error.Up to five thousand key pairs can be created per region. The key pair is available only in the region in which it is created. In the following script, I create the key, assign the key pair object to a variable and save the key material property of the key pair object locally to a file:

#Create Keypair for decrypting login creds
$awskey = New-EC2KeyPair -KeyName demo1key
$awskey.KeyMaterial | Out-File -FilePath C:\AWSCred\mykeypair.pem

Provision a Non-Default Virtual Private Cloud (VPC).
The first time I created my AWS account, a default VPC provisioned with a private ip address scheme.For the purpose of this post, I would prefer to create a custom non-default vpc with an address range of my choice. Unlike the default vpc, the non-default vpc does not have internet connectivity. Some extra configuration is needed to enable internet connectivity to the non-default vpc.
The following tasks are accomplished by the PowerShell script to enable internet connectivity for the custom non-default vpc:
Create the non-default vpc and enable dns hostnames
Tag the vpc with a friendly name
Create a custom subnet for the vpc and tag it
Create an internet gateway and attach it to the custom vpc
Create a custom route table for internet access and associate it with the custom subnet

#Create non default virtual private cloud/virtual network, enable dns hostnames and tag the vpc resource
$Ec2Vpc = New-EC2Vpc -CidrBlock "" -InstanceTenancy default
Edit-EC2VpcAttribute -VpcId $Ec2Vpc.VpcId -EnableDnsHostnames $true
$Tag = New-Object Amazon.EC2.Model.Tag
$Tag.Key = "Name"
$Tag.Value = "MyVPC"
New-EC2Tag -Resource $Ec2Vpc.VpcId -Tag $Tag

#Create non default subnet and tag the subnet resource
$Ec2subnet = New-EC2Subnet -VpcId $Ec2Vpc.VpcId -CidrBlock ""
$Tag = New-Object Amazon.EC2.Model.Tag
$Tag.Key = "Name"
$Tag.Value = "MySubnet"
New-EC2Tag -Resource $Ec2subnet.SubnetId -Tag $Tag
#Edit-EC2SubnetAttribute -SubnetId $ec2subnet.SubnetId -MapPublicIpOnLaunch $true

#Create Internet Gateway and attach it to the VPC
$Ec2InternetGateway = New-EC2InternetGateway
Add-EC2InternetGateway -InternetGatewayId $Ec2InternetGateway.InternetGatewayId -VpcId $ec2Vpc.VpcId
$Tag = New-Object Amazon.EC2.Model.Tag
$Tag.Key = "Name"
$Tag.Value = "MyInternetGateway"
New-EC2Tag -Resource $Ec2InternetGateway.InternetGatewayId -Tag $Tag

#Create custom route table with route to the internet and associate it with the subnet
$Ec2RouteTable = New-EC2RouteTable -VpcId $ec2Vpc.VpcId
New-EC2Route -RouteTableId $Ec2RouteTable.RouteTableId -DestinationCidrBlock "" -GatewayId $Ec2InternetGateway.InternetGatewayId
Register-EC2RouteTable -RouteTableId $Ec2RouteTable.RouteTableId -SubnetId $ec2subnet.SubnetId

Create Security Group.
In this section, I’ll create a security group with a rule to enable remote desktop access to the EC2Instance VM.
#Create Security group and firewall rule for RDP
$SecurityGroup = New-EC2SecurityGroup -Description "Non Default RDP Security group for AWS VM" -GroupName "RDPSecurityGroup" -VpcId $ec2Vpc.VpcId
$Tag = New-Object Amazon.EC2.Model.Tag
$Tag.Key = "Name"
$Tag.Value = "RDPSecurityGroup"
New-EC2Tag -Resource $securityGroup -Tag $Tag
$iprule = New-Object Amazon.EC2.Model.IpPermission
$iprule.ToPort = 3389
$iprule.FromPort = 3389
$iprule.IpProtocol = "tcp"
Grant-EC2SecurityGroupIngress -GroupId $securityGroup -IpPermission $iprule -Force

Get the AMI (Amazon Machine Image) and create an Elastic Public IP Address to be attached to the EC2Instance after initialization..
#Retrieve Amazon Machine Image Id property for Windows Server 2016
$imageid = (Get-EC2ImageByName -Name WINDOWS_2016_BASE).ImageId

#Allocate an Elastic IP Address for use with an instance VM
$Ec2Address = New-EC2Address -Domain vpc
$Tag = New-Object Amazon.EC2.Model.Tag
$Tag.Key = "Name"
$Tag.Value = "MyElasticIP"
New-EC2Tag -Resource $Ec2Address.AllocationId -Tag $Tag

Launch or Provision the EC2 Instance Virtual machine.
#Launch EC2Instance Virtual Machine
$ec2instance = New-EC2Instance -ImageId $imageid -MinCount 1 -MaxCount 1 -InstanceType t2.micro -KeyName mykeypair -SecurityGroupId $securityGroup -Monitoring_Enabled $true -SubnetId $ec2subnet.SubnetId
$Tag = New-Object Amazon.EC2.Model.Tag
$Tag.Key = "Name"
$Tag.Value = "MyVM"
$InstanceId = $ec2instance.Instances | Select-Object -ExpandProperty InstanceId
New-EC2Tag -Resource $InstanceId -Tag $Tag

Associate the Elastic Public IP Address to the EC2 Instance.
#Assign Elastic IP Address to the EC2 Instance VM
$DesiredState = “Running”
while ($true) {
$State = (Get-EC2Instance -InstanceId $InstanceId).Instances.State.Name.Value
if ($State -eq $DesiredState) {
“$(Get-Date) Current State = $State, Waiting for Desired State=$DesiredState”
Start-Sleep -Seconds 5
Register-EC2Address -AllocationId $Ec2Address.AllocationId -InstanceId $InstanceId

Display EC2 Instance Properties.
#Display VM instance properties
(Get-EC2Instance -InstanceId $InstanceId).Instances | Format-List

Remove or Terminate EC2 Instances.
#Clean up and Terminate the EC2 Instance
Get-EC2Instance | Remove-EC2Instance -Force

Logon to the EC2Instance using Remote Desktop protocol.
Login to the EC2 Instance Virtual machine can be initiated using the AWS EC2 Dashboard.The private key portion of the keypair will be used to create a password to login to the Virtual Machine as indicated in the following screen shots:

Select the EC2 Instance and click on the Connect button. On the Connect to your Instance page, click on the Get Password button.

On the Get Password page, copy and paste the private key from the keypair file into the content field and click to decrypt the key.

Copy the displayed password, download the RDP file and login to the EC2Instance. It is recommended to change the password and create a new local user after logon.

The full PowerShell Script can be found at my Github Repository

Posted in AWS | Tagged , , , , , , | Leave a comment

Thoughts on the Meltdown and Spectre Processor Vulnerabilities.

A new class of security vulnerabilities referred to as “Speculative execution side-channel attacks” also known as “Meltdown and Spectre” were publicly disclosed by Cyber security researchers this week. Given the gravity of these flaws, many concerns have been rightly raised. In this article I will cover their impact as well as the Microsoft Cloud’s response and how they are dealing with it. I will also cover how you can mitigate and prevent the issues for on-premises environments.

General Overview:
These security flaws exploit critical vulnerabilities in modern processors whether Intel, Apple, AMD or ARM chips. They allow programs to steal data while it’s being processed on the computer. While programs are typically not permitted to read data from other programs, a malicious program can exploit Meltdown and Spectre to get hold of secrets stored in the CPU’s memory of other running programs. This might include your passwords stored in a password manager or browser, your personal photos, emails, instant messages and even business-critical documents.

Overview of Spectre Vulnerability:
Spectre breaks the most fundamental isolation between user processes and the operating system. This attack allows a program to access the CPU’s memory, and thus also the secrets, of other programs and the operating system.If your computer has a vulnerable processor’s (Intel, Apple, ARM, AMD) and runs an unpatched operating system / firmware, it is not safe to work with sensitive information without the chance of leaking the information. This applies both to personal computers as well as cloud infrastructure.

Overview of the Meltdown Vulnerability:
Meltdown breaks the isolation between different applications using a flaw in the Kernel on machines with vulnerable processor’s (Intel, Apple). It allows an attacker to trick error-free programs, which follow best practices, into leaking their secrets. In fact, the safety checks of said best practices actually increase the attack surface and may make applications more susceptible.

How to Enable Protection for On-premise and Cloud Environments:
Microsoft has provided instructions for recommended actions that enable protection of servers against these vulnerabilities. Some of these are detailed in the following link: Microsoft Protections Against Meltdown and Spectre

Impact to Enterprise Cloud Services:
Microsoft has provided updates of the impact of these flaws on enterprise cloud services in the following link: Impact to Enterprise Cloud Services

Amazon Web Services Linux Protection Recommendations:
All instances across the Amazon EC2 fleet are protected from all known threat vectors from the CVEs previously listed. Customers’ instances are protected against these threats from other instances. We have not observed meaningful performance impact for the overwhelming majority of EC2 workloads. The following link provides further recommendations and updates.
Recommended Customer Actions for AWS

It must be added that Public Cloud providers like Microsoft Azure and Amazon AWS are well ahead of the curve when it comes to security. Cloud providers got on top of this way before most of the industry.Public Cloud providers like Amazon AWS and Microsoft Azure are also much safer than on-premise sites considering that they employ a large and formidable army of security professionals and resources.

Personal computer users are also strongly advised to immediately download and install patches and updates available for their operating systems.

Posted in Uncategorized | Tagged , , , , , , , , , , | Leave a comment

Azure PowerShell Function App automates Azure Storage file sync to secondary Storage Account.

Azure Storage Files is a sub service of Azure Storage Accounts. It offers fully managed file shares in the cloud that are accessible via the industry standard Server Message Block (SMB) protocol (also known as Common Internet File System or CIFS). The cloud File shares can be mounted concurrently by cloud or on-premises deployments of Windows, Linux, and macOS machines.

At a recent cloud migration engagement, a key requirement was the automation of Azure storage file sync/copy to a secondary storage account in the same Azure subscription to help mitigate against accidental file deletion.

As of this writing, Azure Backup and snapshots are not supported for Azure File Storage. Microsoft recommends using Robocopy, the AzCopy CLI tool or a 3rd party backup tool to copy or backup Azure File storage data. Robocopy needs a UNC path,which will require the Azure Storage File Share be mapped as a local network drive for local access to the files. This option did not meet the requirement for automation. The AzCopy tool, does not require local access to a mapped drive, but needs to be installed on the Azure VM or on-premise server.

The ideal automated solution should be deployed in Azure and in addition, trigger email notifications with details of the copied file storage items after processing the files. I developed the following PowerShell function to meet these requirements. Hopefully, Azure Backup and Snapshot support for Azure File Storage will be rolled out soon. A feature request for File Storage Account backups or snapshots currently exists at this link.

This function (which can also be downloaded at my GitHub Repository) loops through the existing file shares and directories, then copies the existing files based on the last modified time stamp property being greater than one hour. I deployed this solution as a Time triggered PowerShell Azure Function App executed once at the top of every hour.

At this time, I don’t think Azure Function Apps support Azure Key Vault for secure authentication to Azure( I could be wrong though and stand corrected). For this Function App to authenticate with Azure, I generated and uploaded an AES256 Encryption key file used to encrypt/decrypt the password in order to generate an encrypted standard password file to be referenced by the function App. The following screen shot displays the Function App settings for the encrypted Azure logon password, Azure credential username, encrypted sendgridpassword and the local time zone.

   The Copy-AzureStorageFilesFunctionApp Function copies files from one storage account to another in the same subscription.
   The Copy-AzureStorageFilesFunctionApp Function copies files from one storage account to another in the same subscription. At this time while developing this function 9/13/2017,
   Azure Backup and snapshots are not supported for Azure File Storage.
   This function is written to be deployed as an Azure Function App and executed based on an hourly time trigger.
   It take a Resource Group name and subscription name as parameters.
   An email notification is sent to the Subscription Admin with a list of the items copied.
.PARAMETER SubscriptionName
        Subscription to be processed.
.PARAMETER ResourceGroupName
        ResourceGroupName of the source and target storage accounts.
        PowerShell Language

    [parameter(Mandatory = $false)]
    [string]$SubscriptionName = "Trial",
    [parameter(Mandatory = $false)]
    [string]$ResourceGroupName = "RGVPN"

#region Azure Logon
$Username = $env:AzureUsernameCredential
$EncryptedPassword = $env:AzurePasswordCredential
$AESKeyPath = "D:\home\site\wwwroot\TimerTriggerPowerShell1\bin\AESKey.key"
$SecurePassword = ConvertTo-SecureString -String $EncryptedPassword -Key (Get-Content -Path $AESKeyPath)
$Credential = New-Object -TypeName System.Management.Automation.PSCredential($Username, $SecurePassword)
Login-AzureRmAccount -Credential $Credential
Select-AzureRmSubscription -SubscriptionName $SubscriptionName


#region create email creds
$Smtpserver = ""
$From = ""
$To = ""
$Port = "587"
$sendgridusername = ""
$sendgridPassword = $env:SendGridPasswordCredential
$emailpassword = ConvertTo-SecureString -String $sendgridPassword -Key (Get-Content -Path $AESKeyPath)
$emailcred = New-Object System.Management.Automation.PSCredential($sendgridusername, $emailpassword)

#Initialize storage variables
$StorageAccountName = "storevpn0518"
$DestStorageAccountName = "store0518"
$DestResourceGroupName = "RGXavier"
$Store = Get-AzureRmStorageAccount -ResourceGroupName $ResourceGroupName -Name $StorageAccountName
$Shares = Get-AzureStorageShare -Context $Store.context
$Deststore = Get-AzureRmStorageAccount -ResourceGroupName $DestResourceGroupName -Name $DestStorageAccountName
$Items = @()

foreach ($Share in $Shares) {
    $Directories = Get-AzureStorageFile -Share $Share

    foreach ($Directory in $Directories) {
        if ((Get-AzureStorageShare -Name $Share.Name -Context $Deststore.Context -ErrorAction SilentlyContinue) -eq $null) { 
            New-AzureStorageShare -Name $Share.Name -Context $Deststore.Context    
        if ((Get-AzureStorageFile -ShareName $Share.Name -Context $Deststore.Context -Path $Directory.Name -ErrorAction SilentlyContinue) -eq $null) {
            New-AzureStorageDirectory -ShareName $Share.Name -Context $Deststore.Context -Path $Directory.Name               
        $sourcefiles = Get-AzureStorageFile -ShareName $Share.Name -Context $Store.context -Path $Directory.Name | Get-AzureStorageFile        
        $destfiles = Get-AzureStorageFile -ShareName $Share.Name -Context $Deststore.Context -Path $Directory.Name -ErrorAction SilentlyContinue `
            | Get-AzureStorageFile
        if ($destfiles.count -eq 0) {
            foreach ($sourcefile in $sourcefiles) {
                Start-AzureStorageFileCopy -SrcShareName $Share.Name -SrcFilePath ($Directory.Name + "/" + $sourcefile.Name)  -Context $Store.Context `
                    -DestShareName $Share.Name -DestFilePath ($Directory.Name + "/" + $sourcefile.Name) -DestContext $Deststore.Context -ErrorAction SilentlyContinue
        else {
            foreach ($sourcefile in $sourcefiles) {
                if ($sourcefile.Properties.LastModified.LocalDateTime -gt ((Get-Date).AddMinutes(-60)) ) {
                    Start-AzureStorageFileCopy -SrcShareName $Share.Name -SrcFilePath ($Directory.Name + "/" + $sourcefile.Name)  -Context $Store.Context `
                        -DestShareName $Share.Name -DestFilePath ($Directory.Name + "/" + $sourcefile.Name) -DestContext $Deststore.Context -Force -ErrorAction SilentlyContinue

                    $copyState = Get-AzureStorageFileCopyState -ShareName $Share.Name -FilePath ($Directory.Name + "/" + $sourcefile.Name) -Context $Deststore.Context `
                        -WaitForComplete -ErrorAction SilentlyContinue
                    $Items += $copyState

#region process email

$a = "<style>"
$a = $a + "BODY{background-color:white;}"
$a = $a + "TABLE{border-width: 1px;border-style: solid;border-color: black;border-collapse: collapse;}"
$a = $a + "TH{border-width: 1px;padding: 10px;border-style: solid;border-color: black;}"
$a = $a + "TD{border-width: 1px;padding: 10px;border-style: solid;border-color: black;}"
$a = $a + "</style>"
$body = ""
$body += "<BR>"
$body += "These" + " " + ($Items.count) + " " + "file storage items were processed for copy. Thank you."
$body += "<BR>"
$body += "<BR>"
$body += $Items | Select-Object -Property Status, @{"Label" = "Completion Time"; e = {$_.CompletionTime.LocalDateTime}}, @{"Label" = "Item"; e = {$_.Source.AbsolutePath}} | ConvertTo-Html -Head $a
$body = $body |Out-String
$subject = "These items were processed for copy."
if ($Items -ne $null) {
    Send-MailMessage -Body $body `
        -BodyAsHtml `
        -SmtpServer $smtpserver `
        -From $from `
        -Subject $subject `
        -To $to `
        -Port $port `
        -Credential $emailcred `

The following screen shot displays a list of copied files embedded in an email notification after code execution.

Posted in Azure File Storage Copy, Azure Function App | Tagged , , , , , , , , | Leave a comment

Remove Active Directory Domain Services (ADDS) from a WS2012 R2 with PowerShell.

I decided to tear down my Azure Lab IaaS and ASR infrastructure and rebuild it. The process involves removing the ASR configurations, Recovery Vaults, S2S VPNs, VNets and Azure VM running as a Azure based Domain controller for resilience with on-premise infrastructure and Azure deployed Apps. The next steps will include uninstalling the Domain Controller, delete and disable the VMs and Hyper-V Hosts configured for Azure site Recovery. Following these tasks, I’ll delete or remove the resource group. All resources were configured and created within one ARM Resource Group to make it easier to tear down. The following are steps to uninstall ADDS from the Azure VM using PowerShell:

1) Login to the WS2012 R2 Domain Controller.

2) Open a PowerShell console as Admin.

3) Use the Get-Command -Module ADDSDeployment cmdlet to review the necessary ADDSDeployment module commands.

4) psdemote

5) Create a Credential variable:


6) Create a Local Admin Password secure string: $adminPassword = ConvertTo-SecureString -String "pa55w04d123A" -AsPlainText -Force

7) Run the ADDS uninstallation cmdlet with the WhatIf parameter to confirm that the script will run successfully:

Uninstall-ADDSDomainController -LocalAdministratorPassword $adminPassword -Credential $cred -DnsDelegationRemovalCredential $cred -RemoveDnsDelegation -WhatIf


8) Run the actual script without the WhatIf parameter:


9) Remove the ADDS role : Remove-WindowsFeature -Name AD-Domain-Services -IncludeManagementTools -Remove -WhatIf

It might still be necessary to remove refereces to the old DC in DNS and AD Domain Sites.

After disabling protection on the VMs, I deleted the Hyper-V Hosts in the Recovery vault site and then used the following cmdlet to remove the Resource Group:

PS C:\Users\Chinny> Get-AzureRmResourceGroup -Name RGXavier | Remove-AzureRmResourceGroup -Force

Posted in Active Directory, Active Directory Domain Services, Azure, Azure Site Recovery, Azure VPN, DCPromo, Domain Controller, FSMO, Microsoft Hyper-v, PowerShell, Powershell 4.0, Windows Server 2012 R2 | Tagged , , , , , , , , | 1 Comment

Configuring Cisco Virtual Switch System (VSS) on Cisco Catalyst 4500X Switches.

As part of a network infrastructure upgrade at a client site, I’ll be implementing Cisco VSS (Virtual Switching System). This technology will go a long way to meet some of the stated objectives of this infrastructure upgrade which include : Physical Hardware Redundancy, High Availability achieved by Switch Clustering, Self-healing, Increased bandwidth (10 GB trunk), to mention a few.

Quick Background on VSS:
A Virtual switching system (VSS) combines a pair of Catalyst 4500X series switches into a single network component, enabling them to function as one logical switch.Cisco Virtual Switching System is a clustering technology that pools two Cisco Catalyst 4500X Series Switches into a single virtual switch.

In a VSS, the data plane of both clustered switches is active at the same time in both chassis. In my VSS implementation, both VSS Switch members are connected by 2 virtual switch links (VSLs) using 10 Gigabit Ethernet connections between the VSS members. Virtual Switch Links carry regular user traffic in addition to control data between the VSS members.


I have outlined my VSS configuration steps for a pair of Cisco Catalyst 4500X switches below starting with Switch-1. The following diagram illustrates the physical topology:


Switch 1 Virtual Domain and Port Channel Configuration:

Switch-1(config)#switch virtual domain 100

Switch-1(config-vs-domain)#switch 1


Switch-1(config)#interface port-channel 10


Switch-1(config-if)#switch virtual link 1

Switch-1(config-if)#no shutdown


Configure Virtual Switch Link:

Switch-1(config)#interface range tenGigabitEthernet 1/1-2

Switch-1(config-if)#channel-group 10 mode on

Switch-1(config-if)#no shutdown

Switch-1(config-if)#channel-group 10 mode on
WARNING: Interface TenGigabitEthernet1/1,2 placed in restricted config mode. All extraneous configs removed!

Switch-1(config)#do wr mem (Save the current configuration)


Switch-1#switch convert mode virtual (Execute the command, but do not reload until VSS configuration is completed on Switch 2)

Switch 2 Virtual Domain and Port Channel Configuration:

Switch-2(config)#switch virtual domain 100

Switch-2(config-vs-domain)#switch 2


Switch-2(config)#interface port-channel 20


Switch-2(config-if)#switch virtual link 2

Switch-2(config-if)#no shutdown


Configure Virtual Switch Link:

Switch-2(config)#interface range tenGigabitEthernet 1/1-2

Switch-2(config-if)#channel-group 20 mode on

Switch-2(config-if)#no shutdown

Switch-2(config-if)#channel-group 20 mode on
WARNING: Interface TenGigabitEthernet1/1,2 placed in restricted config mode. All extraneous configs removed!

Switch-2(config)#do wr mem (Save the current configuration)


Switch-2#switch convert mode virtual

At this point, console into Switch-1 . You will be prompted to save the work and confirm the switch reboot. Do the same for Switch-2.

After the reboot, verify the VSS configuration:

Switch-1#sh switch virtual

Executing the command on VSS member switch role = VSS Active, id = 1

Switch mode : Virtual Switch
Virtual switch domain number : 100
Local switch number : 1
Local switch operational role: Virtual Switch Active
Peer switch number : 2
Peer switch operational role : Virtual Switch Standby

Executing the command on VSS member switch role = VSS Standby, id = 2

Switch mode : Virtual Switch
Virtual switch domain number : 100
Local switch number : 2
Local switch operational role: Virtual Switch Standby
Peer switch number : 1
Peer switch operational role : Virtual Switch Active

Verify Dual Active Detection:

Switch-1#sh switch virtual dual-active summary

Executing the command on VSS member switch role = VSS Active, id = 1

Pagp dual-active detection enabled: Yes
FastHello dual-active detection enabled: Yes
In dual-active recovery mode: No

Executing the command on VSS member switch role = VSS Standby, id = 2

Pagp dual-active detection enabled: Yes
FastHello dual-active detection enabled: Yes
In dual-active recovery mode: No

After the VSS configuration and restart, both switches start to function as one. One switch is designated as the Active and the other as the Standby switch. If I attempt to console into the Standby switch and run commands, I get the following prompt:

Switch-1-Standy#sh run
Standby console disabled.
Valid commands are: exit, logout

Configure VSS Cluster Uplink to Distribution Stack Switches as Multichassis EtherChannel (MEC):

VSS enables the creation of Multi-Chassis EtherChannel (MEC), which is an Etherchannel whose member ports are distributed across the member switches in a VSS. The fact that ports from both chassis of the Virtual Switching System are included in this etherchannel makes it a Multichassis EtherChannel (MEC). My MEC configuration steps are below.

Console into the VSS switch cluster. Doesn’t matter if it’s the Switch master or slave, and configure the etherchannel switchport as it would on any other switch(In this scenario, I’m consoled into the standby/slave switch):

Switch-1(config)#interface port-channel 30
Switch-1(config)# no shut

Add ports from both chassis of the VSS Cluster to the Port Channel:

Switch-1(config)#interface range tenGigabitEthernet 1/1/3, tenGigabitEthernet 2/1/3
Switch-1(config-if)#switchport channel-group 30 mode on
Switch-1(config-if)#no shut

Configure the port channel and the physical ports in the Upstream Distribution Stack Switch:

DataCenterStack(config)#interface port-channel 30
DataCenterStack(config-if)#switchport trunk encapsulation dot1q
DataCenterStack(config-if)#switchport mode trunk
DataCenterStack(config-if)#no shut

Add the Stack switches physical ports to the Channel group. This would be the ports fitted with the X2-10GB-SR fiber converters on the Distribution Layer 3750E-48PD switches as indicated in the diagram above:

DataCenterStack(config)#interface range TenGigabitEthernet1/0/1-2
DataCenterStack(config-if)#switchport trunk encapsulation dot1q
DataCenterStack(config-if)#switchport mode trunk
DataCenterStack(config-if)#channel-group 30 mode on

Verify Ether Channel configuration :

DataCenterStack#sh etherchannel summary
Flags: D - down P - bundled in port-channel
I - stand-alone s - suspended
H - Hot-standby (LACP only)
R - Layer3 S - Layer2
U - in use f - failed to allocate aggregator

M - not in use, minimum links not met
u - unsuitable for bundling
w - waiting to be aggregated
d - default port

Number of channel-groups in use: 1
Number of aggregators: 1

Group Port-channel Protocol Ports
30 Po30(SU) - Te1/0/1(P) Te1/0/2(P)

I configured an SVI (Switch Virtual Interface) for telnet or ssh(preferably) management. I would add that it’s important to verify that the version of Cisco IOS-XE software on both VSS switches is the same.

Posted in Cisco, Virtual Switch System, VSS | Tagged , , , , , , , , , , , , , | Leave a comment