Installing and Configuring Databricks Using the Azure CLI and generating Azure Databricks Token using PowerShell script – Running the Script through YML pipeline

Databricks comes with a CLI tool that provides a way to interface with resources in Azure Databricks. It’s built on top of the Databricks REST API and can be used with the Workspace, DBFS, Jobs, Clusters, Libraries and Secrets API.

In this technical article I wanted to give some insight about how to install Databricks CLI and authenticate to the Databricks using Databricks URL and DB Token through YML pipeline.

The basic steps to configure the Azure Databricks in your application are:

  1. Install Databricks CLI: In order to install the CLI, you’ll need Python version 2.7.9 and above if you’re using Python 2 or Python 3.6 and above if you’re using Python 3. In yaml pipeline you can use task UsePythonVersion@0 to use pip command which will install databricks-cli

    – task: UsePythonVersion@0
    displayName: ‘Use Python 3.x’
    – script: |
    pip install databricks-cli

     

  2. Authenticate and connect to databricks: To login/authenticate into databricks we’ll need to set up Authentication for it. To do this, we use a Databricks personal access token and databricks URL. To generate a token from portal, head into user settings in your Azure Databricks profile and go to access tokens else Databricks Token can be generated dynamically using PowerShell script.Databricks URL you can find it from lunching page of Azure Databricks, it looks something like this: https://adb-xxxxxxxxxxxxxxxx.x.azuredatabricks.net  or you can use below PowerShell script to generate.

    $databricks=”dabrxxxx”+$environment
    $workspaceUrl=(Get-AzResource -Name $databricks -ResourceGroupName $resourceGroupName -ExpandProperties).Properties.workspaceUrl
    $workspaceUrl= “https://”+$workspaceUrl

    You can generate Databricks Token using PowerShell, you can refer below sample snippet for your reference.

    Note: the lifetime of the DB Token would be 3 months, after that need to be generated once again or can be automated in PowerShell also.

    $jsonBody = ‘{“lifetime_seconds”: 7776000, “comment”: “ADF Databrics Token”}’
    $requestParams = @{
    ‘Uri’ = “$($workspaceUrl)/api/2.0/token/create”
    ‘Method’ = ‘Post’
    ‘Body’ = $jsonBody
    ‘ContentType’ = ‘application/json’
    }

    $tokenResult=””
    $tokenResult = Invoke-RestMethod @requestParams -Headers $Headers
    $secretTokenvalue = ConvertTo-SecureString $tokenResult.token_value -AsPlainText -Force
    # you can store the secret values in key Vault using below command and again can retrieve and pass as output parameter to the pipeline
    $secretTokenid = ConvertTo-SecureString $tokenResult.token_info.token_id -AsPlainText -Force

    $secretToken = Set-AzKeyVaultSecret -VaultName $keyvaultName -Name ‘dabrToken’ -SecretValue $secretTokenvalue
    $secretTokenId = Set-AzKeyVaultSecret -VaultName $keyvaultName -Name ‘dabrTokenId’ -SecretValue $secretTokenid

    $dabrToken = Get-AzKeyVaultSecret -vaultName $keyvaultName -name ‘dabrToken’ -AsPlainText
    # below command can be used to assign the return Token value and Token Id to Deveops or Pipeline parameters, here $ $databricksPATToken and $workspaceUrl are two parameters geeting used in CI/CD or Yml pipeline
    echo “##vso[task.setvariable variable=databricksPATToken]$dabrToken”
    echo “##vso[task.setvariable variable=databricksUrl]$workspaceUrl”

    Then you can connect to Azure Databricks workspace using below CLI script in Yml Pipeline

    – bash: |

    echo $HOME
    echo -e “[DEFAULT]\nhost: $(databricksUrl)\ntoken: $(databricksPATToken)” > $HOME/.databrickscfg
    echo -e “Testing the conncection – listing dbfs:/”
    dbfs ls

     

  3. Create a folder in databricks workspace and import a config file into the folder in databricks workspace and execute it. You can mention the below script within a PowerShell task or can execute a .ps1 file from yml pipeline. Refer below code snippet for reference.
    – 

    databricks workspace mkdirs /ABC/XYZ
    databricks workspace import -l SCALA -o “$pipelineWorkspace/$artifactName/(folder name)/databricks_config.scala” /dbws/dbConfig/databricks_config.scala
    $Run_Output=databricks runs submit –json-file “$pipelineWorkspace/$artifactName/(folder name)/config_job.json”

    – task: AzurePowerShell@5
    displayName: ‘Run Databricks script’
    inputs:
    azureSubscription: ${{ parameters.subscriptionName }}
    ScriptPath: ‘${{ parameters.workspace }}/${{ parameters.artifactName }}/(folder name)/Configure-Databricks- Cluster.ps1’

    I hope you have become wiser by the article and that you have been helped by the scripts. Do not hesitate to contact me if you have any further questions!

Sanjeev Nayak
sanjeev.nayak@capgemini.com
My name is Sanjeev Nayak and I work as a BI & Azure consultant at Capgemini in Stockholm. I have over 15+ years of experience in Microsoft BI and Azure. I have worked on various technologies throughout my career which includes MSBI stacks (SSIS, SSAS, SQL Server etc.), Azure, Azure SQL Datawarehouse, Azure Data Factory (ADF), Databricks, Scala, Python, CI/CD and yaml script etc.

All posts by Sanjeev Nayak