Connect and Query Live Presto Data in Databricks with CData Connect Cloud

Use CData Connect Cloud to integrate live Presto data into Databricks and enable direct, live querying and analysis without replication.

Databricks is a leading AI cloud-native platform that unifies data engineering, machine learning, and analytics at scale. Its powerful data lakehouse architecture combines the performance of data warehouses with the flexibility of data lakes. Integrating Databricks with CData Connect Cloud gives organizations live, real-time access to Presto data without the need for complex ETL pipelines or data duplication—streamlining operations and reducing time-to-insights.

In this article, we'll walk through how to configure a secure, live connection from Databricks to Presto using CData Connect Cloud. Once configured, you'll be able to access Presto data directly from Databricks notebooks using standard SQL—enabling unified, real-time analytics across your data ecosystem.

About Presto Data Integration

Accessing and integrating live data from Trino and Presto SQL engines has never been easier with CData. Customers rely on CData connectivity to:

Access data from Trino v345 and above (formerly PrestoSQL) and Presto v0.242 and above (formerly PrestoDB)
Read and write access all of the data underlying your Trino or Presto instances
Optimized query generation for maximum throughput.

Presto and Trino allow users to access a variety of underlying data sources through a single endpoint. When paired with CData connectivity, users get pure, SQL-92 access to their instances, allowing them to integrate business data with a data warehouse or easily access live data directly from their preferred tools, like Power BI and Tableau.

In many cases, CData's live connectivity surpasses the native import functionality available in tools. One customer was unable to effectively use Power BI due to the size of the datasets needed for reporting. When the company implemented the CData Power BI Connector for Presto they were able to generate reports in real-time using the DirectQuery connection mode.

Getting Started

Overview

Here is an overview of the simple steps:

Step 1 — Connect and Configure: In CData Connect Cloud, create a connection to your Presto source, configure user permissions, and generate a Personal Access Token (PAT).
Step 2 — Query from Databricks: Install the CData JDBC driver in Databricks, configure your notebook with the connection details, and run SQL queries to access live Presto data.

Prerequisites

Before you begin, make sure you have the following:

An active Presto account.
A CData Connect Cloud account. You can log in or sign up for a free trial here.
A Databricks account. Sign up or log in here.

Step 1: Connect and Configure a Presto Connection in CData Connect Cloud

1.1 Add a Connection to Presto

CData Connect Cloud uses a straightforward, point-and-click interface to connect to available data sources.

Log into Connect Cloud, click Sources on the left, and then click Add Connection in the top-right.

Adding a Connection in CData Connect Cloud

Select "Presto" from the Add Connection panel.

Enter the necessary authentication properties to connect to Presto.
Set the Server and Port connection properties to connect, in addition to any authentication properties that may be required.

To enable TLS/SSL, set UseSSL to true.

Authenticating with LDAP

In order to authenticate with LDAP, set the following connection properties:
- AuthScheme: Set this to LDAP.
- User: The username being authenticated with in LDAP.
- Password: The password associated with the User you are authenticating against LDAP with.
Authenticating with Kerberos

In order to authenticate with KERBEROS, set the following connection properties:
- AuthScheme: Set this to KERBEROS.
- KerberosKDC: The Kerberos Key Distribution Center (KDC) service used to authenticate the user.
- KerberosRealm: The Kerberos Realm used to authenticate the user with.
- KerberosSPN: The Service Principal Name for the Kerberos Domain Controller.
- KerberosKeytabFile: The Keytab file containing your pairs of Kerberos principals and encrypted keys.
- User: The user who is authenticating to Kerberos.
- Password: The password used to authenticate to Kerberos.
Click Save & Test in the top-right.
Navigate to the Permissions tab on the Presto Connection page and update the user-based permissions based on your preferences.

1.2 Generate a Personal Access Token (PAT)

When connecting to Connect Cloud through the REST API, the OData API, or the Virtual SQL Server, a Personal Access Token (PAT) is used to authenticate the connection to Connect Cloud. PAT functions as an alternative to your login credentials for secure, token-based authentication. It is a best practice to create a separate PAT for each service to maintain granularity of access.

Click on the Gear icon () at the top right of the Connect Cloud app to open the settings page.
On the Settings page, go to the Access Tokens section and click Create PAT.
Give the PAT a name and click Create.
Note: The personal access token is only visible at creation, so be sure to copy it and store it securely for future use.

Step 2: Connect and Query Presto Data in Databricks

Follow these steps to establish a connection from Databricks to Presto. You'll install the CData JDBC Driver for Connect Cloud, add the JAR file to your cluster, configure your notebooks, and run SQL queries to access live Presto data data.

2.1 Install the CData JDBC Driver for Connect Cloud

In CData Connect Cloud, click the Integrations page on the left. Search for JDBC or Databricks, click Download, and select the installer for your operating system.
Once downloaded, run the installer and follow the instructions:
- For Windows: Run the setup file and follow the installation wizard.
- For Mac/Linux: Unpack the archive and move the folder to /opt or /Applications. Make sure you have execute permissions.

After installation, locate the JAR file in the installation directory:

Windows:

C:\Program Files\CData\CData JDBC Driver for Connect Cloud\lib\cdata.jdbc.connect.jar

Mac/Linux:

/Applications/CData/CData JDBC Driver for Connect Cloud/lib/cdata.jdbc.connect.jar

2.2 Install the JAR File on Databricks

Log in to Databricks. In the navigation pane, click Compute on the left. Start or create a compute cluster.
Click on the running cluster, go to the Libraries tab, and click Install New at the top right.
In the Install Library dialog, select DBFS, and drag and drop the cdata.jdbc.connect.jar file. Click Install.

2.3 Query Presto Data in a Databricks Notebook

Notebook Script 1 — Define JDBC Connection:

Paste the following script into the notebook cell:

driver = "cdata.jdbc.connect.ConnectDriver"
url = "jdbc:connect:AuthScheme=Basic;User=your_username;Password=your_pat;URL=https://cloud.cdata.com/api/;DefaultCatalog=Your_Connection_Name;"

Replace:
- your_username - With your CData Connect Cloud username
- your_pat - With your CData Connect Cloud Personal Access Token (PAT)
- Your_Connection_Name - With the name of your Connect Cloud data source, from the Sources page
Run the script.

Notebook Script 2 — Load DataFrame from Presto data:

Add a new cell for this second script. From the menu on the right side of your notebook, click Add cell below.
Paste the following script into the new cell:

remote_table = spark.read.format("jdbc") \
  .option("driver", "cdata.jdbc.connect.ConnectDriver") \
  .option("url", "jdbc:connect:AuthScheme=Basic;User=your_username;Password=your_pat;URL=https://cloud.cdata.com/api/;DefaultCatalog=Your_Connection_Name;") \
  .option("dbtable", "YOUR_SCHEMA.YOUR_TABLE") \
  .load()

Replace:
- your_username - With your CData Connect Cloud username
- your_pat - With your CData Connect Cloud Personal Access Token (PAT)
- Your_Connection_Name - With the name of your Connect Cloud data source, from the Sources page
- YOUR_SCHEMA.YOUR_TABLE - With your schema and table, for example, Presto.Customer
Run the script.

Notebook Script 3 — Preview Columns:

Similarly, add a new cell for this third script.
Paste the following script into the new cell:

display(remote_table.select("ColumnName1", "ColumnName2"))

Replace ColumnName1 and ColumnName2 with the actual columns from your Presto structure (e.g. FirstName, LastName, etc.).
Run the script.

Previewing Presto data data in Databricks notebook

You can now explore, join, and analyze live Presto data directly within Databricks notebooks—without needing to know the complexities of the back-end API and without replicating Presto data.

Try CData Connect Cloud Free for 14 Days

Ready to simplify real-time access to Presto data? Start your free 14-day trial of CData Connect Cloud today and experience seamless, live connectivity from Databricks to Presto.

Low code, zero infrastructure, zero replication — just seamless, secure access to your most critical data and insights.

Ready to get started?

Learn more about CData Connect Cloud or sign up for free trial access:

Free Trial