CData Developer Center

Simplified Integration for Data Developers

Explore Developer Resources

.NET Build C# & ASP.NET apps to work with live data
Python Build Python apps & scripts to work with live data
Java Build Java apps to work with live data
SQL Server Use SSIS, Linked Server, and pure SQL to work with live data
Model Context Protocol Ask your AI for answers from any data source
Join the Community Join the CData Community to ask questions, get answers, and engage with your peers


Using drivers and connectors for data development

Overview

Standards-based drivers offer a straightforward way to work with data from Software-as-a-Service (SaaS) platforms and other non-database sources using SQL. By leveraging SQL-based connectors, you can query disparate data sources with the same SQL syntax you already use for traditional databases. Continue reading to discover the key advantages of using standards-based drivers for data integration tasks.

Benefits of Drivers for data developers

Unified SQL interface

With standards-based drivers, there is no need to learn or maintain multiple language-specific SDKs. Instead, you can treat each data source as if it were a relational database, sending and receiving data through familiar SQL statements. Common operations such as SELECT, JOIN, GROUP BY, SUM, ORDER BY, and WHERE let you quickly aggregate, filter, and transform data without writing custom logic for each platform.

Building relational models of APIs

  • Tables and views correspond roughly to resource collections (Orders, Accounts, Users, etc.)
  • Individual elements from a resource generally correspond to a row, with fields mapped to columns.
  • Sub-collections can have a foreign-key relationship with parent collections (Orders and Order Line Items)
  • CRUD operations are roughly translated to SQL statements:
    HTTP Request Equivalent SQL
    GET SELECT
    POST INSERT
    PUT, PATCH UPDATE
    DELETE DELETE
  • Operations and entities that are not easily represented are implemented through stored procedures

Discoverable metadata

Most standards-based drivers support discovering endpoints and resources through queries against metadata tables. This feature reduces the time you spend searching through API documentation by making data structures and their relationships more readily accessible. Simply query standard system tables to discover the metadata for your data source.

Discovering the entities, objects, or endpoints

Query the "sys_tables" column just like you would a database to learn which "tables" and "views" (representing the endpoints, objects, or entities) are available for your data source.

SELECT * FROM sys_tables;
CatalogName SchemaName TableName TableType Description IsUpdateable
CData Salesforce Account TABLE (null) true
CData Salesforce Opportunity TABLE (null) true
Discovering the fields, dimensions, or measures

Query the "sys_tablecolumns" system table to get all the information you need to start working with the fields, dimensions, or measures for a given object, including SQL-compliant data types, field size, and more.

SELECT * FROM sys_tablecolumns WHERE TableName = 'Account';
CatalogName SchemaName TableName ColumnName DataType DataTypeName Length
CData Salesforce Account Id 12 VARCHAR 2000
CData Salesforce Account IsDeleted 16 BIT 1
CData Salesforce Account Name 12 VARCHAR 2000
CData Salesforce Account BillingStreet 12 VARCHAR 2000
CData Salesforce Account BillingCity 12 VARCHAR 2000
CData Salesforce Account BillingState 12 VARCHAR 2000
CData Salesforce Account BillingPostalCode 4 INT 4
CData Salesforce Account BillingCountry 12 VARCHAR 2000
CData Salesforce Account BillingLatitude 12 VARCHAR 2000
CData Salesforce Account BillingLongitude 12 VARCHAR 2000
CData Salesforce Account BillingGeocodeAccuracy 12 VARCHAR 2000
CData Salesforce Account Phone 12 VARCHAR 2000
CData Salesforce Account Fax 12 VARCHAR 2000
CData Salesforce Account CreatedDate 93 TIMESTAMP 8
CData Salesforce Account CreatedById 12 VARCHAR 2000
CData Salesforce Account LastModifiedDate 93 TIMESTAMP 8
CData Salesforce Account LastModifiedById 12 VARCHAR 2000

Consistent connectivity across environments

Once the drivers are installed, you can work with SaaS data just as you would any relational database. By reusing established database skills and workflows, you can expedite integration projects and simplify collaboration within data engineering teams.

Write standard SQL queries with support for joins, updates, aggregation, and more. Our drivers compensate for features lacking in the source.

SELECT WITH INNER JOIN
SELECT Account.Id, Account.Name, Account.Fax, Opportunity.AccountId, Opportunity.CloseDate FROM Account INNER JOIN Opportunity ON Account.Id = Opportunity.AccountId
AGGREGATE FUNCTIONS
SELECT Name, AVG(AnnualRevenue) FROM Account GROUP BY Name
CRUD OPERATIONS
UPDATE Account SET Name='John' WHERE Id = @myId

Reduced complexity

Because CData develops and maintains the drivers & connectors, we insulate your application from underlying API changes. This reduces the need to refactor code in response to each new API version. Additionally, by using a SQL interface, you minimize the learning curve and code duplication that often arise when working with numerous APIs.

Simplified authentication

CData's drivers and connectors manage complex authentication schemes. Check out our OAuth primer to learn how we simplify authentication.

Modeling NoSQL data

SQL access to NoSQL sources like JSON documents requires intelligent modeling to support reading and writing data. Read our Knowledge Base articles to see how we build relational models for NoSQL data like JSON or Elasticsearch and multidimensional data like that found in Google Analtyics.

JSON

CData uses different flattening techniques to render hierarchical semi-structured data in formats like JSON and XML into a tabular model. Nested fields maintain their structured identity through dot "." or underscore "_" notation while array objects are identified by their indices. Below you'll find a sample JSON document along with the tabular model created by CData drivers and connectors. Row scanning is used to determine data types where possible.

{ "_id" : ObjectId("5780046cd5a397806c3dab38"), "address" : { "building" : "1007", "coord" : [-73.856077, 40.848447], "street" : "Morris Park Ave", "zipcode" : "10462" }, "borough" : "Bronx", "cuisine" : "Bakery }

Tabular schema of JSON data

Column Name Data Type Sample Value
_id VARCHAR 5780046cd5a397806c3dab38
address_building INT 1007
address_coord_0 FLOAT -73.856077
address_coord_1 FLOAT 40.848447
address_street VARCHAR Morris Park Ave
address_zipcode INT 10462
borough VARCHAR Bronx
cuisine VARCHAR Bakery

NOTE: address_building and address_zipcode are scanned and determined to be type INT based on the sample data. If more values are present in the collection the data type would likely be different as, realistically, these items could be non-numerical.

Check out our Knowledge Base article (Leading-Edge Drivers for NoSQL Integration) for a thorough explanation of our JSON modeling techniques and the options available for customization.

We have other articles that discuss our modeling techniques for other NoSQL sources like Elasticsearch and multidimensional data like that found in Google Analtyics.

Caching and performance

Many drivers offer in-memory or persistent caching. This feature boosts performance by storing results from frequently repeated queries, eliminating redundant API calls. The caching layer can be a significant advantage for large-scale analytics and reporting scenarios where data needs to be accessed repeatedly.

There are two major features of our drivers that improve the performance when it comes to reading and writing data, query push-down and paging/batch operation support.

Performance: query push-down

Many services, SaaS applications, and data stores support complex querying, whether through a SQL-like interface or deep support for filters, aggregations and grouping, joins, and more within their API or protocol. Wherever possible, CData drivers push complex SQL functionality down to the data source. For example:

In CData, you could use a query like

SELECT * FROM Employees WHERE ID = 'A1245'

This query would get translated into a GET request to the following URL for an OData service:

http://host/service/Employees(ID='A1245')

Each driver is specifically engineered to its data source, so this functionality is implemented as deeply possible for each API, protocol, or service.

Performance: paging & batch operations

Developers have been writing API calls for as long as APIs have existed. But retrieving large data sets often means multiple "pages" of data that must be collected, parsed, and recombined to create the complete data set. CData drivers automatically handle the pagination of data and combine it in memory before surfacing it within the application or framework from which it was requested. This provides a seamless experience and often faster retrieval of data.

Batch or bulk operations are another mechanism services and applications use to deliver (or receive) large data sets. Typically, this flow requires multiple calls, starting with a request to start building the batch, then multiple subsequent requests to determine the status of the job until it is complete, then a final request to retrieve the compiled data set. For most endpoints, CData drivers handle this flow automatically with a single SELECT request in SQL instead of requiring multiple requests and handling multiple responses.

Example use case

Consider an e-commerce SaaS platform such as Magento. If you want to generate a list of top-spending customers by region, you can simply issue a SQL query that uses JOIN, GROUP BY, and SUM to aggregate the data in one step. Without a standards-based driver, you would likely have to write separate logic for each data endpoint, increasing both development time and maintenance overhead.

Next Steps

By using standards-based drivers, data developers can get to eliminate the complexity of connecting to a wide range of data sources, making applications faster and easier to build and maintain.