Model Context Protocol (MCP) finally gives AI models a way to access the business data needed to make them really useful at work. CData MCP Servers have the depth and performance to make sure AI has access to all of the answers.
Try them now for free →Boost Your Amazon Athena Performance by 4x with CData Streaming API Support
With the recent support for Amazon’s Streaming API, the CData Amazon Athena JDBC driver unlocks massive performance gains, especially for queries that return large result sets or require high concurrency. The Streaming API brings lower memory overhead, reduced latency, and a significant improvement in end-to-end execution time.
This article benchmarks the performance improvements achieved using the new Streaming API mode, comparing it against the standard Native JDBC mode across various row counts. Test results show that the Streaming API implementation by the CData JDBC driver can be up to 4x faster, especially for small to medium-sized data volumes.
Test configuration
Key testing parameters included:
- Source: All queries executed on Amazon S3 data
- Drivers tested: Native Athena JDBC vs CData Amazon Athena JDBC Driver
- APIs used: S3 Fetcher (CSV), Streaming API (JSON)
- PageSize: Set to 1,000,000 for Streaming API
- Exclusions: JSON API was excluded due to excessive slowness (>1400s for 1M rows)
- Implementation detail: The tests in the CData driver are based on the new implementation of GetQueryResultsStream API, which extends GetQueryResults API and uses JsonStreamRows method to parse the results
- Test runs: Each test was repeated four times; average, minimum, and maximum values were recorded
- Modes compared:
- Native JDBC API
- CData JDBC driver with Streaming API
Native JDBC driver: S3
This baseline test evaluates the native driver fetching results as CSV files from S3.
| Rows | Avg (s) | Min (s) | Max (s) |
|---|---|---|---|
| 1K | 4.17 | 3.13 | 6.04 |
| 10K | 3.67 | 3.38 | 3.94 |
| 100K | 6.70 | 6.74 | 7.11 |
| 500K | 21.22 | 16.43 | 33.26 |
| 1M | 35.22 | 33.39 | 38.02 |
| 2M | 66.56 | 63.04 | 74.02 |
Observation: S3 Fetcher is consistently faster than Streaming in the Native JDBC due to direct CSV streaming.
Native JDBC driver: Streaming API
Here, the native driver uses the Streaming API, which returns JSON rows.
| Rows | Avg (s) | Min (s) | Max (s) |
|---|---|---|---|
| 1K | 4.17 | 3.13 | 6.04 |
| 10K | 3.67 | 3.38 | 3.94 |
| 100K | 6.70 | 6.74 | 7.11 |
| 500K | 21.22 | 16.43 | 33.26 |
| 1M | 35.22 | 33.39 | 38.02 |
| 2M | 66.56 | 63.04 | 74.02 |
Observation: The Streaming API in the native driver performs worse than the S3 Fetcher and exhibits higher latency, particularly for large datasets.
CData Amazon Athena JDBC driver: S3
CData JDBC driver’s S3 fetcher leverages optimized CSV reads for fast access.
| Rows | Avg (s) | Min (s) | Max (s) |
|---|---|---|---|
| 1K | 1.90 | 0.29 | 6.46 |
| 10K | 2.20 | 0.56 | 6.00 |
| 100K | 3.30 | 1.05 | 8.67 |
| 500K | 7.45 | 4.71 | 14.95 |
| 1M | 14.61 | 10.53 | 22.60 |
| 2M | 28.59 | 21.39 | 44.49 |
Observation: CData JDBC driver’s S3 mode consistently outperforms Native JDBC, even its Streaming API.
CData Amazon Athena JDBC driver: Streaming API
This is the driver’s newest enhancement, which streams JSON rows while reducing memory overhead and improving concurrency.
| Rows | Avg (s) | Min (s) | Max (s) |
|---|---|---|---|
| 1K | 1.97 | 0.41 | 4.75 |
| 10K | 1.06 | 0.57 | 2.49 |
| 100K | 3.17 | 2.46 | 4.30 |
| 500K | 13.17 | 11.00 | 17.40 |
| 1M | 30.40 | 25.50 | 33.30 |
| 2M | 55.75 | 48.72 | 64.39 |
Observation: At 10K rows, Streaming API completes in just 1.06s, compared to 3.67s in Native, which is nearly 4x faster.
Comparative insight: Why is streaming sometimes slower?
Despite the speed benefits for small data, S3 still outperforms streaming for larger datasets. Here are some of the contributing factors:
- Data format: Streaming returns JSON rows; S3 returns CSV, which is faster to parse.
- Request handling: S3 fetcher streams a single CSV file, while Streaming API paginates results, even with large PageSizes.
- Serialization overhead: Athena must read from S3 and serialize rows as JSON, increasing latency on the API.
Overall driver efficiency
CData Amazon Athena JDBC driver consistently outperforms Native JDBC driver across all configurations:
| Rows | Native JDBC (s) | CData JDBC (s) | Gain |
|---|---|---|---|
| 1K | 4.17 (Stream) | 1.90 (S3) | ~2.2x |
| 10K | 3.67 (Stream) | 1.06 (Stream) | ~3.5x |
| 100K | 6.54 (S3) | 3.17 (Stream) | ~2.1x |
| 500K | 15.16 (S3) | 7.45 (S3) | ~2.0x |
| 1M | 28.91 (S3) | 14.61 (S3) | ~2.0x |
| 2M | 49.34 (S3) | 28.59 (S3) | ~1.7x |
With the introduction of Streaming API support, the CData Amazon Athena JDBC driver pushes the boundaries of what’s possible for real-time analytics, ingestion pipelines, and fast data previews.
Key benefits:
- Low latency for small & mid-range queries
- Reduced memory footprint through row-level streaming
- Consistent 2x–4x speedup over native implementations
- Simple deployment with built-in API mode selection
Power Amazon Athena with faster and smarter querying using CData
For organizations that rely on Athena for real-time analytics and ad-hoc reporting, adopting the Streaming API-enabled CData Amazon Athena JDBC driver offers a clear performance advantage. Whether you're building dashboards, ingesting data into processing pipelines, or optimizing user-facing queries, the Streaming API delivers unmatched speed and efficiency.
Ready to get started? Download a free 30-day trial today and unlock the capability of streaming performance in your cloud analytics stack! As always, our world-class Support Team is available to assist you with any questions you may have.