Reading data from Athena via the command line - Comparing the options (AWS CLI, Python Boto, Java, AWS SDK)
Reading data from Athena via the command line - Comparing the options (AWS CLI, Python Boto, Java, AWS SDK)
Reading data via the command line has the obvious advantage of automation.
Once you have command-line data retrieval in place, you can easily script the data retrieval and schedule it to automatically execute at intervals. Also, you can automate the whole process of retrieving data from Athena, using ANSI SQL to perform the heavy lifting of aggregation. Once the aggregated data has been retrieved and stored locally, then further automation is possible, for example loading the aggregate data into a local database, or processing the data via Python Pandas into beautiful charts to gain insights.
Options for reading data from Athena:
1. AWS command line tool.
The most powerful way to execute SQL queries against Athena is to use the AWS command line tool.
Unlike the various language-specific client libraries available, this tool provides full control over how you access the Athena service. In particular, you can specify which of the various REST API calls to use to retrieve data.
If the Athena database is provided by a third party or another department in the company, then this freedom to specify exact REST call to use can save a lot of time and avoid re-configuration.
A disadvantage of this approach is that you will need to write more code, than when using an Athena client library:
- send the request to Athena
- poll until a result is available
- retrieve the data to a local file
- process the data into a more compact format. For example, the JSON retrieved in this manner tends to be in a verbose structure that is not ideal to work with.
For example bash scripts to run SQL queries against Athena, and use jq to process the resulting JSON into a compact form, see my github project.
Also see this walk-through guide.
See the available commands to execute against the Athena API.
For data retrieval, options include get-query-results and start-query-execution.
2. Python boto library
An obvious choice if scripting via Python, with the one drawback that you lose low-level control over which Athena REST call is being used. See the boto documentation.
3. Java access
For Java developers who wish to use minimal third party code, using JDBC makes sense.
Also you can create your own Java client to access Athena.
4. Java SDK provided by Amazon
Amazon provide a full Java SDK to access their various services.
This includes the Athena service, there are examples for accessing Athena, as well as more general examples.
Tip: access denied or insufficient permissions to access Athena:
If you receive an UnrecognizedClientException when querying Athena, or some other access issue, then please see my blog post for tips to resolve.
Comments
Post a Comment