Well, this one is super broken, which one finds out after shaving a number of yaks.
We want to query Parquet files that sit in Azure Data Lake Storage with Athena. AWS has what seems to be a nice documentation on how to do it… Except:
- Searching for it in Serverless Application Repository with “azure” or “adsl” terms is not yielding anything.
- Additionally there seems to be a bug there, per AWS support:
Issue:
– The search functionality appears to be unresponsive when using the traditional “Enter” key method
– This seems to be a technical bug in the console
Workaround:
– Enter your search term in the search bar – Instead of pressing Enter, click anywhere on the screen
– This should trigger the search functionality and display the results - Search for something like “gen2” actually yields something… It’s a AthenaDataLakeGen2Connector — which is the same thing as below, so read on.
- Additionally there seems to be a bug there, per AWS support:
- Trying to add the Data Source from Athena, selecting “Microsoft Azure Data Lake Storage (ADLS) Gen2” connector… It is based on athena-datalakegen2 code which is borken because the underlying mssql JDBC driver is borken.
- After patching the mssql driver and the connector, we realize that it is trying to connect via JDBC to ADLS, but that is not supported. And yet AWS claims “the documentation is correct“.
Srsly now, AWS and Microsoft, you even tested anything?
It’s already 2025, and still…
[…] Microsoft has a fix for an issue quite quickly (mentioned in a previous post). […]
LikeLike
[…] our previous installment, we learned that Athena does not support ADLS directly (without Synapse). I decided to try to […]
LikeLike