This extension allows you to connect DuckDB to Microsoft Fabric OneLake workspaces and lakehouses, enabling you to query data stored in OneLake directly from DuckDB.
Maintainer(s):
achrafcei
Installing and Loading
INSTALL onelake FROM community;
LOAD onelake;
Example
-- First, create a secret to authenticate with OneLake.
-- You can use either a service principal or the credential chain (e.g., Azure CLI).
-- Uncomment the desired method and provide the necessary credentials.
set azure_transport_option_type = 'curl';
CREATE SECRET (
TYPE azure,
PROVIDER service_principal,
TENANT_ID '<your_tenant_id>',
CLIENT_ID '<your_client_id>',
CLIENT_SECRET '<your_client_secret>'
);
-- CREATE SECRET (
-- TYPE azure,
-- PROVIDER credential_chain,
-- CHAIN 'cli'
-- );
CREATE SECRET onelake (
TYPE ONELAKE,
TENANT_ID '<your_tenant_id>',
CLIENT_ID '<your_client_id>',
CLIENT_SECRET '<your_client_secret>'
);
-- CREATE SECRET onelake(
-- TYPE ONELAKE,
-- PROVIDER credential_chain,
-- CHAIN 'cli'
-- );
-- Optional: use preissued tokens stored in env variables (defaults shown), if they are already available with the same name, no need to set them here (only update the SET commands if the names are different
--SET onelake_env_fabric_token_variable = 'FABRIC_API_TOKEN';
--SET onelake_env_storage_token_variable = 'AZURE_STORAGE_TOKEN';
--CREATE SECRET onelake_env (
-- TYPE ONELAKE,
-- PROVIDER credential_chain,
-- CHAIN 'env'
--);
-- Combine chain steps if you want CLI fallback
--CREATE SECRET onelake_env_chain (
-- TYPE ONELAKE,
-- PROVIDER credential_chain,
-- CHAIN 'cli, env'
--);
-- Optionally keep the token in-session instead of touching the shell
--SET VARIABLE AZURE_STORAGE_TOKEN = '<preissued_onelake_access_token>';
-- Attach to your OneLake workspace and lakehouse
ATTACH '<your_workspace_name>/<your_lakehouse_name>.Lakehouse'
AS <your_connection_name>
(TYPE ONELAKE);
USE <your_connection_name>.<your_schema_name>; -- e.g., dbo
SHOW TABLES;
SELECT * FROM <your_table_name> LIMIT 10 ; -- USING ICEBERG;
About onelake
This extension enables DuckDB to connect to Microsoft Fabric OneLake workspaces and lakehouses, allowing users to query data stored in OneLake directly from DuckDB. It supports authentication via service principals or credential chains (e.g., Azure CLI) and provides seamless integration with OneLake's data storage capabilities. For detailed setup and usage instructions, visit the extension repository.
Current limitations:
- Only read access is supported; write operations are not implemented.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| copy_dir | table_macro | NULL | NULL | |
| delta_filter_pushdown_log | table_macro | NULL | NULL | |
| delta_filter_pushdown_log_tpcds | table_macro | NULL | NULL | |
| delta_list_files | table | NULL | NULL | |
| delta_scan | table | NULL | NULL | |
| get_delta_test_expression | scalar | NULL | NULL | |
| parse_delta_filter_logline | macro | NULL | NULL | |
| write_blob | scalar | NULL | NULL |
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| auto_fallback_to_full_download | Allows automatically falling back to full file downloads when possible. | BOOLEAN | GLOBAL | [] |
| ca_cert_file | Path to a custom certificate file for self-signed certificates. | VARCHAR | GLOBAL | [] |
| delta_kernel_logging | Forwards the internal logging of the Delta Kernel to the duckdb logger. Warning: this may impact performance even with DuckDB logging disabled. | BOOLEAN | GLOBAL | [] |
| delta_scan_explain_files_filtered | Adds the filtered files to the explain output. Warning: this may impact performance of delta scan during explain analyze queries. | BOOLEAN | GLOBAL | [] |
| enable_curl_server_cert_verification | Enable server side certificate verification for CURL backend. | BOOLEAN | GLOBAL | [] |
| enable_server_cert_verification | Enable server side certificate verification. | BOOLEAN | GLOBAL | [] |
| force_download | Forces upfront download of file | BOOLEAN | GLOBAL | [] |
| hf_max_per_page | Debug option to limit number of items returned in list requests | UBIGINT | GLOBAL | [] |
| http_keep_alive | Keep alive connections. Setting this to false can help when running into connection failures | BOOLEAN | GLOBAL | [] |
| http_retries | HTTP retries on I/O error | UBIGINT | GLOBAL | [] |
| http_retry_backoff | Backoff factor for exponentially increasing retry wait time | FLOAT | GLOBAL | [] |
| http_retry_wait_ms | Time between retries | UBIGINT | GLOBAL | [] |
| http_timeout | HTTP timeout read/write/connection/retry (in seconds) | UBIGINT | GLOBAL | [] |
| httpfs_client_implementation | Select which is the HTTPUtil implementation to be used | VARCHAR | GLOBAL | [] |
| onelake_env_fabric_token_variable | Environment variable name that stores the Fabric API access token | VARCHAR | GLOBAL | [] |
| onelake_env_storage_token_variable | Environment variable name that stores the OneLake storage access token | VARCHAR | GLOBAL | [] |
| s3_access_key_id | S3 Access Key ID | VARCHAR | GLOBAL | [] |
| s3_endpoint | S3 Endpoint | VARCHAR | GLOBAL | [] |
| s3_kms_key_id | S3 KMS Key ID | VARCHAR | GLOBAL | [] |
| s3_region | S3 Region | VARCHAR | GLOBAL | [] |
| s3_requester_pays | S3 use requester pays mode | BOOLEAN | GLOBAL | [] |
| s3_secret_access_key | S3 Access Key | VARCHAR | GLOBAL | [] |
| s3_session_token | S3 Session Token | VARCHAR | GLOBAL | [] |
| s3_uploader_max_filesize | S3 Uploader max filesize (between 50GB and 5TB) | VARCHAR | GLOBAL | [] |
| s3_uploader_max_parts_per_file | S3 Uploader max parts per file (between 1 and 10000) | UBIGINT | GLOBAL | [] |
| s3_uploader_thread_limit | S3 Uploader global thread limit | UBIGINT | GLOBAL | [] |
| s3_url_compatibility_mode | Disable Globs and Query Parameters on S3 URLs | BOOLEAN | GLOBAL | [] |
| s3_url_style | S3 URL style | VARCHAR | GLOBAL | [] |
| s3_use_ssl | S3 use SSL | BOOLEAN | GLOBAL | [] |
| unsafe_disable_etag_checks | Disable checks on ETag consistency | BOOLEAN | GLOBAL | [] |