S3 file system support#
Trino includes a native implementation to access Amazon S3 and compatible storage systems with a catalog using the Delta Lake, Hive, Hudi, or Iceberg connectors. While Trino is designed to support S3-compatible storage systems, only AWS S3 and MinIO are tested for compatibility. For other storage systems, perform your own testing and consult your vendor for more information.
Enable the native implementation with fs.native-s3.enabled=true
in your
catalog properties file.
General configuration#
Use the following properties to configure general aspects of S3 file system support:
Property |
Description |
---|---|
|
Activate the native implementation for S3 storage support. Defaults to
|
|
Required endpoint URL for S3. |
|
Required region name for S3. |
|
Use path-style access for all requests to S3 |
|
S3 storage class to use while writing data. Defaults to STANDARD. |
|
Whether conditional write is supported by the S3-compatible storage. Defaults to |
|
Canned ACL
to use when uploading files to S3. Defaults to |
|
Set the type of S3 server-side encryption (SSE) to use. Defaults to |
|
The identifier of a key in KMS to use for SSE. |
|
The 256-bit, base64-encoded AES-256 encryption key to encrypt or decrypt
data from S3 when using the SSE-C mode for SSE with |
|
Part size for S3 streaming upload. Values between |
|
Switch to activate billing transfer cost to the requester. Defaults to
|
|
Maximum number of connections to S3. Defaults to |
|
Maximum time duration allowed to reuse connections in the connection pool before being replaced. |
|
Maximum time duration allowed for connections to remain idle in the connection pool before being closed. |
|
Maximum time duration allowed for socket connection requests to complete before timing out. |
|
Maximum time duration for socket read operations before timing out. |
|
Enable TCP keep alive on created connections. Defaults to |
|
URL of a HTTP proxy server to use for connecting to S3. |
|
Set to |
|
Proxy username to use if connecting through a proxy server. |
|
Proxy password to use if connecting through a proxy server. |
|
Hosts list to access without going through the proxy server. |
|
Whether to attempt to authenticate preemptively against proxy server
when using base authorization, defaults to |
|
Specifies how the AWS SDK attempts retries. Default value is |
|
Specifies maximum number of retries the client will make on errors.
Defaults to |
|
Set to |
|
Specify the application identifier appended to the |
Authentication#
Use the following properties to configure the authentication to S3 with access and secret keys, STS, or an IAM role:
Property |
Description |
---|---|
|
AWS access key to use for authentication. |
|
AWS secret key to use for authentication. |
|
The endpoint URL of the AWS Security Token Service to use for authenticating to S3. |
|
AWS region of the STS service. |
|
ARN of an IAM role to assume when connecting to S3. |
|
Role session name to use when connecting to S3. Defaults to
|
|
External ID for the IAM role trust policy when connecting to S3. |
Security mapping#
Trino supports flexible security mapping for S3, allowing for separate credentials or IAM roles for specific users or S3 locations. The IAM role for a specific query can be selected from a list of allowed roles by providing it as an extra credential.
Each security mapping entry may specify one or more match criteria. If multiple criteria are specified, all criteria must match. The following match criteria are available:
user
: Regular expression to match against username. Example:alice|bob
group
: Regular expression to match against any of the groups that the user belongs to. Example:finance|sales
prefix
: S3 URL prefix. You can specify an entire bucket or a path within a bucket. The URL must start withs3://
but also matches fors3a
ors3n
. Example:s3://bucket-name/abc/xyz/
The security mapping must provide one or more configuration settings:
accessKey
andsecretKey
: AWS access key and secret key. This overrides any globally configured credentials, such as access key or instance credentials.iamRole
: IAM role to use if no user provided role is specified as an extra credential. This overrides any globally configured IAM role. This role is allowed to be specified as an extra credential, although specifying it explicitly has no effect.roleSessionName
: Optional role session name to use withiamRole
. This can only be used wheniamRole
is specified. IfroleSessionName
includes the string${USER}
, then the${USER}
portion of the string is replaced with the current session’s username. IfroleSessionName
is not specified, it defaults totrino-session
.allowedIamRoles
: IAM roles that are allowed to be specified as an extra credential. This is useful because a particular AWS account may have permissions to use many roles, but a specific user should only be allowed to use a subset of those roles.kmsKeyId
: ID of KMS-managed key to be used for client-side encryption.allowedKmsKeyIds
: KMS-managed key IDs that are allowed to be specified as an extra credential. If list cotains*
, then any key can be specified via extra credential.sseCustomerKey
: The customer provided key (SSE-C) for server-side encryption.allowedSseCustomerKey
: The SSE-C keys that are allowed to be specified as an extra credential. If list cotains*
, then any key can be specified via extra credential.endpoint
: The S3 storage endpoint server. This optional property can be used to override S3 endpoints on a per-bucket basis.region
: The S3 region to connect to. This optional property can be used to override S3 regions on a per-bucket basis.
The security mapping entries are processed in the order listed in the JSON configuration.
Therefore, specific mappings must be specified before less specific mappings.
For example, the mapping list might have URL prefix s3://abc/xyz/
followed by
s3://abc/
to allow different configuration for a specific path within a bucket
than for other paths within the bucket. You can specify the default configuration
by not including any match criteria for the last entry in the list.
In addition to the preceding rules, the default mapping can contain the optional
useClusterDefault
boolean property set to true
to use the default S3 configuration.
It cannot be used with any other configuration settings.
If no mapping entry matches and no default is configured, access is denied.
The configuration JSON is read from a file via s3.security-mapping.config-file
or from an HTTP endpoint via s3.security-mapping.config-uri
.
Example JSON configuration:
{
"mappings": [
{
"prefix": "s3://bucket-name/abc/",
"iamRole": "arn:aws:iam::123456789101:role/test_path"
},
{
"user": "bob|charlie",
"iamRole": "arn:aws:iam::123456789101:role/test_default",
"allowedIamRoles": [
"arn:aws:iam::123456789101:role/test1",
"arn:aws:iam::123456789101:role/test2",
"arn:aws:iam::123456789101:role/test3"
]
},
{
"prefix": "s3://special-bucket/",
"accessKey": "AKIAxxxaccess",
"secretKey": "iXbXxxxsecret"
},
{
"prefix": "s3://regional-bucket/",
"iamRole": "arn:aws:iam::123456789101:role/regional-user",
"endpoint": "https://bucket.vpce-1a2b3c4d-5e6f.s3.us-east-1.vpce.amazonaws.com",
"region": "us-east-1"
},
{
"prefix": "s3://encrypted-bucket/",
"kmsKeyId": "kmsKey_10"
},
{
"user": "test.*",
"iamRole": "arn:aws:iam::123456789101:role/test_users"
},
{
"group": "finance",
"iamRole": "arn:aws:iam::123456789101:role/finance_users"
},
{
"iamRole": "arn:aws:iam::123456789101:role/default"
}
]
}
Property name |
Description |
---|---|
|
Activate the security mapping feature. Defaults to |
|
Path to the JSON configuration file containing security mappings. |
|
HTTP endpoint URI containing security mappings. |
|
A JSON pointer (RFC 6901) to mappings inside the JSON retrieved from the configuration file or HTTP endpoint. The default is the root of the document. |
|
The name of the extra credential used to provide the IAM role. |
|
The name of the extra credential used to provide the KMS-managed key ID. |
|
The name of the extra credential used to provide the server-side encryption with customer-provided keys (SSE-C). |
|
How often to refresh the security mapping configuration, specified as a duration. By default, the configuration is not refreshed. |
|
The character or characters to be used instead of a colon character when specifying an IAM role name as an extra credential. Any instances of this replacement value in the extra credential value are converted to a colon. Choose a value not used in any of your IAM ARNs. |
Migration from legacy S3 file system#
Trino includes legacy Amazon S3 support to use with a catalog using the Delta Lake, Hive, Hudi, or Iceberg connectors. Upgrading existing deployments to the current native implementation is recommended. Legacy support is deprecated and will be removed.
To migrate a catalog to use the native file system implementation for S3, make the following edits to your catalog configuration:
Add the
fs.native-s3.enabled=true
catalog configuration property.Refer to the following table to rename your existing legacy catalog configuration properties to the corresponding native configuration properties. Supported configuration values are identical unless otherwise noted.
Legacy property |
Native property |
Notes |
---|---|---|
|
|
|
|
|
|
|
|
Also see |
|
|
|
|
|
Add the |
|
|
|
|
None |
|
|
|
|
|
|
|
|
|
See preceding sections for supported values. |
|
|
|
|
|
Specify the host and port in one URL, for example |
|
|
Set to |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Also see |
|
|
|
|
|
Also see |
|
|
Also see |
|
|
|
|
|
Remove the following legacy configuration properties if they exist in your catalog configuration:
hive.s3.storage-class
hive.s3.signer-type
hive.s3.signer-class
hive.s3.staging-directory
hive.s3.pin-client-to-current-region
hive.s3.ssl.enabled
hive.s3.sse.enabled
hive.s3.kms-key-id
hive.s3.encryption-materials-provider
hive.s3.streaming.enabled
hive.s3.max-client-retries
hive.s3.max-backoff-time
hive.s3.max-retry-time
hive.s3.multipart.min-file-size
hive.s3.multipart.min-part-size
hive.s3-file-system-type
hive.s3.user-agent-prefix