Presto Credential Pass-through
The goal is to build a generic interface to distribute connector credential to all connectors so that each connector can use the credential to authenticate with the data source. This is useful when Presto is running as a federation layer without superuser permission.
The end-to-end passing of connector credetnails can be devided into three parts:
client
-> coordinator
Options:
Under the hood, these two options work in almost exact same way: embed the credentials in the HTTP headers: password authenticator leverages the AUTHORIZATION
header from com.google.common.net.HttpHeaders
while the second option requires a custom X-Presto-Connector-Credential
header and a custom HttpRequestSessionContext
that parses the credentials from the header and put them into the identity field.
We can put connector credentials into a JSON doc. It stores a set of key value pairs that map a connector credential type to the credential string (e.g., {"hive.gs.key" : "DEADBEEF }
).
coordinator
-> worker
Options:
Principal
and store the credentials in the custom principal object.Identity
object to have a optional filed for connector credentialsIf we use
password authenticator
to pass credentials, we have to create a new type ofPrincipal
because the only way for authenticator to output the credentials (without major changes to function signature) is throughPrincipal
object.
As far as I know, Identity
is the best place to store connector credentials. Connectors have access to this object, and it's passed around in task update request. Both options store the credentials in Idenetity
: one through the Principal
object, the other stores directly in Identity
.
Some changes are required in Session
and SessionRepresentation
. I am not 100% sure if it's safe to pass credentials in SessionRepresentation
but this seems the only viable way.
worker
-> data source
The connector should be able to pick up the credentials they need from the Identity
object and use the credential to authenticate/authorize with data source.
HiveConnector
The existing HdfsConfigurationUpdater
is more of a static updater that doesn't update per session basis. We need to add ConfigurationUpdater
in HiveHdfsConfiguration::getConfiguration
method to update the configuration based on the HdfsContext
.
A few SQL features will fail (e.g., VIEW
). But on the other hand, when Presto itself can't get superuser priviliage, no queries will run in Presto. Therefore, it may not be that bad an idean to give user an option to pass their own credential to Presto and execute their query with the credential. I consider this yet another way to implementing delegation.
when use password authenticator to parse connector credentials, it kind of mess up the authentication module. The connector credentials doesn't cover the authentication functionality. A user still needs to authenticate with Presto coordinator even if the user provides correct connector credentials. AuthenticationFilter::doFilter
will exit when one of the authenticator successfully authenticate the user (by creating the Principal
object without throwing AuthenticationException
). Therefore we won't be able to use krb5
with connector credential.
Ideally we would like to have the credential categorized by connector id (like the connector session property), because each data source may require different access token even if they are the same type of connector (e.g., two different hdfs clusters may require two sets of delegation tokens).
Ideally we would like to have a more dynamic, per-session HdfsConfigurationUpdater
that updates the configuration object based on HdfsContext
. This can be achived by having a new implementation of HdfsConfiguration
or add updater to getConfiguration
method.