luohao
1/3/2019 - 2:07 AM

Presto Credential Pass-through

Presto Credential Pass-through

Connector Credentials Implementation Notes

The goal is to build a generic interface to distribute connector credential to all connectors so that each connector can use the credential to authenticate with the data source. This is useful when Presto is running as a federation layer without superuser permission.

End-to-end solution

The end-to-end passing of connector credetnails can be devided into three parts:

a. client -> coordinator

Options:

  1. Use password authenticator.
  2. Add new headers to HTTP request

Under the hood, these two options work in almost exact same way: embed the credentials in the HTTP headers: password authenticator leverages the AUTHORIZATION header from com.google.common.net.HttpHeaders while the second option requires a custom X-Presto-Connector-Credential header and a custom HttpRequestSessionContext that parses the credentials from the header and put them into the identity field.

We can put connector credentials into a JSON doc. It stores a set of key value pairs that map a connector credential type to the credential string (e.g., {"hive.gs.key" : "DEADBEEF }).

b. coordinator -> worker

Options:

  1. Create a new type of Principal and store the credentials in the custom principal object.
  2. Extend the Identity object to have a optional filed for connector credentials

If we use password authenticator to pass credentials, we have to create a new type of Principal because the only way for authenticator to output the credentials (without major changes to function signature) is through Principal object.

As far as I know, Identity is the best place to store connector credentials. Connectors have access to this object, and it's passed around in task update request. Both options store the credentials in Idenetity: one through the Principal object, the other stores directly in Identity.

Some changes are required in Session and SessionRepresentation. I am not 100% sure if it's safe to pass credentials in SessionRepresentation but this seems the only viable way.

c. worker -> data source

The connector should be able to pick up the credentials they need from the Identity object and use the credential to authenticate/authorize with data source.

HiveConnector

The existing HdfsConfigurationUpdater is more of a static updater that doesn't update per session basis. We need to add ConfigurationUpdater in HiveHdfsConfiguration::getConfiguration method to update the configuration based on the HdfsContext.

Known Issues

  1. A few SQL features will fail (e.g., VIEW). But on the other hand, when Presto itself can't get superuser priviliage, no queries will run in Presto. Therefore, it may not be that bad an idean to give user an option to pass their own credential to Presto and execute their query with the credential. I consider this yet another way to implementing delegation.

  2. when use password authenticator to parse connector credentials, it kind of mess up the authentication module. The connector credentials doesn't cover the authentication functionality. A user still needs to authenticate with Presto coordinator even if the user provides correct connector credentials. AuthenticationFilter::doFilter will exit when one of the authenticator successfully authenticate the user (by creating the Principal object without throwing AuthenticationException). Therefore we won't be able to use krb5 with connector credential.

  3. Ideally we would like to have the credential categorized by connector id (like the connector session property), because each data source may require different access token even if they are the same type of connector (e.g., two different hdfs clusters may require two sets of delegation tokens).

  4. Ideally we would like to have a more dynamic, per-session HdfsConfigurationUpdater that updates the configuration object based on HdfsContext. This can be achived by having a new implementation of HdfsConfiguration or add updater to getConfiguration method.