MicroStrategy ONE

Starting in MicroStrategy 2021 Update 4, Hadoop Gateway is no longer supported.

Enable Hadoop Gateway to Support Namenode High Availability

Hadoop Gateway is able to support Hadoop Namenode High Availability from MicroStrategy 10.10 onward. It is no longer required to browse files with Hadoop WebHDFS. The HDFS catalog information is retrieved via Hadoop Gateway instead of WebHDFS.

Browse HDFS via Hadoop Gateway

Contact your Hadoop administrator to obtain the nameservice for Hadoop Namenode High Availability, the Hadoop Namenode IP address, and HDFS port number before starting the steps below.

  1. From the Connect to Hadoop dialog, click Change Connection.

  2. In the Data Source dialog, select the Edit Connection String checkbox.

  3. For a single Namenode:

    Ensure that the Hadoop Namenode IP/Host and HDFS port are configured correctly in the connection string as shown in the example above.

    For Hadoop High Namenode High Availability:

    Add the attribute hadoopNameService and nameservice tag to the end of the connection string.

    For example, if the High Availability tag is nameservice1 the connection string should appear as follows:

    hadoopName=10.242.109.2;hdfsPort=8020;

    BDEIP=10.242.109.10;BDEPORT=10109;hadoopNameService=nameservice1;

  4. Click OK.

Security

Hadoop Gateway exposes HTTP protocol interface for HDFS browsing as of MicroStrategy ONE and HTTPS is not supported.

Browse HDFS via Hadoop Gateway can support Kerberos authenticated cluster. A separate Kerberos principal name for Intelligence Server is no longer required. Once Hadoop Gateway is configured in a Kerberos authenticated cluster and launched successfully, the Intelligence Server is able to browse HDFS via Hadoop Gateway automatically. All access control will be handled by Hadoop Gateway.

Hadoop Gateway must be deployed on a proxy node in a cluster that uses Kerberos authentication. A secured cluster cannot be browsed if Hadoop Gateway is deployed outside of the cluster.

Troubleshooting

  • HDFS browsing is performed by Hadoop Gateway instead of Intelligence Server. Ensure that Hadoop Gateway is launched before browsing.
  • Hadoop Gateway exposes an HTTP RESTful API on port 4020 for HDFS browsing, so the port must be open on the machine which deploys Hadoop Gateway.
  • If both hadoopName and hadoopNameService are provided in the connection string. Hadoop Gateway will use hadoopNameService value by default to access the cluster.
  • All attribute names and values are case sensitive in connection string.