MicroStrategy ONE
Starting in MicroStrategy 2021 Update 4, Hadoop Gateway is no longer supported.
Frequently Asked Questions
The table below helps calculate the recommended settings based on the number of working nodes, RAM and virtual cores in each node, and the number of executors to be allocated on each worker node.
Recommended Performance Parameters for YARN Client Mode | |||||
---|---|---|---|---|---|
ID |
Item |
Parameter |
Formula |
Value |
Description |
C1 |
Number of Node |
|
|
2 |
Available in your hardware |
C2 |
RAM per Node (GB) |
|
|
380 |
Available in your hardware |
C3 |
VCores per Node |
|
|
40 |
Available in your hardware |
C4 |
Total number of VCores |
|
C1 × C3 |
80 |
|
S1 |
Allocated executors |
|
S1 = S2 × C1 |
48 |
|
S2 |
Executors per Node |
spark.executor.cores |
|
6 |
Number of executors to be allocated on each worker node |
S3 |
Max memory per executor (GB) |
|
S3 = C2 / S2 |
63 |
|
H1 |
Overhead (GB) |
|
H1 = S3 × 0.07 |
4 |
Overhead memory used by the OS. It defaults to 0.07 × spark.executor.memory |
H2 |
Number of executor |
spark.executor.instance |
H2 = S1 - 1 |
47 |
Total number of executors created in the cluster. One node needs to occupy an executor for AM. |
H3 |
Memory per executor (GB) |
spark.executor.memory |
H3 = S3 - H1 |
59 |
Running executors with too much memory often results in excessive garbage collection delays. 64 GB is a rough guess at a good upper limit for a single executor. |
H4 |
Cores per executor |
|
H4 = (C3 / S2) - 1 |
6 |
Leave 1 core for system processes |
Yes, MicroStrategy Hadoop Gateway supports releasing cluster resources while the service is idle. You will have to configure the MicroStrategy Hadoop Gateway executors and cores as dynamically allocated to enable this behavior. Modify the MicroStrategy Hadoop Gateway configuration file <MicroStrategy Hadoop Gateway installation path>/conf/hgos-spark.properties, and uncomment the dynamic allocation section.
Yes, MicroStrategy Hadoop Gateway supports Live Connect Cube since MicroStrategy 10.9. There is no extra configuration required to enable it.
The minimum requirement for the MicroStrategy Hadoop Gateway is 256 MB of disk space and 2 GB of memory.
MicroStrategy Hadoop Gateway will not start any extra processes on a NameNode or a DataNode. MicroStrategy Hadoop Gateway just submits job to Spark.
DataNode memory usage depends on the number set by the customer to attribute file spark.executor.memory in the configuration file (by default, it is 1 GB). Meanwhile, NameNode memory usage will not be significantly affected.
When MicroStrategy Hadoop Gateway starts, some JAR files will be uploaded to HDFS to hdfs://HDFSNameNode:8020/user/${user_name_start_hgos}/.sparkStaging. By default, the files in sparkStaging will be deleted automatically once MicroStrategy Hadoop Gateway service is shut down.
Total size of JAR files will be no larger than 256 MB.
No, MicroStrategy Hadoop Gateway will not refresh Kerberos principal ticket automatically. You will have to refresh it by running the kinit command or creating a cron job to refresh the ticket on schedule.
A template for kinit_cron.sh:
function setup_kerberos() {
echo "klist:"
klist
echo "KRB5CCNAME env:"
export KRB5CCNAME="$HGOS_HOME/conf/krb5cc_hgos"
echo $KRB5CCNAME
echo "kinit"
kinit -kt $keytab_path $principal_name -l 1d5h -r 2d -f
echo "klist"
klist
}
setup_kerberos
To schedule a cron job, run:
root@HOST # crontab -l 0 */2 * * * <path to file>/kinit_cron.sh
Yes, MicroStrategy Hadoop Gateway supports HDFS ACL by Apache Sentry, and no extra configuration is required. See the video below for how MicroStrategy Hadoop Gateway works with Apache Sentry.
No. MicroStrategy Hadoop Gateway is built on Spark 1.6 and only eligible to be deployed in a Spark 1.6 environment. We are working to release MicroStrategy Hadoop Gateway on Spark 2.
Related Topics
Introduction to the MicroStrategy Hadoop Gateway
How to Deploy the MicroStrategy Hadoop Gateway
How to Start the MicroStrategy Hadoop Gateway