MicroStrategy ONE

Apply Aggregation and Filtering to Hadoop Data Imports

When importing data from a Hadoop Distributed File System (HDFS) you can apply different aggregation functions, as well as filters, to the data during import. This allows you to customize the amount of data being brought into memory.

Aggregation

The Aggregation option is available in the Data Preview dialog.

To apply an aggregation function to your data:

  1. Click Aggregation to open the Aggregation Dialog Box.
  2. Right-click on a field and select the desired function from the menu.
  3. Click Execute SQL to preview your data with the aggregation applied.

    The aggregation/function executes against top 100,000 rows of entire dataset, by default. Therefore, the precise of Sum, Max, Min, Average, and Count functions are impacted during preview (other functions are not affected). Aggregation results are re-calculated against the entire dataset while publishing the cube. The hgos.aggregation.preview.rows property value in /conf/hgos-spark.properties can be adjusted to specific number of rows, or set to a value of -1 for an unlimited number of rows.

  4. Click OK to save the new schema definitions with the aggregation/function applied.

    If you attempt to wrangle data after aggregation or functions have been applied, the system discards your changes.

Filtering

Hadoop Gateway also supports filtering of data imports. Filtering allows you to import limited data which matches a certain condition instead of entire dataset.

The Filter option is available in the context menu and at the top of the Aggregation Dialog box.

Supported Functions

Category Function Notes
Basic   The result may incorrect if value exceeds numeric type's minimum or maximum. for example, integer value rang is from -2147483647 to 2147483647, the result will be unexpected if the result exceeds the limitation.
Sum  
Average  
Minimum  
Maximum  
Count  
Greatest  
Least  
Datetime   If function is applied on datetime value, the time part will be truncated. for example, apply AddDays on 2017/5/12 18:23:12, the value will be truncated to 2017/5/12 00:00:00
AddDays  
CurrentDate  
CurrentDateTime  
CurrentTime  
Date  
DayOfMonth  
DayOfWeek  
DayOfYear  
DaysBetween  
Hour  
MilliSecond  
Minute  
Month  
MonthEndDate  
MonthBetween  
MonthStartDate  
Quarter  
Second  
Week  
Year  
YearEndDate  
YearStartDate  
Math    
Abs  
Acos When the input value is infinity, the return value will be "-1.#IND" to indicate invalid value, because acos is not convergent.
Asin When the input value is infinity, the return value will be "-1.#IND" to indicate invalid value, because asin is not convergent.
Asinh  
Atan2  
Atanh  
Ceiling  
Combine  
Cos When the input value is infinity, the return value will be "-1.#IND" to indicate invalid value, because cos is not convergent.
Cosh  
Degrees  
Exp  
Factorial If #0 > 20, then null will be returned. If #0 <= 20 corresponding integer value will be returned.
Floor  
Int  
Ln  
Log  
Log10  
Mod  
Multiply  
Power  
Quotient  
Radians  
Randbetween  
Round  
Round2  
Sin When the input value is infinity, the return value will be "-1.#IND" to indicate invalid value, because sin is not convergent.
Sinh  
Tan When the input value is infinity, the return value will be "-1.#IND" to indicate invalid value, because tan is not convergent.
Tanh  
Trunc  
Null/Zero    
NullTo/Zero  
ZeroToNull  
String    
Concat  
ConcatBank  
InitCap  
LeftStr  
Length  
Lower  
LTrim  
Position  
RightStr  
RTrim  
SubStr  
Trim  
Upper  

Supported Filters

Category Filter
Basic In List
Not In List
Exactly
Not Exactly
Comparison Greater than
Less Than
Greater than or equal to
Less than or equal to
Between
Not between
Pattern Like
Not like
Contains
Does not contain
Begin with
Does not begin with
End with
Does not end with