Strategy ONE
Data Preparation
Use data wrangling to manage and manipulate your data efficiently. Use sample data (that is, a subset of your actual data) to build a script of functions that clean, transform, and prepare your data.
Access Data Wrangling
-
On the Model Data page, click Data Prep in the toolbar.
Data Wrangle Interface
Use the following options to manage and manipulate your data.
Data Preview Pane
Use the columns in the preview pane for the following:
-
To Rename, Delete, Duplicate, or Prepare Data, click Options
in a column header.
The preview data represents the result of the wrangle functions on the sample data, not the entire dataset. The result on the full data could be different.
-
To change the data type, click the type in the column and choose an option from the data type list.
The Data Preview pane displays a preview of the data for the selected table. As you wrangle your data, the data updates.
Function Preview
See the preview data for the selected table with the applied data operations. When you preview a Data Prep operation, you can see upcoming changes on the grid.
-
Select a target column and function.
-
Click Preview.
-
To view the original data, click Exit preview.
-
To apply your changes, click Apply.
Prep History Panel
The Prep History panel records all the wrangling functions you applied to the target table. This creates the data wrangling script.
-
To export the script, click More Options
and Export to JSON. The applied functions are saved in order as a json script. You can import and the export file to apply it to another dataset.
-
To import the script, click More Options
and Import to JSON
Data Prep Panel
-
Target Columns: Select the columns to target for the wrangle function. You can select multiple columns or a single column.
-
Functions: Click the function to apply to the target columns. The available functions are:
-
Pivot: When pivoting columns to rows, Mosaic Studio will consolidate selected column names to values in a new column.
When pivoting rows to columns, choose a target column, in which all unique values will be new columns. You can also specify the values and a consolidate function for them.
-
Split: Split a column to multiple columns or a cell to multiple rows.
-
Merge: Merge multiple columns of data into a single column. Use the wrangle panel to specify which columns to merge and the order of columns merged.
-
Extract: Move specific parts of data from the selected column and display it in a new column.
-
Remove: Remove rows, cells, or specific characters in cells based on the specified criteria.
-
Transform: Edit text and clean data using one of the following options:
-
Normalize capitalization - To Title Case: Change the text to title case, where each word begins with an uppercase letter.
-
Normalize capitalization - To Uppercase: Change the text to uppercase.
-
Normalize capitalization - To Lowercase: Change the text to lowercase.
-
Convert data type to number: Change the column data type to a number format. Choose a number format from the data type drop-down list.
-
Convert data type to string: Change the column data type to a string format.
-
Convert data type to date and time: Change the column data type to a date and time format. Choose a format from the data type drop-down list.
For more information on Custom format values, see Values and Examples.
-
Find and Replace where Cell: Search for a content of a cell and replace it with the text or value you enter.
-
Find and Replace Characters in a Cell: Search for a character in a cell and replace it with the character you enter.
-
Trim Leading and Trailing Whitespaces: Trim leading and trailing whitespaces from a cell.
-
Collapse Consecutive Whitespace: Remove blocks of two or more consecutive whitespaces from a cell.
-
Fill Down: Replace blank cells with the value in the preceding cell. If a cell contains spaces, you must remove the spaces to make it blank.
-
Blank Down: Blank out cells that contain repeated data.
-
Pad Before: Add the characters you specify to the beginning of a cell.
-
Pad After: Add the characters you specify to the end of a cell.
-
Unescape HTML: Replace the codes in HTML-encoded data with the correct characters, so the text is more readable.
-
-
Cluster: Uses algorithms to identify and merge similar values. This helps standardize similar values and leads to more accurate data:
-
In the Algorithm drop-down list, choose the algorithm used to group data:
-
Fingerprint: This method is used to identify unique values for data to determine if two items are identical.
-
n-Gram fingerprint: This method is used for text processing. It breaks down text into sequence of items such as characters, words, and more.
-
Phonetic: This method is used to match words that sound similar but may be spelled differently.
-
-
To replace all cluster values with a new value, select the checkbox of the corresponding cluster and type a replacement value in New cell Value.
-
To replace the values of the selected clusters, click Merge.
-
To regenerate recluster options after Merge, click Recluster.
-
-
-
Suggestions: If a target column is not selected, suggestions display here. The suggestions are based on the preview data.