How do you handle missing values in a dataset in tableau?

Marleen MeierFollowSep 1, 2021·8 min readMastering Tableau 2021: How to handle Null ValuesThis article is a modified excerpt of Mastering Tableau 2021, Chapter Two¹ and requires ba

How do you handle missing values in a dataset in tableau?
Marleen Meier

Marleen MeierFollow

Sep 1, 2021·8 min read

Mastering Tableau 2021: How to handle Null Values

This article is a modified excerpt of Mastering Tableau 2021, Chapter Two¹ and requires basic Tableau knowledge in order to follow along the exercises.

Exactly three months ago Packt published my second book, the third edition of the Mastering Tableau series (Kudos to David Baldwin for writing the first edition in 2016). In the book many different Tableau topics are being discussed like Quick Level of Detail expressions, the newest Tableau Server features, Einstein Discovery, and more. But one of earlier focus points is how to treat null values. Hence todays topic, lets dive right in!

Photo by Lounidesign

Content:

  1. Intro
  2. Finding null values
  3. Visualizing null values for multiple fields
  4. Extrapolating
  5. Final thoughts

Intro

Why did I choose, for my first Medium article ever, to cover null values? Because it is very hard to handle them! Every data scientist/data analyst recognizes these daunting thoughts: remove  replace  interpolate or replace  no extrapolate  no I will remove them or should I replace .

And there is no one right answer nor generic advise. YOU need to know how your decision will affect the dataset and if the purpose of your analysis will still be served after you made up your mind. But the least I can do is, show you one way of handling null values in Tableau.

The following section is meant to be a hands-on exercise. If you want to follow along, you can do so in Tableau Public, simply search for the chapter 2 starter workbook on my profile. Furthermore, will we use a dataset called Happiness report, it contains various measures for various countries around the world. The important fact however is, that some fields are not filled, a perfect dataset for a null value exercise.

Follow the steps to find the null values:

  • Navigate to https://public.tableau.com/profile/marleen.meier to locate and download the starter workbook associated with chapter 2 of Mastering Tableau 2021.
  • Navigate to the worksheet entitled Surveying & Exploring Data.
  • Drag Region and Country to the Rows shelf. Observe that in some cases the Region field has Null values for some countries:

Figure 1: Null regions

How many values are null values? Follow the steps to learn more:

  • Right-click and Edit the parameter entitled Select Field. Note that the Data Type is set to Integer and observe a list that contains an entry for each field name in the dataset:

Figure 2: Editing a parameter

  • Close the parameter window. In the Data pane, right-click on the Select Field parameter and select Show Parameter Control.
  • Create a calculated field entitled % Populated and write the following calculation:SUM([Number of Records]) / TOTAL(SUM([Number of Records]))
  • In the Data pane, right-click on % Populated and select Default Properties | Number Format:

Figure 3: Adjusting default properties

  • In the resulting dialog box, choose Percentage.
  • Create a calculated field entitled Null & Populated and write the following code that will help us distinguish the amount of null values per field, I will explain the code later on:CASE [Select Field]WHEN 1 THEN IF ISNULL ([Country]) THEN Null Values ELSE
    Populated Values ENDWHEN 2 THEN IF ISNULL ([Region]) THEN Null Values ELSE
    Populated Values ENDWHEN 3 THEN IF ISNULL ([Economy (GDP per Capita)]) THEN Null Values ELSE Populated Values ENDWHEN 4 THEN IF ISNULL ([Family]) THEN Null Values ELSE
    Populated Values ENDWHEN 5 THEN IF ISNULL ([Freedom]) THEN Null Values ELSE Populated Values ENDWHEN 6 THEN IF ISNULL ([Happiness Rank]) THEN Null Values ELSE Populated Values ENDWHEN 7 THEN IF ISNULL ([Happiness Score]) THEN Null Values ELSE Populated Values ENDWHEN 8 THEN IF ISNULL ([Health (Life Expectancy)]) THEN Null Values ELSE Populated Values ENDWHEN 9 THEN IF ISNULL ([Standard Error]) THEN Null Values ELSE Populated Values ENDEND

The calculation part is done, lets visualize the output:

  • Remove Region and Country from the Rows shelf.
  • Place Null & Populated on the Rows and Color shelves and % Populate on the Columns and Label shelves:

Figure 4: Populated values

  • Change the colors to red for Null Values and green for Populated Values if desired. You can do so by clicking on Color in the Marks card and Edit Colors.
  • Click on the arrow in the upper right corner of the Select Field parameter on your sheet and select Single Value List.
  • Change the field selection in the Select Field parameter and note that some fields have a high percentage of null values. For example, in the following diagram, 32.98% of records do not have a value for Region:

Figure 5: Comparing null and populated values

Depending on the amount of missing values, a different approach per field might be needed. We will zoom in on one of them: Extrapolation.

Extrapolating data

The next steps will expand on the previous exercise by cleaning existing data and populating some of the missing data from known information. We will assume that we know which country belongs to which region. Well use that knowledge to fix errors in the Region field and also to fill in the gaps using Tableau:

  • Starting from where the previous exercise ended, create a calculated field entitled Region Extrapolated with the following code block:CASE [Country]WHEN Afghanistan THEN Southern Asia
    WHEN Albania THEN Central and Eastern Europe
    WHEN Algeria THEN Middle East and Northern Africa
    WHEN Angola THEN Sub-Saharan Africa
    WHEN Argentina THEN Latin America and Caribbean
    WHEN Armenia THEN Central and Eastern Europe
    WHEN Australia THEN Australia and New Zealand
    WHEN Austria THEN Western Europe//complete the case statement with the remaining fields in the data setEND

To speed up the tedious creation of a long calculated field, you could download the data to an Excel file and create the calculated field by concatenating the separate parts, as shown here:

Figure 6: Compiling a calculation in Excel

You can then copy them from Excel into Tableau. However, for this exercise, I have created a backup field called Backup, which can be found in the Tableau Starter Workbook associated with this chapter, which contains the full calculation needed for the Region Extrapolated field. Use this at your convenience. The Solutions dashboard also contains all of the countries. You can therefore copy the Region Extrapolated field from that file too.

  • Add a Region Extrapolated option (Value = 10) to the Select Field parameter:

Figure 7: Adding Region Extrapolated to parameter

  • Add the following code to the bottom of the Null & Populated calculated field://beginning of calculated field WHEN 10 THEN IF ISNULL ([Region Extrapolated]) THEN Null Values
    ELSE Populated Values
    END
  • Note that the Region Extrapolated field is now fully populated:

Figure 8: Fully populated Region Extrapolated field

Now lets consider some of the specifics from the previous exercises. Lets look at the following code block:

Note that the complete case statement is several lines long. The following is a representative portion.CASE [% Populated]WHEN 1 THEN IF ISNULL ([Country]) THEN Null Values
ELSE Populated Values //...END

This case statement is a row-level calculation that considers each field in the dataset and determines which rows are populated and which are not. For example, in the representative line of the preceding code, every row of the Country field is evaluated for nulls. The reason for this is that a calculated field will add a new column to the existing data  only in Tableau, not in the data source itself  and every row will get a value. These values can be N/A or null values. The following code is the equivalent of the quick table calculation Percent of Total:SUM([Number of Records]) / TOTAL(SUM([Number of Records]))

In conjunction with the Null & Populated calculated field, it allows us to see what percentage of our fields are actually populated with values.

Its a good idea to get into the habit of writing table calculations from scratch, even if an equivalent quick table calculation is available. This will help you more clearly understand the table calculations.

The following case statement is an example of how you might use one or more fields to extrapolate what another field should be:CASE [Country]WHEN Afghanistan THEN Southern Asia//...END

For example, the Region field in the dataset had a large percentage of null values, and even the existing data had errors. Based on our knowledge of the business (that is, which country belongs to which region) we were able to use the Country field to achieve 100% population of the dataset with accurate information.

Final thoughts

Nulls are a part of almost every extensive real dataset. Understanding how many nulls are present in each field can be vital to ensuring that you provide accurate business intelligence. It may be acceptable to tolerate some null values when the final results will not be substantially impacted, but too many nulls may invalidate results. However, as demonstrated here, in some cases one or more fields can be used to extrapolate the values that should be entered into an underpopulated or erroneously populated field.

As demonstrated in this section, Tableau gives you the ability to effectively communicate to your data team which values are missing, which are erroneous, and how possible workarounds can be invaluable to the overall data mining effort.

I hope you enjoyed this article on null values. I am planning to publish more hands-on exercises in the future. But please let me know in the comment section if any particular topics are especially of interest for you! Have a great rest of your day!

Sources

[1] M. Meier and D. Baldwin: Mastering Tableau 2021 (May 2021), Packt Publishing, ISBN: 9781800561649

Video liên quan