How to create a summary sheet in Excel

Nishan PradhanFollowMar 26, 2021·10 min readSavePython + ExcelAutomate Your Excel Report By Creating A Summary Sheet Using PythonHow to harmonize Python and Excel to summarize your

How to create a summary sheet in Excel
Nishan Pradhan

Nishan PradhanFollow

Mar 26, 2021·10 min read

Save

Python + Excel

Automate Your Excel Report By Creating A Summary Sheet Using Python

How to harmonize Python and Excel to summarize your data with a presentable and professional formatting. A way to improve your Excel reporting.

Photo by Luca Bravo on Unsplash

Introduction

Creating a unique and presentable summary page is the final step to any Excel based reporting work. Without this, you are left with sheets of numbers and formulas with no easy to draw conclusions from your underlying data. By creating a summary sheet, viewers of your workbook will instantly be able to understand the data in an aesthetically pleasing format.

Previously in this Python + Excel series, you learnt how to format an Excel spreadsheet using Python. This time we will look at how to use the formatting code to make a summary sheet/dashboard within Excel.

In my last article I wrote some reasoning into why you would want to use Excel with python, how python with pandas and xlwings could greatly improve your Excel heavy workflow, and included a short tutorial with examples to get you started on stylizing your Excel spreadsheet.

You can find all this information in the link below; it might be a good starting point as the same topics will not be covered in as much detail in this article.

Stylize and Automate Your Excel Files with Python

How to utilize Python to create a stylized Excel report using xlwings, a way for beginners to get started with Python.

towardsdatascience.com

My goal for this article is to help you summarise your data while using the Excel/python formatting skills learnt in my last article. We will be pivoting, grouping and sorting data. Followed by charting, stylizing and adding a logo to a summary page. These skills combined will enable you to summarize and format your data in one simple python script. Hopefully this article is as helpful as the last!

Summarizing Data

If your workflow is Excel heavy, I am sure you have workbooks stretching over multiple tabs, many sheets of PivotTables and even more formulas cross referencing every sheet. This is great  if your goal is to confuse everyone who tries to understand your spreadsheets.

A benefit of using Python to summarize your data is the ability to store your PivotTables and other data, as a DataFrame within a single variable. Referencing in this way is easier to debug rather than attempting to trace numbers through multiple tabs in a spreadsheet.

pandas this module is essential to successfully summarising your data.

According to the pandas site:

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool

In short, pandas contains functions which will do all the data analysis you usually do in Excel. Here are a few of the functions you will find interesting coming from an Excel-based background and documentation to each are hyperlinked if you would like more information:

  • value_counts()
  • drop_duplicates()
  • groupby()
  • describe()
  • pivot_table() (every Excel users favourite)
  • plot()
  • dtypes
  • loc
  • iloc

These are a few examples from the multitude of pandas functions available to manipulate or quickly summarise your data.

Steps to creating your summary sheet

As previously mentioned, the code to format your spreadsheet in this tutorial has been explained in detail in my previous post Stylize and Automate your Excel Files with Python. This tutorial will build upon the previous with a few new functions to create a quick, presentable and comprehensive summary sheet.

I have created a test dataset of fruit and veg sales in csv format to use for both this and the previous tutorial.

Data can be downloaded from Github: link

  1. The first step is to import the modules we will be using. Almost all of the tutorial will be done using pandas and xlwings.import pandas as pd
    import numpy as np
    import xlwings as xw
    import matplotlib.pyplot as plt

2. Import the csv data into a DataFrame using Pandas. There are two ways of doing this, directly reading from Github or downloading to your local drive and referencing. Both are mentioned in the full script. The snippet below will read directly from Github.df = pd.read_csv(rhttps://raw.githubusercontent.com/Nishan-Pradhan/xlwings_dashboard/master/fruit_and_veg_sales.csv")

3. The following steps initialize an Excel Workbook, rename Sheet1 and copy our DataFrame to Excel.wb = xw.Book()
sht = wb.sheets["Sheet1"]
sht.name = "fruit_and_veg_sales"
sht.range("A1").options(index=False).value = df

4. Following on from this, we want to create a new sheet named Dashboard and reference the sheet.wb.sheets.add('Dashboard')
sht_dashboard = wb.sheets('Dashboard')

We now have an Excel Workbook which contains two sheets. fruit_and_veg_sales has our data and Dashboard is blank.

5. We will now start using pandas to manipulate our data and produce summaries of our data to include on our blank sheet. The first step to take is to check the names of the columns we have in our dataset. A quick shortcut to see all of our column names in Python is to run the following line: print(df.columns)

df.columns | Image by Author

From here we can see that many of the column names are awkward containing spaces, brackets, US Dollar symbols, capital letters and brackets.

To prevent us from making trivial errors and to save a great deal of time debugging, it would be wise to copy and paste these column names when referring to them with pandas. (renaming the columns might be preferable in a larger scale project)

6. The first summary we will create for our Dashboard sheet will be a pivot of our data showing the Total Profit per Item sold. To do this we will take advantage of pandas pd.pivot_table() function.pv_total_profit = pd.pivot_table(df, index='Item', values='Total Profit ($)', aggfunc='sum')

Here we create a new DataFrame called pv_total_profit. This DataFrame has an index containing one of each value in our Item column. The values shown are from the Total Profit ($) column in our data and the final input into our function we specified was aggfunc='sum', this tells Pandas that we would like to aggregate our data by summing our valuescolumn, Total Profit ($) in this case.

pv_total_profit DataFrame Pivot | Image by Author

7. We now do the same step again, only this time we want to pivot our data to show us Quantity of each Item sold.pv_quantity_sold = pd.pivot_table(df,index='Item',values='Quantity Sold',aggfunc='sum')

8. You could make an entire dashboard or report of simple pivots and that will look great, however, to make this tutorial more interesting we will use a grouping function: df.groupby() will group your data however you specify.

Since our data pertains to sales figures over the past year, it might be useful to see our data grouped by months rather than days. df.groupby()will achieve this, but first we need to make sure our Date Sold column is actually being read by pandas as a date.

To check this run print(df.dtypes)

You should be able to see from this, our Date Sold column is being read as an object and not in datetime format. This means we will not be able to group our DataFrame easily by month yet.

We can change the data type of this column to datetime like so:df[Date Sold] = pd.to_datetime(df[Date Sold], format=%d/%m/%Y)

Here, pd.to_datetime() formats our column and we specify the format our raw data is in to ensure it is correctly converted from an object to datetime. If you run df.dtypes once again, you will now see that Date Sold is in datetime64[ns] format, this is what we need for our grouping to work.

Data Types: Before (left), After Conversion (Right) | Image by Author

9. With our data now in the correct format, we can use the following line of code to sum and aggregate our data, as well as display the relevant columns.gb_date_sold = df.groupby(df["Date Sold"].dt.to_period('m')).sum()[["Quantity Sold",'Total Revenue ($)',  'Total Cost ($)',"Total Profit ($)"]]

Our data grouped by month now looks like this:

gb_date_sold | Image by Author

10. One final groupby will give us a fourth dataset to use for our dashboard.gb_top_revenue = (df.groupby(df["Date Sold"]).sum().sort_values('Total Revenue ($)',ascending=False).head(8))[["Quantity Sold",'Total Revenue ($)','Total Cost ($)',"Total Profit ($)"]]

This groupby is showing us our Top 8 days by Total Revenue. We get this by sorting our DataFrame by Total Revenue in descending order (Highest revenue at the top), then we use head(8) to give us the top 8 lines of our sorted data.

11. Now that we have our 4 summaries of data, we can move on to the creation of our report. First we start off with some static formatting.# Background
sht_dashboard.range('A1:Z1000').color = (198,224,180)# A:B column width
sht_dashboard.range('A:B').column_width = 2.22# Title
sht_dashboard.range('B2').value = 'Sales Dashboard'
sht_dashboard.range('B2').api.Font.Name = 'Arial'
sht_dashboard.range('B2').api.Font.Size = 48
sht_dashboard.range('B2').api.Font.Bold = True
sht_dashboard.range('B2').api.Font.Color = 0x000000
sht_dashboard.range('B2').row_height = 61.2# Underline Title
sht_dashboard.range('B2:W2').api.Borders(9).Weight = 4
sht_dashboard.range('B2:W2').api.Borders(9).Color = 0x00B050# Subtitle
sht_dashboard.range('M2').value = 'Total Profit Per Item Chart'
sht_dashboard.range('M2').api.Font.Name = 'Arial'
sht_dashboard.range('M2').api.Font.Size = 20
sht_dashboard.range('M2').api.Font.Bold = True
sht_dashboard.range('M2').api.Font.Color = 0x000000# Line dividing Title and Subtitle
sht_dashboard.range('L2').api.Borders(7).Weight = 3
sht_dashboard.range('L2').api.Borders(7).Color = 0x00B050
sht_dashboard.range('L2').api.Borders(7).LineStyle = -4115

The subheadings in the code snippet above should explain what each part of the code is doing.

12. I have created the following function to manually create a specific table format. It is quite long so I will break down what it is doing in this step.

The function above takes 4 inputs, header_cell, title, df_summary, color.

  • header_cell is a String referring to the top right cell where you would like to place your DataFrame summary (e.g. B5).
  • title is a String of what you would like to title your summary (e.g. Top 8 Days of Revenue).
  • df_summary is the Pandas DataFrame you wish to format and place on your Excel Dashboard page.
  • color is a String referring to the predefined color in the function (e.g. blue).

In the function, we first define a dictionary of colors.colors = {purple:[(112,48,160),(161,98,208)],
blue:[(0,112,192),(155,194,230)],
green:[(0,176,80),(169,208,142)],
yellow:[(255,192,0),(255,217,102)]}

Here we have named 4 colors, purple, blue, green and yellow. For each color, there are two shades, a darker shade and a lighter shade both in RGB format within tuples. Any colors can be added here if you would like to expand on this range!

The rest of the formatting is dynamically referencing the header_cell we have specified in the input of the function and automatically formatting the remainder of the table for you.

13. Next we call the function 4 times, once for each of our DataFrame summaries we created.create_formatted_summary('B5','Total Profit per Item', pv_total_profit, 'green')create_formatted_summary('B17','Total Iteams Sold', pv_quantity_sold, 'purple')create_formatted_summary('F17','Sales by Month', gb_date_sold, 'blue')create_formatted_summary('F5','Top 5 Days by Revenue ', gb_top_revenue, 'yellow')

14. Finally, we make a chart using Matplotlib and pandas .plot() function (which calls on Matplotlib)# Makes a chart using Matplotlib
fig, ax = plt.subplots(figsize=(6,3))
pv_total_profit.plot(color='g',kind='bar',ax=ax)# Add Chart to Dashboard Sheet
sht_dashboard.pictures.add(fig,name='ItemsChart',
   left=sht_dashboard.range("M5").left,
   top=sht_dashboard.range("M5").top,
   update = True)

The first section above creates a bar chart, with green bar colors.

Then sht_dashboard.pictures.add() xlwings function, allows us to place an image of this chart on our Excel Dashboard. We specify where we want to place it with the left and top arguments. name will also give our Image a name in Excel.

Bonus

As an extra to make your Excel Dashboard look more professional, we can even add a logo. The code below will download a small example logo in png format from the Github repository made for this tutorial, save it, add it to our Excel Dashboard and then resize it. Make sure you have specified a desired location to save the png image in the FOLDER_PATH variable. (At the top of the script on Github)import requestsFOLDER_PATH = r"path_to_save_folder" # r"C:\Users\Name\Downloads"image_url = rhttps://github.com/Nishan-Pradhan/xlwings_dashboard/blob/master/pie_logo.png?raw=truer = requests.get(image_url, stream = True)
image_path = rf"{FOLDER_PATH}\logo.png"# Saves image to image_path above
file = open(image_path, "wb")
file.write(r.content)
file.close()# Adds image to Excel Dashboardlogo = sht_dashboard.pictures.add(image=image_path,
   name='PC_3',
   left=sht_dashboard.range("J2").left,
   top=sht_dashboard.range("J2").top+5,
   update=True)# Resizes image
logo.width = 54
logo.height = 54

We have now created our dashboard summary for the dataset! Dont forget to save it by running:wb.save(rf{FOLDER_PATH}\fruit_and_veg_dashboard.xlsx)

GIF of Report Creation using Python and Excel

Formatted Summary Sheet using xlwings | GIF by Author

Conclusion

With not much code, we have created an aesthetically pleasing dashboard which quickly summarizes our data. The advantage of doing this in Python is the easy-to-read syntax, the relatively few lines of code and the use of a combination of third-party modules which can save us vast amounts of time in summarising our data.

Using Python to format your Excel reports can be an intermediary step between manually creating an Excel report (or using VBA) and completely replacing Excel with an alternative reporting software. Being able to interact with Excel at this level of granularity allows you to speed up your workflow, automate your reports and improve your Python coding skills, all while keeping a familiar output for end users.

This code should be enough to get you started; you can customize it in any way you want. Try changing the colors, borders, titles, background color, logo, almost anything mentioned here can be altered to match your desired style.

Bonus points to anyone who creates a function to arrange these formatted summaries programmatically!

Code used in this tutorial is available on Github here

If you are stuck, check out the xlwings docs here

If you have any further questions, please leave them in the comments section below.

Note: This article represents my personal views and experiences only.

One Useful Way to Track the End of Lockdown Using Python

Using Python and OpenTables State of the Industry publicly available data to track global emergence from lockdown and

towardsdatascience.com

Video liên quan