Plotting a pie chart with 'other' slice using matplotlib and data prepared via Pandas

Plotting a pie chart with 'other' slice using matplotlib and data prepared via Pandas

In this post we will look at summarizing data with a pie chart that has an 'other' slice to capture the less frequent values.

We will first use the Python Pandas library to load data from a JSON file and prepare it for plotting.

Then we will use matplotlib to render the pie chart.

The data is in JSON format and looks like this:

{ "data": [ { "ram_size_gb": 4, "processor_count": 8 }, { "ram_size_gb": 8, "processor_count": 6 }, { "ram_size_gb": 12, "processor_count": 16 } ] }

Preparing the data

First, we install the dependencies from a terminal:

python3 -m pip install matplotlib pandas

Next, inside a new Python file, we import the libraries:

import matplotlib.pyplot as plt import pandas as pd

Next, we use Pandas to read in the JSON file and normalize the data for use:

data1 = pd.read_json("../data/file1.json") normalized_data = pd.json_normalize(data1['data'])

We use Pandas to aggregate the data, counting and then grouping by 'count':

df_grouped = df.groupby([column])[column].count().reset_index(name='count').sort_values('count', ascending = False)

We then use Pandas to take the top 5 categories, and separately all 'other' categories:

# Take the top 5 categories. All other categories are aggregated together into one 'other' group TOP_N = 5 df_top_n = df_grouped_filtered[:TOP_N].copy() df_others = pd.DataFrame(data = { column : ['Other'], 'count' : [df_grouped_filtered['count'][TOP_N:].sum()] df_others = df_others.set_index(column) })


We create a new dataframe, that concatenates the top 5 with the 'other' section:

df_top_n_and_others = pd.concat([df_top_n, df_others])

The data is now ready to plot.

Rendering the pie chart

We use matplotlib to render the pie chart:

df_top_n_and_others.plot(kind="pie", colormap='tab20b', legend=True, title=short_title, y='count', ax=ax, ylabel='')

Finally, we save the chart to disk:

fig.savefig('../charts/my-pie-1.png', bbox_inches="tight")

The end result looks like this:



Complete Example

For a complete working example, see this reusable Python script which is part of my Amazon Athena via CLI github project (Athena-CLI).

Further Reading

Pandas is a powerful data manipulation and analysis library for Python.
matplotlib plotting library for creating static, animated, and interactive visualizations in Python.


Comments