Plotting a stacked bar chart via Python Pandas and matplotlib

Plotting a stacked bar chart via Python Pandas and matplotlib

Pandas is a powerful data manipulation library for Python. Combined with a plotting library such as matplotlib, we can produce complex charts to clearly present summaries of data.

In this post, we will look at processing some time-based data of user clicks for particular versions of a software product.

The JSON data looks like this:

{
"data": [
{
"year_and_month": "2022-01",
"Clicks": 5,
"version": "1.07"
},
{
"year_and_month": "2022-01",
"Clicks": 7,
"version": "1.08"
},
{
"year_and_month": "2022-02",
"Clicks": 4,
"version": "1.07"
} ] }


First, we install the dependencies from a terminal:

python3 -m pip install matplotlib pandas

Next, inside a new Python file, we import the necessary libraries:


import matplotlib.pyplot as pl import pandas as pd

Next, we use Pandas to read in the JSON data:

jsonFs1 = "../../data/clicks-count-per-month.json" data1 = pd.read_json(jsonFs1)

Then we normalize the data structure:

normalized_data = pd.json_normalize(data1['data'])

Aggregation


Now we group by time (year_and_month column) and then version.
We aggregate clicks using the sum function.
Finally, we call unstack() so that we get a separate series of total clicks per month, for each version:

# === Clicks per month === ## aggregate and then unstack, so 1 series for each version df_unstacked = normalized_data.groupby(['year_and_month', 'version'])['clicks_int'].sum().unstack()

The data is now ready to plot.

Plotting the data


Using matplotlib, we arrange a single subplot with figure 'fig' and axis 'ax':

fig, ax = plt.subplots(figsize=(15,7))
ax.set_title("Clicks by Version")
ax.set_xlabel('Year and Month')
ax.set_ylabel('Clicks by version')

Now, we use an extension on the Pandas dataframe, to draw the chart:

df_unstacked.plot(ax=ax, colormap='tab20b', kind='bar', stacked=True)

Finally, we save the chart to a file:

fig.savefig(pngFs1,bbox_inches="tight")

The resulting chart is like this:



Complete Example

For a full code example, see this Python script in my Athena-cli github project.

Further Reading

Pandas is a powerful data manipulation and analysis library for Python.
matplotlib plotting library for creating static, animated, and interactive visualizations in Python.

Comments