Outputting a data table in Markdown format using Python Pandas

 

Outputting a data table in Markdown format using Python Pandas

Pandas is a powerful data manipulation library for Python. One of its many features is to output to Markdown text format, using the tabulate library.

In this post, we will look at processing some hardware stats data.

The JSON data looks like this:

{ "data": [ { "ram_size_gb": 4, "processor_count": 8 }, { "ram_size_gb": 8, "processor_count": 6 }, { "ram_size_gb": 12, "processor_count": 16 } ] }


First, we install the dependencies from a terminal:

python3 -m pip install pandas tabulate

Next, inside a new Python file, we import the necessary libraries:


import pandas as pd

Next, we use Pandas to read in the JSON data:

jsonFs1 = "../../data/hardware-stats.json" data1 = pd.read_json(jsonFs1)

Then we normalize the data structure:

normalized_data = pd.json_normalize(data1['data'])

Counting and aggregating

Now we count the categories and then aggregate the count values:

column = 'ram_size_gb'

df_grouped = df.groupby([column])[column].count().reset_index(name='count').sort_values('count', ascending = False)
df_grouped = df_grouped.set_index(column)

Next we calculate a percentages column:

df_grouped['percent'] = (df_grouped[column] / df_grouped[column].sum()) * 100

The data is now ready to render.

Rendering the markdown

We use the to_markdown() extension of Pandas that is provided via tabulate library:

markdown_text = df_grouped.set_index(column).to_markdown()

Finally, we save the markdown text to a new file:

with open(md_filepath, 'w') as f:
f.write('# ' + title)
f.write(os.linesep)
f.write(os.linesep)
f.write(markdown_text)


The resulting markdown is something like this:

# RAM (Gb)

| ram_size_gb                     | percent |
|----------------------------------------:|----------:|
|                                      4  |  23.1707  |
|                                      8  |  20.7317  |
|                                      12 |  19.5122  |

Complete Example

For a full code example, see this Python script in my Athena-cli github project.

Further Reading

Pandas is a powerful data manipulation and analysis library for Python.
tabulate is a pretty-printing library for Python, to print out, well, tabular tables!


Comments