Converting a Python array to a pandas dataframe [multiplying data]

Sometimes a Python array is simpler to generate or manipulate - but then we need to convert it to a Pandas dataframe for further processing:


import numpy as np
import pandas as pd
def convert_array_of_rows_to_dataframe(array_of_rows, columns):
np_array = np.array(array_of_rows)
dfOut = pd.DataFrame.from_records(np_array)
dfOut.columns = columns
return dfOut

Example call:
my_array = [ ['This is great', 'positive'],
['This is terrible', 'negative']
]
df = convert_array_of_rows_to_dataframe(my_array, ['text', 'label'])

A more complicated example is using Python code to 'multiply' data from CSV, and then convert that to a dataframe:


dfIn = pd.read_csv("../data/nouns.csv")
multiplied_rows = []
dfIn.reset_index()

phrases = [
"You are a NOUN", "What a NOUN", "Don't be such a NOUN",
"This waiter is a NOUN.",
"She was a NOUN today.",
"The company here are NOUNs.",
"That director was a NOUN."
]

for _, row in dfIn.iterrows():
for template in phrases:
multiplied_rows.append([template.replace("NOUN", row['text'].lower())])

dfMultiplied = convert_array_of_rows_to_dataframe(offensive_multiplied_rows, ['text'])

Comments