Sometimes a Python array is simpler to generate or manipulate - but then we need to convert it to a Pandas dataframe for further processing:
import numpy as np
import pandas as pd
def convert_array_of_rows_to_dataframe(array_of_rows, columns):
np_array = np.array(array_of_rows)
dfOut = pd.DataFrame.from_records(np_array)
dfOut.columns = columns
return dfOut
Example call:
my_array = [ ['This is great', 'positive'],
['This is terrible', 'negative']
]
df = convert_array_of_rows_to_dataframe(my_array, ['text', 'label'])
A more complicated example is using Python code to 'multiply' data from CSV, and then convert that to a dataframe:
dfIn = pd.read_csv("../data/nouns.csv")
multiplied_rows = []
dfIn.reset_index()
phrases = [
"You are a NOUN", "What a NOUN", "Don't be such a NOUN",
"This waiter is a NOUN.",
"She was a NOUN today.",
"The company here are NOUNs.",
"That director was a NOUN."
]
for _, row in dfIn.iterrows():
for template in phrases:
multiplied_rows.append([template.replace("NOUN", row['text'].lower())])
dfMultiplied = convert_array_of_rows_to_dataframe(offensive_multiplied_rows, ['text'])
Comments
Post a Comment