Hello! I'm working with a dataset that has a rather inconvenient format where one of the columns is basically a list of name-value pair dictionaries. I would like to expand that column such that each of the names is it's own column. So far, I've found a way to do it by manually extracting each of the values, but ideally, I would prefer a more general solution that is also efficient. Here's an example:
Maybe the solution is to use pd.json_normalize on the preferences column, pivot that, then append the various dataframes?
import pandas as pd
data = {'name': ['Alice', 'Bob', 'Clark'],
'preferences': [[{'name': 'fruit', 'value': 'apple'},
{'name': 'drink', 'value': 'lemonade'},
{'name': 'food', 'value': 'pizza'}],
[{'name': 'fruit', 'value': 'orange'},
{'name': 'drink', 'value': 'soda'},
{'name': 'food', 'value': 'soup'}],
[{'name': 'fruit', 'value': 'pear'},
{'name': 'drink', 'value': 'water'},
{'name': 'food', 'value': 'chicken'}]]}
df = pd.DataFrame(data)
# Extract values from 'preferences' column
df['fruit'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'fruit'][0])
df['drink'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'drink'][0])
df['food'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'food'][0])
# Drop the 'preferences' column
df = df.drop(columns=['preferences'])An additional complication is that not every column has the same name-value pairs. In that case, the method above fails (IndexError) without doing an additional check, which is even more inefficient.Maybe the solution is to use pd.json_normalize on the preferences column, pivot that, then append the various dataframes?
