How to most effectively unpack list of name-value pair dictionaries in a dataframe?

zlim · (This post was last modified: Nov-07-2023, 10:45 PM by zlim.)

Hello! I'm working with a dataset that has a rather inconvenient format where one of the columns is basically a list of name-value pair dictionaries. I would like to expand that column such that each of the names is it's own column. So far, I've found a way to do it by manually extracting each of the values, but ideally, I would prefer a more general solution that is also efficient. Here's an example:

import pandas as pd

data = {'name': ['Alice', 'Bob', 'Clark'],
        'preferences': [[{'name': 'fruit', 'value': 'apple'}, 
                         {'name': 'drink', 'value': 'lemonade'},
                         {'name': 'food', 'value': 'pizza'}],
                        [{'name': 'fruit', 'value': 'orange'}, 
                         {'name': 'drink', 'value': 'soda'},
                         {'name': 'food', 'value': 'soup'}],
                        [{'name': 'fruit', 'value': 'pear'}, 
                         {'name': 'drink', 'value': 'water'},
                         {'name': 'food', 'value': 'chicken'}]]}

df = pd.DataFrame(data)

# Extract values from 'preferences' column
df['fruit'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'fruit'][0])
df['drink'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'drink'][0])
df['food'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'food'][0])

# Drop the 'preferences' column
df = df.drop(columns=['preferences'])

An additional complication is that not every column has the same name-value pairs. In that case, the method above fails (IndexError) without doing an additional check, which is even more inefficient.

Maybe the solution is to use pd.json_normalize on the preferences column, pivot that, then append the various dataframes?

zlim · Nov-07-2023, 10:56 PM

Updated code to account for potential gaps:

import pandas as pd
import numpy as np

def extract_value(l, name):
    extracted = [item['value'] for item in l if item['name'] == name]
    if len(extracted) == 0:
        return np.nan
    else:
        return extracted[0]

data = {'name': ['Alice', 'Bob', 'Clark'],
        'preferences': [[{'name': 'fruit', 'value': 'apple'}, 
                         {'name': 'drink', 'value': 'lemonade'},
                         {'name': 'food', 'value': 'pizza'}],
                        [{'name': 'fruit', 'value': 'orange'}, 
                         {'name': 'drink', 'value': 'soda'},
                         {'name': 'food', 'value': 'soup'}],
                        [{'name': 'fruit', 'value': 'pear'}, 
                         {'name': 'food', 'value': 'chicken'}]]}
 
df = pd.DataFrame(data)
 
# Extract values from 'preferences' column
df['fruit'] = df['preferences'].apply(lambda x: extract_value(x, 'fruit'))
df['drink'] = df['preferences'].apply(lambda x: extract_value(x, 'drink'))
df['food'] = df['preferences'].apply(lambda x: extract_value(x, 'food'))
 
# Drop the 'preferences' column
df = df.drop(columns=['preferences'])

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Find duplicates in a pandas dataframe list column on other rows	Calab	2	4,159	Sep-18-2024, 07:38 PM Last Post: Calab
	Find strings by index from a list of indexes in a different Pandas dataframe column	Calab	3	2,800	Aug-26-2024, 04:52 PM Last Post: Calab
	[Solved] How to refer to dataframe column name based on a list	lorensa74	1	3,615	May-17-2021, 07:02 AM Last Post: lorensa74
	Comparing results within a list and appending to pandas dataframe	Aryagm	1	3,425	Dec-17-2020, 01:08 PM Last Post: palladium
	How to form a dataframe reading separate dictionaries from .txt file?	Doug	1	5,874	Nov-09-2020, 09:24 AM Last Post: PsyPy
	Computing the distance between each pair of points	Truman	11	7,782	Jun-20-2020, 01:15 PM Last Post: Truman
	how to list/count the number of dictionaries	paul18fr	2	3,439	Nov-18-2019, 09:50 PM Last Post: paul18fr
	Creating A List of DataFrames & Manipulating Columns in Each DataFrame	firebird	1	6,230	Jul-31-2019, 04:04 AM Last Post: scidam
	Inserting data from python list into a pandas dataframe	mahmoud899	0	3,526	Mar-02-2019, 04:07 AM Last Post: mahmoud899
	List and Dictionaries with Pandas	Balinor	3	4,492	Aug-20-2018, 10:47 PM Last Post: ichabod801

How to most effectively unpack list of name-value pair dictionaries in a dataframe?

User Panel Messages

Announcements