Optimal way to search partial correspondence in a large dict

genny92c · Apr-22-2022, 10:20 AM

I have got a set of songs, and everyone of these has got a list of tags associated. I am working with senticnet 1.6, which is basically a large dict, to get some values associated with these tags (i need these numbers to train a regressor). This is the function i wrote to find the tag. it simply searches the tag as it's written, but since the tags i got are pretty "recent", it happens that i cant find most of them, so i thought i could find correspondences with words prefixes by stemming them(sorry for the bad explanation, my english is a bit rusty). The way i implemented it is very slow since i stem the tag, stem every key of the dict and compare the results. I was wondering, since the executon built this way is very slow (obviously, i think it's O(n) for every tag, i got 20 tag for song and 10k songs, so...), is there a way to limit the search just for keys that start with the stemmed version of the tag?

def get_concept(string, sn):
    
    concept_infos = {}
    try:
        concept_infos = sn.concept(string)
    except:
        ps = PorterStemmer()
        string_stemmed = ps.stem(string)
        found = False
        while not found:
            for key in sn.data:
                key_stemmed = ps.stem(key)
                if string_stemmed == key_stemmed:
                    concept_infos = sn.concept(key)
                    found = True
            found = True
    return concept_infos

i was thinking, if i get the list of keys, stem them and save them stemmed i definitely save a lot of time, but the most is spento to find the corresponding one, so my problem still remains. But if i save the stemmed keys in a data structure that is more useful in terms of searching, i could have lot better results. Is it a valuable idea? Can you suggest me something?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	partial functions before knowing the values	mikisDeWitte	4	2,557	Dec-24-2023, 10:00 AM Last Post: perfringo
	Move Files based on partial Match	mohamedsalih12	2	4,711	Sep-20-2023, 07:38 PM Last Post: snippsat
	search in dict inside tuple	steg	1	1,847	Mar-29-2023, 01:15 PM Last Post: rob101
	Partial KEY search in dict	klatlap	6	3,967	Mar-28-2023, 07:24 AM Last Post: buran
	remove partial duplicates from csv	ledgreve	0	2,474	Dec-12-2022, 04:21 PM Last Post: ledgreve
	Webhook, post_data, GPIO partial changes	DigitalID	2	2,516	Nov-10-2022, 09:50 PM Last Post: deanhystad
	Partial Matching Rows In Pandas DataFrame Query	eddywinch82	1	4,102	Jul-08-2021, 06:32 PM Last Post: eddywinch82
	RuntimeError: Optimal parameters not found: Number of calls to function has reached m	bntayfur	0	8,410	Aug-05-2020, 04:41 PM Last Post: bntayfur
	Partial key lookup in dictionary	GaryNR	1	5,412	Jul-16-2020, 06:55 PM Last Post: Gribouillis
	Partial using Tkinter function	chesschaser	10	10,817	Jul-03-2020, 03:57 PM Last Post: chesschaser

Optimal way to search partial correspondence in a large dict

User Panel Messages

Announcements