Apr-22-2022, 10:20 AM
I have got a set of songs, and everyone of these has got a list of tags associated. I am working with senticnet 1.6, which is basically a large dict, to get some values associated with these tags (i need these numbers to train a regressor). This is the function i wrote to find the tag. it simply searches the tag as it's written, but since the tags i got are pretty "recent", it happens that i cant find most of them, so i thought i could find correspondences with words prefixes by stemming them(sorry for the bad explanation, my english is a bit rusty). The way i implemented it is very slow since i stem the tag, stem every key of the dict and compare the results. I was wondering, since the executon built this way is very slow (obviously, i think it's O(n) for every tag, i got 20 tag for song and 10k songs, so...), is there a way to limit the search just for keys that start with the stemmed version of the tag?
def get_concept(string, sn):
concept_infos = {}
try:
concept_infos = sn.concept(string)
except:
ps = PorterStemmer()
string_stemmed = ps.stem(string)
found = False
while not found:
for key in sn.data:
key_stemmed = ps.stem(key)
if string_stemmed == key_stemmed:
concept_infos = sn.concept(key)
found = True
found = True
return concept_infos i was thinking, if i get the list of keys, stem them and save them stemmed i definitely save a lot of time, but the most is spento to find the corresponding one, so my problem still remains. But if i save the stemmed keys in a data structure that is more useful in terms of searching, i could have lot better results. Is it a valuable idea? Can you suggest me something?
