Strange error ValueError: dimension mismatch

Anldra12 · Aug-17-2021, 07:54 AM

I have to test word2vector model for text data similarity it generate this kind of error ValueError: dimension mismatch

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
 
tokenizer=Tokenizer()
tokenizer.fit_on_texts(documents_df.documents_cleaned)
tokenized_documents=tokenizer.texts_to_sequences(documents_df.documents_cleaned)
tokenized_paded_documents=pad_sequences(tokenized_documents,maxlen=64,padding='post')
vocab_size=len(tokenizer.word_index)+1
 
# reading Glove word embeddings into a dictionary with "word" as key and values as word vectors
embeddings_index = dict()
 
with open("D:\Clustering\glove.6B.100d.txt", 'r', encoding="utf8") as file:
    for line in file:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = coefs
     
# creating embedding matrix, every row is a vector representation from the vocabulary indexed by the tokenizer index. 
embedding_matrix=np.zeros((vocab_size,100))
 
for word,i in tokenizer.word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector
         
# calculating average of word vectors of a document weighted by tf-idf
document_embeddings=np.zeros((len(tokenized_paded_documents),100))
words=tfidfvectoriser.get_feature_names()
 
# instead of creating document-word embeddings, directly creating document embeddings
for i in range(documents_df.shape[0]):
    for j in range(len(words)):
        document_embeddings[i]+=embedding_matrix[tokenizer.word_index[words[j]]]*tfidf_vectors[i][j]
         
 
pairwise_similarities=cosine_similarity(document_embeddings)
pairwise_differences=euclidean_distances(document_embeddings)

Error:

Error:Traceback (most recent call last):
  File "D:/Clustering/text-cluster-master/Cos_Sim_Eucliden_distance.py", line 156, in <module>
    document_embeddings[i] += embedding_matrix[tokenizer.word_index[words[j]]] * tfidf_vectors[i][j]
  File "D:\Python3.8.0\Python\lib\site-packages\scipy\sparse\base.py", line 550, in __rmul__
    return (self.transpose() * tr).transpose()
  File "D:\Python3.8.0\Python\lib\site-packages\scipy\sparse\base.py", line 498, in __mul__
    raise ValueError('dimension mismatch')
ValueError: dimension mismatch

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	MariaDB Connector/Python; version mismatch	shopgeek	1	1,519	Feb-24-2025, 05:06 AM Last Post: from1991
	Strange argument count error	rowan_bradley	3	2,394	Aug-06-2023, 10:58 AM Last Post: rowan_bradley
	Array dimension don't match	asja2010	0	3,048	Feb-23-2023, 04:22 PM Last Post: asja2010
	x and y must have same first dimension, but have shapes (1,) and (50,)	asja2010	5	7,003	Jan-12-2023, 07:24 PM Last Post: deanhystad
	ValueError: Length mismatch: Expected axis has 8 elements, new values have 1 elements	ilknurg	1	10,378	May-17-2022, 11:38 AM Last Post: Larz60+
	ValueError: dimension mismatch	Anldra12	0	5,126	Jul-17-2021, 04:46 PM Last Post: Anldra12
	ValueError: x and y must have same first dimension, but have shapes (11,) and (15406,	hobbyist	17	182,088	Mar-22-2021, 10:27 AM Last Post: hobbyist
	Why getting ValueError : Math domain error in trig. function, math.asin() ?	jahuja73	3	6,233	Feb-24-2021, 05:09 PM Last Post: bowlofred
	Strange syntax error with f-strings	Askic	6	11,087	Oct-16-2020, 10:40 AM Last Post: Askic
	Error When Plotting ValueError: x and y must have same first dimension	JoeDainton123	1	11,188	Oct-04-2020, 12:58 PM Last Post: scidam

Strange error ValueError: dimension mismatch

User Panel Messages

Announcements