Posts: 327
Threads: 82
Joined: Apr 2019
Aug-24-2025, 01:27 PM
(This post was last modified: Aug-26-2025, 05:42 AM by paul18fr.)
Hi,
I'm looking for a fast way to associate to each row of the Mat array, some values to build the assciatedvalue array.
In pratice i'm dealin with an array with more than a million of values
Thanks
Paul
import numpy as np
# initial value -> associated ones
# 13 -> 8
# 3 -> 2
# 14 -> 5
# 13 -> 6
# mat values are not unique
Mat = np.array([12, 3, 14, 12, 14, 13])
associatedValue = np.array([ 8, 2, 5, 8, 5, 6])
# dict = {'12': 8, '3': 2, '14': 5, '13': 6}
#ref_mat = np.array([[12, 8],
# [ 3, 2],
# [14, 5],
# [13, 3]])
#associatedTupe = [tuple(row) for row in ref_mat]
Posts: 4,904
Threads: 79
Joined: Jan 2018
Can you post an example input together with the expected output?
« We can solve any problem by introducing an extra level of indirection »
Posts: 327
Threads: 82
Joined: Apr 2019
Sorry if it's a bit messy, so
- input =
Mat
- output =
associatedValue
Posts: 327
Threads: 82
Joined: Apr 2019
ok it seems to work; note the idea of using a dictionary was adviced me some years ago in this forum (do not remember when and were), and after some trials, it weems to work.
Any faster solution remains open
import numpy as np
import time
# initial value -> associated ones
# 13 -> 8
# 3 -> 2
# 14 -> 5
# 13 -> 6
# mat values are not unique
Mat = np.array([12, 3, 14, 12, 14, 13])
associatedValue = np.array([ 8, 2, 5, 8, 5, 6]).tolist()
k = (3, 12, 13, 14)
asso = (2, 8, 6, 5)
t0 = time.time()
d = {k: i for i,k in enumerate(k)} #
output = [asso[d.get(x)] for x in Mat]
t1 = time.time()
print(f"duration = {t1 - t0}")
check= (associatedValue == output)
print(f'Check = {check}')
Posts: 4,904
Threads: 79
Joined: Jan 2018
Aug-24-2025, 05:02 PM
(This post was last modified: Aug-24-2025, 05:02 PM by Gribouillis.)
There is a C++ extension library in Pypi: fastremap. It seems that it was written specifically to tackle this problem on large arrays. It is probably worth considering if you have millions of values.
Your pure Python loop would be very slow.
« We can solve any problem by introducing an extra level of indirection »
Posts: 7,431
Threads: 125
Joined: Sep 2016
You import numpy then not use it,and work in pure Python loops a dict lookup and a list append per element of Mat.
NumPy is fast when you keep the work in vectorized C loops,your code steps out of NumPy and back into Python for every element.
import numpy as np
import time
t0 = time.time()
Mat = np.array([12, 3, 14, 12, 14, 13])
mapping = {12: 8, 3: 2, 14: 5, 13: 6}
u, inv = np.unique(Mat, return_inverse=True)
# Cast to plain int for robust dict lookups; choose dtype to match your values
mapped_u = np.fromiter((mapping[int(x)] for x in u), dtype=np.int64, count=u.size)
associated_value = mapped_u[inv] # shape matches Mat (1D here)
print(associated_value) # [8 2 5 8 5 6]
t1 = time.time()
print(f"duration = {t1 - t0}")Output: [8 2 5 8 5 6]
duration = 0.00018525123596191406
Your output:
Output: duration = 1.0251998901367188e-05
[8, 2, 5, 8, 5, 6]
Posts: 1,301
Threads: 151
Joined: Jul 2017
I don't know what your data represents. You have numbers associated with other numbers.
Maybe some of your associated numbers are associated with more than 1 value number. That is my assumption.
Maybe the associated numbers represent hours of the day and the values represent windspeed. Just a guess.
I only used numpy to create arrays to work on. I think, in practice, you already have your data all stored in some data container. Millions of it!
So, on this very rainy day, I came up with this:
import numpy as np
# I only use numpy to generate random data sets
# in practice, you will already have your data stored somewhere
# in a databank
rng = np.random.default_rng()
# low, the lowest value, high the highest value, endpoint=True includes high, here 30
# fake data
data = rng.integers(low=5, high=30, size=(30, 4), dtype=np.uint8, endpoint=True)
# put your associated numbers here
ids = rng.integers(low=1, high=4, size=(30, 4), dtype=np.uint8, endpoint=True)
# a dictionary of all the id numbers associated with the data value numbers
# I assume 1 id number may be associated with more than 1 value number
# maybe the ids are the hours or periods of the day
# maybe the value is the windspeed
data_dict = {i: [] for i in range(1,5)}
def get_data():
for d in data:
for e in d:
yield e
def get_ids():
for i in ids:
for f in i:
yield f
# a dictionary of all the id numbers associated with the data value numbers
# store the values in a list under the key of the associated number
# only 4 keys here
data_dict = {i: [] for i in range(1,5)}
# use a generator if you have millions of data, don't load everything at once
# reset the generator to repeat
ids_gen = get_ids()
# use a generator if you have millions of data, don't load everything at once
# reset the generator to repeat
value_gen = get_data()
# if you don't use data_dict[int(idd)].append(int(value)) you get e.g. np.uint8(7) as value
# although print(idd, value) shows normal numbers
# not sure what the problem is there
for idd in ids_gen:
value = next(value_gen)
#print(idd, value)
data_dict[int(idd)].append(int(value))This gives the following for data_dict:
Output: {1: [28, 20, 20, 14, 15, 22, 26, 10, 17, 16, 10, 29, 17, 22, 15, 30, 14, 26, 25, 12, 25, 23, 6, 30, 9, 11, 16, 26, 21], 2: [21, 27, 12, 10, 13, 21, 6, 10, 30, 5, 25, 12, 30, 17, 30, 17, 7, 8, 9, 16, 8, 13, 21, 18, 20, 28, 18, 11, 29, 5], 3: [22, 6, 14, 13, 29, 23, 9, 6, 6, 20, 16, 6, 24, 28, 15, 27, 20, 13, 11, 11, 12, 12, 19, 30, 28, 20, 14, 14, 25, 9], 4: [24, 20, 20, 9, 19, 18, 27, 23, 7, 15, 22, 23, 20, 21, 14, 28, 23, 27, 11, 30, 21, 11, 5, 22, 24, 30, 10, 8, 7, 17, 24]}
Of course, if we knew what the data are, and how the numbers relate to each other, it would be easier to know how to proceed!
Posts: 327
Threads: 82
Joined: Apr 2019
Aug-25-2025, 12:05 PM
Quote:Of course, if we knew what the data are, and how the numbers relate to each other, it would be easier to know how to proceed!
why not:
Mat = element type array
associated value (array) = number of nodes per element
Posts: 1,301
Threads: 151
Joined: Jul 2017
Aha, progress!
Please look at post #2 above.
Post a little of the data you have. How is this data stored? json? csv? databank?
Merci!
Posts: 327
Threads: 82
Joined: Apr 2019
data comes from a vtu file using vtk library (however it's out of the scope of the current post)
Many thanks for your advices
|