Python Forum
[solved] how to associate value in an array?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[solved] how to associate value in an array?
#1
Big Grin 
Hi,

I'm looking for a fast way to associate to each row of the Mat array, some values to build the assciatedvalue array.

In pratice i'm dealin with an array with more than a million of values

Thanks

Paul


import numpy as np
# initial value -> associated ones
#       13      ->        8
#        3      ->        2
#       14      ->        5
#       13      ->        6
# mat values are not unique

Mat             = np.array([12, 3, 14, 12, 14, 13])   
associatedValue = np.array([ 8, 2,  5,  8,  5,  6]) 

# dict = {'12': 8, '3': 2, '14': 5, '13': 6}

#ref_mat = np.array([[12, 8],
#                    [ 3, 2],
#                    [14, 5], 
#                    [13, 3]])

#associatedTupe = [tuple(row) for row in ref_mat]
Reply
#2
Can you post an example input together with the expected output?
« We can solve any problem by introducing an extra level of indirection »
Reply
#3
Sorry if it's a bit messy, so
  • input = Mat
  • output = associatedValue
Reply
#4
ok it seems to work; note the idea of using a dictionary was adviced me some years ago in this forum (do not remember when and were), and after some trials, it weems to work.

Any faster solution remains open Rolleyes

import numpy as np
import time
# initial value -> associated ones
#       13      ->        8
#        3      ->        2
#       14      ->        5
#       13      ->        6
# mat values are not unique

Mat             = np.array([12, 3, 14, 12, 14, 13])   
associatedValue = np.array([ 8, 2,  5,  8,  5,  6]).tolist()

k    = (3, 12, 13, 14)
asso = (2,  8,  6, 5)

t0 = time.time()
d = {k: i for i,k in enumerate(k)} #
output = [asso[d.get(x)] for x in Mat]
t1 = time.time()
print(f"duration = {t1 - t0}")

check= (associatedValue == output)
print(f'Check = {check}')
Reply
#5
There is a C++ extension library in Pypi: fastremap. It seems that it was written specifically to tackle this problem on large arrays. It is probably worth considering if you have millions of values.

Your pure Python loop would be very slow.
« We can solve any problem by introducing an extra level of indirection »
Reply
#6
You import numpy then not use it,and work in pure Python loops a dict lookup and a list append per element of Mat.
NumPy is fast when you keep the work in vectorized C loops,your code steps out of NumPy and back into Python for every element.
import numpy as np
import time

t0 = time.time()
Mat = np.array([12, 3, 14, 12, 14, 13])
mapping = {12: 8, 3: 2, 14: 5, 13: 6}

u, inv = np.unique(Mat, return_inverse=True)
# Cast to plain int for robust dict lookups; choose dtype to match your values
mapped_u = np.fromiter((mapping[int(x)] for x in u), dtype=np.int64, count=u.size)
associated_value = mapped_u[inv] # shape matches Mat (1D here)
print(associated_value)  # [8 2 5 8 5 6]
t1 = time.time()
print(f"duration = {t1 - t0}")
Output:
[8 2 5 8 5 6] duration = 0.00018525123596191406
Your output:
Output:
duration = 1.0251998901367188e-05 [8, 2, 5, 8, 5, 6]
Reply
#7
I don't know what your data represents. You have numbers associated with other numbers.

Maybe some of your associated numbers are associated with more than 1 value number. That is my assumption.

Maybe the associated numbers represent hours of the day and the values represent windspeed. Just a guess.

I only used numpy to create arrays to work on. I think, in practice, you already have your data all stored in some data container. Millions of it!

So, on this very rainy day, I came up with this:

import numpy as np

# I only use numpy to generate random data sets
# in practice, you will already have your data stored somewhere
# in a databank
rng = np.random.default_rng()
# low, the lowest value, high the highest value, endpoint=True includes high, here 30
# fake data
data = rng.integers(low=5, high=30, size=(30, 4), dtype=np.uint8, endpoint=True)
# put your associated numbers here
ids = rng.integers(low=1, high=4, size=(30, 4), dtype=np.uint8, endpoint=True)

# a dictionary of all the id numbers associated with the data value numbers
# I assume 1 id number may be associated with more than 1 value number
# maybe the ids are the hours or periods of the day
# maybe the value is the windspeed
data_dict = {i: [] for i in range(1,5)}

def get_data():
    for d in data:
        for e in d:
            yield e
            
def get_ids():
    for i in ids:
        for f in i:
            yield f    

# a dictionary of all the id numbers associated with the data value numbers
# store the values in a list under the key of the associated number
# only 4 keys here
data_dict = {i: [] for i in range(1,5)}

# use a generator if you have millions of data, don't load everything at once
# reset the generator to repeat
ids_gen = get_ids()
# use a generator if you have millions of data, don't load everything at once
# reset the generator to repeat
value_gen = get_data()

# if you don't use data_dict[int(idd)].append(int(value)) you get e.g. np.uint8(7) as value
# although print(idd, value) shows normal numbers
# not sure what the problem is there
for idd in ids_gen:
    value = next(value_gen)
    #print(idd, value)
    data_dict[int(idd)].append(int(value))
This gives the following for data_dict:

Output:
{1: [28, 20, 20, 14, 15, 22, 26, 10, 17, 16, 10, 29, 17, 22, 15, 30, 14, 26, 25, 12, 25, 23, 6, 30, 9, 11, 16, 26, 21], 2: [21, 27, 12, 10, 13, 21, 6, 10, 30, 5, 25, 12, 30, 17, 30, 17, 7, 8, 9, 16, 8, 13, 21, 18, 20, 28, 18, 11, 29, 5], 3: [22, 6, 14, 13, 29, 23, 9, 6, 6, 20, 16, 6, 24, 28, 15, 27, 20, 13, 11, 11, 12, 12, 19, 30, 28, 20, 14, 14, 25, 9], 4: [24, 20, 20, 9, 19, 18, 27, 23, 7, 15, 22, 23, 20, 21, 14, 28, 23, 27, 11, 30, 21, 11, 5, 22, 24, 30, 10, 8, 7, 17, 24]}
Of course, if we knew what the data are, and how the numbers relate to each other, it would be easier to know how to proceed!
Reply
#8
Smile 
Quote:Of course, if we knew what the data are, and how the numbers relate to each other, it would be easier to know how to proceed!

why not: Smile

Mat = element type array
associated value (array) = number of nodes per element

[Image: 1756124293-types.png]
Reply
#9
Aha, progress!

Please look at post #2 above.

Post a little of the data you have. How is this data stored? json? csv? databank?

Merci!
Reply
#10
data comes from a vtu file using vtk library (however it's out of the scope of the current post)

Many thanks for your advices
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python institute associate sample exam question CrusaderT 2 3,067 Aug-30-2020, 07:20 AM
Last Post: CrusaderT

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020