Code that generates MD5 hashes from IPv6 addresses giving differant answers?

PyMD5 · (This post was last modified: Oct-16-2016, 06:19 PM by Yoriz.)

I have some code that generates MD5 hashes from IPv6 addresses, then checks them against a list of known MD5 hashes. In trying to speed it up, I profiled it, and found the string conversion was chewing up a lot of CPU time. One must convert IPv6 to string to bytes, then feed that to _hashlib.

So, I attempted to speed it up. Here's some code documenting my attempt:

from _hashlib import openssl_md5 as hashMD5
from ipaddress import IPv6Address as IPv6

starting_ip='2001:4958::'
ip = IPv6(starting_ip)
aa = 208000000000

hashgen = hashMD5((b'%u' % (ip+aa))).hexdigest()
hashgen2 = hashMD5(('%s' % (ip+aa)).encode('utf-8')).hexdigest()

print(hashgen)
print(hashgen2)

Output:d6f76fb9ca27fdae847af8ea2f3797e2
6e217802558e0534bfb91f694e045f5e

I know the second one (hashgen2) is correct, but why is the first one (hashgen) not returning the correct MD5 hash? If Python 3.5.2 is using Unicode as a default, then specifying the 'b' string literal should implicitly encode it as Unicode, right?

What am I doing wrong?

Ah, figured it out. 'u' is not a Unicode string literal. Apparently it's for an integer.

**Larz60+** · (This post was last modified: Oct-16-2016, 03:03 AM by Larz60+.)

Hello,

You are correct, the u is for unsigned character. It originated in C as a data type, originally for an 8 bit byte, where you wanted to use the full 255 possible values.
Without it, the range would be -128 to 127.

PyMD5 · Oct-16-2016, 03:29 AM

Yeah. It's too bad, too... hashgen was about 4 times faster than hashgen2.

**Larz60+** · Oct-16-2016, 06:33 AM

if you know C (I expect the hash algorithm is written in C) you might want to take a look at the two algorithms.
Most that I have written or borrowed (I used a modified of Aho's from the dragon book) were actually quite simple,
usually fed a seed that was the size of the hash table, manipulating the key through an iterative process of masks and
bit shifts. only a few lines of code.

What you did with it afterwards is where it can get more complicated (although, with care, this can be simple as well). The one that I used for processing
a days worth of phone calls (~80 million calls) used a lateral extension, which was actually a linked list, when a collision was encountered. By using the
size of the table as part of the has, the distribution was very even. The linked list handling of collisions had the (very good) side effect of not running
out of space.

This algorithm could process (identify customer, distance between points, number of points, segment rating, etc.) in twenty minutes.
The lateral lists never got too long, so caused little delay.

I got into hashing in a big way, saving a few computer cycles on a single call really added up when you were processing so many.

Should you get interested and investigate the python hashes, I'd be interested in what you find.

Larz60+

Skaperen · Oct-17-2016, 02:39 AM

(Oct-16-2016, 02:59 AM)Larz60+ Wrote: Hello,

You are correct, the u is for unsigned character. It originated in C as a data type, originally for an 8 bit byte, where you wanted to use the full 255 possible values.
Without it, the range would be -128 to 127.

it's for unsigned (int is implied ... 32 bit in common platforms) in C. and works that way in Py, too:

Output:lt1/forums /home/forums 10> py2
Python 2.7.12 (default, Jul  1 2016, 15:12:24) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> '%u' % (2**30,)
'1073741824'
>>> 
lt1/forums /home/forums 11> py3
Python 3.5.2 (default, Sep 10 2016, 08:21:44) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '%u' % (2**30,)
'1073741824'
>>>

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	why: pyinstaller generates a different exe file every time	momo1313	1	74	Jan-23-2026, 03:17 PM Last Post: DeaD_EyE
	hashlib md5 - Different hashes for requests content	dev01	4	2,077	Dec-03-2024, 10:39 AM Last Post: Larz60+
	how to log ip addresses in loguru	robertkwild	2	1,521	Jul-22-2024, 12:01 PM Last Post: robertkwild
	'answers 2' is not defined on line 27	0814uu	4	3,117	Sep-02-2023, 11:02 PM Last Post: 0814uu
	Compiles Python code with no error but giving out no output - what's wrong with it?	pythonflea	6	4,446	Mar-27-2023, 07:38 AM Last Post: buran
	Non cryptographic hashes	AndrzejB	3	2,283	Mar-21-2023, 07:36 PM Last Post: AndrzejB
	unittest generates multiple files for each of my test case, how do I change to 1 file	zsousa	0	2,179	Feb-15-2023, 05:34 PM Last Post: zsousa
	a function to get IP addresses of interfaces	Skaperen	2	2,750	May-30-2022, 05:00 PM Last Post: Skaperen
	Loop through list of ip-addresses [SOLVED]	AlphaInc	7	8,445	May-11-2022, 02:23 PM Last Post: menator01
	Division calcuation with answers to 1decimal place.	sik	3	3,557	Jul-15-2021, 08:15 AM Last Post: DeaD_EyE

Code that generates MD5 hashes from IPv6 addresses giving differant answers?

User Panel Messages

Announcements