I have some code that generates MD5 hashes from IPv6 addresses, then checks them against a list of known MD5 hashes. In trying to speed it up, I profiled it, and found the string conversion was chewing up a lot of CPU time. One must convert IPv6 to string to bytes, then feed that to _hashlib.
So, I attempted to speed it up. Here's some code documenting my attempt:
What am I doing wrong?
Ah, figured it out. 'u' is not a Unicode string literal. Apparently it's for an integer.
So, I attempted to speed it up. Here's some code documenting my attempt:
from _hashlib import openssl_md5 as hashMD5
from ipaddress import IPv6Address as IPv6
starting_ip='2001:4958::'
ip = IPv6(starting_ip)
aa = 208000000000
hashgen = hashMD5((b'%u' % (ip+aa))).hexdigest()
hashgen2 = hashMD5(('%s' % (ip+aa)).encode('utf-8')).hexdigest()
print(hashgen)
print(hashgen2)Output:d6f76fb9ca27fdae847af8ea2f3797e2
6e217802558e0534bfb91f694e045f5eI know the second one (hashgen2) is correct, but why is the first one (hashgen) not returning the correct MD5 hash? If Python 3.5.2 is using Unicode as a default, then specifying the 'b' string literal should implicitly encode it as Unicode, right?What am I doing wrong?
Ah, figured it out. 'u' is not a Unicode string literal. Apparently it's for an integer.
