async for

wavic · May-21-2017, 08:22 AM

Well I have a list of URLs and I want to iterate over it so do get the pages as fast as it's possible.
I know that there is async for loop but can't get it how it works

Basically this is what I want

# urls

async for link in urls:
    print('{},{}'.format(await get_email(link))) # this is simplified. I am doing something else

# get_email
async def get_email(link):
    page = await fetch(link)
    soup = BeautifulSoup(page, 'lxml')
    name = soup.find('div', class_='MProwD').text.strip().lower().title()
    try:
        email = soup.find('div',    class_='MPinfo').find_all('a')[-1]['href'].split(':')[1].strip()
    except:
        email = 'Unknown'

        return name, email

#fetch
async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return response.read()

Until there wasn't an error I have not noticed performance difference from the regular program.
I have changed the code so many times and now it gives me an error and I don't even know what caused it.
Can't get this async stuff very well yet

I've tried to subclass the list as I saw it in some web pages so to get an object with __aiter__
Didn't work
I've tried to yielding each list element.

def list_gen(l):
    i = 0
    try:
        yield l[i]
        i += 1
    except StopIteration:
        return

**Larz60+** · May-21-2017, 05:05 PM

nilamo had a good post on async: https://python-forum.io/Thread-Exploring...ight=async

wavic · (This post was last modified: May-21-2017, 07:09 PM by wavic.)

I've done it but not using asyncio. I got rid of all of this and used concurrent.futures instead

with futures.ThreadPoolExecutor(200) as executor:

        results = executor.map(get_email, links) # the result is a generator so you have to list() it

The execution time dropped more than two times
Three lines of code including the import statement and such a difference

***snippsat*** · May-21-2017, 09:03 PM

Quote:The execution time dropped more than two times

It can/should drop a lot more,
it do of course depend on task if downloading larger pieces or getting text.
Try using ProcessPoolExecutor.
Eg.

with futures.ProcessPoolExecutor(max_workers=20) as executor:
    results = executor.submit(get_email, links)

Let say downloading 100 images from a site ca 2-min,
down to 15-20-sec with ProcessPoolExecutor in my tests.
It's of course heavy to launch ProcessPoolExecutor(multiprocessing) for this,
but the speed is really great Undecided

wavic · (This post was last modified: May-23-2017, 08:03 AM by wavic.)

So, I have tried ProcessPoolExecutor and I managed to reduce the running time to 5.661 sec as the best results from few trials.
Since this is networking it is relative but yet is faster than ThreadPoolExecutor. I have to try different numbers for the last one too
I have tried few numbers of max_workers number and 32 gave look like the optimum

Little changes to get the results:

results = []
    with futures.ProcessPoolExecutor(max_workers=32) as executor:
        for result in executor.map(get_email, links):
            results.append(result)

**nilamo** · Jun-06-2017, 05:59 PM

(May-21-2017, 05:05 PM)Larz60+ Wrote: nilamo had a good post on async: https://python-forum.io/Thread-Exploring...ight=async

Just for functions. I'm not sure how async for loops or async with blocks are supposed to work. Do each of the elements of the iterable run asyncronously? Or is it just syntactic sugar to let you use await in the body of the block? I... have no idea.

I'm also a little surprised a bare async for loop would work, I thought that was a syntax error, and they had to be contained within an async callable.

Quick test:

>>> async for _ in range(50):
  File "<stdin>", line 1
    async for _ in range(50):
            ^
SyntaxError: invalid syntax
>>> async def spam():
...   async for _ in range(50):
...     pass
...
>>>

Ok, so a bare for loop can't be async, it must be inside an async callable. I can't actually offer help, since I don't know what it's supposed to do, though :/

**nilamo** · Jun-06-2017, 06:04 PM

(May-23-2017, 08:03 AM)wavic Wrote: So, I have tried ProcessPoolExecutor and I managed to reduce the running time to 5.661 sec as the best results from few trials.

It's also worth noting that starting a new thread/process is not "free". They take time to spin up. That's one of the main benefits of async... there is no downtime with setup, you just get to do things while waiting for something else to finish (like network traffic).

Quote:

results = []
    with futures.ProcessPoolExecutor(max_workers=32) as executor:
        for result in executor.map(get_email, links):
            results.append(result)

I'm not sure what that does, but... are you having all 32 workers process the same list of links? Wouldn't you want to break those apart so each worker processes a different list?

wavic · (This post was last modified: Jun-07-2017, 06:08 AM by wavic.)

As I know the iterable for async for expression have to be a generator/coroutine. I had obstacles to achieve this. Maybe a lack of experience, I don't know. This special kind of iterable has to be constructed before the loop and call it.

async for number in AsyncIterClass(iterable):
    # action

I had read in SO for async list comprehension and I will try it.

list_ = [await function(element) for element in AsyncIterClass(iterable)]

Perhaps this asynchronous iterable class should be something like this:

class Aiter:
    def __self__(self):
        self.iter_ = iter(iterable)

    def __aiter__(self):
        return self
    
    async def __anext__(self):
        try:
            element =  await next(self.iter_)
        except StopIteration:
            raise StopAsyncIteration
        
        return element

After that both async for and async comprehension should work.
I didn't try it. Yet

Ref: https://www.python.org/dev/peps/pep-0492...-async-for

But at the nearly bottom they don't recommend to use it this way. I don't know why:

Quote:While this is not a very useful thing to do, .....

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Async for making requests	DangDuong	1	2,729	Aug-07-2023, 03:35 AM Last Post: deborahlockwood
	Async for loop	wavic	4	24,494	Dec-08-2019, 09:30 PM Last Post: DeaD_EyE

async for

User Panel Messages

Announcements