Skip to content

Fix pool drain#453

Closed
geraldog wants to merge 5 commits intosonic182:masterfrom
geraldog:fix_pool_drain
Closed

Fix pool drain#453
geraldog wants to merge 5 commits intosonic182:masterfrom
geraldog:fix_pool_drain

Conversation

@geraldog
Copy link
Copy Markdown
Contributor

As I stated previously I have serious problems with the pool draining very quickly. I only tested with CyclicQueuePool,

I erroneously and naively tried to fix this with #452

It turns out the problem wasn't just an one-liner. The problem starts with create_task in del inside HttpResponse class. This task has no guarantee of executing and as soon del exits the task is most likely instantaneously garbage collected.

So I discovered I could wait inside Connection class aexit function. This alleviated the problem but there was still fast pool drain.

I tried a little more and it became obvious the wait_for call to _do_request helper function was cancelling aexit of Connection class. But I was able to catch CancelledError there so there's guarantee that we can close the socket by using close() on the StreamWriter and finally putting the Connection back into the CyclicQueuePool class Queue.

There was still pool drain so I fixed two more situations inside Connector class where Exceptions happen and the Connection is not put back into the Queue.

Now my crawler is running stable without pool drain! Thank you for the work!

@geraldog
Copy link
Copy Markdown
Contributor Author

After fixing pool drain I noticed that a particular fast crawler (courtesy of aiosonic :) of mine was raising OSError related to too many open files. Strange I thought, since I ulimit -n 16000 on that terminal and have a custom security.conf.

It turns out some bad SSL servers leave the connection half-open and the only solution to close the zombie socket is to abort the transport. I tested and it solves the problem, so I'll push the relevant commit with this fix in an instant.

Note that for Python 3.11 it requires this python/cpython#104485

Also there's a memory leak in Python 3.11+ regarding, again SSL, see python/cpython#109534

The memory leak itself is not relevant to the commit, just though I'd share since it particularly limits long-winded crawls.

Copy link
Copy Markdown
Owner

@sonic182 sonic182 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! your crawler must be awesome!

I have commented some possible edge cases, mostly with the while True that may block the execution

Comment thread aiosonic/__init__.py
if self.connection:
loop = get_loop()
loop.create_task(self.connection.release())
pass
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just remove the whole method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do so. Thank you for your kindness.

Comment thread aiosonic/connection.py
self.blocked = False
self.temp_key: Optional[str] = None
self.requests_count = 0
self.cycled = False
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit this variable?

I think I need to better document this class which is a bit complex for the nature of the problem 😄

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a safe-guard against putting the same Connection object into the pool twice. I can test without it and see if it makes a difference.

Comment thread aiosonic/connection.py
self.h2handler.cleanup()

else:
while self.blocked:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This __aexit__ step is executed in a separated asyncio task? this doesn't block?

The CI (gh actions) with linux is blocked I guess it is for this 😮

Maybe it worth to create a separate task with asyncio.create_task that does this active wait (or to have a "cleaner" task for it)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not block because of the nature of asyncio.sleep, which yields to other coroutines.

Sorry to hear about the CI issues.

We really can't have a task easily there I think because we need to catch CancelledError which is raise by wait_for in _do_request. I'll see if I can do it in a more elegant manner. Thanks.

@geraldog
Copy link
Copy Markdown
Contributor Author

@sonic182 I think we can create a Future object that can be awaited in aexit if self.blocked without the ugly while True:

Let's call this Future object the Connection waiter. A Future is just a promise of a resolution of a problem, that can be awaited and it will yield to other tasks if not yet done, without the need for the also ugly asyncio.sleep(hardcoded_timeout)

The beauty of this Future object is that its set_result() method can be called from synchronous code paths.

So we await on this waiter if Connection is self.blocked and we get the chance to catch CancelledError either way when RequestTimeout() comes.

I have a test crawling running right now that I would like not to interrupt but I think I can hack up some code for tomorrow. What do you think?

@sonic182
Copy link
Copy Markdown
Owner

I'm working in this small PR #454 to include socket read/write in the same connection class (this will be needed for future usage when fully implement the Proxy)

Maybe the try/finally inside the HTTPResponse class fixes your crawler? by ensuring the connection get's released after reading response data or having an error with it

@geraldog
Copy link
Copy Markdown
Contributor Author

geraldog commented Jan 22, 2024

I'll definitely give it a try, thanks!

EDIT: I see it's merged already so I'll be definitely be giving it a try today still.

@geraldog
Copy link
Copy Markdown
Contributor Author

Unfortunately I see ConnectionPoolAcquireTimeout very fast even with latest master that include #454 :(

@sonic182
Copy link
Copy Markdown
Owner

sonic182 commented Jan 22, 2024

ConnectionPoolAcquireTimeout

You may need to use Semaphores to limit the concurrency in your crawler, I think you're starting a lot of calls very quick and the pool size is not big enough for it, you may need to extend timeouts too

@geraldog
Copy link
Copy Markdown
Contributor Author

geraldog commented Jan 22, 2024

Please re-read the first message in this pull request and you'll realize the problem is not as you mentioned.

EDIT: By the way my pool size is 16000 clients, which I allow by custom limits.conf and ulimit -n 16000.
The crawler does not create more than 1000 tasks, limited by len(asyncio.all_tasks())

How could my Pool be exhausting? And how does it not exhaust with the changes introduced in this PR?

@geraldog
Copy link
Copy Markdown
Contributor Author

@sonic182 taking in account #454 going in effect and my promise to rework the logic around a future to wait instead of using asyncio.sleep, I'm closing this PR and opening another one. Hope you don't mind.

@geraldog geraldog closed this Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants