This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients David.Edelsohn, cstratak, pablogsal, vstinner, xtreak
Date 2020-09-02.13:08:12
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1599052093.15.0.283909386297.issue41642@roundup.psfhosted.org>
In-reply-to
Content
Charris, Pablo and me identified that TCP connections are closed by the load balancer on some buildbot workers.

When the "buildbot.python.org" host name is used, TCP connections (tcp port 9020) go through a load balancer.

Ernest exposed the TCP port 9020 directly to the Internet (without the load balancer) using a new host name: "buildbot-api.python.org".

Buildbot workers should be updated to use "buildbot-api.python.org". I also suggest to use a keepalive of 60 seconds, rather than 600 seconds.

If your worker got impacted the this issue, I strongly advice you to clean up manually the temporary directory (/tmp). When a worker was disconnected, the build was interrupted without removing temporary files. On some workers, we got around 20 GB of temporary files in /tmp: "ccXXXX" files and "tmpXXXX" files. I guess that some files are coming from the compiler, some other from the Python test suite.

I updated the buildbot client configuration of the 9 workers operated by Red Hat:

Fedora Rawhide x64-86
Fedora Stable x64-86
RHEL8 x64-86
RHEL7 x64-86
RHEL8 FIPS x86-64
Fedora Rawhide AArch64
Fedora Stable AArch64
RHEL 8 ppc64le
RHEL 7 ppc64le

On our owners, I used the following commands:

systemctl stop buildbot-worker.service
du -sh /tmp; rm -f /tmp/{cc,tmp}*; du -sh /tmp
sed -i -e "s/buildmaster_host = 'buildbot.python.org'/buildmaster_host = 'buildbot-api.python.org'/;s/keepalive = .*/keepalive = 60/" /home/buildbot/buildarea/buildbot.tac; grep -E '(host|keepalive) =' /home/buildbot/buildarea/buildbot.tac
systemctl start buildbot-worker.service
systemctl status buildbot-worker.service
History
Date User Action Args
2020-09-02 13:08:13vstinnersetrecipients: + vstinner, David.Edelsohn, cstratak, pablogsal, xtreak
2020-09-02 13:08:13vstinnersetmessageid: <1599052093.15.0.283909386297.issue41642@roundup.psfhosted.org>
2020-09-02 13:08:13vstinnerlinkissue41642 messages
2020-09-02 13:08:12vstinnercreate