Skip to content

bpo-40049: Check if symlink exists when extracting from tarfile#19187

Closed
jonnyhsu wants to merge 4 commits into
python:masterfrom
jonnyhsu:fix-issue-40049
Closed

bpo-40049: Check if symlink exists when extracting from tarfile#19187
jonnyhsu wants to merge 4 commits into
python:masterfrom
jonnyhsu:fix-issue-40049

Conversation

@jonnyhsu

@jonnyhsu jonnyhsu commented Mar 27, 2020

Copy link
Copy Markdown

When extracting a tarfile, os.symlink() will raise an exception if the symlink already exists. This will cause the entire tarfile to be scanned for the destination files, thinking that the platform does not support symlinks. On a normal file this goes unnoticed, but when processing stream data it will raise a StreamError because it needs to seek backwards to resume extraction where it left off.

https://bugs.python.org/issue40049

@jonnyhsu jonnyhsu requested a review from ethanfurman as a code owner March 27, 2020 01:35
@the-knights-who-say-ni

Copy link
Copy Markdown

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@jonnyhsu

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@jonnyhsu

jonnyhsu commented Jun 4, 2020

Copy link
Copy Markdown
Author

Is there anyone available to review this?

@ZackerySpytz ZackerySpytz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this should have a unit test.

Comment thread Misc/NEWS.d/next/Library/2020-03-27-20-49-32.bpo-40049.8079ca.rst Outdated
@jonnyhsu

jonnyhsu commented Jun 6, 2020

Copy link
Copy Markdown
Author

@ZackerySpytz thanks for taking a look. I've added a unit test and confirmed that it fails on master and passes on this branch with both Windows 10 and Ubuntu.

@taleinat

Copy link
Copy Markdown
Contributor

Closing and re-opening to restart the Travis-CI check.

@taleinat taleinat closed this Sep 18, 2020
@taleinat taleinat reopened this Sep 18, 2020
@taleinat

taleinat commented Sep 18, 2020

Copy link
Copy Markdown
Contributor

GNU tar (v1.30, Ubuntu 20.04) does indeed overwrite files with symlinks upon extracting, while both ln -s and os.symlink do not. Assuming this is common behavior for tar, I agree that the appropriate behavior would be to overwrite as suggested here.

@taleinat taleinat left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but it seems to me that the root issue here is not the backwards seek, but the simply incorrect behavior of not overwriting existing files with symlinks.

If you agree, @jonnyhsu, please change the reasoning in the test comment and the NEWS entry accordingly, and I'd be happy to merge this.

@bedevere-bot

Copy link
Copy Markdown

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Comment thread Lib/tarfile.py
try:
# For systems that support symbolic and hard links.
if tarinfo.issym():
if os.path.lexists(targetpath):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps include a short comment here explaining why this is needed? Something along the lines of "tar should overwrite existing files with symlinks, but os.symlink raises an exception rather than overwriting."

@taleinat

Copy link
Copy Markdown
Contributor

Thanks for the PR, @jonnyhsu, but I'm closing this in favor of PR GH-21409, which restores the original fix for this issue which was lost in a bad merge.

@taleinat taleinat closed this Sep 19, 2020
@jonnyhsu

Copy link
Copy Markdown
Author

Thanks for the PR, @jonnyhsu, but I'm closing this in favor of PR GH-21409, which restores the original fix for this issue which was lost in a bad merge.

Got it, thanks for sorting it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants