Downloading txt files

tjnichols · Aug-27-2018, 04:16 PM

I am trying to learn how to download txt files from the web. I am familiar with downloading pdfs but when I've tried text files I haven't had that much luck.

I'm not in a class but I am trying to learn this which is why I posted my question here.

This is the code I'm trying to run.

from __future__ import print_function

import requests
from bs4 import BeautifulSoup


def file_links_filter(tag):
    """
    Tags filter. Return True for links that ends with 'pdf', 'htm' or 'txt'
    """
    if isinstance(tag, str):
        return tag.endswith('pdf') or tag.endswith('htm') or tag.endswith('txt')


def get_links(tags_list):
    return [WEB_ROOT + tag.attrs['href'] for tag in tags_list]


def download_file(file_link, folder):
    file = requests.get(file_link).content
    name = file_link.split('/')[-1]
    save_path = folder + name

    print("Saving file:", save_path)
    with open(save_path, 'wb') as fp:
        fp.write(file)


WEB_ROOT = 'https://www.sec.gov'
SAVE_FOLDER = '~/download_files/'  # directory in which files will be downloaded

r = requests.get("https://www.sec.gov/litigation/suspensions.shtml")

soup = BeautifulSoup(r.content, 'html.parser')

years = soup.select("p#archive-links > a")  # css selector for all <a> inside <p id='archive'> tag
years_links = get_links(years)

links_to_download = []
for year_link in years_links:
    page = requests.get(year_link)
    beautiful_page = BeautifulSoup(page.content, 'html.parser')

    links = beautiful_page.find_all("a", href=file_links_filter)
    links = get_links(links)

    links_to_download.extend(links)

# make set to exclude duplicate links
links_to_download = set(links_to_download)

print("Got links:", links_to_download)

for link in set(links_to_download):
    download_file(link, SAVE_FOLDER)

This is the error I receive.

Error:===================== RESTART: C:/Python365/SEC Test.py =====================
Traceback (most recent call last):
  File "C:/Python365/SEC Test.py", line 3, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'
>>>

I installed requests using pip install. I've tried uninstalling it and then reinstalling it. No luck. Can you point me in another direction?

Any help you can provide will be most appreciated!

DeaD_EyE · Aug-27-2018, 04:38 PM

You can install requests with:

py -m pip install requests

But you can also use urllib.request.urlopen

from urllib.request import urlopen
from bs4 import BeautifulSoup


req = urlopen('http://google.de')
bs = BeautifulSoup(req.read(), 'html.parser')

**buran** · Aug-27-2018, 04:51 PM

Do you have more than one python installation?

**Gribouillis** · Aug-27-2018, 05:01 PM

Note that one can write

tag.endswith(('pdf', 'htm', 'txt'))

tjnichols · (This post was last modified: Aug-27-2018, 05:37 PM by tjnichols.)

Gribouillis - Thank you for your response. Does this mean I can lose the 'or' statements? Also, why do you have the double parenthesis?

I appreciate the insight!

Thanks!

(Aug-27-2018, 04:51 PM)buran Wrote: Do you have more than one python installation?

Yes I do. Should I install all but one?

Thanks!

**buran** · Aug-27-2018, 06:03 PM

(Aug-27-2018, 05:36 PM)tjnichols Wrote: Yes I do. Should I install all but one?

You can have more than one installation, it's OK. But like in this case, when install third-party packages you need to make sure to install it for the correct python installation. You installed the requests package for different python installation, not the one used to run your script.

tjnichols · Aug-27-2018, 10:01 PM

An update - it works! Thanks for your help!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	python selenium downloading embedded pdf	damian0612	0	5,801	Feb-23-2021, 09:11 PM Last Post: damian0612
	Downloading CSV from a website	bmiller12	1	2,770	Nov-26-2020, 09:33 AM Last Post: Axel_Erfurt
	Downloading book preview	Truman	6	5,554	May-15-2019, 10:02 PM Last Post: Truman
	Downloading Multiple Webpages	MoziakBeats	4	5,004	Apr-17-2019, 04:06 AM Last Post: Skaperen

Downloading txt files

User Panel Messages

Announcements