Mar-22-2017, 05:12 PM
Hello everyone,
I am trying to learn a bit about Python since I researched and saw it's a really powerful language.
So for my first "hello world" project ( well, maybe this one is a bit more advanced ) I decided to make myself a nice script that will download all the images from certain facebook page ( group ).
First, I started with this nice example of getting .csv file from facebook graph data with this awesome python code: Facebook Page Post Scraper from minimaxir ( can't link since it's my 1st post, sorry author ! )
That was my starting point. So from this I wanted to extract all the photo links and download images, simple :)
This is the code I wrote and it works ! It works but for sure this has so many holes and things could improve.
I am trying to learn a bit about Python since I researched and saw it's a really powerful language.
So for my first "hello world" project ( well, maybe this one is a bit more advanced ) I decided to make myself a nice script that will download all the images from certain facebook page ( group ).
First, I started with this nice example of getting .csv file from facebook graph data with this awesome python code: Facebook Page Post Scraper from minimaxir ( can't link since it's my 1st post, sorry author ! )
That was my starting point. So from this I wanted to extract all the photo links and download images, simple :)
This is the code I wrote and it works ! It works but for sure this has so many holes and things could improve.
import csv
import urllib.request
from collections import defaultdict
columns = defaultdict(list) # each value in each column is appended to a list
with open('disu.txt') as f:
reader = csv.DictReader(f) # read rows into a dictionary format
for row in reader: # read a row as {column1: value1, column2: value2,...}
for (k,v) in row.items(): # go over each column name and value
columns[k].append(v) # append the value into the appropriate list based on column name k
strings = columns['status_link'] # take only status_link column out of list
# loop and remove any of '' empty values inside list if any exists. This is to avoid errors in further loop if there is no url to download.
while True:
try:
strings.remove("")
except ValueError:
break
# split list value ( it's always the same URL ), and take only img ID from url
strings = [i.rsplit('/', 2)[-2] for i in strings]
# fint length of list, so we can loop trough all values
sumtotal = len(strings)
count = 0
#while (count < sumtotal-1): this is "while" for automatic loop based on list length, now we just take few elements to test
while (count < 20):
count = count + 1
one = strings[count]
one = one.rsplit('/', 1)[-1]
newurl = ('https://graph.facebook.com/'+ one +'/picture')
urllib.request.urlretrieve(newurl, 'slike/test'+ str(count) +'.jpg')
print(newurl, "Downloaded")I am aware this is bad way of downloading images, since:- I don't use API that facebook provides. I used csv file from another script mentioned above
- I found way that using
('https://graph.facebook.com/'+ one +'/picture')redirects to real image and it works ! But probably is a bad way to do it
- I don't have any checkups for 404 or 500 errors, so if that happens, my scripts stops.
- Also mind that I just started to learn programming and python, so loops above may be soooo wrong, but that's why I am here
