Posts: 12
Threads: 3
Joined: May 2025
Hello to this great forum:
I am trying to learn how to scrape data from websites with python.
From this website:
Deutsche Börse
I tried to scrape the data from this table:
https://ibb.co/GYTFJdf
I am fidgeting around with AI Code trying to get the table. The script runs fine even tho i am on an old machine with (win7 Python 3.8.12 and Selenium 3.141) till the point where the script should actually get the data from the table. I tried things like:
for _ in range(10):
try:
# Mit CSS-Selector nach Klasse:
table = driver.find_element(By.CSS_SELECTOR, "app-price-history _nghost-boerse-frankfurt-c352035757")
except:
time.sleep(2)
else:print("table not found!")
driver.quit()
exit()
rows = table.find_elements(By.TAG_NAME, "tr")[1:]
csv_data = []
.....But honestly i have no clue how to identifiy the table right - the script always gives the "table not found" answer.
Since i am not a web-developer neither i am a bit over my head here - i thougt there would be a table-id to easily identify this, but it seems you have build something with the class or so.
the next thing i would need is the ID of the "next page" button.
The data are prices of the Germand stock market INdex "Dax". And for every date there should be the 4 price like this:
25.12.2022, 25000, 28000, 24000, 28000
Any help appreciated! have a good one!
Posts: 12,137
Threads: 496
Joined: Sep 2016
The final page you show is an image, not text.
Is there somewhere else on the main site where you can find this data?
Posts: 12
Threads: 3
Joined: May 2025
Jun-14-2025, 12:05 PM
(This post was last modified: Jun-14-2025, 12:05 PM by MarkMan.)
(Jun-14-2025, 09:21 AM)Larz60+ Wrote: The final page you show is an image, not text.
Is there somewhere else on the main site where you can find this data?
If you click on "Deutsche Börse" the first link it should bring you directly to the page, just scroll down a bit.
Is the link not working? I can click it...
PS the picture is simply there to show you this is the table that i am interested in from the site.
Posts: 12,137
Threads: 496
Joined: Sep 2016
Jun-14-2025, 10:29 PM
(This post was last modified: Jun-14-2025, 10:29 PM by Larz60+.)
Let me try that again ... back tomorrow, hopefully with a solution.
This site uses heavy java script, so will have to use selenium to parse.
Posts: 7,431
Threads: 125
Joined: Sep 2016
Like this,and into Pandas to make table usable easier.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import pandas as pd
from io import StringIO
import time
# Driver
#https://storage.googleapis.com/chrome-for-testing-public/137.0.7146.0/win64/chromedriver-win64.zip
options = Options()
#options.add_argument("--headless=new") # Uncomment to not load site
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
# Parse
url = "https://www.boerse-frankfurt.de/index/dax/kurshistorie/historische-kurse-und-umsaetze"
browser.get(url)
table_parse = browser.find_element(By.CSS_SELECTOR, 'div.table-responsive > table')
#print(table_parse)
# Into Pandas
html = browser.page_source
table_pan = pd.read_html(StringIO(html))[0]
print(table_pan)Output: Datum Eröffnung Schluss Tageshoch Tagestief Umsatz
0 13.06.25 23.448,19 23.516,23 23.557,71 23.360,16 NaN
1 12.06.25 23.768,05 23.771,45 23.885,06 23.618,85 NaN
2 11.06.25 23.996,66 23.948,90 24.151,39 23.948,57 NaN
3 10.06.25 24.156,53 23.987,56 24.168,59 23.964,77 NaN
4 09.06.25 24.252,26 24.174,32 24.289,51 24.097,09 NaN
.. ... ... ... ... ... ...
95 28.01.25 21.374,29 21.430,58 21.475,90 21.296,33 NaN
96 27.01.25 21.201,99 21.282,18 21.344,98 21.081,61 NaN
97 24.01.25 21.463,15 21.394,93 21.520,50 21.353,01 NaN
98 23.01.25 21.277,58 21.411,53 21.423,02 21.254,08 NaN
99 22.01.25 21.169,61 21.254,27 21.330,87 21.162,31 NaN
[100 rows x 6 columns]
Pedroski55 likes this post
Posts: 12
Threads: 3
Joined: May 2025
Thanks for this script snippsat,
unfortunately i am on an older machine for at least a while and i would need something that runs on Python 3.8.12 + Selenium 3.141.0 (or bs4 4.7.1)
i tried to get this to work with an ai but then i ran in other troubles.. The one thing the ai did get right about the code was:
"The main change is in the way the Chrome driver is initialized. Instead of using the Service class, we're directly passing the executable_path argument to the webdriver.Chrome() constructor.
The Service class was introduced in Selenium 4.0, which is not compatible with Selenium 3.141.0 that you're using.
The executable_path argument specifies the path to the Chrome driver executable, which is set to r"C:\cmder\bin\chromedriver.exe" in this example."
What i don't understand about your code is this i copied your CSS selector: "'div.table-responsive > table'"
into my script - but it didn't work but it works in yours?
I am having trouble with everything but that the css selector is not working, i mean i need the selector for the next butoon too
Thanks so far..
If somebody else has an idea. Like you Larz60+ i 'd like to hear it...
Posts: 12
Threads: 3
Joined: May 2025
Ok i found out how to get yoour code running - till the CSS selector - on my machine, just needed to swith 2 lines the code is down below
It then runs into the same problem that the css selector is not working:
Quote:Traceback (most recent call last):
File "......\webscraping_python-forum.io.py", line 18, in <module>
table_parse = browser.find_element(By.CSS_SELECTOR, 'div.table-responsive > table')
File "d:\User\xxx\AppData\Local\python\mu\mu_venv-38-20250502-125844\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 976, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "d:\user\xxx\AppData\Local\python\mu\mu_venv-38-20250502-125844\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "d:\user\xxx\AppData\Local\python\mu\mu_venv-38-20250502-125844\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.table-responsive > table"}
(Session info: chrome=109.0.5414.120)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import pandas as pd
from io import StringIO
import time
# Driver
#https://storage.googleapis.com/chrome-for-testing-public/137.0.7146.0/win64/chromedriver-win64.zip
options = Options()
#options.add_argument("--headless=new") # Uncomment to not load site
browser = webdriver.Chrome(executable_path=r"C:\WebDriver_Python\ChromeDriver\chromedriver.exe", options=options)
#browser = webdriver.Chrome(service=ser, options=options)
# Parse
url = "https://www.boerse-frankfurt.de/index/dax/kurshistorie/historische-kurse-und-umsaetze"
browser.get(url)
table_parse = browser.find_element(By.CSS_SELECTOR, 'div.table-responsive > table')
#print(table_parse)
# Into Pandas
html = browser.page_source
table_pan = pd.read_html(StringIO(html))[0]
print(table_pan)
Posts: 7,431
Threads: 125
Joined: Sep 2016
(Jun-18-2025, 02:28 PM)MarkMan Wrote: Ok i found out how to get yoour code running - till the CSS selector - on my machine, just needed to swith 2 lines the code is down below
It then runs into the same problem that the css selector is not working: The CSS still work for me.
You can in browser code copy the selector,right click--> select the element that has the whole table --> right click --> copy --> Copy selector.
If i do this:
Output: body > app-root > app-wrapper > div > div > div.content-wrapper > app-index > div.row.ng-star-inserted > div > app-price-history > div.widget.app-loading-spinner-parent.d-block.ar-mt.ar-p > div > div > div.table-responsive > table
So i have just shorten it to last part div.table-responsive > table,or just table also work.
Can let page load for while before it find CSS selector,something it load to fast then it can not find CSS selector.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import pandas as pd
from io import StringIO
import time
# Driver
#https://storage.googleapis.com/chrome-for-testing-public/137.0.7146.0/win64/chromedriver-win64.zip
options = Options()
#options.add_argument("--headless=new") # Uncomment to not load site
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
# Parse
url = "https://www.boerse-frankfurt.de/index/dax/kurshistorie/historische-kurse-und-umsaetze"
browser.get(url)
time.sleep(3)
table_parse = browser.find_element(By.CSS_SELECTOR, 'table')
#print(table_parse)
# Into Pandas
html = browser.page_source
table_pan = pd.read_html(StringIO(html))[0]
print(table_pan)Output: Datum Eröffnung Schluss Tageshoch Tagestief Umsatz
0 17.06.25 23.495,06 23.434,65 23.550,99 23.315,07 NaN
1 16.06.25 23.589,47 23.699,12 23.711,73 23.505,54 NaN
2 13.06.25 23.448,19 23.516,23 23.557,71 23.360,16 NaN
3 12.06.25 23.768,05 23.771,45 23.885,06 23.618,85 NaN
4 11.06.25 23.996,66 23.948,90 24.151,39 23.948,57 NaN
.. ... ... ... ... ... ...
95 30.01.25 21.676,22 21.727,20 21.732,05 21.650,78 NaN
96 29.01.25 21.511,47 21.637,53 21.671,59 21.475,64 NaN
97 28.01.25 21.374,29 21.430,58 21.475,90 21.296,33 NaN
98 27.01.25 21.201,99 21.282,18 21.344,98 21.081,61 NaN
99 24.01.25 21.463,15 21.394,93 21.520,50 21.353,01 NaN
[100 rows x 6 columns]
Posts: 12
Threads: 3
Joined: May 2025
Jun-20-2025, 06:25 AM
(This post was last modified: Jun-20-2025, 06:25 AM by MarkMan.)
@snippsat: thank you for this great answer!! So many intelligent learning things you built in like uncomment here to start headless and see the selenium objektc etc. You saved me with this. Yesterday when nothing worked i wanted to quit learning python and then i saw your answer and everything worked and i could pass the impass....
Of course every newbie would have a bazillion questions on this, the more important stuff going forwars i guess would be:
1. We use pandas here to parse the source code of the site. What i want to do in the end is to scrape the entire dataset which spans across 97 paginations (if you max out the preset dates) and save it to a csv file.
What would be a good way to store the read data i don't want to play with it in pandas just want to "glue together" the data form 97 pages and write to csv (maybe delete last column). It should be safe, so there should not be any data lost. And it probably would be a good idea to do some housekeeping like deleting the source code of the pages after the data is scraped?
I am a bit confused how to go about this, do i need to get the data out of pandas into a list or would it be better to do this in pandas? I don't know what the right way to go about this is. Any input appreciated.
2. Question is more a understanding question about this line:
table_parse = browser.find_element(By.CSS_SELECTOR, 'div.table-responsive > table') I originally thought this would be the most important line of code, but now it does not seem very useful it just returns some kind of object:
Output: selenium.webdriver.remote.webelement.WebElement
and the code continues without even using it. In my understanding in the end you use the html parser of pandas to get the first table of the site. So long story short:
Selenium is not really a webscraping tool it is more a web automation tool and to actually get the data out of the site it is not so useful? Or is there anything especially useful one could do with this object it returns?
have a good one!
Posts: 7,431
Threads: 125
Joined: Sep 2016
(Jun-20-2025, 06:25 AM)MarkMan Wrote: What would be a good way to store the read data i don't want to play with it in pandas just want to "glue together" the data form 97 pages and write to csv (maybe delete last column). When collect can just eg save all to a list,the save at .csv or bring into Pandas make it easier it need to do something with data.
(Jun-20-2025, 06:25 AM)MarkMan Wrote: I originally thought this would be the most important line of code, but now it does not seem very useful it just returns some kind of object: The object has attributes,can use dir() to see them.
So if use table_parse.text get then table out.
>>> table_parse
<selenium.webdriver.remote.webelement.WebElement (session="d155102e8068b963d680570cac5677d6", element="f.D7E492F404C9B2A1DA7D9BD7576BD63A.d.934C06167C380C3270BF4181EB4E50B7.e.47")>
>>> dir(table_parse)
['__abstractmethods__',
'__class__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__firstlineno__',
'__format__',
'__ge__',
'__getattribute__',
'__getstate__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__static_attributes__',
'__str__',
'__subclasshook__',
'__weakref__',
'_abc_impl',
'_execute',
'_id',
'_parent',
'_upload',
'accessible_name',
'aria_role',
'clear',
'click',
'find_element',
'find_elements',
'get_attribute',
'get_dom_attribute',
'get_property',
'id',
'is_displayed',
'is_enabled',
'is_selected',
'location',
'location_once_scrolled_into_view',
'parent',
'rect',
'screenshot',
'screenshot_as_base64',
'screenshot_as_png',
'send_keys',
'session_id',
'shadow_root',
'size',
'submit',
'tag_name',
'text',
'value_of_css_property']
>>> table_parse.text
('Datum Eröffnung Schluss Tageshoch Tagestief Umsatz\n'
'19.06.25 23.178,60 23.057,38 23.254,86 23.051,55\n'
'18.06.25 23.426,97 23.317,81 23.502,95 23.262,64\n'
'17.06.25 23.495,06 23.434,65 23.550,99 23.315,07\n'
..... (Jun-20-2025, 06:25 AM)MarkMan Wrote: Selenium is not really a webscraping tool it is more a web automation tool and to actually get the data out of the site it is not so useful?
Or is there anything especially useful one could do with this object it returns? It work fine for both web-scraping and web-automation,as you see over there may attributes that can be use in web-scraping
|