How do select this table for webscraping?

MarkMan · Jun-13-2025, 02:51 PM

Hello to this great forum:
I am trying to learn how to scrape data from websites with python.
From this website:
Deutsche Börse

I tried to scrape the data from this table:

https://ibb.co/GYTFJdf

I am fidgeting around with AI Code trying to get the table. The script runs fine even tho i am on an old machine with (win7 Python 3.8.12 and Selenium 3.141) till the point where the script should actually get the data from the table. I tried things like:

for _ in range(10):
    try:
        # Mit CSS-Selector nach Klasse:
        table = driver.find_element(By.CSS_SELECTOR, "app-price-history _nghost-boerse-frankfurt-c352035757")
    except:
        time.sleep(2)
else:print("table not found!")
driver.quit()
exit()
rows = table.find_elements(By.TAG_NAME, "tr")[1:] 

csv_data = []
.....

But honestly i have no clue how to identifiy the table right - the script always gives the "table not found" answer.
Since i am not a web-developer neither i am a bit over my head here - i thougt there would be a table-id to easily identify this, but it seems you have build something with the class or so.

the next thing i would need is the ID of the "next page" button.

The data are prices of the Germand stock market INdex "Dax". And for every date there should be the 4 price like this:
25.12.2022, 25000, 28000, 24000, 28000

Any help appreciated! have a good one!

**Larz60+** · Jun-14-2025, 09:21 AM

The final page you show is an image, not text.
Is there somewhere else on the main site where you can find this data?

MarkMan · (This post was last modified: Jun-14-2025, 12:05 PM by MarkMan.)

(Jun-14-2025, 09:21 AM)Larz60+ Wrote: The final page you show is an image, not text.
Is there somewhere else on the main site where you can find this data?

If you click on "Deutsche Börse" the first link it should bring you directly to the page, just scroll down a bit.

Is the link not working? I can click it...

PS the picture is simply there to show you this is the table that i am interested in from the site.

**Larz60+** · (This post was last modified: Jun-14-2025, 10:29 PM by Larz60+.)

Let me try that again ... back tomorrow, hopefully with a solution.
This site uses heavy java script, so will have to use selenium to parse.

***snippsat*** · Jun-15-2025, 01:52 PM

Like this,and into Pandas to make table usable easier.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import pandas as pd
from io import StringIO
import time

# Driver
#https://storage.googleapis.com/chrome-for-testing-public/137.0.7146.0/win64/chromedriver-win64.zip
options = Options()
#options.add_argument("--headless=new") # Uncomment to not load site
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
# Parse
url = "https://www.boerse-frankfurt.de/index/dax/kurshistorie/historische-kurse-und-umsaetze"
browser.get(url)
table_parse = browser.find_element(By.CSS_SELECTOR, 'div.table-responsive > table')
#print(table_parse)

# Into Pandas
html = browser.page_source
table_pan = pd.read_html(StringIO(html))[0]
print(table_pan)

Output:      Datum  Eröffnung    Schluss  Tageshoch  Tagestief  Umsatz
0   13.06.25  23.448,19  23.516,23  23.557,71  23.360,16     NaN
1   12.06.25  23.768,05  23.771,45  23.885,06  23.618,85     NaN
2   11.06.25  23.996,66  23.948,90  24.151,39  23.948,57     NaN
3   10.06.25  24.156,53  23.987,56  24.168,59  23.964,77     NaN
4   09.06.25  24.252,26  24.174,32  24.289,51  24.097,09     NaN
..       ...        ...        ...        ...        ...     ...
95  28.01.25  21.374,29  21.430,58  21.475,90  21.296,33     NaN
96  27.01.25  21.201,99  21.282,18  21.344,98  21.081,61     NaN
97  24.01.25  21.463,15  21.394,93  21.520,50  21.353,01     NaN
98  23.01.25  21.277,58  21.411,53  21.423,02  21.254,08     NaN
99  22.01.25  21.169,61  21.254,27  21.330,87  21.162,31     NaN

[100 rows x 6 columns]

MarkMan · Jun-18-2025, 02:14 PM

Thanks for this script snippsat,

unfortunately i am on an older machine for at least a while and i would need something that runs on Python 3.8.12 + Selenium 3.141.0 (or bs4 4.7.1)

i tried to get this to work with an ai but then i ran in other troubles.. The one thing the ai did get right about the code was:
"The main change is in the way the Chrome driver is initialized. Instead of using the Service class, we're directly passing the executable_path argument to the webdriver.Chrome() constructor.
The Service class was introduced in Selenium 4.0, which is not compatible with Selenium 3.141.0 that you're using.
The executable_path argument specifies the path to the Chrome driver executable, which is set to r"C:\cmder\bin\chromedriver.exe" in this example."

What i don't understand about your code is this i copied your CSS selector: "'div.table-responsive > table'"

into my script - but it didn't work but it works in yours?

I am having trouble with everything but that the css selector is not working, i mean i need the selector for the next butoon too Huh

Thanks so far..

If somebody else has an idea. Like you Larz60+ i 'd like to hear it...

MarkMan · Jun-18-2025, 02:28 PM

Ok i found out how to get yoour code running - till the CSS selector - on my machine, just needed to swith 2 lines the code is down below

It then runs into the same problem that the css selector is not working:

Quote:Traceback (most recent call last):
File "......\webscraping_python-forum.io.py", line 18, in <module>
table_parse = browser.find_element(By.CSS_SELECTOR, 'div.table-responsive > table')
File "d:\User\xxx\AppData\Local\python\mu\mu_venv-38-20250502-125844\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 976, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "d:\user\xxx\AppData\Local\python\mu\mu_venv-38-20250502-125844\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "d:\user\xxx\AppData\Local\python\mu\mu_venv-38-20250502-125844\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.table-responsive > table"}
(Session info: chrome=109.0.5414.120)

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import pandas as pd
from io import StringIO
import time
 
# Driver
#https://storage.googleapis.com/chrome-for-testing-public/137.0.7146.0/win64/chromedriver-win64.zip
options = Options()
#options.add_argument("--headless=new") # Uncomment to not load site
browser = webdriver.Chrome(executable_path=r"C:\WebDriver_Python\ChromeDriver\chromedriver.exe", options=options)
#browser = webdriver.Chrome(service=ser, options=options)
# Parse
url = "https://www.boerse-frankfurt.de/index/dax/kurshistorie/historische-kurse-und-umsaetze"
browser.get(url)
table_parse = browser.find_element(By.CSS_SELECTOR, 'div.table-responsive > table')
#print(table_parse)
 
# Into Pandas
html = browser.page_source
table_pan = pd.read_html(StringIO(html))[0]
print(table_pan)

***snippsat*** · Jun-18-2025, 03:52 PM

(Jun-18-2025, 02:28 PM)MarkMan Wrote: Ok i found out how to get yoour code running - till the CSS selector - on my machine, just needed to swith 2 lines the code is down below

It then runs into the same problem that the css selector is not working:

The CSS still work for me.
You can in browser code copy the selector,right click--> select the element that has the whole table --> right click --> copy --> Copy selector.
If i do this:

Output:
body > app-root > app-wrapper > div > div > div.content-wrapper > app-index > div.row.ng-star-inserted > div > app-price-history > div.widget.app-loading-spinner-parent.d-block.ar-mt.ar-p > div > div > div.table-responsive > table

So i have just shorten it to last part div.table-responsive > table,or just table also work.
Can let page load for while before it find CSS selector,something it load to fast then it can not find CSS selector.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import pandas as pd
from io import StringIO
import time

# Driver
#https://storage.googleapis.com/chrome-for-testing-public/137.0.7146.0/win64/chromedriver-win64.zip
options = Options()
#options.add_argument("--headless=new") # Uncomment to not load site
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
# Parse
url = "https://www.boerse-frankfurt.de/index/dax/kurshistorie/historische-kurse-und-umsaetze"
browser.get(url)
time.sleep(3)
table_parse = browser.find_element(By.CSS_SELECTOR, 'table')
#print(table_parse)

# Into Pandas
html = browser.page_source
table_pan = pd.read_html(StringIO(html))[0]
print(table_pan)

Output:      Datum  Eröffnung    Schluss  Tageshoch  Tagestief  Umsatz
0   17.06.25  23.495,06  23.434,65  23.550,99  23.315,07     NaN
1   16.06.25  23.589,47  23.699,12  23.711,73  23.505,54     NaN
2   13.06.25  23.448,19  23.516,23  23.557,71  23.360,16     NaN
3   12.06.25  23.768,05  23.771,45  23.885,06  23.618,85     NaN
4   11.06.25  23.996,66  23.948,90  24.151,39  23.948,57     NaN
..       ...        ...        ...        ...        ...     ...
95  30.01.25  21.676,22  21.727,20  21.732,05  21.650,78     NaN
96  29.01.25  21.511,47  21.637,53  21.671,59  21.475,64     NaN
97  28.01.25  21.374,29  21.430,58  21.475,90  21.296,33     NaN
98  27.01.25  21.201,99  21.282,18  21.344,98  21.081,61     NaN
99  24.01.25  21.463,15  21.394,93  21.520,50  21.353,01     NaN

[100 rows x 6 columns]

MarkMan · (This post was last modified: Jun-20-2025, 06:25 AM by MarkMan.)

@snippsat: thank you for this great answer!! So many intelligent learning things you built in like uncomment here to start headless and see the selenium objektc etc. You saved me with this. Yesterday when nothing worked i wanted to quit learning python and then i saw your answer and everything worked and i could pass the impass....

Of course every newbie would have a bazillion questions on this, the more important stuff going forwars i guess would be:

1. We use pandas here to parse the source code of the site. What i want to do in the end is to scrape the entire dataset which spans across 97 paginations (if you max out the preset dates) and save it to a csv file.

What would be a good way to store the read data i don't want to play with it in pandas just want to "glue together" the data form 97 pages and write to csv (maybe delete last column). It should be safe, so there should not be any data lost. And it probably would be a good idea to do some housekeeping like deleting the source code of the pages after the data is scraped?
I am a bit confused how to go about this, do i need to get the data out of pandas into a list or would it be better to do this in pandas? I don't know what the right way to go about this is. Any input appreciated.

2. Question is more a understanding question about this line:

table_parse = browser.find_element(By.CSS_SELECTOR, 'div.table-responsive > table')

I originally thought this would be the most important line of code, but now it does not seem very useful it just returns some kind of object:

Output:
selenium.webdriver.remote.webelement.WebElement

and the code continues without even using it. In my understanding in the end you use the html parser of pandas to get the first table of the site. So long story short:
Selenium is not really a webscraping tool it is more a web automation tool and to actually get the data out of the site it is not so useful? Or is there anything especially useful one could do with this object it returns?

have a good one!

***snippsat*** · Jun-20-2025, 03:05 PM

(Jun-20-2025, 06:25 AM)MarkMan Wrote: What would be a good way to store the read data i don't want to play with it in pandas just want to "glue together" the data form 97 pages and write to csv (maybe delete last column).

When collect can just eg save all to a list,the save at .csv or bring into Pandas make it easier it need to do something with data.

(Jun-20-2025, 06:25 AM)MarkMan Wrote: I originally thought this would be the most important line of code, but now it does not seem very useful it just returns some kind of object:

The object has attributes,can use dir() to see them.
So if use table_parse.text get then table out.

>>> table_parse
<selenium.webdriver.remote.webelement.WebElement (session="d155102e8068b963d680570cac5677d6", element="f.D7E492F404C9B2A1DA7D9BD7576BD63A.d.934C06167C380C3270BF4181EB4E50B7.e.47")>
>>> dir(table_parse)
['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__firstlineno__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__static_attributes__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_impl',
 '_execute',
 '_id',
 '_parent',
 '_upload',
 'accessible_name',
 'aria_role',
 'clear',
 'click',
 'find_element',
 'find_elements',
 'get_attribute',
 'get_dom_attribute',
 'get_property',
 'id',
 'is_displayed',
 'is_enabled',
 'is_selected',
 'location',
 'location_once_scrolled_into_view',
 'parent',
 'rect',
 'screenshot',
 'screenshot_as_base64',
 'screenshot_as_png',
 'send_keys',
 'session_id',
 'shadow_root',
 'size',
 'submit',
 'tag_name',
 'text',
 'value_of_css_property']

>>> table_parse.text
('Datum Eröffnung Schluss Tageshoch Tagestief Umsatz\n'
 '19.06.25 23.178,60 23.057,38 23.254,86 23.051,55\n'
 '18.06.25 23.426,97 23.317,81 23.502,95 23.262,64\n'
 '17.06.25 23.495,06 23.434,65 23.550,99 23.315,07\n'
.....

(Jun-20-2025, 06:25 AM)MarkMan Wrote: Selenium is not really a webscraping tool it is more a web automation tool and to actually get the data out of the site it is not so useful?
Or is there anything especially useful one could do with this object it returns?

It work fine for both web-scraping and web-automation,as you see over there may attributes that can be use in web-scraping

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Webscraping: Attendance Local Community Council	ThatsMe	1	1,323	Jun-17-2025, 02:20 AM Last Post: Larz60+
	Intro to WebScraping	d1rjr03	3	8,497	Dec-16-2024, 02:50 AM Last Post: bobprogrammer
	Webscraping - loop on first page	RikP	0	1,288	Jul-22-2024, 12:15 PM Last Post: RikP
	Webscraping news articles by using selenium	cate16	7	7,710	Aug-28-2023, 09:58 AM Last Post: snippsat
	Webscraping with beautifulsoup	cormanstan	3	13,316	Aug-24-2023, 11:57 AM Last Post: snippsat
	Webscraping returning empty table	Buuuwq	0	3,186	Dec-09-2022, 10:41 AM Last Post: Buuuwq
	WebScraping using Selenium library	Korgik	0	1,977	Dec-09-2022, 09:51 AM Last Post: Korgik
	How to get rid of numerical tokens in output (webscraping issue)?	jps2020	0	2,986	Oct-26-2020, 05:37 PM Last Post: jps2020
	Python Webscraping with a Login Website	warriordazza	0	3,842	Jun-07-2020, 07:04 AM Last Post: warriordazza
	Help with basic webscraping	Captain_Snuggle	2	6,148	Nov-07-2019, 08:07 PM Last Post: kozaizsvemira

How do select this table for webscraping?

User Panel Messages

Announcements