Python Forum
Web Scrapper to find provider information
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web Scrapper to find provider information
#1
I'm new to python coding and my goal is to write a web scrapping program to capture information, such as provider name, address and phone number.

From the website below:
https://www.cascadehealthalliance.com/pr...directory/

I inspected the website from Edge and saw all the HTML code and found where the address name is located in the following tag below:
<div class="sa-provider-item__address-facility" bis_skin_checked="1">Timber Kids Dentistry</div>

I written the code below and I got it to work to view HTML code when I ran it without the hidden_element and hidden_text variables, but I noticed that some of the HTML code showed up. The DIV tag with class name: "sa-provider-item__address-facility" did not show up. So I attempted to locate any hidden HTML tags but when I inputted the hidden_element and hidden_text variable I get an error about it can't locate the HTML element. I don't understand why the hidden HTML code not show up???

Does anyone have ideas on how I can retrieve the text from the HTML code, such as address-facility name???


from selenium import webdriver
from selenium.webdriver.edge.service import Service as EdgeService
from selenium.webdriver.edge.options import Options as EdgeOptions
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import csv
import os
from datetime import datetime
import logging
import subprocess

options = EdgeOptions()
service = EdgeService("C:\\My Programs\\MSEdgeDriver\\msedgedriver.exe", log_output=subprocess.DEVNULL)
driver = webdriver.Edge(service=service, options=options)
driver.get("https://www.cascadehealthalliance.com/for-members/find-a-provider/")
cookies = [
    {'name': 'ga', 'value': 'GA1.1.683160523.1754418491', 'domain': '.cascadehealthalliance.com'},
    {'name': '_ga_DWMGTNPWG3', 'value': 'GS2.1.s1755720214$o4$g0$t1755720214$j60$l0$h0', 'domain': '.cascadehealthalliance.com'}
]
for cookie in cookies:
    driver.add_cookie(cookie)

WebDriverWait(driver, 20).until(
    EC.element_to_be_clickable((By.CSS_SELECTOR, "a[href*='/provider-directory/']"))
).click()
driver.refresh()
#print(driver.page_source)

hidden_element = driver.find_element(By.CLASS_NAME, "sa-provider-item__address-facility")
hidden_text = driver.execute_script("return arguments[0].textContent;", hidden_element)

print(hidden_text)

driver.quit()
Error:
DevTools listening on ws://127.0.0.1:57220/devtools/browser/349feaa2-99ca-4250-b2e9-fba136849913 [14156:25952:0821/173217.990:ERROR:components\edge_auth\edge_auth_errors.cc:535] EDGE_IDENTITY: Get Default OS Account failed: Error: Primary Error: kImplicitSignInFailure, Secondary Error: kAccountProviderFetchError, Platform error: 0, Error string: [empty] [14156:25952:0821/173222.073:ERROR:components\device_event_log\device_event_log_impl.cc:242] [17:32:22.073] USB: usb_service_win.cc:105 SetupDiGetDeviceProperty({{A45C254E-DF1C-4EFD-8020-67D146A850E0}, 6}) failed: Element not found. (0x490) [14156:25952:0821/173222.074:ERROR:services\device\usb\usb_descriptors.cc:119] Failed to parse configuration descriptor.
Thank you
Reply
#2
From the website below:
Link Removed

They need to wait for elements to load (or be visible) before trying to access them. Using something like WebDriverWait + expected\_conditions for visibility helps.
Gribouillis write Sep-12-2025, 03:29 PM:
Spam link removed. Please read What to NOT include in a post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Telegram Users Scrapper - Exclude UserPrivacyRestricted graphite2015 0 3,836 Oct-23-2020, 05:43 AM
Last Post: graphite2015
  ADODB.RecordSet Provider=OraOLEDB.Oracle haleem02 0 2,304 Jan-16-2020, 03:02 AM
Last Post: haleem02

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020