• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Tim Cooke
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
Saloon Keepers:
  • Piet Souris
Bartenders:

Writing Java code that can interact with a webpage

 
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This may not be a beginner level question, but I have a beginner's level of Java, so I'm putting it here, haha

I recently saw a Java program being used that logged onto an internet game and retrieved some information from the page. This intrigued me because in my Java classes I've never heard of code that can interact with web pages. So I decided I'd try and figure out how it works, or if I could write something that behaved similarly (log onto the page, navigate links, write text to a form).

However when I tried to find tutorials or resources to help me with this task I realized I didnt have the vocabulary or the understanding of the issue required to find the information I needed. I didn't know enough to find what I needed to learn more! And I've spent enough time reading java resources without even knowing this kind of thing was possible to know that reading more general Java guides wont help me get closer to my answer.

So my question is, where should I begin searching to find out how to write code that interacts in this way with javascript or php forms, or navigates through links. What do I call these sort of tasks so that I can search for guides more effectively? I have 3 semesters of java exp, so I know basic sytax, data types, algorithms, ect, but nothing about how java interacts with the internet. If you could try and give me an idea of what I need to learn and where I can find this info I'd really appreciate it!

If this question belongs somewhere else feel to move it.
 
Sheriff
Posts: 3065
12
Mac IntelliJ IDE Python VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You want to write a program that acts as a client to a web application. Usually, you write the application itself in Java and then the client is a person using a browser. However, there is a free package called Selenium, which is geared towards automatically testing web applications. It might work for you.
 
Author
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How to go about doing this depends entirely on the application you're interacting with.

Selenium *might* be appropriate for something that needs to screen-scrape. If there's an API that returns XML or JSON (or something else) then something like HttpClient or the standard Java APIs would be a better choice. (I'm not convinced that Selenium would be the best choice for scraping, either, but that's a different issue. Nor would I look to Java first.)
 
Grant Mitchell
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@GregCharles So the website I'm using is the web application and the program I'd hope to write is a client, thanks!

@DavidNewton I chose to use Java because its the language I know a little about, and its the language the program I saw being run used. If you think this task would be easier to complete with a different language (easier to a degree that makes up learning a new language XD) which language would you choose?
 
David Newton
Author
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'd go for any lower-ceremony language--there are too many options to bother listing, that both use the JVM and don't.

Java's fine, just not the first thing in my toolbox, particularly if I don't know where the application is heading.
 
Grant Mitchell
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
To be honest, I'm not sure what a low-ceremony language means.

Ok, so to be more specific for my first goal I'd like to create a program that takes my username and my password as arguments, navigates to www.kingdomofloathing.com, inputs the username and password into the correct slots, submits, and then opens my browser to that page.

When searching for tutorials to help me, how could I reword those tasks to really zone in on the information I need? And is this a task that is appropriate for java? I've found another individual who created a similar "bot" in perl, would learning a new language (perl) be easier than forcing java to this task?
 
Greenhorn
Posts: 28
Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's simple to read and write to URL's using java take a look here

http://java.sun.com/docs/books/tutorial/networking/urls/readingWriting.html

 
Grant Mitchell
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thats a really interesting link, however I have a few questions about how it works.

First, that example doesn't read the url in the sense of printing what would be displayed on the browser, it prints the actual html.

Second, I take it this is the section of code responsible for creating the connection to the website and writing to the form:



How could this code identify which form is to be written to? What does the line - connection.setDoOutput(true); do?

Normally I'd test and try to find these answers myself, but since I can't see what is happening in my browser, and my only output is the html, I cant tell what is happening as a result of my code.

In fact, I'm starting to think this type of example only works when you pass it the location online of a servelet (not that I know what that is) not an html file.
 
David Newton
Author
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You want to post to a web application; Apache's HttpClient (or similar) is one place to start if you want to do this in Java.

"Low-ceremony" means that the code you write looks more like the problem you're trying to solve, something Java isn't particularly good at. There's a lot of hoops to jump through, and if you're only marginally familiar with Java anyway, it's going to be a lot harder. If you're *also* new to interacting directly with web applications and don't know how to go about what you're trying to do, it's even harder. That's why I suggested a language that makes it easier to prototype, experiment, etc.
 
Grant Mitchell
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I read about HTTPclient and its uses, and it looks like something that could be very useful for me.

A few questions about beginning to use it, I hope they're not too dense:

The way I see it being used in the tutorial it seems as if its actually a java class, in that you instantiate a client instance in eclipse (thats what I use, whats the more general term for this kind of program?) Is this the correct way to view httpclient?

Second, I'm having a few troubles right off the bat with the download, embaressing I know. I save the files to my computer, unzip them, but I don't know where to place them or how to link to them from my current project to make eclipse recognize them. Can you help with this?


***EDIT***

Yikes... to reword what I need to know, when I go to http://hc.apache.org/downloads.cgi

- which file do I download?
- where in my project should the .jar files go
- what does it mean to "verify the integrity of the downloaded files using signatures downloaded from our main distribution directories"
- what does this business mean "The KEYS link links to the code signing keys used to sign the product. The PGP link downloads the OpenPGP compatible signature from our main site. The MD5 link downloads the checksum from the main site. "
- how do I set up dependencies... what are dependencies

Yikes again, sorry to question every little step, and thank you for all the help you've given so far.
 
Marshal
Posts: 82459
594
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Grant Mitchell wrote:This may not be a beginner level question . . .
If this question belongs somewhere else feel to move it.

You're right. Now, I just have to work out where to move it

And welcome to JavaRanch
 
Campbell Ritchie
Marshal
Posts: 82459
594
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Not sure I have moved it to the best place, however . . .
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Instead of coding all the HTTP and HTML handling yourself, use the jWebUnit library; it does all that already and provides a fairly high-level API of a web page.
 
Sheriff
Posts: 22907
132
Eclipse IDE Spring TypeScript Quarkus Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You may also check out this thread as it is about roughly the same subject.
 
Greg Charles
Sheriff
Posts: 3065
12
Mac IntelliJ IDE Python VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

David Newton wrote:
Selenium *might* be appropriate for something that needs to screen-scrape.



Well, I've read a white paper and seen a demo on Selenium, but haven't deployed it myself yet. Still, I think you may be thinking of something else. Selenium isn't likely to help you with screen scrapes. It can, however, record your interactions with a web application, then render the result as a script in the language of your choice (as long as your choice is Java, Python, Ruby, C#, or some others). You can then edit the programs as needed, so in theory you could build an automatic client. Unless it has changed a lot since the last time I used it, it would be very tedious to write the same kind of client in HttpUnit.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Greg Charles wrote:Unless it has changed a lot since the last time I used it, it would be very tedious to write the same kind of client in HttpUnit.


No, it hasn't changed much, but it's also pretty much a dead project now. HtmlUnit operates on a higher level than HttpUnit, though, and jWebUnit still more so. I'd say they're viable options for screen scraping.
 
David Newton
Author
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Greg Charles wrote:

David Newton wrote:
Selenium *might* be appropriate for something that needs to screen-scrape.



Well, I've read a white paper and seen a demo on Selenium, but haven't deployed it myself yet. Still, I think you may be thinking of something else. Selenium isn't likely to help you with screen scrapes. It can, however, record your interactions with a web application, then render the result as a script in the language of your choice (as long as your choice is Java, Python, Ruby, C#, or some others). You can then edit the programs as needed, so in theory you could build an automatic client. Unless it has changed a lot since the last time I used it, it would be very tedious to write the same kind of client in HttpUnit.


The Selenium *IDE* records interactions. Selenium itself drives a browser and allows granular access to the DOM.

No, I'm not thinking of something else--it *might* be appropriate for something that needs to screen-scrape--the implication being that it's probably not.
 
reply
    Bookmark Topic Watch Topic
  • New Topic