Method of extracting data from a website? - java

I want to grab data from a website (for example, the names, identification number, and list of resources someone is using) and post it to another website.
What I was thinking of doing was using cURL to grab the information from an existing REST api on one website. Then, what I wanted to do is write a program or an api to post that information onto another website.
Upon using a cURL, how/where can I store that information so that I can use it via another program? Would it be easier to write one single program that extracts the information from the first website and posts it to the other? If so would it be possible to do so using Java/give an idea on how to do so? I'm not asking for code, just a method to do this. I'm using the Eclipse for Java Web EE developer's IDE.

I'd write it as 2-3 programs. One that extracts the data, one that formats the data (if necessary), one that posts the data.
My gut tells me the easiest way to do this is a pure bash script. But if you want to use Java for this you can.
I would save the output in a file for the post-er to read from. This has the benefit of letting you write/test the poster without the 2 other programs working. That said, I recommend you write the get-er program first. That way you know what data you're really dealing with.
Now, if you happen to write both the formatter and the post-er in java, I would write this as one program instead of "piping" files between them. The formatter will read in the file, turn it into a data structure/class, and the post-er will read this data structure/class.
This is only superficially different from my previous paragraph. The point is each "part" is independent from each other. This allows you to test a part without running the whole thing. That's the important thing.
As for how/where to store the information from the get-er, just redirect it to a file. Here's a tutorial on how.
Truth be told, I can't tell if you're using the linux cURL program or a java implementation like this one. My answer would be very different depending on this.

Related

Connecting to API

I plan to build a simple website that takes data from a websites API and puts it into charts that are listed on my website. Here is what I am trying to do:
Take data from the API listed on localbitcoins.com
https://localbitcoins.com/api-docs/
Code a program that parses this data as I see fit.
Create a graphical layout that displays the data.
Post the graphs on a website that I have created, ideally to update in real-time.
I don't know where to begin.
I am not asking for some one to hold my hand through it all, but more so to give me some pointers on where to start, what resources are there that I can look at, and so on.
My instinct tells that that I need to tackle API and coding part first. Can someone point me to a resource that could take me through this? Should I stick with Java or should I use another language for this?

How can i use Java to call an existing RPG screen program?

I have existing RPG4 programs with green screens, i would like to be able to call the rpg programs with Java and bypass the green screens.
I have done some research on this and IBM OAR (Open Access: RPG) keeps coming up. but i have not found a working example yet.
My goal is to create a web app to collect the the same information and feed it to the back end RPG
any help would be greatly appreciated
EDIT
Delete: You can't.
Insert: A beginner will need to master several complex new concepts before tackling this.
END-EDIT
At least, not without changing the RPG program. Web requests are processed by server jobs, which run in batch - they are not connected to a 5250 terminal. Because they aren't connected to a terminal, when the RPG program tries to open the display file, it will fall over because there's no terminal to attach to.
In order for this to work you'd have to alter the RPG program to not try display file I/O if called by a batch process like a Java app (although Java isn't necessary in this web scenario).
One way to change the RPG program is to use input parameters; if you have them, then don't try to open the display file, but stuff the input parameters into the fields where the display file would have done. Since a display file also outputs from the program you'd need to reserve some parameters for the output information as well. This could get very ugly if a subfile is involved, as there would be potentially thousands of parameters.
OAR comes into the picture because one can write an OAR handler that continues to use the same display file I/O operations, but to direct the actual I/O elsewhere, like STDIN and STDOUT for an HTTP type application. Jon Paris and Susan Gantner have written an article called Getting a Handle on RPG's Open Access which you might find helpful. It's in the July 2010 e-edition of IBM Systems Magazine.
Better perhaps is to extract the business logic in the RPG program, implement it as stored procedures which can be called by the web application via traditional ODBC / JDBC. One can write stored procedures in RPG, so that's not as hard as it might seem.
OAR is probably going to be your best bet....
However, every example I can think of that I've seen has resolved around building a handler to replace a printer file (PRTF) or physical file (PF).
Replacing a display file (DSPF) is a whole other ball game. Primarily because the 5250 protocol is an "intelligent" protocol; unlike dumb character type protocols such as used by ANSI/VT100.
It certainly can and had been done. If you have a single basic screen, you might be able to do it. But for a complex application with multiple screens and subfiles you'll probably have a tough time. Especially if you don't have a in depth understanding of the 5250 protocol.
I'd recommend you take a look at one of the vendor toolset designed to use OAR to replace a 5250 screen with a web page. Those vendors have put years of time and effort into developing the handler needed.
http://www.profoundlogic.com/solutions/rpg-application-modernization.html
https://asna.com/us/products/wings
You might find the following publication useful:
Modernizing IBM i Applications
Lastly note that ROA isn't the only option. There's an older technique, "screen scraping" in which your application basically emulates a 5250 terminal. It's simpler than a full ROA handler, but the end result is simpler also. IBM has it's own tool, HATS. And for instance Profound logic also has a tool, GENIE. But you could conceivably build your own screen scraper, the opensource TN5250J would probably be a place to start. But even this would be non-trivial.
You should use a mix of parsin json on the iseries(this eliminates the subfile problems), one good javascript framework( I've used Extjs) and The Apache server for I.
I've developed a HTTP services framework based on json parameters send directly from the browser using Ajax, processing each request with any ILE language program(mostly rpgle) and returning the result in pure json created inside the program. With this approach, you just send/receive business data, leaving the front-end to the Javascript framework.
Hope this helps. Contact me if you need more help.

Printing multiple file types using java

Hey i was wondering if there is a specific api for printing a bunch of common file types(pdf,doc,docx,txt, etc..). I am trying to develop program similar to HP's eprint. eprint annoys me because it prints an email as well as the file attached to it. I just want something i can send files directly to. I have found that java has a printing api, but that seems to focus on printing something from a gui window. any ideas much appreciated!
Thanks
Morpheous
It would be difficult for Java alone to be able to print a variety of different file types, and most of the time I've seen Java programs use other programs to do the printing by using Runtime.exec. If you're going to be trying to use this though, please be sure to read this extremely important article: When Runtime.exec() won't

What is the easiest way (framework/library/call) to prevent Cross Site Scripting using Google App Engine (GAE)?

I'd like to store then later display user-entered content securely with minimal effort (my goal is a web app not writing a bunch of security-related code).
EDIT: Google App Engine for Java
I'm working with the same issue myself; but I haven't had the chance to get it out into the real world yet; so please just keep in mind that MY ANSWER IS NOT BATTLE TESTED. USE AT YOUR OWN RISK.
First, you need to ask yourself if you're going to be allowing the user to use ANY html markup. So, for example, can the user enter a link? What about make bold text?
If the answer is NO, then it is fairly simple. Here is the idea of how to set the filter up:
http://greatwebguy.com/programming/java/simple-cross-site-scripting-xss-servlet-filter/
But personally, I don't like the filter being used in the first example; I just put it there to show you how to set the filter up.
I would recommend using this filter:
http://xss-html-filter.sourceforge.net/
So basically:
Setup the example from first link, get it working
Download the example from the second link, put it in your project in such a way you can access it from your code.
Rewrite the cleanXSS method to use what you downloaded from the second link. So probably something like:
private String cleanXSS(String value) {
return new HTMLInputFilter().filter( input );
}
If you do want to allow HTML (such as an anchor tag/etc) then it looks like the HTMLInputFilter has mechanisms to allow this; but it isn't documented so you'll have to figure it out by looking at the code yourself or provide your own way of filtering.
user-entered content securely with minimal effort (my goal is a web app not writing a bunch of security-related code).
How much security-related code you need to write depends on how much you are at risk (how likely is it someone would want to attack your site, which it self is related to how popular your site is).
For example if your writing a public notepad, which will have a total of 3 users, you can get away with the bare minimum, if however your writing a we hate China, Iran and all hackers/crackers app dealing with $1,000,000 worth of transactions an hour and 3 billion users, you may be a bit more of a target.
Simply put you shouldn't trust any data that comes from outside your app including from the datastore. All this data should be checked that it's what you expect.
I've not validated incoming Java Strings against XSS however removing HTML is normally good enough, and Jsoup looks interesting for this (See Remove HTML tags from a String )
Also to be sure you should ensure your outputting what you expect to be outputting and not the some JavaScript.
Most templating engines, including django's (which is bundled with App Engine), provide facilities to escape output to make it safe to print in HTML. In newer versions of Django, this is done automatically unless you tell it not to; in 0.9.6 (still the default in webapp), you pass your output values to |escape in the template.
Escaping on output is universally the best way to do this, because it means you have the original unmodified text; if you modify your escaping or output formatting later, you can still format text entered before that.
You can also use a service that will proxy all connections and block any XSS attempts. I know only one service like that - CloudFlare (but it doesn't mean there aren't others like that). Unfortunately security features goes in with Pro plan which is paid :(

Sending HTML Form Data to Java

I have a Java program that I'm trying to interact with over the web.
I need to gather form data from the user on a Drupal site that I don't have any control over, send it to the Java program, and send the output back to the user. The Java program needs to load a lot of libraries every time it's run, so it needs to be up waiting for data from the user.
It'd be best for me to just have an HTML form for the input. What's the simplest way to deal with HTML form data using Java?
Also, I'm trying to call the Java program from a shell script. I want the program running in the background though so the libraries are loaded in advance. So ideally, I could use the server I set up for both applications.
Thanks for any help.
It sounds like you really just want to write a servlet (or use a higher level web framework, but a servlet would work fine). That makes it very easy to get web form data - you just ask for values by name, basically.
You could then "script" the application using curl, wget or something similar to make requests to the servlet.
Apologies if this doesn't answer your question - I'm finding it slightly tricky to understand exactly what you're trying to do, particularly as there are multiple layers of web UI involved, as far as I can see.
The easiest way to make POST requests with java is to use the Apache HttpClient or the more recent HttpComponents libraries.

Categories

Resources