Load Array from the Internet - java

I wish to load data located on a website at http://www.example.com/file.extension. The file will most likely be .txt, but if I could save the data as an array, maybe .csv, and load the data as array as that is what it will be used as on the application side. JSON had popped into my head, but I wouldn't know how to use that website-side. How would you load this file from the internet?

The simplest way is probably URLConnection. There's a nice Oracle example of how you can load the response from a remote URL into a string. Then you can parse the string in whatever way seems easiest.
URLConnection belongs to the java.net package, which appears to be the same in Android as in the standard Java API, so it's pretty safe to use the Oracle documentation. However, to guarantee consistency with Android, you might also want to look at the Android documentation, which also provides a nice example.

Related

Is there any way I can try to read files with suffix like '.sas7bdat', '.dta', '.sav' and '.xpt' by using JAVA?

I am working on a module, which can import different kinds of files(currently those file types are SAS output format). Of course, this module is written by java, springboot. The main logic is that when user upload a file, the controller will pass the file inputstream to this module, and call the corresponding method by the file's suffix.
I have done some investigation and found that R-language and python could do this. Can anybody tell me whether there is another way, I mean, can do this job only by JAVA. Thanks.
The sas7bdat is a binary, proprietary format. Its layout has never been made public. Some people have figured it out (like the parso mentioned in the comment). If you really want to go down this path, use the python library as a guide. It will help with sas7bdat and sas7bcat. For the old SAS transport format, see my library on GitHub which is in C#. The new transport format appears to have some incorrect information in its description on the SAS website which I couldn't get around. I have put my code on GitHub in case you want to try.
All that said, let me offer a few words of wisdom: don't do it. Use the python library vs doing it in Java. Also, these libraries read the format vs having read/write. Keep that in mind (it sounds like it doesn't matter but FYI).

Make a downloadable file available in an API that returns JSON

I am developing an API in Java. It is basically a java servlet that returns content in json (application/json). Using a Tomcat server. One of the field in the response is supposed to be a link to a downloadable .txt file.
I wonder what is the best way to deliver this file:
Generating this file on every request seems to me killer, even having some cron to clean directories with files
Any way to give a temporary link only while that request for a period without saving to the file system?
Thank you.
If you say writing to the file system would kill your application, then I deduce from that that your IO performance is too weak for that, right? I mean, if you even would not have the storage capacity for that, then your infrastructure is not suitable for your application at all. I can only see four other ways for solving that problem (but maybe there are more, my list is not exclusive):
Store the text file in a database. The database should also store timeout information. Good if there are more than 1 application servers and there is a load balancer in front of them (but all application servers share the same database).
Store the text file in RAM, maybe using a cache library which does the cleanup tasks automatically for you - but be aware that a cache library will usually not guarantee a minimum storage time for each file.
Do not store the text file at all, but create it just when it is requested (no idea if that is possible in your application).
Do not provide a link to the text file, but directly include its content in the json answer (of course it would then be escaped as a JSon String), which means your server can directly forget about it when the answer has been sent, but the client _must_ download it without checking if it needs the file or not.

GTFS realtime feed example script

I was able to successfully parse through BART's GTFS realtime Service Alerts and TripsUpdate feed. I also looked at the official protocol buffer for Java tutorial page and was able to compile and run the tutorial.
https://developers.google.com/protocol-buffers/docs/javatutorial
The next part for me is figuring out how to create a Realtime feed maybe preferably Service Alerts first for my GTFS static data. From what I understand a GTFS realtime feed is like sending protocol buffer data that is serialized to a webpage and then writing a script that takes in the web link that refers to the page and deserialized the data using HTTP GET. I was thinking of using Visual Studio and ASP.NET core to do this. Is there an example project I can refer too and/or am I even on the right track in the first place?
Take a look at the awesome-transit list of gtfs-realtime stuff. A lot of libraries within the OneBusAway project are probably your best bet for seeing code in action that deals with gtfs-realtime. For example, maybe you can look at onebusaway-gtfs-realtime-exporter.

How to get data from XML feed URL in Java

There is a betting exchange website which offer their data in XML from the following link:
http://odds.smarkets.com/oddsfeed.xml
I would like to access this link to retrieve the latest data (in java). Previously I have had to download the (very large) file and add it to my project and get the data from there. What is the best way to achieve this without having to download the file every time I want to access the data?
I plan on storing the returned data into a database.
Thanks
Well this seems to be very tricky question .I would suggest you to create a simple web service application[Client/server architecture] to get the contents from this url. You can use REST to call this url. But what contents you need to read depends on the functionality that you want to achieve.You need to write your custom logic to read the data.Here in you will be acting as client and the url would be your service.
You can refer following link
https://community.atlassian.com/t5/Confluence-questions/Access-page-content-via-URL/qaq-p/163060

How to store a copy of complete web page at server side as soon as it is rendered on client browser?

Requirement is to keep a copy of complete web page at server side same as it is rendered on client browser as past records.These records are revisited.
We are trying to store the html of rendered web page. The html is then rendered using resources like javascript, css and image present at server side. These resources keep on changing. Therefore old records are no longer rendered perfectly.
Is there any other way to solve above? We are also thinking converting it into pdf using IText or apache FOP api but they does not consider javascript effect on page while conversion. Is there any APIs available in java to achieve this?
Till now, no approach working perfectly. Please suggest.
Edit:
In summary,requirement is to create a exact copy of rendered web page at server side to store user activities on that page.
wkhtmltopdf should do this quite nicely for you. It will take a URL, and return a pdf.
code.google.com/p/wkhtmltopdf
Example:
wkhtmltopdf http://www.google.com google.pdf
Depending on just how sophisticated your javascript is, and depending on how faithfully you want to capture what the client saw, you may be undertaking an impossible task.
At a high level, you have the following options:
Keep a copy of everything you send to the client
Get the client to return back exactly whatever it has rendered
Build your system in such a way that you can actually fetch all historical versions of the constituent resources if/when you need to reproduce a browser's view.
You can do #1 using JSP filters etc, but it doesn't address issues like the javascript fetching dynamic html content during rendering on the client.
Getting the client to return what they are seeing (#2) is tricky, and bandwidth intensive.
So I would opt for #3. In order to turn a website that renders dynamic content versioned, you have to do several things. First, all datasources need to versioned too. So any queries would need to specify the version. "Version" can be a timestamp or some generation counter that you maintain. If you are taking this approach, you would also need to ensure that any javascript you feed to the client does not fetch external resources directly. Rather, it should ask for any resources from your system. Your system would in turn fetch the external content (or reuse from a cache).
The answer would depend on the server technology being used to write the HTML. Are you using Java/JSPs or Servlets or some sort of an HTTPResponse object to push the HTML/data to the browser?
If only the CSS/JS/HTML are changing, why don't you just take snapshots of your client-side codebase and store them as website versions?
If other data is involved (like XML/JSON) take a snapshot of those and version that as well. Then the snapshot of the client codebase as mentioned above with the contemporary snapshot of the data should together give you the exact rendering of your website as at that point of time.
A very resource-consuming requirement but...
You haven't written what application server you are using and what framework. If you're generating responces in your own code, you can just store it while generating.
Another possibility is to write a filter, that would wrap servlet's OutputStream and log everything that was written to it, you must just assure your filter is on the top of the hierarchy.
Another, very powerfull, easiest to manage and generic solution, however possibly the most resource-consuming: write transparent proxy server staying between user and application server, that would redirect each call to app server and return exact response, additionally saving each request and response.
If you're storing the html page, why not the references to the js, css, and images too?
I don't know what your implementation is now, but you should create a filesystem with all of the html pages and resources, and create references to the locations in a db. You should be backing up the resources in the filesystem every time you change them!
I use this implementation for an image archive. When a client passes us the url of an image we want to be able to go back and check out exactly what the image was at that time they sent it (since it's a url it can change at any time). I have a script that will download the image as soon as we receive the url, store it in the filesystem, and then store the path to the file in the db along with other various details. This is similar to what you need, just a couple more rows in your table for the js, css, images paths.

Categories

Resources