An Optimal way to send huge text data across the network

An Optimal way to send huge text data across the network - java

I have built a GUI which works through the browser. It has drag-n-drop facilities and it prepares an XML when all the attributes have to initialized to it. I can relate the system similar to this online drawing tool available : https://www.draw.io
Its working perfectly as of now, but there is a problem while saving the XML. As of now, I am dynamically creating a form through Javascript and sending the information in the form of an HTTPRequest. It works well when the XML is small (around 5-6 Mb in size).
Off late, the XML size has increased several folds (around 80-100 Mb) and the GUI hangs frequently due to this. It takes 8-10 minutes to send the data across the network and sometimes the browser shuts down automatically after being unresponsive for long duration, thereby thwarting the efforts that the user has put in to create the XML in GUI.
I am using Java in the controller's end. When the user saves the XML, it is passed through the dynamically created HTML form to the backend which is accessed by a Java Controller. I require a better mechanism since sending the entire 80 Mb XML data as an HTTPRequest is becoming extremely cumbersome. Any suggestions to improve it will be welcome.
Thanks a lot..

Related

Executing long-running task from Tapestry using already existing EJB service

I have the following situation.
I have my web site written in Tapestry. On one of the pages, I'm required to generate a fairly large Excel or PDF document (around 20 MB). Now, since the entire process takes time, I'm asking my user to wait a bit.
However, when I tried to test the upper limits, I noticed that my entire application (not just the web part) freezes, because the generation eats up all the resources and the rest of the website and application becomes unresponsive.
Here is the flow how I've been doing it so far.
User clicks a button to request the file on the page
Data is being fetched from database (this part is quite fast)
Data is passed to a Tapestry service which prepares it (nothing fancy, also fast)
Prepared data is sent to EJB service which creates and deploys a visitor which creates the excel/pdf file
The InputStream of the created file is passed all the way up to Tapestry which wraps it in StreamResponse and offers the download
What would be an appropriate way to handle this problem?
Could I perhaps use Tapestry's ParallelExecutor from some Tapestry service of mine in a manner like this?
Future<InputStream> future = executor.invoke(new Invokable<InputStream>() { ... });
My main objective is that the application and website keep running, that they do not freeze.
Thank you in advance.

Take a look at the progresslink demo from tapestry stitch. It might give you some inspiration to poll for a long running / asynchronous task in tapestry.

Java Temporary file for Memory management

I have a huge data that has to be displayed in a page retrieved from database table. The table is regularly updated with new data and page has to display the latest data whenever refreshed. Since, the page is loaded with heavy loads of data it takes time to load (2-3minutes) and sometime it suffers from only displaying white page.
So, what I need to do is create a temporary file in server which stores this data from database and the page will extract info from this file. This file should be reloaded with new data in each 10-15 minutes so that till 15 mins page displays the old data but after 15 minutes if page is refreshed it shows the new data.
Hope my problem will be solved with the above idea but I am only limited to above theory. Please provide me solution how it can be done and any other better idea than the above. if any java code sample that would be great to solve my problem.
Thanks

Now you are propably using a lightweight webserver (guessing from the fact you are talking about pages loading white)
But if were using some java ee application server like glassfish, you could have a scheduled task running every 15minutes that loads the data and stores it into a file.
http://docs.oracle.com/javaee/6/tutorial/doc/bnboy.html
Maybe Quartz is the answer for you:
Scheduled task in a web application?
Or maybe you are not using a webserver/appserver at all and my answer completly invalid.

Rendering HTML : Serverside Vs Clientside

I have a complex invoice page with orderitems etc details . I will be working with these kind of forms all time . When page is loaded I am using javacode(for loops etc) and writing html in that way .
Another thing I can do is return a json object and use javascript to build that form . What is better practice ?
FYI I am only using Java , jsps , JQuery , json .

These are two absolutely different approaches.
When you building your page with java jsp technology it is a server technology. That means that you can control data on server which will be returned to clients. Representation of this data it's another question.
JavaScript is client technology, more flexible, less boilerplate, faster, but page generation depends on client javascript engine. If it disabled, you can get some problems.
If you not tied to any tech-requirements, I recommend to choose javascript, because you have the case to get data from server.
In case send data to the server server technology is preferable.

There are many "options" for doing this. For web forms I prefer the route of javascript with a json object. js is very flexible and is very easy to implement.

I would suggest json, jquery plugins... More manageable & cleaner codes

Client side rendering is generally considered slower (in terms of page load times), as was found recently when Twitter switched back to server-side rendering from client-side and found that their pages were loading in 1/5th the time.
It's not going to make much difference when you're running a site on a core i7 computer with 16GB of RAM; where it does matter is when you running a heavily client-side site on a mobile device or older computer. More javascript = higher RAM usage and more work for the browser to do (download, parsing, running).
Don't get me wrong, I'm a massive big fan of client-side rendering as it lets me use the awesome AngularJS. For me it about finding a nice balance between the two: I want a fast loading page with a rich UI experience.
So for example, if I was creating a new listings page; I'd could have the first 10 or so results, loaded with the page, from the server. Then I'd have something like a "Load more news articles" button at the bottom or the page which loads the next 10 news item asynchronously via some javascript call (e.g. jQuery $.ajax(), AngularJS $http etc) to a JSON service.
As opposed to:
Client-side only: load the initial 10 news items via javascript & JSON, as your page is loading. Clicking the "Load more news articles" button, uses the same service to then load additional news items on request.
Server-side only: load the initial 10 news items with the page, from the server. Clicking the "Load more news articles" button triggers a page-refresh and the next 10 news articles are returned from the server.

Can you programmatically connect to a sequence of web-pages and parse the source HTML without imposing stress on the system or raising red flags?

I am working on a project in NLP requiring me to download quite a few video game reviews --- about 10,000 per website. So, I am going to write a program that goes to each URL and pulls out the review part of each page as well as some additional metadata.
I'm using Java and was planning on just opening an HttpURLConnection and reading the text through an input stream. Then, closing the connection and opening the next one.
My questions are this:
1) Let's assume this is a site with medium-to-small amounts of traffic: normally, they receive about 1000 requests per second from normal users. Is it possible that my program would cause undue stress to their system, impacting the user experience for others?
2) Could these connections made one right after another appear as some kind of malicious attack?
Am I being paranoid, or is this an issue? Is there a better way to go about getting this data? I am going to several websites so working individually with site administrators is inconvenient and probably impossible.

If you mimic a web browser, and extract text at human speeds (that is, it normally takes a human several seconds to "click thru" to the next page even if they aren't reading the text), then the server can't really tell what the client is.
In other words, just throttle your slurping to 1 page per few seconds, and no problems.
The other concern you ought to have is legality. I assume these reviews are material that you didn't write, and have no permission to create derivative works from. If you are just slurping them for personal use, then its ok. If you are slurping them to create something (a derivative work), then you are breaking copyright.

I believe you are misunderstanding how HTTP requests work. You ask for a page and you get it... the fact that you're reading a stream one line at a time has no bearing on the HTTP request and the site is perfectly happy to give you your 1 page at a time. It won't look malicious (cause it's just 1 users reading pages... totally normal behavior). You're 100% ok to proceed with your plan (if it is as you described it).

How to log impressions (and data) in Java for a javascript widget?

I have a javascript widget the loads JSON data from a Java webapp. I want to record impressions, and the ids of the data I return (5 or 10 longs).
I want to be able to process these and give a summary of how many impressions a person got on their widget, and record how many times each piece of data was seen.
I'm guessing it's not a great idea to store it all in my postgres database since it could probably be a lot of data (perhaps tens of impressions per second).
Right now I'm considering just writing it all to a log file and processing it every hour.
Is there a better way? Maybe with some kind of message system, or event system?

Writing to log and processing off-line should be ok.
You can program your logging system to create hourly log files, then process files that are not written-to any more. You can do background processing in separate thread/process at low priority.

I'm not sure what you are after, but maybe it's a good solution to do call to an web-analytics (.e.g. Google Analytics) tool from the javascript part of your widget?
You won't have access to your raw data, but it will scale wonderful and you'll have nice graphs.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.