Massive Multiple File Upload with Spring [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am developing a web application based on Spring 4 which aims to act as some kind of gallery for hosting images.
Therefore, I have to implement a Java client application that takes care of uploading the images to the web application because of a huge number of images (can be up to 100000 images) that have to be uploaded by the client.
I have currently implemented a REST endpoint web application which is able to receive one image after the other and save it on the server side. For each image to upload, a client application makes a POST request to the REST endpoint containing the image. Considering the fact that these images should be available on the server web application as soon as possible, I guess that this is not the optimal solution for this job. I think that this solution doesn't even use the full bandwidth which would be available.
Now I am asking how to implement this feature in a reasonable and efficient way that makes use of the full available bandwidth (even possible without REST)?

There are various ways this can be done but just looking at the RESTful server way which you have started then my suggestion is to upload the files in parallel.
Would it be possible for your client to have multiple worker threads. The worker threads read off a shared blocking queue. You decide how many worker threads to create. The main thread will figure out which files need to get uploaded. The main thread will then enter into the queue the file location (either relative or full path or URL) into the queue. The worker-threads will each grab a file upload request from the queue and will make the POST request to upload the file.
This would allow you to make better use of the bandwidth. You can then add smarts to decide how many worker threads to use. Maybe by polling the REST server to ask it how many worker threads it should use. The server could have an AtomicInteger counter that has number of current uploads. It can return MAX_UPLOADS - currentUploads.get().

Related

How to build blog application using microservice architecture? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I already have a blog application which is built on Spring-Boot as a monolith.
There are 2 entities.
user
post
And the mapping is one to many.
A single user can create multiple blog posts.
How can i recreate the same functionality as separate microservice applications.
So far on researching through internet, what i see people saying is create database per service etc.
Suppose if i create 2 services say
UserService(which has separate DB and CRUD operations associated with it)
PostService(which has separate DB and CRUD operations associated with it)
How can i make communications in between them.
In the monolith app the POST entity has createdBy mapped to User.
But how this works in Microservices architecture?
Can any one please help me how can i design such architecture?
First list out the reasons why you want to break it in micro-services. To make it more scalable (in following scenarios for example).
Post comments becomes slow, and during this period registration of
new Users to remain unaffected.
Very few users upload/download files, and you want general users who simply view comments and post comments to be unaffected, while upload/download files may remain
slow.
Answers of the above question and analyzing,priotizing other NFR's shall help to determine how and what to break.
Additionally,
Post service only needs to validate whether the user is a valid logged in user.(Correct?)
User Service does not really need to communicate with post service at all.
Further you might want to decouple other minor features as well. Which in turn talk to each other which can be authenticated via other means like(Certificates, etc.) as they will be internal and updating some stats(user ranking), aggregates data etc.
The system might also have a lot of smaller hidden features, which might or might not have to do anything with Post Service at all, which can be separated in terms of different micro-services(like video/file/picture/any binary content upload/download) also and prioritized based on computation power needed, hit frequency and business priority.
After breaking it in to micro-services, you need to run some stress tests (based on current load) to know which services needs replication and which not and needs a automatic load balancing. Writing stress load first before breaking can also help to understand which features need to be move out of the monolith first.

Best approach to create a webapp which updates the UI dynamically on his own rest service invocation by an external client [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a webapp which exposes a rest endpoint. Clients post data to this endpoint. I need to display the content they post in my webapp, without refreshing the UI.
My first approach was to save the posted data on a database and to have an ajax call which will continuously seek for the new data to display. This adds an overhead, because I don't need to save what I receive.
Secondly, I came across web sockets where a client and server can have full duplex communication.
Is there any better way of doing this?
P.S: My rest endpoints are developed using spring boot.
Generally there are three ways to get the client updated on a server event: polling, long-polling, and server-side push. They all have their pros and cons:
Polling
Polling is the easiest way of implementing this. You just need a timer on the client side and that's it. Repeatedly query the server for updates and reuse the code you already have.
Especially if you have many clients, the server may get flooded by a large amount GET requests. This may impose computational as well as network overhead. You can mitigate this by using a caching proxy on the server side (either as part of your application or as a separate artifact/service/node/cluster). Caching GET requests is normally quite easy.
Polling may not seem to be the most elegant solution but in many cases it is good enough. In fact polling can be considered the "normal" rest-full way to do this. HTTP specifies mechanisms like the if-modified-since header which are used to improve polling.
Long-Polling
Long-polling works by making the GET request a blocking operation. The client makes a request and the server blocks until there is new data. Then the request (which might have been made long ago) is answered.
This dramatically reduces the network load. But there are several drawbacks: First of all you can easily get this wrong. For example when you combine this approach with server-side pooling of session beans your bean pool can get used up quite fast and you have a wonderful denial-of-service.
Furthermore long-polling does not work well with certain firewall configurations. Some firewalls may decide that this TCP connection has been quiet for too long and regards it as aborted. It then may silently discard any data belonging to the connection.
Caching-proxies and other intermediaries may also not like long-polling -- although I have no concrete experience I can share here.
Although I spent quite some time writing about the drawbacks, there are cases when long-polling is the best solution. You just need to know what you are doing.
Server-Side Push
The server can also directly inform the clients about a change. Websockets are a standard which details this approach. You can use any other API for establishing TCP connections but in many cases websockets are the way to go. Essentially a TCP connection is left open (just like in long-polling) and the server uses it to push changes to the client.
This approach is on a network-level similar to the long-polling approach. Because of that it also shares some of the drawbacks. For example you can get the same firewall issues. This is one of the reasons why websocket endpoints should send heartbeats.
In the end it depends on your concrete requirements which solution is best. I'd recommend using a simple polling mechanism if you are fine with polling every ten seconds or less frequently and if this doesn't get you into trouble with battery usage or data transmission volume on the client (e.g. you are building a smartphone app). If polling is not sufficient, consider the other two options.

Delegating workload from Java Servlet [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm attempting to build a Java Servlet task that has a runtime of about ~15-20 minutes that takes arguments from a HTML form. I have a couple of questions regarding this:
Will the task continue to run even after the user closes the browser? I Googled this and it seems the process will continue to execute even after browser close. I just want to confirm this.
While searching for the answer for the above question, I came across a post (or a couple of them) that stated that for such 'intensive' (I would consider mine intensive as it takes around 15-20 minutes to complete) tasks, it's better to have a separate program run the task than containing it in the servlet program. So, do I just execute another Java program from the servlet class?
And now for my final question, will multiple user requests be processed independent of each other? As in, will the servlet have a separate thread or instance for each request? If so, will my execution of another Java program from the servlet class lead to any problems?
There are a few items to discuss, each with their own (part of a) solution:
Do you really want the task to continue if the browser closes? Spawn a new thread for the task (Trying to write to the browser outputstream when browser is already closed will make the thread die in an exception) See Executor
Do you want concurrent requests to be handled in parallel? How many in parallel? See ThreadPoolExecutor
Do you want feedback to the browser (user) during the long running task? See Async servlets
The servlet container will make sure that parallel requests are handled concurrently, each in their own thread. But they will share the instance of the Servlet class. Therefore, you have to make your code thread safe.
About running a 'separate java program' or keeping the task in the servlet: it is best practice to separate different tasks in a program in different sections. Creating a new class for your long running task is better than keeping it in the servlet class. See Separation of concerns.

Restful services and messaging [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
we are planning to design a system where data comes through web services and data will be processed asynchronously, i have been assigned to pick java open source technologies to do this, for web services we have decided to go with Restful services, I never worked with messaging technologies , can anyone please suggest which is the best open source technology that is available in the market that does data process asynchronously
Try Apache CXF - see the DOCS
It has everything you want i guess
Your use case is of processing of data asynchronously. This typically happens in following steps:
Receive the data and store it somewhere (in-memory or persistent location)
Return the Acknowledged/Received response immediately.
Either immediately start a thread to process the data or let some scheduled thread scan the received data and process it.
Optionally sent acknowledge to the sending application if such an interface is available.
There is no standard library or framework in java to do this. There are individual pieces which are know to solve standard problems and combining them will be one options.
Producer consumer Pattern is a typical patter which satisfies your need over there.
You can build a producer-consumer pattern using Java's concurrent APIs (Here is an example)
This producer consumer piece can be wrapped behind a Servlet (Or some other server side class which handles requests).
All in incoming request will put by the producer on the shared queue and return.
Consumer will pick it from the queue and process it asynchronously.
Another option would be to use Asynchronous processing in in Servlet3.0.

What are performant, scalable ways to resolve links from http://bit.ly [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Given a series of URLS from a stream where millions could be bit.ly, google or tinyurl shortened links, what is the most scalable way to resolve those to get their final url?
A Multi-threaded crawler doing HEAD requests on each short link while caching ones you've already resolved? Are there services that already provide this?
Also factor in not getting blocked from the url shortening service.
Assume the scale is 20 million shortened urls per day.
Google provides an API. So does bit.ly (and bit.ly asks to be notified of heavy use, and specify what they mean by light usage). I am not aware of an appropriate API for tinyurl (for decoding), but there may be one.
Then you have to fetch on the order of 230 URLs per second to keep up with your desired rates. I would measure typical latencies for each service and create one master actor and as many worker actors as you needed so the actors could block on lookup. (I'd use Akka for this, not default Scala actors, and make sure each worker actor gets its own thread!)
You also should cache the answers locally; it's much faster to look up a known answer than it is to ask these services for one. (The master actor should take care of that.)
After that, if you still can't keep up because of, for example, throttling by the sites, you had better either talk to the sites or you'll have to do things that are rather questionable (rent a bunch of inexpensive servers at different sites and farm out the requests to them).
Using HEAD method is an interesting idea by I am afraid it can fail because I am not sure the services you mentioned support HEAD at all. If for example the service is implemented as a java servlet it can implement doGet() only. In this case doHead() is unsupported.
I'd suggest you to try to use GET but do not read the whole response. Read HTTP status line only.
As far as you have very serious requirements for performance you cannot these requests synchronously, i.e. you cannot use HttpUrlConnection. You should use NIO package directly. In this case you will be able to send requests to all millions of destinations using only one thread and get responses very quickly.

Categories

Resources