Server Storage - Java - java

I have the following problem: I have Java application - Sprint boot, which uses Angular in the frontend. This application needs to store some data on the client side, however, this data is lost when the client changes their browser or opens an anonymous browser tab.
I need an alternative, other than linking data to the user in the database. Something that is implemented in Java itself.
Is there any way I can store data in Java - Even though I know they will be volatile, that is, we can assume that my application server will be up 100% of the time.
**edit
My server run a openshift plataform that have multiple pods, the load baancer of server are configured in a NON-Sticky sessions design. That's why we can assuming that my server will be 100% active.

This really depends on the design of your server. For example, why is it guaranteed to be up 100% of the time? Do you have multiple redundant instances? In that case you need to coordinate that "storage" between all instances; you may even want to deal with a quorum of instances keeping the state etc. Doesn't seem to be trivial. Or do you have just one single instance? But how do you guarantee 100% uptime?
I strongly recommend using some kind of data store or at least distributed cache.

Related

Where to syncronize inside a Java WebApp

My employer has currently given me a project that has me scratching my head about synchronization.
I'm going to first talk about the situation I'm in:
I've been asked to create a pdf-report/quotation-tool that takes data (from csv-files; because the actual database the data is on is being used by old IBM software and they for reasons (unknown) don't want any direct access to this database (so instead of making copies of the data to other databases, they apparently found it incredibly fine to just create a folder on the server with loads and loads and loads of CSV-files.)), this piece of software is to load data into the application, query it, transform where needed, do calculations and then return with a pdf-file to the end-user.
The problem here is that getting, querying, and calculating things takes a fair amount of time, the other problem is: they want it to be a WebApp because the business team does not want to install any new software, they're mostly moving towards doing everything online (since the start of the pandemic), it being a WebApp means that every computation has to be done by the WebApp and getting the data likewise.
My question: Is each call to a servlet by a separate user treated as a separate servlet and should I only synchronize the methods on the business logic (getting and using the data); or should I write some code that puts itself in the middle of the servlet, receives a user-id (as reference), that then runs the business-logic in a synchronized-fashion, then receiving data and returning the pdf-file?
(I hope you get the gist of it...)
Everything will run on Apache Tomcat 8 if that helps. Build is Java 11lts.
Sorry, no code yet. But I've made some drawings.
With java web applications, the usual pattern is for the components to not have conversational state (meaning information specific to a specific user's request). If you need to keep state for a user on the server, you can use the http session. With a SPA or Ajax application it's often easier to keep a lot of that kind of state in the browser. The less state you keep on the server the easier things are as your application scales, you don't have to pin sessions to servers (messing up load balancing) or copy lots of session state across a cluster.
For simple (non-reactive) web apps that do blocking i/o, each request-response cycle gets its own dedicated thread from tomcat's pool. That thread delivers the http request to the servlet, handles the business logic and blocks while talking to the database, then carries the http response.
(Reactive webapps are going to be more complex to build, you will need a non-blocking database driver and you will have less choices for databases, so I would steer clear of those, at least for your first web application.)
The threadpool used by tomcat has to protect itself from concurrent access but that doesn't impact your code. Likewise there are 3rd party middletier caching libraries that have to deal with concurrency but you can avoid dealing with it directly. All of your logic is confined to one thread so it doesn't interfere with processing done by other threads unless there are shared mutable data structures. Those data structures would be the part of the application where synchronization might be one of several possible solutions.
Synchronization or other locking schemes are local to one instance of the application. If you want to stand up multiple instances of this application then you need to be aware each one would be locking separately from the others. So for some things it's better to do locking in the database, since that is shared across webapp instances.
If you can make use of a database to store your data, so that you can rely on the database for caching and indexing, then it seems likely your application should be able to avoid having doing a lot of locking.
If you want examples there are a lot of small examples for building web apps using spring at https://spring.io/guides. These are spring boot applications that are self hosted so you can put them together quickly and run them right away.
Going rogue with a database may not be the best course since databases need looking after by DBAs. My advice is put together two project plans, one for using a database, and one for using the flat files. The flat file one will have to allow for addressing issues like handling caching, indexing data, replication of data from the legacy database, and not having standard tools that generate pdfs from sql queries. The alternative plan using a database should have a lot less sorting out of infrastructure and a shorter time til you can get down to cranking out reports.

In GAE, is there a way to force an instance to serve only 1 session?

I am building a rich app on GAE using Canoo's RIA Suite. This package splits Java Swing components into server-side and client-side parts. On the server, it looks like a 'desktop' Java application. The client keeps its own map between these halves. When GAE starts a new instance, the client-side parts don't know about it -- if the next request they send is routed to the wrong instance bad things happen.
I figure I could get around this problem if I did one of two things:
Forced a GAE instance to serve exactly one HTTP session.
Directed each HTTP request to a specific GAE instance.
My question is, in the GAE environment, can either of these be done?
Neither of these two options will solve your problem, because an App Engine instance can die and be replaced at any moment.
If you can save a state of your server-side "half" in a datastore, you can load it when a request hits the "wrong" instance, but it's probably not a very efficient solution.
You may be better off using a Compute Engine instance.
I agree that neither of those two options will work for you. The implication of your current design is that you are storing state in memory on an instance, which will not work with GAE (or any autoscaling distributed system). You should put any state into some distributed data store, whether that is memcache (which is volatile), the datastore or cloudSQL
GAE/J has built in support for java sessions, the session state is persisted in the datastore across requests so that it is valid on any instance. For this to work, everything stored in your session will need to be serializable.
You can enable this by following these instructions.
Otherwise you can manage persisting server state yourself into the datastore accelerated by memcache, and linking it to a 'session' with a cookie. If you go down this road make sure you understand the implications of eventual consistency in the GAE datastore.

Java Swing - Single user application to a Multi user application

I am a newbie......I have a Java Swing application and it runs great on my machine. I want to access this application from a Server via Citrix. So when I click on the published icon, it would run the main method in the jar file and will enable me to access the application. This application will be able to access a DB in the DB Server.
But I want mutliple users to access the application at the same time and that is where my questions are:
I thought of creating n number of threads available for n number of users (i.e.) I can set a limit on concurrent access. But what will be the entry point.....I mean when I click on the published icon, it means each time the main method will be invoked.
I can think of seperating my User Interface from the Logic layer....but I have no idea how to do it. ofcourse I am following MVC model. My question is in terms of creating multiple instances of the gui each time the application is accessed.
And finally I want to use DB Connection pooling. So, would this mean that I have to create a separate java program that creates this pool and my application will use its datasource?
Can anyone please 'point' me in the right direction? I am not looking for specific ideas but just an idea on how to create this multi user application.
Typically, for a multi-user Swing application you would want to separate the "client" part of the application from the "server" application.
This works as follows:
Each user would get their own running instance of the client application. This can be on their own machine.
The server application is a single instance (or maybe a cluster) that accepts connections from multiple clients and talks to the database
The client applications talk to the server application when they need to access or change data. There are a wide variety of different communication methods you can use.
Optionally, the server application can send notifications to the client (e.g. in situations where data is updated by another client)
You can do without the server application if you are happy to let the clients connect to the same database. This is simpler to set up, but has some downsides:
You need to be much more careful about concurrent access to the database / potential corruption from different clients attempting to alter the same data at the same time.
You need to allow connections to your database, from clients that are potentially untrusted. This is a security risk.
Given that you already have a working application, the second option is probably easier for you to move to. Just be aware of the downsides: the first option is a much better architecture in general.

How do you store and replay JDBC statements?

Given a JDBC-based application, that was not designed for real-time propagation of changes from one instance of the app running on computer A to another instance runnning on computer B in a two-way synchronization schema. How can you do this elegantly, without using Symmetric DS?
We though of using XMPP and XStream, transforming POJOs to XML or JSON, sending them via the XMPP, Smack API to the pre-configured "chat room" where other bots, listening, would replay the data they receive. Thus, even offline client apps, would receive the "DiscussionHistory" by sending their last "since timestamp".
I kind of looked everywhere for a "near real-time database change propagation" in Java, or even in H2, but where changes are propagated between each node registered, but the only solution I could think of is to use the XMPP protocol, build a "bot" chat-room around it, have nodes send their data there while others listen for changes.
The so-called "bots" are application instances on different computers, of an accounting application that should allow for real-time collaboration on the same database, but allow for offline modifications (so no centralized server to store changes).
One common approach is to build your caching so that the application always queries the database if a particular entry is not found. Then you would only have to synchronize cache-evictions to force all nodes in a group to re-load a certain entry. This is fairly easily achieved using, for instance, spring method caching and ehcache.

What are requirements for a web application to work in a cluster environment

I need to check if existing web application is ready the be deployed in a clustered environment.
Cluster:
Several Linux boxes. The flow is controlled by a load balancer that is using simple round robin algorithm with sticky session.
Application
Stateless (hopefully) java web application that retrieves content from back office and format it appropriately.
I have access to the source code. What should I check in the code to be sure that it will run in the cluster?
Check that something is not cached in a memory or file system that stores state of the application.
...Something else?
If you're using EJBs (which is recommended if you access a DB), then here is a list of restrictions:
http://java.sun.com/blueprints/qanda/ejb_tier/restrictions.html
I guess similar restrictions apply to the web application.
The easiest way to check the application is to start by having it running on 2 servers with the same data so at startup both are in the same state. Let's assume for a user to complete an operation, the browser will make 2 consecutive HTTP requests to your web app -- what you need to do is hit webserver 1 with first call and web server 2 with second call; then try the other way around, then with both requests going to the same webserver -- and if you get the same result each time you're very likely you have ready-to-cluster application. (It doesn't mean the app IS ready to cluster as there might be object states etc it stores in memory which are not easy to spot from the front-end, but it gives you a higher probability that IT MIGHT BE ok to run in a cluster.)
If its truly "stateless", there would be no problem, you could make any request of any server at any time and everything would just work. Most things aren't quite that easy so any sort of state would either have to be streamed to and from the page as it moves from client to server, or be stored on the back end, and have some sort of token passed back and forth in order to retrieve it from whatever shared data store you're using for that. If they are using the HttpSession, then anything that is retrieved from the session, if modified, needs to be set back into the session with session.setAttribute(key,value). This setting the attribute acts as a signal that whatever is being stored in the session needs to be replicated to the redundant servers. Make sure anything stored in the session implements, and actually is, Serializable. Some servers will allow you to store objects, (I'm looking at you weblogic), but will then throw an exception when it tries to replicate the object. I've had many a coworker complain that having to set stuff back to the session should be redundant, and perhaps it should, but this is just the way things work.
Having state is not a big problem if done properly. Anyway, all applications have state. Even if serving somewhat static file, the file content associated with an URL is indeed part of the state.
The problem is how this state is propagated and shared.
state inside user session is a no brainer. Use a session replication mechanism (slower but no session loss on node crash) or session sticky load balancer and your problem is solved.
All other shared state is indeed a problem. In particular even cache state must be shared and perfectly coherent otherwise a refresh on the same page could generate different result on random depending on witch web server, and thus the cache you hit.
You can still cache data using a shared cached (like ehcache), or failing back to session sticky.
I guess it is pretty difficult to be sure that the application will indeed work in a clusterised environement because a singleton in some obscure service, a static member somewhere, anything can potentially produce strange results. You can validate the general architecture for sure, but you'll need to do in reality and perform some validation test before going into production.

Categories

Resources