In my web application I have a part which needs to continuously crawl the Web, process those data and present it to a user. So I was wondering if it is a good approach to split it up into two separate applications where one would do the crawling, data processing and store the data in the database. And the other app would be a web application (mounted on some web server) which would present to a user the data from the database and allow him a certain interaction with the data.
The reason I think I need this split is because if I make certain changes to my web app (like adding new functionalities, change the interface etc.) I wouldn't like the crawling to be interrupted.
My application stack is Tapestry (web layer), Spring, Hibernate (over MySQL) and my own implementation of the crawler independent from the others.
Is it good for the integration to be done just by using the same database? This might cause an issue with accessing the database from the both applications at the same time. Or can the integration be done on the Hibernate level, so both applications could use the same Hibernate session? But can the app from one JVM instance access the object from another JVM instance?
I would be grateful for any suggestions regarding this matter.
UPDATE
The user (from web app's interface) would enter the URLs for crawler to parse. The crawler app would just read the tables with URLs the web app populates. And vice versa, the data processed by the crawler would just be presented on the user interface. So, I think I shouldn't concern about any kind locking, right?
Thanks,
Nikola
I would definitely keep them separated like you are planning. The web crawling is more a "batch" process than a request driven web application. The web crawling app will run in its own JVM and your web app will be running in a servlet/Java EE container.
How often will the crawler run or is it a continuously running process? You may want to consider the frequency based on your requirements.
Will the users from web app be updating the same tables that the crawler will post data to? In that case you will need to take precaution otherwise a potential deadlock may arise. If you want your web app to auto refresh data based on new inserts in the tables then you can create a message driven bean (using JMS) to asynchronously notify the web app from the crawler app. When a new data insert message arrives you can either do a form submit on your page or use ajax to update the data on the page itself.
The web app should use connection pooling and the batch app could use DBCP or C3P0. I am not sure you gain much benefit by trying to share the database sessions in this scenario.
This way you have the integration between the two apps while not slowing down each other waiting on other to process.
HTH!
You are right, splitting the application into two could be reasonable in your case.
Disadvantages of separating into two applications -
You can not cache in Hibernate or any other cached mutable objects that are modifiable from both applications in any one of them. Optimistic locking should work fine with two hibernate applications. I don't see any other problems.
Advantages you have already specified in your code.
Related
My employer has currently given me a project that has me scratching my head about synchronization.
I'm going to first talk about the situation I'm in:
I've been asked to create a pdf-report/quotation-tool that takes data (from csv-files; because the actual database the data is on is being used by old IBM software and they for reasons (unknown) don't want any direct access to this database (so instead of making copies of the data to other databases, they apparently found it incredibly fine to just create a folder on the server with loads and loads and loads of CSV-files.)), this piece of software is to load data into the application, query it, transform where needed, do calculations and then return with a pdf-file to the end-user.
The problem here is that getting, querying, and calculating things takes a fair amount of time, the other problem is: they want it to be a WebApp because the business team does not want to install any new software, they're mostly moving towards doing everything online (since the start of the pandemic), it being a WebApp means that every computation has to be done by the WebApp and getting the data likewise.
My question: Is each call to a servlet by a separate user treated as a separate servlet and should I only synchronize the methods on the business logic (getting and using the data); or should I write some code that puts itself in the middle of the servlet, receives a user-id (as reference), that then runs the business-logic in a synchronized-fashion, then receiving data and returning the pdf-file?
(I hope you get the gist of it...)
Everything will run on Apache Tomcat 8 if that helps. Build is Java 11lts.
Sorry, no code yet. But I've made some drawings.
With java web applications, the usual pattern is for the components to not have conversational state (meaning information specific to a specific user's request). If you need to keep state for a user on the server, you can use the http session. With a SPA or Ajax application it's often easier to keep a lot of that kind of state in the browser. The less state you keep on the server the easier things are as your application scales, you don't have to pin sessions to servers (messing up load balancing) or copy lots of session state across a cluster.
For simple (non-reactive) web apps that do blocking i/o, each request-response cycle gets its own dedicated thread from tomcat's pool. That thread delivers the http request to the servlet, handles the business logic and blocks while talking to the database, then carries the http response.
(Reactive webapps are going to be more complex to build, you will need a non-blocking database driver and you will have less choices for databases, so I would steer clear of those, at least for your first web application.)
The threadpool used by tomcat has to protect itself from concurrent access but that doesn't impact your code. Likewise there are 3rd party middletier caching libraries that have to deal with concurrency but you can avoid dealing with it directly. All of your logic is confined to one thread so it doesn't interfere with processing done by other threads unless there are shared mutable data structures. Those data structures would be the part of the application where synchronization might be one of several possible solutions.
Synchronization or other locking schemes are local to one instance of the application. If you want to stand up multiple instances of this application then you need to be aware each one would be locking separately from the others. So for some things it's better to do locking in the database, since that is shared across webapp instances.
If you can make use of a database to store your data, so that you can rely on the database for caching and indexing, then it seems likely your application should be able to avoid having doing a lot of locking.
If you want examples there are a lot of small examples for building web apps using spring at https://spring.io/guides. These are spring boot applications that are self hosted so you can put them together quickly and run them right away.
Going rogue with a database may not be the best course since databases need looking after by DBAs. My advice is put together two project plans, one for using a database, and one for using the flat files. The flat file one will have to allow for addressing issues like handling caching, indexing data, replication of data from the legacy database, and not having standard tools that generate pdfs from sql queries. The alternative plan using a database should have a lot less sorting out of infrastructure and a shorter time til you can get down to cranking out reports.
We have an Oracle database which hold data about some cities and
places, etc.
We have a web system which we can manipulate these datas.
We also have a desktop client application which is working with these
data.
For increasing our desktop application performance and decreasing unuseful request for our DAO layer, we have implemented some Singleton classes in our desktop application to fetch mentioned cities, places, etc data only once right after the user is opened his/her desktop application.
Recently we received a request from our clients why we don't see the changes we make using the web application, when the client desktop application is live and up and running. They're complaining about why they have to close the desktop app and open it again in order to see the changes.
We know that the problem is those Singleton classes but we don't want to change them because it's gonna be huge overhead in our system when they're not there. For solving the problem we have thought about multiple solutions:
Create a table in a database with integer column names similar to our data columns (cities, places, etc) and auto increment value when there's an update for tracking the changes using it (a light weight solution)
Using database functionalities
a Notify system that notify the client application whenever an update occurred.
a caching mechanism inside database that cache those lately changing tables and service our users when they have similar request
Here are our stacks:
Our Desktop application is swing application
Our Web application is JSF
Our business layer for both JSF and swing is EJB
Our Dao layer for both JSF and swing is Eclipse-Link
What do you think is the best practice for solving this problem ?
Oracle has a feature called "Database Change Notification" that can be used to be notified when read-mostly tables are changed. It looks like this feature could be a good fit to address your requirement. The link to the doc is here.
In a nutshell, the way it works is that JDBC thin driver in your desktop application would open a port and the Oracle Database would connect to that port and use this connection to push notifications when data changes. You then get a callback through an event/listener API and can refresh your cache.
This notification mechanism is designed for data that is read-mostly, in other words, data that doesn't constantly change otherwise it wouldn't be worth caching the data anyway.
Please consider me as a novice and this is my first web app I am creating.
I am planning to develop a web application where the traffic I am expecting is around 50 users will access the application at a single time.
The webapp is developed with Vaadin (for UI) and respective business logic implemented with Java. DB used would be MySQL. The war will be deployed in Tomcat.
So, my question is do I need to modify anything in Tomcat properties or anywhere to make the web app as multi user application (i.e. each users need to access and use application as though they are only one using the application)?
I tried to access a prototype developed using Vaadin in both Chrome and Firefox and could see both sessions running without an impact on another.
But please let me know suggestions.
You must keep in mind that even if tomcat and vaadin manage multiple sessions, your server application will have only 1 instance. So if you use singletons, static methods or fields, use them with care: they should never hold session-dependant content. Try to favour stateless methods over statefull.
Apart from that, there shouldn't be any problem.
It should not have any code changes if you handle the session and your business logic with statefulness properly.
There might be some configuration changes, like increasing the database connection pool size, it depends on what kind of connection pooling you are using and what is the default size etc.
Apart from that it should work just fine.
Vaadin is built on top of Jakarta Servlet technology (formerly known as Java Servlet). See Wikipedia. Indeed, Vaadin is a servlet, a much bigger and more sophisticated servlet than most.
Within a Java Servlet container (engine) such as Apache Tomcat or Eclipse Jetty, any particular servlet has only a single instance running. If three requests from three users arrive at the same time, there are three threads running through that same single instance for that particular servlet. So a servlets are inherently a highly threaded environment.
If you share any variables or resources between those threads, you must be very careful. That means mandatory reading, rereading, and fierce study of the book Java Concurrency in Practice by Brian Goetz, et al.
While the Web and HTTP were designed to be stateless delivery of single documents, that original vision has been warped by the desire to make web apps. To maintain state, a servlet automatically maintains a session. Vaadin represents this session state in its VaadinSession object. All data in all the forms, along with business logic, running for each user is maintained as part of that session.
Depending on your particular Vaadin app, and when multiplied by the number of concurrent users, this may add to a large amount of memory. You should monitor your server to make sure you have enough available RAM on your server.
do I need to modify anything in Tomcat properties or anywhere to make the web app as multi user application (i.e. each users need to access and use application as though they are only one using the application)?
No, nothing for you to set or enable. Tracking the requests/responses and session for each user is the very purpose of a servlet container. From the moment it launches, every servlet container expects multiple users. As a Servlet, Vaadin is built to expect multiple users as well. The only trick is making your own code thread-safe, hence the book suggestion.
I tried to access a prototype developed using Vaadin in both Chrome and Firefox and could see both sessions running without an impact on another.
Concurrency problems can be very tricky to detect and debug. Often potential problems occur on the random chance of coincidental timing. You need to focus on properly designing your code in the first place, rather than relying on testing. Again, hence the book recommendation.
Of special note, since you mentioned using a database, is JDBC drivers. Deploying them in a Servlet environment can be tricky. Basically you need to not bundle them within your Vaadin web app WAR file. Instead, deploy the JDBC driver separately within a shared library folder within Tomcat. If using Maven to drive your project, direct Maven in the POM file to give the dependency for your JDBC driver a scope of provided. This has nothing to do with Vaadin specifically, it applies to all servlets. Search Stack Overflow as this issue has been extensively addressed.
I am a newbie......I have a Java Swing application and it runs great on my machine. I want to access this application from a Server via Citrix. So when I click on the published icon, it would run the main method in the jar file and will enable me to access the application. This application will be able to access a DB in the DB Server.
But I want mutliple users to access the application at the same time and that is where my questions are:
I thought of creating n number of threads available for n number of users (i.e.) I can set a limit on concurrent access. But what will be the entry point.....I mean when I click on the published icon, it means each time the main method will be invoked.
I can think of seperating my User Interface from the Logic layer....but I have no idea how to do it. ofcourse I am following MVC model. My question is in terms of creating multiple instances of the gui each time the application is accessed.
And finally I want to use DB Connection pooling. So, would this mean that I have to create a separate java program that creates this pool and my application will use its datasource?
Can anyone please 'point' me in the right direction? I am not looking for specific ideas but just an idea on how to create this multi user application.
Typically, for a multi-user Swing application you would want to separate the "client" part of the application from the "server" application.
This works as follows:
Each user would get their own running instance of the client application. This can be on their own machine.
The server application is a single instance (or maybe a cluster) that accepts connections from multiple clients and talks to the database
The client applications talk to the server application when they need to access or change data. There are a wide variety of different communication methods you can use.
Optionally, the server application can send notifications to the client (e.g. in situations where data is updated by another client)
You can do without the server application if you are happy to let the clients connect to the same database. This is simpler to set up, but has some downsides:
You need to be much more careful about concurrent access to the database / potential corruption from different clients attempting to alter the same data at the same time.
You need to allow connections to your database, from clients that are potentially untrusted. This is a security risk.
Given that you already have a working application, the second option is probably easier for you to move to. Just be aware of the downsides: the first option is a much better architecture in general.
I am looking for a pattern and/or framework which can model the following problem in an easily configurable way.
Every say 3 minutes, I needs to have a set of jobs kick off in a web application context that will concurrently hit web services to obtain the latest version of data, and push it off to a database. The problem is the database will be being heavily used to read the data from to do tons of complex calculations on the data. We are currently using spring so I have been looking at Spring Batch to run this process does anyone have any suggestions/patterns/examples of using Spring or other technologies of a similar system?
We have used ServletContextlisteners to kick off TimerTasks in our web applications when we needed processes to run repeatedly. The ServletContextListener kicks off when the app server starts the application or when the application is restarted. Then the timer tasks act like a separate thread that repeats your code for the specified period of time.
ServletContextListener
http://www.javabeat.net/examples/2009/02/26/servletcontextlistener-example/
TimerTask
http://enos.itcollege.ee/~jpoial/docs/tutorial/essential/threads/timer.html
Is refactoring the job out of the web application and into a standalone app a possibility?
That way you could stick the batch job onto a separate batch server (so that the extra load of the batch job wouldn't impact your web application), which then calls the web services and updates the database. The job can then be kicked off using something like cron or Autosys.
We're using Spring-Batch for exactly this purpose.
The database design would also depend on what the batched data is used for. If it is for reporting purposes, I would recommend separating the operational database from the reporting database, using a database link to obtain the required data from the operational database into the reporting database and then running the complex queries on the reporting database. That way the load is shifted off the operational database.
I think it's worth also looking into frameworks like camel-integration. Also take a look at the so called Enterprise Integration Patterns. Check the catalog - it might provide you with some useful vocabulary to think about the scaling/scheduling problem at hand.
The framework itself integrates really well with Spring.