I have a web application written in java and it is about course scheduling. I used file system to store my data. I serialized my collentions (objects in them) and collect them into an serialalized object. Every user has a few file for system data and backup or prepared table data. My program is reading that file (only system data) when user entered and writing it to that file when user exit.
Object design is very complex and there are many many-to-many relations between my objects. Writing data of this program requires so much join operation and well design schema in SQL. Furthermore a user generally have thousands of information containing hundreds of courses, teacher, rooms and even thousands of constraints. Beside of those, there is no connection between users.
In this program I do not see any advantage of using sql. Even my system is running very fast. However lots of developers which do not know structure of my program advised to me to carry my database system to SQL.
What is the advantage of SQL against file system (with serialized objects) especially for my extraordinary system?
To name just a few advantages of using a SQL database in general:
Scalability: You can move your data to a separate server(s) to reduce the load on your application server.
Accessibility: Should you wish to release an extension of your program in the future, that program can run independently and still access the same data without you having to worry about locking files etc.
Security: Database systems allow you to configure "privileges" as who is allowed to write data and who is allowed to read data.
Portability: Should you decide in the future to upgrade your program to a different technology, the SQL language is an industry standard that can relatively easily be ported from one database vendor to another. You can also port your main application independently of your database to another technology given both technologies has drivers / libraries / connection / plug-ins / what-ever-you-want-to-call-it to communicate with your database.
Backups: You can easily run a backup while your application is online. With a files, you might run into locking issues or alternatively have to shut down the application to do a backup.
Human resources: You can employ a database administrator to take care of your databases and reduce the responsibility of your developers/engineers so that they can focus on other tasks.
Monitoring: You can monitor or inspect your data, while your application is live.
However, usually application that uses a SQL database was designed with a relational data-structure in mind from the beginning. Without fully understanding your application, it sound like you have a very complex data structure and that it might not be that easy to migrate to a SQL database. I would recommend that you also have a look at a NoSQL database which
allows large volumes of structured, semi-structured, and unstructured data
encourages object-oriented programming
easy to use
and is much more flexible than SQL
Related
My employer has currently given me a project that has me scratching my head about synchronization.
I'm going to first talk about the situation I'm in:
I've been asked to create a pdf-report/quotation-tool that takes data (from csv-files; because the actual database the data is on is being used by old IBM software and they for reasons (unknown) don't want any direct access to this database (so instead of making copies of the data to other databases, they apparently found it incredibly fine to just create a folder on the server with loads and loads and loads of CSV-files.)), this piece of software is to load data into the application, query it, transform where needed, do calculations and then return with a pdf-file to the end-user.
The problem here is that getting, querying, and calculating things takes a fair amount of time, the other problem is: they want it to be a WebApp because the business team does not want to install any new software, they're mostly moving towards doing everything online (since the start of the pandemic), it being a WebApp means that every computation has to be done by the WebApp and getting the data likewise.
My question: Is each call to a servlet by a separate user treated as a separate servlet and should I only synchronize the methods on the business logic (getting and using the data); or should I write some code that puts itself in the middle of the servlet, receives a user-id (as reference), that then runs the business-logic in a synchronized-fashion, then receiving data and returning the pdf-file?
(I hope you get the gist of it...)
Everything will run on Apache Tomcat 8 if that helps. Build is Java 11lts.
Sorry, no code yet. But I've made some drawings.
With java web applications, the usual pattern is for the components to not have conversational state (meaning information specific to a specific user's request). If you need to keep state for a user on the server, you can use the http session. With a SPA or Ajax application it's often easier to keep a lot of that kind of state in the browser. The less state you keep on the server the easier things are as your application scales, you don't have to pin sessions to servers (messing up load balancing) or copy lots of session state across a cluster.
For simple (non-reactive) web apps that do blocking i/o, each request-response cycle gets its own dedicated thread from tomcat's pool. That thread delivers the http request to the servlet, handles the business logic and blocks while talking to the database, then carries the http response.
(Reactive webapps are going to be more complex to build, you will need a non-blocking database driver and you will have less choices for databases, so I would steer clear of those, at least for your first web application.)
The threadpool used by tomcat has to protect itself from concurrent access but that doesn't impact your code. Likewise there are 3rd party middletier caching libraries that have to deal with concurrency but you can avoid dealing with it directly. All of your logic is confined to one thread so it doesn't interfere with processing done by other threads unless there are shared mutable data structures. Those data structures would be the part of the application where synchronization might be one of several possible solutions.
Synchronization or other locking schemes are local to one instance of the application. If you want to stand up multiple instances of this application then you need to be aware each one would be locking separately from the others. So for some things it's better to do locking in the database, since that is shared across webapp instances.
If you can make use of a database to store your data, so that you can rely on the database for caching and indexing, then it seems likely your application should be able to avoid having doing a lot of locking.
If you want examples there are a lot of small examples for building web apps using spring at https://spring.io/guides. These are spring boot applications that are self hosted so you can put them together quickly and run them right away.
Going rogue with a database may not be the best course since databases need looking after by DBAs. My advice is put together two project plans, one for using a database, and one for using the flat files. The flat file one will have to allow for addressing issues like handling caching, indexing data, replication of data from the legacy database, and not having standard tools that generate pdfs from sql queries. The alternative plan using a database should have a lot less sorting out of infrastructure and a shorter time til you can get down to cranking out reports.
As the application gets complicated, one thing that change a lot is the queries, especially if they are complex queries. Wouldn't it be easier to maintain the queries in the db rather then the resources location inside the package, so that it can be enhanced easily without a code change. What are the drawbacks of this?
You can use stores procedures, to save your queries in the database. Than your Java code can just call the procedure from the database instead of building a complex query.
See wikipedia for a more detailed explanation about stored procedures:
https://en.wikipedia.org/wiki/Stored_procedure
You can find details about the implementation and usage in the documentation of your database system (MySql, MariaDb, Oracle...)
When you decide to move logic to the database, you should use a version control system for databases like liquibase: https://www.liquibase.org/get-started/quickstart
You can write the changes to you database code in xml, json or even yaml and check that in in your version control system (svn, git...). This way you have a history of the changes and can roll back to a previous version of your procedure, if something goes wrong.
You also asked, why some people use stored procedures and others keep their queries in the code.
Stored procedures can encapsulate the query and provide an interface to the data. They can be faster than queries. That is good.
But there are also problems
you distribute the buisiness logic of your application to the database and the programm code. It can realy be troublesome, if the logic is spread through all technical layers of your applicaton.
it is not so simple anymore to switch from a Oracle database to a MariaDb, if you use specific features of the database system. You have to migrate or rewrite the procedures.
you have to integrate liquibase or another system into you build pipeline, to keep track of you database changes.
So it depends on the project and it's size, if either of the solutions is better.
I am writing a chat application in Java, and can't seem to find information regarding the runtime differences between using MySQL to store and retrieve account usernames and passwords vs simple traditional data structures, like HashMaps. Very soon, I would like to store messages in this data structure as well, since they will be linked to accounts. Does using MySQL provide any runtime advantages over a natively coded Java structure like a HashMap, or is the overhead constant time of using MySQL not worth it? I will be frequently adding to (~5/500 operations), removing from (~1/500 operations), and accessing (~494/500 operations) the information in the database.
** I am aware of the differences in the persistence of data. That is not what I am asking about and it does not matter at all in this context. I am simply asking what the runtime difference will be while the application is running.
MySQL is a database system and a HashMap is a data structure. They are very different things and have completly different uses. Databases allow persistent data (data exists after the program ends) and simple data structures only retain their data when the program is running and are cleared when the program ends.
I have a servlet-based application that runs in a tomcat7 environment.
This application needs to manage users' files in such a way these files can be accessed in many ways and through different classification methods (for instance time-oriented classification and search, keywords, tags, author and so on).
So I have a multidimensional search space and I need to organize a database-based grouping system.
Let focus on a single and specific aspect.
Any user can upload his own files. So I'll have a path in which these files will be saved.
Then I need also a place where to store the informations associated to the files.
I though that it is good to separate files from associated informaions (title, ...) and then to create a third entity that is a small string that univocally identificate both info and file.
This way once i know the file id I can get both the informations (that are stored in a specific file) and the file but I can save this id in any perverse classification table without copying anything heavy.
So If I have the file id (fid) I can get the file and the informations. and when I have for example to associate an object to a file I can simply associate that object to the fid.
Then any user must have its own table that collects the variuos fid of the files he uploaded .
Therefore I have one table for each user. Then for any other classification dimension I will have N tables (where N is the size of the dimension). So for instance I want to classify files for keywords, I'll need N tables each for a specific keyboard. (it will be too unefficient to search each time I want files associated to key AGAA through all the users files)
So if I need to show the 50 more recent files associated to the keyword "AGAAA" I need a table for AGAAA. and so on.
This is crazy. as the number of users increases I get exponentialy more tables.
I heard about table limit per database in mysql databases.
Until now I'm using mysql (mariaDB) with connection pooling.
I though to split tables of different "nature" (i.e. those of the keyboards, those fo the time and so on) in different databases (also in order to organize in a clearly way the contents). But with connection pooling I need to declare the database name in the resource definition. So for different databases I will need different pools.
Now questions.
Using pooling I must create a different pool resource for each different database access. aint I?
If yes, is It a good pratice to use the same database for all the different kind of tables?
If no. How can change database runtime?
I thought I could manage different tables with different database systems. for example I could use SQLite in order to manage classification tables, mysql to manage user interaction and so on. Is this a good pratice?
Is SQLite in general faster than server-based databases in multi-user applications?
Can I use connection pooling with SQLite ? I mean, what are SQLite connection if SQLite has no server? and does it make sense to think about connection pooling?
What database architecture do you suggest for this kind of problematics?
thanks
Why would each user or keyword need its own table? Tables can have many rows.
Using pooling I must create a different pool resource for each
different database access. aint I?
Your question has multiple meanings, but generally you create one pool for one application, and it manages itself.
If yes, is It a good pratice to use the same database for all the
different kind of tables? If no. How can change database runtime?
Generally one would use one database for an application.
I thought I could manage different tables with different database
systems. for example I could use SQLite in order to manage
classification tables, mysql to manage user interaction and so on. Is
this a good pratice?
You could, but that would be insane.
Is SQLite in general faster than server-based databases in multi-user
applications?
Absolutely not. SQLite can only have one writer at a time, though it is fine for many readers.
Can I use connection pooling with SQLite ? I mean, what are SQLite
connection if SQLite has no server? and does it make sense to think
about connection pooling?
I don't know, but you shouldn't use SQLite if you expect multiple concurrent users writing / uploading to the database.
What database architecture do you suggest for this kind of
problematics?
I would suggest you use a content repository like Apache JackRabbit, or a search server like Apache Solr.
I'm working on a school project where the client needs to have multiple users querying and writing to a single data source. The users have access to shared network drives and all functionality has to be in the client application, the IT department won't allow a service to run from one of their servers and external server hosting isn't an option.
The amount of data that actually needs to be stored is actually very little, about 144 rows maximum.
I've looked into using embedded databases, sqllite , hsql , objectdb ... etc but they seem over kill for how little data needs to be saved. It also seemed like with hsql if anyone accessed the database it would be completely locked to any other user. Concurrency wouldn't be much of an issue there will be 5-7 people using the system albeit scarcely only a few times a year.
Would using something like XQuery and serializing everything in xml be a viable option or just simply using the java serializable api?
A distributed, client side database writing files to the shared network drive could be a good solution for this use case. Take a look at Cloud DB, it might be what your looking for.