build database for a huge web portal

build database for a huge web portal - java

If I want to build a movie website similar to IMDB.
I can have the website up and running.
from the day I launch I can maintain the data upto date,
but my question, in my mind is How can i think of making the old data available say from 1900 to 2010
This is the challenge that I am facing, can any one share the knowledge how to go about this ?
Which Strategy I can follow to make a any website to make it to have old and as well as the on going News
Say the technology I can think to develop this website would be java, mysql, php
How can i think of making the old data available say from 1900 to 2010 *
Means : I would like to upload the All the movie data which is very old

I don't see the problem here. Just store the data for all movies in the database, and provide a way for the user to:
qualify his/her searches with a date range, and
order the results in date order.
I would like to upload the All the movie data which is very old
And is there any reason that you can't do this? (Apart from the obvious one of not having the data in the first place ...)
FOLLOWUP
It sounds like the problem you are worrying about is one of incomplete or bad data. There is no simple solution to that. You probably need to devise a strategy for dealing with it; e.g.
upload everything, and then do batch validation / cleanup runs on the live database
as above, but hide records until have passed validation
validate each record as you attempt to upload, and put those that fail validation on one side.
You also need to be able to:
identify common validation problems, and perform bulk corrections,
change (e.g. tighten) the validation rules on the fly and revalidate.
But this is all standard stuff for a large data-oriented application. (And there were easy solutions, everyone and their granny would be scraping the internet and building databases. TANSTAAFL. )

Related

XML vs JSON vs SQLite for only reading data

I have a collection of 350 locations in the United States with each containing about 25 subcategories. The data structure looks something like this:
Location (ex: Albany, NY)
--> Things to do
--> Population
... 23 More
Which of the following would be best for loading this data into the app: JSON, XML, or SQLite? Just to clarify, I don't need to edit this data in any way. I simply need to read it so that the information can be loaded into TextView's.
Edit:
I'm attempting to implement Room and XML and so far the XML seems to be the simplest to implement. Is it bad practice to use the XML solution? It doesn't seem to be using too many resources and it isn't running slow at all when tested on a few devices. Would it still be a better practice to implement the Room solution?

Undoubtedly, among all of these RDB is the most efficient one, both in terms of storage and query response. I personally do not see any point in using xml and json as these have been traditionally used for exchange of data and are inefficient for storage and queries.
I would suggest that you evaluate the following:
a) how are you going to store the data: single file vs multiple files(for example by subject)
b) are you going to be doing updates on the strings or just appending(SQL will be better suited for updates but if it just reading data after a batch processing flat files might be better suited)
c) How complex are the queries that you want to implement.XML and SQL are better suited for queries that might try to address metadata (date stored, original location address, etc.) than JSON
Once you determine what you want to optimize: whether it is on adding metadata, fast updates, fast querying, ease of storage, fast retrieval of subject files, etc. then you can decide the tradeoffs with other less important goals. In this specific instance the devil is very much in the details.

In most cases it would be better to use a database because it increases readability and maintainability. Especially if you want to show these information inside a kind of list-view. If you use JSON or XML you'll have to parse or write a lot of code to switch between things or load them with a good performance. Consider the case of using Room, LiveData and a RecyclerView, this will reduce the code you'll need and improve( a lot) performance and readability of your app code. By the way you should provide more information about how you want to use and where you want to show these information. XML (or the Android resource system) should be used if you plan to use the resource system itself with its qualifiers to reduce your work. Most of the time JSON is used to communicate outside or with another app in an easy way or for REST requests/responses.

The one option that wouldn't make sense to use at all for your use case is SQLite. Unless you plan on running specific queries on the data for preprocessing before loading them into your view it doesn't worth the overhead (even if I don't imagine is a lot with 350 locations)
XML vs JSON serve the same usecase without much difference, read up their specifics in this website: https://www.json.org/xml.html
I would personally go for JSON due to the simplicity of the format.
Edit:
#simo-r Argument is also a valid one in regards to readability of your code. While there are libraries that can make reading json/xml easier by default Android has really good SQLite support so it might make sense to use it. Ultimately it is in your personal preference and where you see the project growing.

I have a collection of 350 locations in the United States with each containing about 25 subcategories.
The main issue is scalability
Will you, in the next few years, keep just a few hundred locations, or do you imagine, that, if your software becomes successful, your data would grow to many thousands of locations?
If yes: choose SQLite because it could store many records, in an efficient way. Don't forget to have a good database schema with appropriate indexes. See this and read about database normalization. Also, an SQLite database could later be migrated (with efforts) to PostGreSQL.
If no (your data has just a few megabytes): keep JSON or XML. The data is in the page cache.
Consider also YAML, and sometimes a mixed approach.
don't forget to document how your data is organized and accessed.
See also the data persistence chapter of this draft report

If you gonna simply bind data into text views, you can just store the text as strings.xml. As simple as that.

Go with JSON.
Advantages :
Low overhead ( Vs SQLite )
Lightweight parsers like Jackson available using which you can easily convert your data into custom object or data-structure if you need.
Maintainable. As most of the developers understand the format.

I would suggest using JSON. Reason below
JSON vs XML
JSON is lightweight than XML and would take fewer resources(network and storage). Performance of the app increases.
JSON parsing is easy and as mentioned above, its trivial.
JSON is friendly to javascript, in case it's required.
JSON vs SQLite
350 data set with 23 attributes, can be easily managed by JSON. RDBMS is not required.
SQLite becomes an overhead. It's an extra layer and layer comes with a cost. Especially if the application is containerized, the architecture becomes complicated. One needs to deal with volume mapping etc, in case of JSON you can keep the data as part of the application code.
Importantly, since data is static, keep the application stateless by keeping the data alongside the codebase. This makes lot more sense from architectural perspective.

Problem
You have a fixed set of information with a simple structure that you wish to deliver to clients.
Questions to Reflect On
Do I expect this information to significantly changed or modified ever?
Do I expect to increase the amount of information available?
What kind of help do I have? Do they have a background in software engineering or is it someone of a different profession that has to wear a lot of hats?
What is the scale of the project? Are you expecting a large amount of users or just people interested in a very niche application?
JSON or XML
JSON and XML provide similar services: they are both data transfer protocols. If the information is not expected to grow both might be a great option. If its public information, just serve these files statically over nginx. You can point a worker with limited software engineering experience to update these files; they're just files in a folder presented in a human readable format... its extremely simple to do. These updates should be minor and infrequent.
JavaScript Object Notation(JSON) Pros
solid browser and backend support
small size and fast parsing by the javascript engine
very human readable, easy for the untrained eye to make changes
Extensible Markup Language(XML) Pros
standard meta-data option
supports namespaces
solid backend support and is often baked into frameworks
This article explains XML and JSON differences really well (in 2020) if these highlights were not sufficient for your investigation.
Database System
There are a plethora of database systems out there. Their job is to efficiently retrieve specific information from a large volume of data stored. The key reason to use databases is scalability. Scalability means a number of things; I view it as adapting to drastic change. If you expect this information to frequently change or grow, go with a database.
Object Relational Mapping (ORM)
Databases can be cumbersome to use. I would recommend using an ORM on top of them. These encapsulate a database and makes it more user friendly (language specific). Room makes sense in your use case especially for java android development. Encapsulation also allows you to migrate to other databases later without change your code. Here's a good article that discusses Room and SQLite!
Miscellaneous
"Is it bad practice to use an XML solution?"
No. The important thing is that it works, is understandable, and runs efficiently. Just keep in mind that XML and JSON are data transfer protocols and they do THAT job well. This stackoverflow discussion may be helpful to gain a better picture of what that means; be sure to read more than just the accepted answer.
"It doesn't seem to be using too many resources and it isn't running slow at all when tested on a few devices."
Although testing for functionality is great, keep in mind that your test is not a load test and does not verify what you're trying to confirm. I would explore load testing, Wikipedia is a good place to start!

Building a database structure in java and save files in XML format

I am new to Java and I am working on a project called (in-memory database server). In this project I am supposed to build the database structure for the tables and the relations between them (I am not going to use any DB language, I should build the structure by myself), and then save these structures to XML files in the server (It's a fixed schema of three tables), next I am supposed to handle the CRUD operations on the saved data sent from clients over sockets to the DB-server(using TCP). Also I must use a caching method to access the data fast from memory instead of the HDD.
Well, when thinking in the project generally I see it very complex and I don't know from where to start! Should I start with the client or the server?
I tried to divide the problem into smaller problems so I have these question that I need answers to it in order to catch the starting point.
How can I build the tables and save them in XML files?
After building the tables' structures, how can I make the relations between the tables? (The primary keys,foreign keys, and else).
How will client and server communicate to handle the CRUD operations?
What is the best caching method and how can I implement it.
I want to start building "Users" table, the client has a GUI Login form that will send the username and password to the server, the server will check them and log the user in.
I know it's a lot of questions, but I need help to understand how the work will be done, I need useful topics and videos, any related link may help me.

I would start by trying to define one of the tables in XML format. Probably starting with an XML Schema definition about the format the XML will take in the "table definition" file(s). Once I had that ready, I would start unit testing the heck out of that/those XML. Only after I had everything working to my satisfaction in the unit testing stage would I introduce any complexity of the web client/server variety. Baby steps win.
EDIT: (Useful links: Google this: "XML Schema definition example")

You're new to Java, and you haven't written a database management system before. You've got quite a learning curve ahead of you. Breaking the problem down into manageable chunks is very wise. I'd start reading about principles of DBMS (independent of any implementation language), and then maybe download some open source Java DBMS and study how others have solved the problem.

Two-way sync with salesforce (java)

We would like to start using salesforce for managing sales contacts, but there is also some business functions regarding contacts that we would like to retain in our current system.
As far as I can see, that means that we're going to need a two-way sync? Ie, when anything changes on salesforce, we need to update it on our system and vice versa.
I'm suggesting some kind of messaging product that can sit in the middle and retry failed messages, because I have a feeling that without that, things are going to get very messy? Eg, when one or other service is down.
The manager on the project would like to keep it simple and feels that using messages rather then realtime point-to-point calls is overkill, but I feel like without it we're going to be in for a world of pain.
Does anyone have any experience with trying to do two-way syncs (actually even one-way suffers from the same risks I think?)
Many thanks for your insights..

I can't speak for your system, but on the side Salesforce API, take a look at the getUpdated() and getDeleted() calls, which are designed for data replication. The SOAP API doc has a section that goes into detail about how to use them effectively.

We use Jitterbit to achieve two way sync between Salesforce and billing system. The Salesforce has a last-modified field and so does our biling system (you system should have this, if not, add a timestamp field to the table in its SQL storage). The only important thing is to chose one of the keys as primary (either SF_ID or other system's key) and create that key field in another system as it will be used for conflict resolution. The process is simple and multistep, load all modified SF data into flat file, load all modified secondary system data into another flat file, look for conflicts by comparing two files over a common key field, notify admin on conflicts, if any, and propagate all non-conflicting changes to another system. We run this process every 10 minutes and we store the last timestamp on both systems between cycle runs so that we only take records that were modified between two cycles.
In case two users edit at the same time, you will either encounter a confict and resolve it manually or you will get the "last-saved-wins" outcome.
You also have to cater for new provisions, on SF side use upsert instead of update (using external or SF key depending on which you chose above), on your other side it depends on the system.

C#/Java Best practice to save data locally

Lets say I want to make a program on c#.net for a video club to store clients and theirs rents.
What is the best and modern way (and standalone way) to store this data? xml, binary seriliazation, sqlite, access?
I will also need to query often the data. For example a client come, I search him via name, I find him and I add him a new rent. Also (now or on future) there will be data for dvds too (add dvds and on short there will relation between dvds and clients). Its like a database but because database is not standalone I want my program works only if user have on his pc .net installed. I could use mysql but this needs mysql server installed...
What you think is the best and most modern solution?

various sql's are good enuff: sqlite, sql ce, sql express.
Access may be the easiest starting solution as you can visualise the data without much trouble.

if it is not critical data, simply store it in an xml or json file using xml to object mapping like JAXB or xsd.exe or JSON to object mapping like GSON.
If it is critical best is to go with DB.

Since it's a small solution use either ms sql express or mysql. Both should be free (as in free beer).
Sql express is best for .net, mysql for java.

You can use an embedded database. If you don't like SQL databases (some can be embedded) you can try one of the many free "NoSQL" database http://nosql-database.org/ has a list of 122.

I think the best solution is the simplest which works. I don't think the most modern is necessarily going to be the best. For example, I'd be looking at the number of members. Is this number as big as from a large movie store chain? Or is for a small group of people? If only a small number, is it possible for you to use Excel or similar?
Are you choosing C# because you want to learn it or are you driven by the end result. i.e. a working and useful asset tracking system? (Which many already exist of course)
Other than that, many databases can be stored in a single file and are quite easy to create. I think there is an flat file driver for SQLite, and JDBC certainly has one.
You may also wish to consider the need to inform your members of your privacy policy with dealing with their personal information (eg. How do you keep private Mr Bloggs's dirty movie rental collection). We often forget this stuff in our eagerness for creating a cool application!
Good luck!

MICROS POS Integration

I have a desktop application for managing restaurants front-of-house operations such as reservations, guest data, table turnover, with support for online reservations.
The problem that I am trying to solve is how to capture customer spend and table state by integrating into MICROS. I would like to find out when a table is busy, when a check is printed, what is the total value of the check paid by customer.
Any help in how or where to start would be appreciated. The MICROS website is quite vague as to what can be done.
-Thanks

One way to track this information is to create a polling application that runs on that Micros sever. You would need read access on the database, and in the best case scenario full dba access. The schema is quite complicated, but if you Google something like "micros pos 3700 schema pdf" you'll come across some resources to get you going. Also, check out http://www.tek-tips.com/ and do some searching for Micros if you go this route. There are examples of SQL and other users who have faced the same task of integrating with Micros. You can query things like open checks, and when a check was closed. That may give you an idea of when it was printed if you cannot find that out specifically.

I have never used MICROS specifically but I have integrated with many systems before and I generally find that if you call them and tell them you want to integrate they will usually be willing to tell you where their data is stored, also using their software for purposes other than what they intended could be copyright infringement unless you ask; also you would unofficially be a data processor for MICROS then and you don't want to get sued, so its probably best to ask.
Generally speaking though you can probably find the data you want just by performing a single action before the you open so as not to confuse matters and looking through the files in the install directory until you find information on the action you just performed, take note and repeat for each action. Then you can watch the directory for changes and if the file is one of the ones you care about then process it. The best ones are often logs as they are usually plaintext, updated realtime, easy to access and you can usually pick out the patterns you want quite easily.
You do need to keep in mind though that some data may only be outputted at the end of the day or transaction in a format you can use so again I really recommend calling and asking.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.