Java Social Networking, friends' activity status awareness efficient method - java

Assume you having a social networking website where you can have friends and also view their up to date activities. The question is, what is the most efficient way (avoiding performance problem) to be informed of their activities right away such as changing profile when they're online and you're online as well at the same time?
I have two different ways working out this but I am not precisely sure which one will be the most efficient method from DB point of view as well as Java memory point of view. The followings are my methods, and please let me know if you have any better way:
1- Using java HTTP Session Listener to get session of each single user and traverse through for updates.
2- Checking database for new updates after every few seconds and then updating the map.

First: you will only know after you have done measurements.
Having said this, there is always a space-time-tradeoff. Meaning that if you store a lot of stuff in memory, it will be fast, but you have a large memory footprint. If you go via DB, you will have a small (java) footprint, but will be a lot slower.
So you need to decide what to do. Memory is cheap, so putting stuff in a memory cache will work nicely. But on the other hand, do you really need sub-second updates? Or can an update of a profile be 20sec old, before it is detected?
There is a great episode of the SE-Radio podcast on NoSQL databases that talks a lot about those decisions to make: http://www.se-radio.net/2010/05/episode-162-project-voldemort-with-jay-kreps/ (I hope it is this one)
This episode about memory-grids is also quite good: http://www.se-radio.net/2010/11/episode-169-memory-grid-architecture-with-nati-shalom/

Related

How to expand code/description to a complex object?

I want to present a list of the names/basic attributes of some complex objects (i.e. they are comprised of multiple collections of other objects) in a recycler view, then get the full object on user selection. For example, the top level objects are "Play Scripts", and each contains a number of "Spoken Lines" spoken by one of the "Actors" associated with the Play Script.
I'm trying to use the Android Architecture components to do this and have (using Florian # codinginflow.com 's tutorials) successfully used Room to create a simplified Play_Script class, DAO and Repository. I've also created some basic REST web services in ASP.Net which can serve up data from a MySQL db.
It strikes me that the path that I am going down will perform poorly and use excessive network bandwidth getting lots of data that I won't use. I'm getting every Play Script (including its Spoken Lines etc) just so that I have the Play Script "Name" and "Description" attributes to populate the Recycler.
In the olden days, I'd just "SELECT ID, Name, Description FROM Play_Script" and once the user had made their choice, I'd use the ID as the key to get everything else that I needed. I suspect that I'm missing something fundamental in the design of my data entities but can't come up with any keywords that would let me search for examples of this common sort of task being done well (/at all).
Please can you help this SO noob with his 1st question?
Cheers,
Z
Update 15 May:
Though I haven't had a response, from what I've been reading in recent weeks (e.g. re Dependency Injection) I suspect that there is no blanket approach for this sort of thing in Android development. It appears that people generally either retrieve extensive data and then use what they require or else build multiple Web Service APIs to return sparse data that includes keys that the client can use to expand when required. So, for example you might make both a "plays_light" and a "plays_detail" Get API.
My solution has been exactly as my May update - i.e. to extend the web API and offer a number of similar calls that return varying granularities of information. It's not particularly elegant and I suspect there may be better ways but it works. In general, I'm finding that the user tends to need less detail in the parent entities and more as we get to individual children/grandchildren.
I do now realise why some apps are so slow though: It's easy to be lazy in the web service design and just return loads of data - only a fragment of which will be used by the client - and justify this by convincing yourself that single API will be universally applicable and thus easier for whoever picks up my code down the line to understand.
Again, it could be my inexperience but I find the local caching of relational data on the Android side retrieved through the API calls quite clunky - lots of storing foreign keys and then re-parsing json to get data into the SQLite tables. I'd hoped Dagger would have been more useful in simplifying this than it has turned out to be so far. I actually unravelled a whole load of Dagger-related code just to preserve my sanity. Not sure I was entirely successful!
Better answers are still very much welcome.
Z

Which is fast: store 10^5 size array in android app and search in it or fire queries of search and get data from database

I am currently working on an app that have 15000+ rows and 20+ columns array on database. If a user search in app then two things I can do
1. Is I will save all data from database when app runs and then after search for the user preference data.
2/ I will convert user data into a query and fire on database and retrieve data from database at same time.
Additionally I want to ask if I store 10^6*20 size array in app then how much space it take and how it behave to reload the app.
And if I fetch then what will be fetching time complexity in worst case data usage.
Thanks for your appreciation in before.
With only about 200 MB or less of data either way will be fast, assuming you allocate enough heap to the process. In the modern era that is not much data. Which will be faster? That all depends.
How fast is your I/O system? What database would you use? How would you code your algorithm? Are you doing any processing in parallel? What bugs have you written into your code? What amount of overall program time does access to these data use? What else is running on the system?
Only actual measurement of both approaches under very well controlled conditions that simulate realistic loads will reveal which is faster.
But at what cost? More complex systems have more risk and costs. Complicated premature "optimization" code makes code brittle and hard to understand. Workarounds​ take time and effort, wasted time and effort if there's nothing to work around.
So instead of asking which is fast(er), ask what makes sense. What gets the job done at all? What costs the least to do? What's more correct? What minimizes risk?

Cache update with db changes

We have a java based product which keeps Calculation object in database as blob. During runtime we keep this in memory for fast performance. Now there is another process which updates this Calculation object in database at regular interval. Now, what could be the best strategy to implement so that when this object get updated in database, the cache removes the stored object and fetch it again from database.
I won't prefer any caching framework until it is must to use.
I appreciate response on this.
It is very difficult to give you good answer to your question without any knowledge of your system architecture, design constraints, your IT strategy etc.
Personally I would use Messaging pattern to solve this issue. A few advantages of that pattern are as follows:
Your system components (Calculation process, update process) can be loosely coupled
Depending on implementation of Messaging pattern you can "connect" many Calculation processes (out-scaling) and many update processes (with master-slave approach).
However, implementing Messaging pattern might be very challenging task and I would recommend taking one of the existing frameworks or products.
I hope that will help at least a bit.
I did some work similar to your scenario before, generally there are 2 ways.
One, the cache holder poll the database regularly, fetch the data it needs and keep it in the memory. The data can be stored in a HashMap or some other collections. This approach is simple and easy to implement, no extra framework or library needed. But users will have to endure dirty data from time to time. Besides, polling will cause a lot of pressure on DB if the number of pollers is huge or the query is not fast enough. However, it is generally not a bad one if your requirement for real-time is not that high and the scale of your system is relatively small.
The other approach is that the cache holder subscribes the notification of the data updater and update its data after being notified. It provides better user experience, but this will bring more complexity to your system because you have to get some MS infrastructure, such as JMS, involved. Developing and tuning is more time-consuming.
I know I am quite late resonding this but it might help somebody searching for the same issue.
Here was my problem, I was storing requestPerMinute information in a Hashmap in a Java filter which gets loaded during the start of the application. The problem if somebody updates the DB with new information ,the map doesn't know about this.
Solution: I took one variable updateTime in my Java filter which just stored when was my hashmap last got updated and with every request it checks if the current time is time more than 24 hours , if yes then it updates the hashmap from the database.So every 24 hours it just refreshes the whole hashmap.
Although my usecase was not to update at real time so it fits the use case.

Sort a list with SQL or as a collection?

I have some entries with dates in my database. What is best?:
Fetch them with a sql statement and also apply order by.
Get the list with sql, and order them within the application with collection.sort or so?
Thanks
This a very broad question that is very difficult to answer, and it depends a lot on what you mean by best?
From a performance perspective, you will simply have to measure to determine what part of your system is the bottleneck. Databases are usually very efficient, but it could still be relevant to off-load that work to the client.
From a separation of concern perspective, it depends on how the sorting matters in the application and how the application is layered.
Ask your self: "where does the knowledge that the data is sorted belong?" and "What would happen if I where to change from a relational database storage to something different".
To some extent, it depends on how many values are in the complete collection. If it is, say, 20-30 values then you can sort anywhere — even a relatively poor sorting algorithm can do that quickly (avoid Stooge Sort though; that's terrible) — as that is the sort of size of data chunk which you might expect to actually fetch in one service response.
But once you get into larger datasets you need to plan much more carefully. In particular, you want to avoid moving data around if you don't have to. If the data is currently only present in the database, you really don't want to fetch it all into the client just to sort it (a relatively expensive operation) and then throw virtually all of it away. It's far better to actually keep the data sorted in the database to start with, so that picking it up in order is trivial; in relational database terms, keeping the data sorted is functionally identical to maintaining an index on the data. Indeed, you can have multiple indices on the data, which can make even rather complex queries quick. (NoSQL DBs are more varied; some even don't support the concept of keeping data sorted.) The downside of maintaining indices is that they take up more space and they take time to maintain, particularly when the data is being created in the first place.
So… to return to your question, you probably want to try to not sort the data in the application: for most data, an appropriate index can be much more efficient as it lets your code not even look at unwanted data. But if you have to fetch it all into your application for some other reason and you can't bring it in pre-sorted, there's no reason to avoid sorting it yourself: Java's sorting algorithms are efficient and stable. But you should measure whether fetching it from the DB in the new order is faster. (The question is whether the DB overheads exceed the super-linear costs of re-sorting; lots of problems are in the domain where “maybe; hard to tell” is the answer.)
The other thing to balance is whether it is simpler for your code to not do sorting itself and instead always delegate that to the DB. Keeping your code simpler (and more bug-free) is a good goal to have…
Database management systems (DMBS) are optimized for these tasks, so I think you should stick with them. Especially if you are accessing the database from a script written in PHP or (other scripting language), it might be slower to perform that task using a script. You might also reach a memory limit allowed to be used by PHP if you sort the array using a script.
I don't mean to raise a question of performance of different programming languages, just want to point out that it is a very good practice to rely on the DMBS whenever you can.
This is a very interesting question to me, and I want to present the other side of the accepted answer, which BTW is a very good answer with which I don't necessarily *dis*agree. Just want to present the other side.
When I started in my career, I was working on mainframe DB2, and the old-timers that taught me were VERY INSISTENT that sorting be done OUTSIDE of the db. Their rational for this is that it's work that CAN be offloaded, and this leaves the DB free to service other requests.
Of course, it's far more nuanced than this. In general, I'd say the factors you're weighing are:
A) How busy, or central to your system, is your database? If your db is very busy, if you have a lot of OLTP processing on clients or app servers, and your client or application servers have lots of excess capacity, why not sort on the app server or client? Even if it's less efficient, it spreads the work through the system and gets you more throughput from a whole-systems perspective.
B) How big is the sort? It would be silly to, say, blow your call stack or java heap because you sorted a gazillion MB of data.
C) Will sorting in your app or app server cause pauses, latency, etc? In other words, if your particular programming language has REALLY bad sorting libraries, and you don't want to write your own, maybe letting the DB take 0.5 seconds is better than making your application take 5.0 seconds.
So, as with all things, "it depends" ;-). But, I think these are the things upon which it depends.

What database to use?

I'm new to databases, but I think I finally have a situation where flat files won't work.
I'm writing a program to analyze the outcomes of multiplayer games, where each game could have any number of players grouped into any number of teams. I want to allow players can win, tie, or leave partway through the game (and win/lose based on team performance).
I also might want to store historical player ratings (unless it's faster to just recompute that from their game history), so I don't know if that means storing each player's rating alongside each game played, or having a separate table for each player, or what.
I don't see any criteria that impacts database choice, but I'll list the free ones:
PostgreSQL
MySQL
SQL Server Express
Oracle Express
I don't recommend an embedded database like SQLite, because embedded databases make trade-offs in features to accommodate space & size concerns. I don't agree with their belief that data typing should be relaxed - it's lead to numerous questions on SO about about to deal with date/time filtration, among others...
You'll want to learn about normalization, getting data to Third Normal Form (3NF) because it enforces referential integrity, which also minimizes data redundancy. For example, your player stats would not be stored in the database - they'd be calculated at the time of the request based on the data onhand.
You didn't mention any need for locking mechanisms where multiple users may be competing to write the same data to the same resource (a database record or file in the case of flat files) simultaneously. What I would suggest is get a good book on database design and try to understand normalization rules in depth. Distributing data across separate tables have a performance impact, but they also have an effect on the ease-of-use of query construction. This is a very involving topic, and there's no simple answer to it. That's why companies hire database administrators to keep their data structures optimized.
You might want to look at SQLite, if you need a lightweight database engine.
Some good options were mentioned already, but I really think that on Java platform, H2 is a very good choice. It is perfect for testing (in-memory test database), but works very well also for embedded use cases and as stand-alone "real database". Plus it is easy to export as dump file, import from that, to move around. And works efficiently too.
It is developed by a very good Java DB guy, and is not his first take, and you can see this from maturity of the project. On top of this it is still being actively developed as well as supported.
A word on why nobody even mentions any of the "NoSQL" databases while you have used it as a tag:
Non-SQL databases are getting a lot of attention (or even outright hype) recently, because of some high-profile usecases, because they're new (and therefore interesting), and because their promise of incredible scalability (which is "sexy" to programmers). However, only a very few very big players actually need that kind of scalability - and you certainly don't.
Another factor is that SQL databases require you to define your DB schema (the structure of tables and columns) beforehand, and changing it is somewhat problematic (especially if you already have a very large database). Non-SQL databases are more flexible in that regard, but you pay for it with more complex code (e.g. after you introduce a new field, your code needs to be able to deal with elements where it's not yet present). It doesn't sound like you need this kind of flexibility either.
Try also OrientDB. It's free (Apache 2 license), run everywhere, supports SQL and it's really fast. Can insert 1,000,000 of records in 6 seconds on common hw.

Categories

Resources