Google App Engine: Query users? - java

I am building an app that lets users sign up, and other users see data about those who have already registered. If I record the userID of everyone who signs up, how can I look up data about those userIDs later?
(I'm using the Java SDK.)

If you have signed-up users and data about them, having some sort of SignedUpUser Entity makes sense and is straightforward. From there it's a matter of arranging to construct indexes that support the types of lookups that you'll be doing (e.g., by name, by recency of activity). At this level, it's not much different from how you'd construct this on top of an RDBMS.

Related

How to apply code logic to database records/values?

I'm maintaining a system (in Java, with Tomcat, Spring MVC, and Hibernate) where I have to set access rules for user groups. These rules are saved in a database (PostgreSQL) as records / rows. The logic is very simple. Each user of a company's team belongs (is connected) to a group, and each group has a set of rules.
I have to allow administrators to configure (through a web application) rules for groups, so that each rule has a logic and this is recognized and reproduced on the server side.
I need to define rules with parameters, such as:
Authentications only weekends.
Authentications only on weekdays.
Authentications only at a certain time (from time X to time Y).
X authentications per day.
Account expiration from date X
And so on...
My intention is that the company team can organize itself dynamically, just setting up any rules they want at any time, without the need for maintenance every time their policies change.
I've been searching on google and found nothing about it. I know I can do this in Java code, I would have to tie Java code with values of rules names present in database, something that could change in the future (or between companies), and this does not seem right to me. I'm not sure if this is correct, or preferable (maintainable). I appreciate any suggestions, ideas, or corrections (for real).
Note: Team/Groups names may change, but its rules should remain the same (if desired).
EDIT
The database is already modeled and ready. Groups and rules represent values from two different tables, with no logic at all. Querying these values works trivial. However, as I'm maintaining a web application, I'm in charge of creating a code or procedure that applies logic to the choice of rule values.
I was very clear in my question, but I will add more things:
Imagine that my clients (companies) want a website (a web application) that can manage their employees. Every company has teams of employees (groups), each with its function. Otherwise, some employees are sometimes hired as temporary employees.
My duty is to restrict access to the accounts of users who are part of company teams. This will allow business leaders to restrict things according to their policies.
For any company, the process works something like this:
The person in charge defines groups (with names and descriptions).
The same person defines restrictions rules for each group.
User accounts are created and linked to groups with rules.
The accounts are given (assigned) to each person part of the company
team, each according to their function.
Why should this be done?
Management
Control
Security
Speaking more technically now, I do not know where or how I should implement this properly. I know of a way to accomplish this, which is in programming code (Java, in my case), but again, I do not know if this is appropriate.
I also know that it is possible to define users and groups on the database side. But creating and deleting such definitions for each time an employee is hired or his or her length of service is expired can not become practical. My intention is to avoid to the maximum that companies have to spend more money on maintenance (Although sometimes this is obviously impossible).
My question based on a real case can be answered indicating to me an ideal way / approach for this type of scenario, either the solution being something that should be implemented in the database, or something done in the application layer, or both, or something else (I do not have experience to solve this kind of situation properly, so I'm here).
For practical purposes, I have decided to describe what technologies I am using in this system. If you want more information, I'll be happy to show you here.
Also, as this is a question that covers a larger context, not specifically databases, and also not specifically web applications, I have decided to put it here (instead of other StackExchange communities).
Thank you.

How to Convert SQL table into Redis Data

Hi I am new to redis and want some help over here. I am using java and sql server 2008 and redis server. To interact with redis I am using jedis api for java. I know that redis is used to store key value based things. Every key has values.
Problem Background:
I have a table names "user" which stores data like id, name, email, age, country. This is schema of sql table. Now this table have some rows(means some data as well). Now here my primary key is id and its just for DB use Its of no use for me in application.
Now in sql I can insert new row, can update a row, can search for any user, can delete a user.
I want to store this tables data into redis. Then I want to perform similar operations on redis as well, like search, insert, delete. But if I have a good design on "Storing this info in DB and Redis" then these operations will be carried out simply. Remember I can have multiple tables as well. So should store data in redis on basis of table.
My Problem
Any design or info you can advise me that how I can convert DB data to Redis and perform all operations. I am asking this because I know Facebook is also using redis to store data. Then how they are storing data.
Any help would be very appreciative.
This is a very hard question to answer as there are multiple ways you could do.
The best way in my opinion would be use hashes. This is basically a nested a nested key-value type. So your key would match to the hash so you can store username, password, etc.
One problem is indexing, you would need to have an ID stored in the key. For example each user would have to have a key like: USER:21414
The second thing unless you want to look at commands like KEYS or SCAN you are going to have to maintain your own list of users to iterate, only if you need to do that. For this you will need to look at lists or sorted sets.
To be honest there is no true answer to this question, SQL style data does not map to key-value's in any real way. You usually have to do a lot more work yourself.
I would suggest reading as much as you can and would start here http://redis.io/commands and here http://redis.io/documentation.
I have no experience using Jedis so I can't help on that side. If you want an example I have an open-source social networking site which uses Redis as it's sole data store. You can take a look at the code to get some ideas https://github.com/pjuu/pjuu/blob/master/pjuu/auth/backend.py. It uses Python but Redis is such an easy thing to use everywhere there will not be that much to difference.
Edit: My site above no longer solely uses Redis. An older branch will need to be checked such as 0.4 or 0.3 :)

Exploring user specific data in webapps

I am busy practicing on designing a simple todo list webapp whereby a user can authenticate into the app and save todo list items. The user is also only able to to view/edit the todo list items that they added.
This seems to be a general feature (authenticated user only views their own data) in most web applications (or applications in general).
To me what is important is having knowledge of the different options for accomplishing this. What I would like to achieve is a solution that can handle lots of users' data effectively. At the moment I am doing this using a Relational Database, but noSQL answers would be useful to me as well.
The following ideas came to mind:
Add a user_id column each time this "feature" is needed.
Add an association table (in the example above a user_todo_list_item table) that associates the data.
Design in such a way that you have a table per user per "feature" ... so you would have a todolist_userABC table. It's an option but I do not like it much since a thousand user's means a thousand tables?!
Add row level security to the specific "feature". I am not familiar on how this works but it seems to be a valid option. I am also not sure whether this is database vendor specific.
Of my choices I went with the user_id column on the todolist_item table. Although it can do the job, I feel that a user_id column might be problematic when reading data if the data within the table gets large enough. One could add an index I guess but I am not sure of the index's effectiveness.
What I don't like about it is that I need to have a user_id for every table where I desire this type of feature which doesn't seem correct to me? It also seems that when I implement the database layer I would have to add this to my queries for every feature (unless I use some AOP)?
I had a look around (How does Trello store data in MongoDB? (Collection per board?)), but it does not speak about the techniques regarding user_id columns or things like that. I also tried reading about this in some security frameworks (Spring Security to be specific) but it seems that it only goes into privileges/permissions on a table level and not a row level?
So the question is whether my choice was appropriate and if there are better techniques to do this?
Your choice is the natural thing to do.
The table-per-user is a non-starter (anything that modifies the database structure in response to user action is usually suspect).
Row-level security isn't really an option for webapps - it requires each user session to have a separate, persistent connection to the database, which is rarely practical. And yes, it is vendor-specific.
How you index your tables depends entirely on your usage patterns and types of queries you want to run. Is 'show all TODOs for a user' a query you want to support (seems like it would be)? Then and index on the user id is obviously needed.
Why does having a user_id column seem wrong to you? If you want to restrict access by user, you need to be able to identify which user the record belongs to. Doesn't actually mean that every table needs it - for example, if one record composes another (say, your TODOs have 'steps', each step belongs to a single TODO), only the root of the object graph needs the user id.

How To implement Facebook friends module in Database?

I am developing a facebook type application for my institute.
and I am stuck at the friends module. i.e. How to know if the particular users are one's friends.
I googled a lot but didn't get any satisfactory answers.
What I got is : there will be many friends of a person and implementing users and their friends in seperate table will only increase redundancy and large DB size.
I thought of using a graph with vertices as users and edges as connection .
But how to implement something like that in db.
Or How Facebook handles such huge amount of relationships?
Personally, I would have a dedicated table for it:
You could have a table with just two columns: userID and friendID
Since the relationships between users in the db will be many-to-many, normalizing it requires a link table which breaks it into many-to-one-to-many
http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html#03
This kind of problems are usually solved by using a different type of database. For a social network, a graph database should make sense, as nodes and relationships are first class citizens in it. There's a social network example for the Neo4j graph database, the full source code of the example is included in the standard dowload package. I've also written a blog post on this theme, with another example as starting point.

Google app engine: Poor Performance with JDO + Datastore

I have a simple data model that includes
USERS: store basic information (key, name, phone # etc)
RELATIONS: describe, e.g. a friendship between two users (supplying a relationship_type + two user keys)
COMMENTS: posted by users (key, comment text, user_id)
I'm getting very poor performance, for instance, if I try to print the first names of all of a user's friends. Say the user has 500 friends: I can fetch the list of friend user_ids very easily in a single query. But then, to pull out first names, I have to do 500 back-and-forth trips to the Datastore, each of which seems to take on the order of 30 ms. If this were SQL, I'd just do a JOIN and get the answer out fast.
I understand there are rudimentary facilities for performing two-way joins across un-owned relations in a relaxed implementation of JDO (as described at http://gae-java-persistence.blogspot.com) but they sound experimental and non-standard (e.g. my code won't work in any other JDO implementation).
Worse yet, what if I want to pull out all the comments posted by a user's friends. Then I need to get from User --> Relation --> Comments, i.e. a three-way join, which isn't even supported experimentally. The overhead of 500 back-and-forths to get a friend list + another 500 trips to see if there are any comments from a user's friends is already enough to push runtime >30 seconds.
How do people deal with these problems in real-world datastore-backed JDO applications? (Or do they?)
Has anyone managed to extract satisfactory performance from JDO/Datastore in this kind of (very common) situation?
-Bosh
First of all, for objects that are frequently accessed (like users), I rely on the memcache. This should speedup your application quite a bit.
If you have to go to the datastore, the right way to do this should be through getObjectsById(). Unfortunately, it looks like GAE doesn't optimize this call. However, a contains() query on keys is optimized to fetch all the objects in one trip to the datastore, so that's what you should use:
List myFriendKeys = fetchFriendKeys();
Query query = pm.newQuery(User.class, ":p.contains(key)");
query.execute(myFriendKeys);
You could also rely on the low-level API get() that accept multiple keys, or do like me and use objectify.
A totally different approach would be to use an equality filter on a list property. This will match if any item in the list matches. So if you have a friendOf list property in your user entity, you can issue a single Query friendOf == theUser. You might want to check this: http://www.scribd.com/doc/16952419/Building-scalable-complex-apps-on-App-Engine
You have to minimize DB reads. That must be a huge focus for any GAE project - anything else will cost you. To do that, pre-calculate as much as you can, especially oft-read information. To solve the issue of reading 500 friends' names, consider that you'll likely be changing the friend list far less than reading it, so on each change, store all names in a structure you can read with one get.
If you absolutely cannot then you have to tweak each case by hand, e.g. use the low-level API to do a batch get.
Also, rather optimize for speed and not data size. Use extra structures as indexes, save objects in multiple ways so you can read it as quickly as possible. Data is cheap, CPU time is not.
Unfortunately Phillipe's suggestion
Query query = pm.newQuery(User.class, ":p.contains(key)");
is only optimized to make a single query when searching by primary key. Passing in a list of ten non-primary-key values, for instance, gives the following trace
alt text http://img293.imageshack.us/img293/7227/slowquery.png
I'd like to be able to bulk-fetch comments, for example, from all a user's friends. If I do store a List on each user, this list can't be longer than 1000 elements long (if it's an indexed property of the user) as described at: http://code.google.com/appengine/docs/java/datastore/overview.html .
Seems increasingly like I'm using the wrong toolset here.
-B
Facebook has 28 Terabytes of memory cache... However, making 500 trips to memcached isn't very cheap either. It can't be used to store a gazillion pieces of small items. "Denomalization" is the key. Such applications do not need to support ad-hoc queries. Compute and store the results directly for the few supported queries.
in your case, you probably have just 1 type of query - return data of this, that and the others that should be displayed on a user page. You can precompute this big ball of mess, so later one query based on userId can fetch it all.
when userA makes a comment to userB, you retrieve userB's big ball of mess, insert userA's comment in it, and save it.
Of course, there are a lot of problems with this approach. For giant internet companies, they probably don't have a choice, generic query engines just don't cut it. But for others? Wouldn't you be happier if you can just use the good old RDBMS?
If it is a frequently used query, you can consider preparing indexes for the same.
http://code.google.com/appengine/articles/index_building.html
The indexed property limit is now raised to 5000.
However you can go even higher than that by using the method described in http://www.scribd.com/doc/16952419/Building-scalable-complex-apps-on-App-Engine
Basically just have a bunch of child entities for the User called UserFriends, thus splitting the big list and raising the limit to n*5000, where n is the number of UserFriends entities.

Categories

Resources