How to Convert SQL table into Redis Data - java

Hi I am new to redis and want some help over here. I am using java and sql server 2008 and redis server. To interact with redis I am using jedis api for java. I know that redis is used to store key value based things. Every key has values.
Problem Background:
I have a table names "user" which stores data like id, name, email, age, country. This is schema of sql table. Now this table have some rows(means some data as well). Now here my primary key is id and its just for DB use Its of no use for me in application.
Now in sql I can insert new row, can update a row, can search for any user, can delete a user.
I want to store this tables data into redis. Then I want to perform similar operations on redis as well, like search, insert, delete. But if I have a good design on "Storing this info in DB and Redis" then these operations will be carried out simply. Remember I can have multiple tables as well. So should store data in redis on basis of table.
My Problem
Any design or info you can advise me that how I can convert DB data to Redis and perform all operations. I am asking this because I know Facebook is also using redis to store data. Then how they are storing data.
Any help would be very appreciative.

This is a very hard question to answer as there are multiple ways you could do.
The best way in my opinion would be use hashes. This is basically a nested a nested key-value type. So your key would match to the hash so you can store username, password, etc.
One problem is indexing, you would need to have an ID stored in the key. For example each user would have to have a key like: USER:21414
The second thing unless you want to look at commands like KEYS or SCAN you are going to have to maintain your own list of users to iterate, only if you need to do that. For this you will need to look at lists or sorted sets.
To be honest there is no true answer to this question, SQL style data does not map to key-value's in any real way. You usually have to do a lot more work yourself.
I would suggest reading as much as you can and would start here http://redis.io/commands and here http://redis.io/documentation.
I have no experience using Jedis so I can't help on that side. If you want an example I have an open-source social networking site which uses Redis as it's sole data store. You can take a look at the code to get some ideas https://github.com/pjuu/pjuu/blob/master/pjuu/auth/backend.py. It uses Python but Redis is such an easy thing to use everywhere there will not be that much to difference.
Edit: My site above no longer solely uses Redis. An older branch will need to be checked such as 0.4 or 0.3 :)

Related

How can I fetch all items from a DynamoDB table without specifying the primary key with java?

I'm fairly new to Amazon's AWS and its API for Java, so I'm not exactly sure what the most efficient method for what I'm trying to do would be. Basically, I'm trying to setup a database that will store a project's ID, it's status, as well as the bucket. What I'm having trouble with is getting a list of all user without primary key ?. Any recommendations?
You can use the 'Scan' operation provided by DynamoDB. It does not need primary key to operate on. But keep in mind that scan operation is very inefficient and needs more read capacity. Read about Scan here on it's official doc.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
If you want to just retrieve all the entries, DynamoDB is probably not a acorrect choice for storing this data.

Should I use a key/value database to store my API logs?

I get a lot of logs from my API. I analyse those logs to get interesting information like how many users for the API in this month or what type of activities they do.
All of the analysis I do depend on a period. So the timestamp is very important for me.
In fact, actually I use indexes on the timestamp. The problem is that timestamp is continue.
My question is which database is the more appropriate for my use case?
I heard about key/value databases, is it interesting to use the timestamp as a key?
Thanks.
This is a two-year-old article from IBM that talks more about SQL implementation, but it is also possibly something to keep in mind when you do a NoSQL implementation:
"Why CURRENT TIMESTAMP produces poor primary keys" - https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/current_timestamp?lang=en
Of course, your app would be different, I'm not sure of the granularity of your time-stamping, but it is possible to have two items logfiled at the same timestamp.
You might be better off creating some other form of unique key algorithm for your key-value store, adding some sort of serialization per timestamp. So the first item at a timestamp is ".1", the second ".2", etc. So you'd have some sort of timestamp.serialid format.
The other thought I have is: are you merging API log files from multiple applications/processes or machines? You might be able to do some sort of elementid.appid.timestamp.serialid to make unique key.
It all depends on your use case, so I can't say more for sure. I also wonder what you want to do with your key-value store in terms of reads/analysis after-the-fact, as that might highly alter your NoSQL solution. If you are planning to do a lot of log analysis, then, yes, there's a good reason to put that into a NoSQL database, especially if you want to do something like fast analysis of data, and then push some of the older items back into disk for storage.
As for databases, obviously each vendor will stick up for their product; but choose the best tool for the job. Best to try before you buy, and test things out for your specific setup. I'm from Aerospike, so I'm obviously biased towards it as a Key-Value store: http://www.aerospike.com/
Talked to a Very Smart Guy today, and he also suggested that you might want to use something like "milliseconds since date-time 'x'" as a primary key. Depending on what you are logging, there might still be a chance of collision with that as a primary key.
Therefore, another suggestion would be to take all entries for that primary key (ex: all log entries for that millisecond) and load them into the same record, in a kind of "bucket." You'd need application logic to parse out the multiple log entries under the same primary key, but that's another way to skin the cat.

performance improvement of queries against encrypted table without changing the application code

I have tagged this problem with both Oracle and Java because both Oracle and Java solutions would be accepted for this problem.
I am new to Oracle security and have been presented with the below problem to solve. I have done some research on the internet but I have had no luck so far. At first, I thought Oracle TDE might be helpful for my problem but here: Can Oracle TDE protect data from the DBA? it seems TDE doesn't protect data against DBA and this is an issue which is not to be tolerated.
Here is the problem:
I have a table containing millions of records. I have a Java application which queries this table using equality or range criteria against a column in the table which is the primary key column of the table. The primary key column contains sensitive data and thus has been encrypted already. As the result, querying data using normal (i.e. decrypted) values from the application cannot use the primary key's unique index access path. I need to improve the queries' performance without any changes on the application code (application config can be modified if necessary but not the code). It would be OK to do any changes that are necessary on the database side as long as that column remains encrypted.
Oracle people please: What solution(s) do you suggest to this problem? How can I create an index on decrypted column values and somehow force Oracle to utilize this index? How can I use partitioning such as hash-partitioning? How about views? Any, Any solution?
Java people please: I myself have this very vague idea which is to create a separate application in between (i.e between the database and the application) which acts as a proxy that receives the queries from the application and replaces the decrypted values with encrypted values and sends it for the database, it then receives the response and return the results back to the application. The proxy should behave like a database so that it should be possible for the application to connect to it by changing the connection string in the configuration file only. Would this work? How?
Thanks for all your help in advance!
which queries this table using equality or range criteria against a column in the table which is the primary key column of the table
To find a specific value it's simple enough - you can store the data encrypted any way you like - even as a hash and still retrieve a specific value using an index. But as per my comment elsewhere, you can't do range queries without either:
decrypting each and every row in the table
or
using an algorithm that can be cracked in a few seconds.
Using a linked list (or a related table) to define order instead of an algorithm with intrinsic ordering would force a brute force check on a much larger set of values - but it's nowhere near as secure as a properly encrypted value.
It doesn't matter if you use Oracle, Java or pencil and paper. Might be possible using quantum computing - but if you can't afford to ensure the security of your application / pay for good advice from an expert cryptographer, then you certainly won't be able to afford that.
How can I create an index on decrypted column values and somehow force Oracle to utilize this index?
Maybe you could create a function based index in which you index the decrypted value.
create index ix1 on tablename (decryptfunction(pk1));

how to create "has many" between two documents in couchdb?

basically I am wondering how you would go about in Couchdb as you would in MysQL: storing username, password in one table and link the user id as foreign key on another table of tasks?
should I just use mysql for the user authentication part and couchdb to store lots of user submitted documents? so create a random unique token to link each user to their "documents" on couchdb?
also I am looking to store Java objects to the couchdb, and retrieve them to be used directly in my application. which Java-couchdb library does this? Ektorp's example is seems more complicated compared to couchdb4j.
I do not know Java very well, but I suggest use the most simple tool you find. CouchDB is very simple and usually it is most beneficial to access it with simple tools too.
Yes, if you will have many relationships in the data, MySQL will help. However CouchDB can do some simple has-many queries.
First, there is view collation. You use map/reduce, and for every "child" document, you emit a key pointing to the parent document. When you query for ?key=parent then you get a long list of children. (The wiki explains it pretty well.)
Secondly, I suggest the article What's new in CouchDB 0.11 which shows how to use document _ids to link between two documents.
Good luck!

Any reference for good Datamining tools in Java?

We are working on an internship project for company. The project itself consists of Datamining. Let's say the structure of database we have to work is huge (in Gigabytes).
Sad to say that DB itself is very poorly structured with inconsistent values and most importantly no primary or foreign keys. So in our simple Servlet modules to extract and show the inconsistent data, it takes forever for queries to perform and show up on servlet.
As n00b programmers we do not know about Join and such things in DB. Also we are using MySQL as our DB server. The DB is composed of real-time data from telecom towers.
To find sample inconsistency in table values we are using combination of multiple queries, output of one query serving as input to another query like:
"SELECT distinct(tow_id) FROM 'tower_data' WHERE TIME_STAMP LIKE ? ";
//query for finding tower-id.
"SELECT time_stamp FROM tower_data WHERE 'TIME_STAMP' LIKE ? AND 'PARAM_CODE' = ? AND 'TOW_ID'=? GROUP BY time_stamp HAVING count( * ) >1";
//query for finding time stamps with duplicate data.
And so on.
Also there are some 10 tables in the database. We need to combine 2-3 tables to get values for custom queries.
After finding all the inconsistent values for multiple factors, we have to do data cleansing, removal of noise, data prediction and such tasks in the next stage.
So we thought we can apply some Java Data Mining tools which would in turn apply some algorithm to speed up the data retrieval.
Please guide us towards some good datamining tools. Any guidance towards optimizing/rewriting the queries would also be highly appreciated.
I'm not 100% sure it will help in your case, but have a look at google-refine...
Since you seem to have a lot of badly structured data, I do not think data-mining will help.
You may consider using Apache Hadoop for going over all this data and finding inconsistencies. You can use Amazon EC2 for a simple and relatively cheap way to run Hadoop. You can also use Hadoop to port the databases to a better schema, provided that you can build one.
EDIT: I guess you can also do some things within MySQL. Use query explanation to find the slow parts of your query - I believe 'LIKE' is usually slow, and maybe you can reformulate the query to something faster. Maybe you can first sort your schema by timestamp and then look at sub-ranges. Again, you first have to have an efficient way to get the data, and then you can try to mine it. Good luck.

Categories

Resources