Secure database storage in Android

Secure database storage in Android - java

I'm implementing the in-app-billing system for my app and (as the documentation advices) I'd like to add some encryption system to the database.
I've tested SQLCipher, I liked it but since it need to override all the classes at android.database.sqlite.* your app loses the connection to that part of the SDK, losing possible important future updates and depending on the developer (or yourself since is opensource).
Other solution I've considered is to add a extra column in the sensible tables storing there a value (unique for each device), and then ignore the data that don't contain that key. But this method is clearly more weak.
How can I improve the security without using SQLCipher? Thanks

If your sensible table contains columns that are sensible and some that aren't, and if you don't use the sensible columns in a where clause, the you can easly encrypt/decrypt just that sensible fields on write/read operations without a significant performance cost.
If you need to use sensible columns in your where clause in a table with just a few records, you can read all the records, decrypt and choose.
If none of the above aplies, then I have no other suggestion beside SQLCipher.

Related

Should I use a key/value database to store my API logs?

I get a lot of logs from my API. I analyse those logs to get interesting information like how many users for the API in this month or what type of activities they do.
All of the analysis I do depend on a period. So the timestamp is very important for me.
In fact, actually I use indexes on the timestamp. The problem is that timestamp is continue.
My question is which database is the more appropriate for my use case?
I heard about key/value databases, is it interesting to use the timestamp as a key?
Thanks.

This is a two-year-old article from IBM that talks more about SQL implementation, but it is also possibly something to keep in mind when you do a NoSQL implementation:
"Why CURRENT TIMESTAMP produces poor primary keys" - https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/current_timestamp?lang=en
Of course, your app would be different, I'm not sure of the granularity of your time-stamping, but it is possible to have two items logfiled at the same timestamp.
You might be better off creating some other form of unique key algorithm for your key-value store, adding some sort of serialization per timestamp. So the first item at a timestamp is ".1", the second ".2", etc. So you'd have some sort of timestamp.serialid format.
The other thought I have is: are you merging API log files from multiple applications/processes or machines? You might be able to do some sort of elementid.appid.timestamp.serialid to make unique key.
It all depends on your use case, so I can't say more for sure. I also wonder what you want to do with your key-value store in terms of reads/analysis after-the-fact, as that might highly alter your NoSQL solution. If you are planning to do a lot of log analysis, then, yes, there's a good reason to put that into a NoSQL database, especially if you want to do something like fast analysis of data, and then push some of the older items back into disk for storage.
As for databases, obviously each vendor will stick up for their product; but choose the best tool for the job. Best to try before you buy, and test things out for your specific setup. I'm from Aerospike, so I'm obviously biased towards it as a Key-Value store: http://www.aerospike.com/
Talked to a Very Smart Guy today, and he also suggested that you might want to use something like "milliseconds since date-time 'x'" as a primary key. Depending on what you are logging, there might still be a chance of collision with that as a primary key.
Therefore, another suggestion would be to take all entries for that primary key (ex: all log entries for that millisecond) and load them into the same record, in a kind of "bucket." You'd need application logic to parse out the multiple log entries under the same primary key, but that's another way to skin the cat.

Is it better to use HBase columns or serialize data using Avro?

I working on a project that stores key/value information on a user using HBase. We are in the process of redesiging the HBase schema we are using. The two options being discussed are:
Use HBase column qualifiers as names for the keys. This would make rows wide, but very sparse.
Dump all the data into a single column and serialize it using Avro or Thrift.
What are the design tradeoffs of the two approaches? Is one preferable to the other? Are they are any reasons not to store the data using Avro or Thrift?

In summary, I lean towards using distinct columns per key.
1) Obviously, you are imposing that the client uses Avro/Thrift, which is another dependency. This dependency means you may remove the possibility of certain tooling, like BI tools which expect to find values in the data without transformation.
2) Under the avro/thrift scheme, you are pretty much forced to bring the entire value across the wire. Depending on how much data is in a row, this may not matter. But if you are only interested in 'city' fields/column-qualifier, you still have to get 'payments', 'credit-card-info', etc. This may also pose a security issue.
3) Updates, if required, will be more challenging with Avro/Thrift. Example: you decide to add a 'hasIphone6' key. Avro/Thrift: You will be forced to delete the row and create a new one with the added field. Under the column scheme, a new entry is appended, with only the new column. For a single row, not big, but if you do this to a billion rows, there will need to be a big compaction operation.
4) If configured, you can use compression in HBase, which may exceed the avro/thrift serialization, since it can compress across a column family, instead of just for the single record.
5) BigTable implementations like HBase do very well with very wide, sparse tables, so there won't be a performance hit like you might expect.

The right answer to this is a bit more complicated, so I'll give you the tl;dr first.
Use Avro/Thrift/Protobuf
You will need to strike a balance between how many fields to pack in a record vs. columns.
You'll typically want to put fields ("keys" in your original question) that are frequently accessed together into something like an avro record because as mentioned by cmonkey you don't want the overhead of retrieving extra data you won't use.
By making your row very wide, you'll increase seek times when fetching a subset of columns because of how HFiles are stored. Again, determining what is optimal, comes down to your access patterns.
I would also like to point out that by using something like avro, you're also providing yourself with evolvability. You don't need to delete the row and re-add it with the record containing a new field. Avro has rules for backward-compatibility and forward-compatibility. This actually makes your life much much easier because you can read both new and old records WITHOUT rewriting your data or forcing updates to older client code.
You should nearly always use compression in HBase (SNAPPY is always a good choice).

factors to consider before dropping a column from a table

We are supporting deletion of columns from a table through java layer. What are the factors that should be considered before doing so like
1.no of instances in table
2.behavior of different database vendors etc

Are you sure you're doing the right thing? Unless you're using throwaway tables for people to mess around with or use for studying, this sounds like a not so good design.
Once tables are defined, their column number shouldn't change. Otherwise you'd get denormalized tables; foreign keys can break and all hell breaks loose unless you've got pretty good constraints placed on your columns.
Tracking what columns exist and what queries you can execute will put much more burden on your JDBC code than needed.
This isn't a Java or JDBC question, it's more of a database design question. You should speak with your DBA about this.

constraints on the columns. If you have PKs or FKs on the columns then they may not (or cannot) be dropped (easily) depending on db vendor. While the cols may not be 'used' by the user, they may be dependent on by other cols/tables.
Also, total agree with duffymo. VERY dangerous to allow users to chose to drop cols.
Oracle does have the ability to restore dropped cols, but really, do you want to go down that path?
Auto generated drop statements have always been fraught with peril.

The biggest question has little to do with the database and everything to do with clients that use it. Your Java app might be able to check to see if there are any non-null entries in the column, but it can't tell which clients are expecting the column to be there for SELECTs and UPDATEs.
I don't know your precise use case, but I'd say this is usually an activity for a DBA and not a user of your app. I'd advise caution.

Way to know table is modified

There are two different processes developed in Java running independently,
If any of the process modifyies the table, can i get any intimation? As the table is modified. My objective is i want a object always in sync with a table in database, if any modification happens on table i want to modify the object.
If table is modified can i get any intimation regarding this ? Do Database provide any facility like this?

We use SQL Server and have certain triggers that fire when a table is modified and call an external binary. The binary we call sends a Tib rendezvous message to notify other applications that the table has been updated.
However, I'm not a huge fan of this solution - Much better to control writing to your table through one "custodian" process and have other applications delegate to that. To enforce this you could change permissions on your table so that only your custodian process can write to the database.
The other advantage of this approach is being able to provide a caching layer within your custodian process to cater for common access patterns. Granted that a DBMS performs caching anyway, but by offering it at the application layer you will have more control / visibility over it.

No, database doesn't provide these services. You have to query it periodically to check for modification. Or use some JMS solution to send notifications from one app to another.

You could add a timestamp column (last_modified) to the tables and check it periodically for updates or sequence numbers (which are incremented on updates similiar in concept to optimistic locking).
You could use jboss cache which provides update mechanisms.

One way, you can do this is: Just enclose your database statement in a method which should return 'true' when successfully accomplished. Maintain the scope of the flag in your code so that whenever you want to check whether the table has been modified or not. Why not you try like this???

If you're willing to take the hack approach, and your database stores tables as files (eg, mySQL), you could always have something that can check the modification time of the files on disk, and look to see if it's changed.
Of course, databases like Oracle where tables are assigned to tablespaces, and tablespaces are what have storage on disk it won't work.
(yes, I know this is a bad approach, that's why I said it's a hack -- but we don't know all of the requirements, and if he needs something quick, without re-writing the whole application, this would technically work for some databases)

What are the various options and their tradeoffs for storing a UUID in a MYSQL table?

I'm planning on using client provided UUID's as the primary key in several tables in a MySQL Database.
I've come across various mechanisms for storing UUID's in a MySQL database but nothing that compares them against each other. These include storage as:
BINARY(16)
CHAR(16)
CHAR(36)
VARCHAR(36)
2 x BIGINT
Are there any better options, how do the options compare against each other in terms of:
storage size?
query overhead? (index issues, joins etc.)
ease of inserting and updating values from client code? (typically Java via JPA)
Are there any differences based on which version of MySQL your running, or the storage engine? We're currently running 5.1 and were planning on using InnoDB. I'd welcome any comments based on practical experience of trying to use UUIDs. Thanks.

I would go with storing it in a Binary(16) column, if you are indeed set on using UUIDs at all. something like 2x bigint would be quite cumbersome to manage. Also, i've heard of people reversing them because the start of the UUIDs on the same machine tend to be the same at the beginning, and the different parts are at the end, so if you reverse them, your indexes will be more efficient.
Of course, my instinct says that you should be using auto increment integers unless you have a really good reason for using the UUID. One good reason is generating unique keys accross different databases. The other option is that you plan to have more records than an INT can store. Although not many applications really need things like this. THere is not only a lot of efficiency lost when not using integers for your keys, and it's also harder to work with them. they are too long to type in, and passing them around in your URLs make the URLs really long. So, go with the UUID if you need it, but try to stay away.

I have used UUIDs for smart client online/offline storage and data synchronization and for databases that I knew would have to be merged at some point. I have always used char(36) or char(32)(no dashes). You get a slight performance gain over varchar and almost all databases support char. I have never tried binary or bigint. One thing to be aware of, is that char will pad with spaces if you do not use 36 or 32 characters. Point being, don't write a unit test that sets the ID of an object to "test" and then try to find it in the database. ;)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.