Catch truncation errors

Catch truncation errors - java

I have a small application with an embedded database. Sometimes Is get truncation errors when trying to insert varchars which exceeds the maximum size of the corresponding database column.
I wish to detect this before insert/updating and show a correct message to the user.
Now I presume that there are two possibilities to achieve this.
Get the maximum length of the column of interest through the DatabaseMetaData object. You could reduce the performance lack by using Singletons or similar constructions.
Keep the maximum lengths in the Java code (eg: in ResourceBundle or Properties file) and check against these values. Downside is off course that Java code and database must be in sync. This is error prone.
What would be the best approach?

The only answer that won't require maintenance is getting the maximum length of the column of interest at database connect time.
If you use Integer.valueOf(...) you can store this in an object, which the lower values (according to the current JVM specs) backs to a singleton pool anyway. This will unload a lot of memory performance issues, as all the columns will eventually refer to the few unique values you likely have in your database.
Also, digging around in the DatabaseMetaData, I would look for any flags that indicate that columns would be truncated upon larger than data inserts. It may provide the switch to know if your code is needed.
By putting the values in a property file, you ease the detection of the issue, but at the cost of possibly getting them out of sync. Such techniques are effectively quick implementations with little up-front cost, but they create latent issues. Whether the issue ever gets raised will be unknown, but given enough time, even the remote possibilities are encountered.

Combination of both the approaches. During application build time, you use DatabaseMetaData to dynamically create a resource bundle.

One solution would be to use a CLOB. I don't know what other requirements you have for this field.
Also, Use the smallest max character value you have as a constant in the java code. This handles it having to be in sync or db dependent and it's more or less arbitrary anyway. Users don't care what the max size is, they just need to know what the max size is or be kept from making an error automatically.

Related

Setting max entries on OpenHTF ChronicleMap

I'm playing around with ChronicleSet, which is backed by ChronicleMap. I've done some initial testing, and it's pretty good for our needs. Much more efficient in RAM usage than other solutions, with access time a bit slower, but still pretty fast.
However, one thing I'm testing is setting the maximum number of entries, and it doesn't seem to be working as expected.
I'm using the following code:
ChronicleSetBuilder<Long> postalSetBuilder =
ChronicleSetBuilder.of(Long.class)
.name("long-map")
.entries(2_000_000L)
.maxBloatFactor(1.0);
According to the documentation, this means that the maximum number of entries allowed in this case would be 2M. However, when testing, I can reliably go up to 2x the maximum specified, and a little bit mor,e until I get an exception like this:
Error: ChronicleMap{name=city-postal-codes-map, file=null, identityHashCode=3213500}: Attempt to allocate #129 extra segment tier, 128 is maximum.
Possible reasons include:
- you have forgotten to configure (or configured wrong) builder.entries() number
- same regarding other sizing Chronicle Hash configurations, most likely maxBloatFactor(), averageKeySize(), or averageValueSize()
- keys, inserted into the ChronicleHash, are distributed suspiciously bad. This might be a DOS attack
In this case, the ChronicleMap object call to size() spit out 4,079,238. So I'm wondering how I can set an explicit limit (like the 2M specified above), and have Chronicle reliably reject any requests to add additional elements after that.

It's not possible to configure exact specific limit of entries, because of shared-nothing segmented storage, there is no single counter of entries. To make Chronicle Map to fail close to the configured entries() limit, you should configure allowSegmentTiering(false).

Efficient recall of a delta-based data log in Java

My application has a number of objects in an internal list, and I need to be able to log them (e.g. once a second) and later recreate the state of the list at any time by querying the log file.
The current implementation logs the entire list every second, which is great for retrieval because I can simply load the log file, scan through it until I reach the desired time, and load the stored list.
However, the majority of my objects (~90%) rarely change, so it is wasteful in terms of disk space to continually log them at a set interval.
I am considering switching to a "delta" based log where only the changed objects are logged every second. Unfortunately this means it becomes hard to find the true state of the list at any one recorded time, without "playing back" the entire file to catch those objects that had not changed for a while before the desired recall time.
An alternative could be to store (every second) both the changed objects and the last-changed time for each unchanged object, so that a log reader would know where to look for them. I'm worried I'm reinventing the wheel here though — this must be a problem that has been encountered before.
Existing comparable techniques, I suppose, are those used in version control systems, but I'd like a native object-aware Java solution if possible — running git commit on a binary file once a second seems like it's abusing the intention of a VCS!
So, is there a standard way of solving this problem that I should be aware of? If not, any pitfalls that I might encounter when developing my own solution?

Efficiency of multible boolean flags vs multiplexed integer (bits) in Java and MySQL

This is a design question involving both Java and MySQL.
The client requires the addition of 14 boolean flags (T/F) to keep track of some new information in an existing class/table.
I can add these flags to the existing table, or I could create a new class and table just for this data. Adding the 14 boolean flags to the existing table will give it quite a few attributes, which I'm inclined to avoid (especially if the number of flags increases in time). Creating a new class/table is cleaner, but it it really necessary in this case?
Alternately, I could use a 16 bit integer with masks to multiplex the data and then I'm only adding one variable to the existing class/table.
My primary question is this: is it more efficient to store 14 individual boolean variables in a MySQL database and load them into the class, or would it be better to store a single integer and then (in Java) multiplex the flags using bit manipulation (i.e. masks)?
Secondary question, if individual flags are more efficient, then is it better to have lots of attributes in one table or split them? What is the penalty for storing lots of boolean flags in a table that already has quite a few entities?
If the primary question's answer is "integer + multiplex" then the second question becomes moot.
Thanks.
-R

I personally like to have separate columns. the only place I might consider masking is when the database and the application are running under extreme conditions or on low memory and storage devices where any use of memory or space is crucial.
1- space should not be a consideration unless the class/table can grow to huge volumes.
to simulate Boolean flags a tiny int (1) is enough and all you need is 0/1 values.
2- it becomes much harder for anyone wanting to do queries on the table or wanting to write reports using it. and if your client does access the database, I am quite sure masking won't be acceptable in most cases.
3- it will be much harder to build indexes on this column when they are needed, if that will be possible at all (based on the database)
4- working more and writing more code should not be an issue. You work more now but you will work less in the future. thinking it is less work for the programmer/dba is just an illusion IMHO. here are some considerations:
a- it will be harder to maintain the code and write database queries. maybe you do everything in your java code now but you never know what the future holds.
b- making structural changes become harder. what if the customer requires removal of two flags and addition of 4 ? do you keep the original two bits that held the removed flags in the database and add 4 bits ? or you use them for two of the new flags and then add two more bits? how would this affect code that is already written ? and how easy would it be to track all places and actually making the changes in the code?
in a small application, this is not a big problem. but applications grow with time. if the table gets to be widely used, this is very dangerous. if you had code working with the 7th and 8th flag, and they were removed and the decision was (by some other programmer lets say) to reuse the same places, any code that used to access the 7th and 8th bit will keep functioning (incorrectly) until that is noticed. it could already do harmful things until the issue is spotted and fixed. if you had separate columns and you dropped them, the error will pop up to the surface on the very first use of that code as the columns won't be there.
c- it will without a doubt be harder to make scripts that upgrade the data and/or change structure for the dba. an experienced dba will not sit and write the column names one after the other and will use its tools to generate scripts. with bit manipulation, he will have to work by hand and make no mistake in the expressions he produces in various selects/updates
5- all the above is database related. once it reaches your application, you are free.
you can read the 16 flags from the database and produce your integer and from now on, your code can use bit manipulation on it and you can save time (by writing your functions that deal with it once and using them). I personally think that here too its better not to do so but anyway its your choice.
I know i am not focused and that i might have repeated here and there. But I also hope that i was able to help you in seeing longer term considerations that will help you make the right choice for your case.

take a look at SET Column Type

You can use EnumSet. It's the best way to emulate flags - much more clear in design and have almost the same performance as int. Can be easily translated to int (to read/put into database). For more information look at "Effective Java" book, chapter "EnumSet"

In the primary question you ask that what is more efficient then what is better. This complicate the answer.
From point of view of Developer and DBA having a single column is more efficient solution. Because you spare place and using masks you increase the performance of inserts and updates.
From point of view data analyst the separate column is more efficient solution, each column has specified role.
As goes fro me i prefer the masks
- Les changes in code
- Better management (limited integer capacity is a risk here)

Java DB choose for better perfomance

I have java application that process such kind of data:
class MyData
{
Date date;
double one;
double two;
String comment;
}
All data are stored in csv format on hard disk, maximum size of such data sequence is ~ 150 mb, and for this moment I just load it fully to memory and work with it.
Now I have the task to increase maximum data sequence for hundreds of gigabyte. guess I need to use DB, but I did not work with them before.
My questions:
Which DB better to choose for my
reasons(there will be only 1 table
with data as abowe) ?
Which library
better to use to connect Java <-> DB
I guess there will be used something
like cursor?!? if so, is there any
cursor realization with good record
caching for fast access?
Any other tips&tricks about java <-> DB are welcome!

Your question is pretty unspecific. There isn't a best of breed - it depends on how much money you have and what kind of hardware.
Since your mapping between Java and the DB is pretty simple, JDBC should be enough. JDBC will create a cursor for you as necessary; lost loop over the rows in the ResultSet. Depending on the database, you may need to configure it to use cursors, though.
Since you mention "hundreds of gigabytes", that rules out most of the "simple" databases. If you have money, try Oracle. If you don't have money, try MySQL or Postgres.
You can also try JavaDB (also known as Derby). But I'm not sure the performance will be what you need.
Note that they all have their quirks and "features", so expect to spend a couple of weeks to find your way with them.

Depends entirely on what you will be doing with the data. Do you need to index it to retrieve specific records, or are you stream processing the entire data set to generate some statistics (for example)? Does the database need to be accessed concurrently by multiple clients/processes?
Don't rush immediately towards SQL/JDBC, relational databases are powerful, but they add a lot of complexity and are often entirely unnecessary for the task at hand.
Again, depending on what you actually need to do, something like BerkeleyDB may fit the bill, or you may just need a more compact binary message format: check out Protocol Buffers and Kryo.
If you really need to scale things up, look at Hadoop/HDFS for distributed processing (but that's getting rather complicated).
Oh, and generally speaking, JavaDB/Derby tends to suck somewhat.

I would recommend JavaDB. I have used it in a Point of Sale system and it works very good. It is very easy to integrate into your Java Application, and you can integrate it to the same .jar file if you want.
Using Java DB in Desktop Applications may be a useful article. You will use JDBC for interfacing the database from Java, this makes it easy to switch to another database if you don't want to use JavaDB.

You'll want to evaluate several databases (you can get trials of just about any of them if they're not open source/free already). I'd recommend trying Oracle, Mysql/Postgres and with the size of your data (and its lack of apparent complexity) you might want to consider a datagrid as well (gridgain or similar).
Definitely prototype though.

I'd just like to add that the "fastest" database is not necessarily the best.
You also need to take into account:
reliability,
software license cost,
ease of use,
ease of administration,
availability of support,
and so on.

How to handle huge data in java

right now, i need to load huge data from database into a vector, but when i loaded 38000 rows of data, the program throw out OutOfMemoryError exception.
What can i do to handle this ?
I think there may be some memory leak in my program, good methods to detect it ?thanks

Provide more memory to your JVM (usually using -Xmx/-Xms) or don't load all the data into memory.
For many operations on huge amounts of data there are algorithms which don't need access to all of it at once. One class of such algorithms are divide and conquer algorithms.

If you must have all the data in memory, try caching commonly appearing objects. For example, if you are looking at employee records and they all have a job title, use a HashMap when loading the data and reuse the job titles already found. This can dramatically lower the amount of memory you're using.
Also, before you do anything, use a profiler to see where memory is being wasted, and to check if things that can be garbage collected have no references floating around. Again, String is a common example, since if for example you're using the first 10 chars of a 2000 char string, and you have used substring instead of allocating a new String, what you actually have is a reference to a char[2000] array, with two indices pointing at 0 and 10. Again, a huge memory waster.

You can try increasing the heap size:
java -Xms<initial heap size> -Xmx<maximum heap size>
Default is
java -Xms32m -Xmx128m

Do you really need to have such a large object stored in memory?
Depending of what you have to do with that data you might want to split it in lesser chunks.

Load the data section by section. This will not let you work on all data at the same time, but you won't have to change the memory provided to the JVM.

You could run your code using a profiler to understand how and why the memory is being eaten up. Debug your way through the loop and watch what is being instantiated. There are any number of them; JProfiler, Java Memory Profiler, see the list of profilers here, and so forth.

Maybe optimize your data classes? I've seen a case someone has been using Strings in place of native datatypes such as int or double for every class member that gave an OutOfMemoryError when storing a relatively small amount of data objects in memory. Take a look that you aren't duplicating your objects. And, of course, increase the heap size:
java -Xmx512M (or whatever you deem necessary)

Let your program use more memory or much better rethink the strategy. Do you really need so much data in the memory?

I know you are trying to read the data into vector - otherwise, if you where trying to display them, I would have suggested you use NatTable. It is designed for reading huge amount of data into a table.
I believe it might come in handy for another reader here.

Use a memory mapped file. Memory mapped files can basically grow as big as you want, without hitting the heap. It does require that you encode your data in a decoding-friendly way. (Like, it would make sense to reserve a fixed size for every row in your data, in order to quickly skip a number of rows.)
Preon allows you deal with that easily. It's a framework that aims to do to binary encoded data what Hibernate has done for relational databases, and JAXB/XStream/XmlBeans to XML.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.