Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I think tow months ago. I found a google's open source project that can store key value pairs with high performance. But i forget the name. Could anybody tell me? or you can have some other suggestions for me? I have been using BerkerlyDB, but I found BerkerlyDb is not fast enough for my program. However, berkerylyDB is convenient to use as it appears as a java lib jar, which can be integraed with my program seamlessly. My program is also written in Java.
Two strong competitors in the DHT (Distributed Hash Table) 'market':
Cassandra (created by Facebook, in use by Digg and Twitter)
HBase
Here is a presentation about Cassandra. On slide 20 you'll see some speed benchmarks- 0.12 ms / write
(You can search around for the whole presentation, including Eric Evans talking)
Nobody mentions leveldb and yet this post is at the top when searching for "good key value store". Leveldb in my experience is simply awesome. It's so fast I couldn't believe it.
I've been trying quite a few databases for a task I was doing. I tried:
windows azure table storage (expensive, value size max 1 Mb and each property size is max 64 Kb)
redis (awesome if you have as much ram as you please)
mongodb (awesome as long as there is enough ram, breaks after that point)
sql server (expensive, needs maintenance, such as rebuilding indexes and eventually still not fast enough)
sqlite (free, but not as simple to use as leveldb and not fast)
leveldb. If you can model your job as to reading large consecutive chunks of data through an iterator then you'll get great speed. Writing is also pretty fast. Combine it with ssd disk and you'll love it.
Bigtable?
Redis
http://code.google.com/p/redis/
Maybe you should describe what features you need. If it doesn't need to be distributed (does it?) then I would try using the H2 Database. For those who think "it can't be fast because it's using SQL" please note that when using prepared statement, SQL parsing is only done once. Disclaimer: I'm the main author of H2.
Many answer seem to automatically assume need for distribution; but that seems odd if question refers to BDB.
With that in mind, beyond Redis and H2 (which are both good), there is also Tokyo Cabinet to consider, which seems to offer benefits over BDB. And one more newer possibility is Krati.
I think you saw Guava or Google collections.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have an application that makes heavy use of JMS (OpenMQ at the moment). We need durable persistence on some of the messages in flight, and that needs to survive the database blowing up, so we've moved the broker's storage to JDBC as it means that we can replicate and back up a database cluster and know that both the relational and message stores are at the same point in time when restoring from a backup.
We're finding that OpenMQ's JDBC backed storage method is very slow (tens of messages/s). Are there any brokers out there that perform well using JDBC? We're ideally after thousands of messages/s, but can probably tolerate 100s/s.
Backing a messaging server with JDBC is never going to be as fast using an optimized message store. Most of the time, a broker simply writes messages to disk but do not read. A traditional database is often more optimized towards read performance and complex queries than maximum write performance.
ActiveMQ with replicated LevelDB seems to be a decent fit for your requirements. It will automatically replicate the message store to "slave" brokers while still allow high throughput.
Another option would be to dig into the database part rather than the broker part. You will probably notice similar patterns with other brokers as well, if you hook them up to JDBC. Is there a way to make your database work more efficient with write performance? SSD disks? Settings? etc. As I said, by default, most databases are tweaked towards read performance and performance of complex queries.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Does anyone know of a library that will help me build a file store in Java? I am not looking for something like JCP. Rather, I need to build something that stores millions of files/terabytes of data, de-duped by hash, and with metadata for each file. Metadata might include mime type, filenames, dates, size, etc. (A hash might correspond to various filenames, dates, etc.)
I know this is not overly difficult, but I don't want to reinvent the wheel if the wheel already exists. For example, files have to be sorted into a directory hierarchy on disk based on part of the hash to avoid exceeding the maximum number of files the OS will allow per directory. A web service needs to be written to provide access to files, etc. Some other data structure (RDBMS?) needs to store the metadata. A mechanism is needed for loading new content.
Everything I am finding is higher level, JCP or JCP-ish, but I figured it was worth checking with the experts here before going off to build it. Thanks in advance.
Definitely don't reinvent the wheel... an RDBMS might work, but something like Apache Rabbit perhaps? If you want reaalllllyyyyy low level there is Peter Lawrey's Chronicle.
All the existing content management solutions are too high level for my purposes. SQLServer with FILESTREAM is about as close as I could find with what promises to be reasonable performance. However this is not a difficult thing to build and then I won't have to reply on a specific RDBMS for the solution, so that is the route I'm going to go with.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Problem description: i want to load image pixel data in excel sheet.
what i have tried: using apache POI for writing the data to excel, but i found there are some limitations in apache POI (as elaborated below)
I have come to know of some workarounds, which are tedious on the part of the programmer and i am not really willing to do that for such a trivial looking task.
Details:
i have been using apache POI for quite some time, and i have come across few limitations:
the whole file is in memory at once, so cant use directly for bigger files.
(specific to HSSF) :
no more than 255 columns
no more than 4000 cell styles
cant use custom colors directly.
my requirement is to read an image(say, 1024x764) pixel by pixel and write pixel value in rows and columns of the excel sheet, every different pixel value is styled differently.
the problems i have faced are:
out of memory exception, while writing to the excel sheet, because of so many rows/columns and styles
writing a logic for reusing styles would slow down the whole program
even if i reuse styles, what to do about the huge number of rows/columns
I have come to know that there are workarounds for these problems:
reusing styles
writing logic for efficient memory usage
but i do not intend to take much pain for a job as simple as that, and since these are not directly the limitations of excel (atleast not .xlsx), i am looking for a library that can do it for me.
can someone please suggest another library which can do this,or can you suggest some easier workarounds for these problems?
can someone please suggest a good library to do this, or else i would change from java to csharp
In short, nope - the POI libraries are, in my experience, the best ones available for the job. They're not perfect, but I don't know of an alternative that's better. You may want to try checking trunk out and seeing if any of your issues have been resolved there - entirely possible, it's a relatively active project.
The only other thing I'd suggest looking at is the OpenOffice API, but note that requires OO to be installed (or distributed with your app.)
In all honesty though, POI's strength is it's cross platform nature - it's a pure Java implementation with no native components. If you don't care about this and could therefore go with C# and use the native office APIs, this would seem like the logical approach surely? It seems odd to me that you're not doing this already.
JExcelApi
http://jexcelapi.sourceforge.net/
It works in declarative mode, as Adobe LifeCycle e JReport: you create a Template file xls andin every cell you put the reference to the beans.
Invoking the engine, a the end you have a XLS file.
Sorry for the extreme synthesis, but I worked with it a lot of years ago and I don't remember the details, but in the website there's the documentation.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
what is a better nosql database for creating a system to record advertisement data for about 50 to 200 millions insert per day, the aggregation of the data will be used to show the pattern of how users engage with the ads. I really like MongoDB but it seems that major industry players are picking Riak for the job. It seems that Mongo had to flush some caveats in last 2 releases and the current version seems to be pretty good for the job, any idea?
It seems mongodb with hadoop (http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/ ) fits your data requirements. You can store data in mongodb and run aggregation jobs (map/reduce) on hadoop cluster.
I'd use Java Berkeley DB from Oracle.
Very powerful and easy to use Open source but not free.
This is not a type of question that could be answered as Product1 or Product2. It is just too small amount of information given. There is no info about the environment, where the system will run, what type of information will be inserted, how are you going to aggregate it.
The best way is to try:
write a test using Product1,
write a the same thing with a Product2
start inserting the data which looks as close as possible to the
data you are assuming to get in real environment
make measurements of speed and whatever factors you need
and only based on that you will be able to determine what suits you
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Is there an open-source Java implementation of the VCDIFF binary diff format (decoder and encoder)?
There are xdelta and open-vcdiff, but those are both C libraries.
Alternatively, are the other formats/algorithms that one could use to generate diffs for binary files from Java?
You can generate binary diffs using badiff; the website is
http://badiff.org/
and it is available on maven central. It's BSD licensed, so friendly for both OSS and commercial. The algorithm used is a chunked version of the O(ND) diff described in this paper:
http://www.xmailserver.org/diff2.pdf
The diff format isn't particularly compatible with anything else, but it produces some really good and really small diffs.
The library is pretty fast; on my desktop machine it can generate a diff for two random 50MB input streams in 54 seconds. Hopefully that's fast enough; I think it's reasonably impressive since that's a comparison of two token streams of 50 million tokens each. badiff will take advantage of multiple CPU cores when computing diffs.
disclaimer: I'm the author of badiff, so of course I think it's cool. I'm always open to suggestions; things like being able to read/write "standard" binary diff formats sound like cool new features to add in upcoming releases.
I have a decoder for VCDIFF written in C#, which would probably be fairly straightforward to port to Java, if that's any help. It's part of MiscUtil but I don't think it relies on any other bits of MiscUtil (or only minimally, anyway).
Unfortunately I never got round to writing an encoder, which is obviously rather harder - and wasn't necessary in our case (where we needed to apply patches in .NET on a mobile device, but could create them however we wanted at the server).
I hava ported MiscUtil's vcdiff decoder to java.
https://github.com/xiaxiaocao/jvcdiff
update: now it also have a vcdiff encoder
There is a java-port of xdelta:
http://sourceforge.net/projects/javaxdelta/
But i can not say anything on its quality - i did not try it yet.
I have a Java port of open-vcdiff on Github. It's tested against open-vcdiff, but it's not used in production anywhere.