Accumulo, modifying previous keys using the TransformingIterator

Accumulo, modifying previous keys using the TransformingIterator - java

I am currently just started writing my very own java client for accumulo.
I am able to write and read records, I now want to modify some existing keys using the TransformingIterator class (https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#TransformingIterator())
Since it was a while since I coded in java last time, I don't really understand how to use this class and I'm not able to find any good examples or explanations on this.
Does someone know how to use it?
Thanks.

I'd caution you against trying to use this class on your own. It has a lot of caveats that make it tricky to get correct (not to mention Iterators being tricky already on their own).
Unless you have a very large amount of data (terabytes), it is likely going to be easier to transform your data using some batch-processing tool (e.g. MapReduce) instead of trying to use the TransformingIterator.

Related

Using a SerialBlob vs byte[]

I am using hibernate to store and retrieve data from a MySQL database. I was using a byte array but came across the SerialBlob class. I can use the class successfully but I cant seem to find any difference between using the SerialBlob and a byte array. Does anyone know the basic differences or possible situations you wish to use a SerialBlob inlue of a byte[] are?

You are right that the SerialBlob is just a thin abstraction around a byte[], but:
Are you working in a team?
Do you sometimes make mistakes?
Are you lazy with writing comments?
Do you sometimes forget what your code from a year ago actually does?
If you anwsered any of the above questions with a yes, you should probably use SerialBlob.
It's basically the same with any other abstraction around a simple data structure (think ByteBuffer, for example) or another class. You want to use it over byte[], because:
It's more descriptive. A byte[] could be some sort of cache, it could be a circular buffer, it could be some sort of integrity checking mechanism gone wrong. But if you use SerialBlob, it's obvious that this is just a blob of binary data from the database / to be stored in the database.
Instead of manual array handling, you use methods on the class, which is, again, easier to read if you don't know the code. Even trivial array manipulation must be comprehended by the reader of your code. A method with a good name is self-descriptive.
This is helpful for your teammates and also for you when you'll read this code in a year.
It's more error proof. Every time you write any new code, there's a good chance you had made a bug in it. It may be not visible at first, but it is probably in there. The SerialBlob code has been tested by thousands of people around the world and it's safe to say that you won't get any bugs associated to it.
Even if you're sure you got your byte array handling right, because it's so straightforward, what if somebody else finds your code in half a year and starts "optimizing" things? What if he reuses an old blob, or messes up with your magic array padding? Every single off-by-one error in index manipulating will corrupt your data and that might not be detected right away (You are writing unit tests, aren't you?).
It restricts you to only a handful of possible interactions. This might actually look like a demerit, but it's not! It ensures you won't be using your blob as a local temporary variable after you're done with it. It ensures you won't try to make a String out of it or anything silly. It makes sure you'll only use it as a blob. Again, clarity and safety.
It's already written and always looks the same. You don't have to write a new implementation for every project, or read ten different implementations in ten different projects. If you'll ever see a SerialBlob in anyone's project, the usage will be clear to you. Everyone uses the same one.
TL; DR: A few years ago (or maybe still in C), using a byte[] would be ok. In Java (and OOP in general), try to use a specific class designed for the job instead of a primitive (low level) structure as it more clearly describes your intents, produces less errors and reduces the length of your code in the long run.

SortedBiTreeMultimap data structure in Java?

Is there any Java library with TreeMap-like data structure which also supports all of these:
lookup by value (like Guava's BiMap)
possibility of non-unique keys as well as non unique values (like Guava's Multimap)
keeps track of sorted values as well as sorted keys
If it exists, it would probaby be called SortedBiTreeMultimap, or similar :)
This can be produced using a few data structures together, but I never took time to unite them in one nice class, so I was wondering if someone else has done it already.

I think you are looking for a "Graph". You might be interested in this slightly similar question asked a while ago, as well as this discussion thread on BiMultimaps / Graphs. Google has a BiMultimap in its internal code base, but they haven't yet decided whether to open source it.

Simplest way to allow users to specify output format

I have written an application which outputs data as XML. However, it would be nice to allow the user to completely customize the output format so they can more easily integrate it into their applications.
What would be the best way to approach this problem? My initial thoughts are to define a grammar and write a parser from the ground up.
Are there any free Java libraries that can assist in parsing custom scripting(formatting?) languages?
Since I already have the XML, would it be a better approach to just 'convert' this with a search & replace algorithm?
I should specify here that 'users' are other programmers so defining a simple language would be fine, and that the output is potentially recursive (imagine outputting the contents of a directory to XML).
Just looking for general advice in this area before I set off down the wrong track.
EDIT: To clarify... My situation is a bit unique. The application outputs coordinates and other data to be loaded into a game engine. Everybody seems to use a different, completely custom format in their own engine. Most people do not want to implement a JSON parser and would rather use what they already have working. In other words, it is in the interests of my users to have full control over the output, asking them to implement a different parser is not an option.

Have you considered just using a templating engine like Velocity or FreeMarker.

I would have created a result bean as a POJO.
Then I would have different classes working on the result bean. That way you can easily extend with new formats if needed.
E.g
Result result = logic.getResult();
XMLOutputter.output(result, "myXMLFile.xml");
Format1Outputter.output(result, "myFormat1File.fo1");
Format2Outputter.output(result, "myFormat2File.fo2");

If you are planning to provide this as an API to multiple parties, I would advise against allowing over-customization, it will add unnecessary complexity to your product and provide just one more place for bugs to be introduced.
Second, it will increase the complexity of your documentation and as a side affect likely cause your documentation to fall out of sync with the api in general.
The biggest thing I would suggest considering, in terms of making your stream easier to digest, is making the output available in JSON format, which just about every modern language has good support for (I use Gson for Java, myself).

Offline database in Android

There must be a better way to manage string values than to have a bunch of strings in strings.xml file.
I am looking for database like solution, however I don`t want it to connect to a database on the internet. It is just that I need some advanced sorting and categorising to be done that is all.
I am not very experienced with JAVA so pardon me if I just lack the knowledge.
EDIT: It would be nice that I could synchronize both database on the internet and on user`s smartphone.
Maybe effect of synchronization can be achieved by adding additional databases and sending out already modified data.

You can use a SQLite database for your app. See http://developer.android.com/guide/topics/data/data-storage.html#db

If you don't need to save them you could just use an array of strings or an array list if you have the user entering a bunch of strings or need dynamic sizing

Well, there are simpler alternatives to SQLite - you should at least consider some kind of ORM for Android.
It'll let you to persist Java objects in a few lines of code instead of using SELECT, UPDATE, and marshalling data.
Consider db4o or something like that - it's rather large (~1M), though.

Why would you want to use SQLite for this? Remember that using a database connection for this (and only this) takes up resources that could be used elsewhere. If its strings that are only used as text (which never changes) in your program you are better of using the strings.xml. Not only is it faster, but it is also a Android Standard. Besides, if you decide one day into the future to translate your application to a different language I would guess it is much easier using the strings.xml file.

Creating a custom reader class in java

I am try to solve a prob in one of the Programming contests.The actual prob is sort a given list of numbers.I am using a algorithm with complexity(nlog n) and that is the maximum level of optimization I can do with the algorithm.Now from the forum I understood that I need a faster I/O for which I should create a new reader class.I/O are from and to the standard I/O.
I want to know how to create a reader class (insted of other standard Reader classes)?
Thanks in Advance!

This question really seems like a "barking up the wrong tree" kind of question. I find it unlikely that you'd be able to subclass Reader and make it run faster, given that you don't know how to do it. If there was an obvious way, wouldn't it already be in java?
If I/O speed is the problem, perhaps it's the method you're using. There are several different types of Readers, and several algorithms to use them. For example, do you read the whole file at once then parse it, or do you read one line at a time? Some of these options may not even be possible depending on the type of file, size of the file, and other conditions.
If you're trying to solve a problem for a programming contest, solving the actual problem should be all that's required. You shouldn't have to create your own Reader class unless that's a part of the problem being described. Besides, you mention that you're getting your direction from a forum. How do you know they even know what they're talking about?
So, I feel like you're doing something wrong here that's outside the scope of the question you asked.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.