Collections sort - java

While sorting some data in a JTable I faced a problem when some German languages are present in the JTable.
I am giving a sample code that I tried out and the desired result that I require to work my application properly.
Code :
Vector<String> vec = new Vector<String>();
vec.add("ba~~~");
vec.add("ba~ab~~");
vec.add("ÇÊÕÝ");
vec.add("ÇÊÕÝ~ÂÉĒÐ~~");
System.out.println("Unsorted: "+vec);
Collections.sort(vec);
System.out.println("Sorted: "+vec);
Output :
Unsorted: [ba~~~, ba~ab~~, ÇÊÕÝ, ÇÊÕÝ~ÂÉĒÐ~~]
Sorted: [ba~ab~~, ba~~~, ÇÊÕÝ, ÇÊÕÝ~ÂÉĒÐ~~]
Output that I require:
Unsorted: [ba~~~, ba~ab~~, ÇÊÕÝ, ÇÊÕÝ~ÂÉĒÐ~~]
Sorted: [ba~ab~~, ba~~~, ÇÊÕÝ~ÂÉĒÐ~~, ÇÊÕÝ]
From the output I see that the string 'ba~ab~~' comes before 'ba~~~' which is correct lexicographically. But I could not understand why 'ÇÊÕÝ~ÂÉĒÐ~~' comes after 'ÇÊÕÝ'. My requirement is to get the desired output (as I mentioned above) for my application to work correctly.
Am I missing anything that I need to provide while comparing special languages.
Any useful help is highly appreciated.

Either you can use the built-in collators (Collator.getInstance(Locale.XXX))and implement compareTo, or if it is not a built-in collation, you will need to build your own using a RuleBasedCollator and then sort. For details on the rule-based collatorsee:
http://docs.oracle.com/javase/7/docs/api/java/text/RuleBasedCollator.html

Thanks a lot Sir...# Akber Choudhry...
Ultimately I found that ~ (tilt) character is the main culprit. To compare these set If I use the extended ASCII character 'ÿ' with maximum ASCII value 255 it solves the problem like a charm.
I found the extended ASCII chars from http://www.ascii-code.com/

Related

how to use cell.getArabicOption(int arabicoption)

i don't know what the integer can i use it in this function
so i have a problem to show arabic in my creating pdf
i use iText library to get this function
if some on know how to use it please inform me
You shouldn't use int values such as 0, 1, 2,... in your code as it will be very hard for people to know what these values mean (just like you currently have no idea which options are available).
Instead you should use constants that are provided by iText. The API documentation informs you that the parameters for the setArabicOptions() method can be a combination of:
ColumnText.AR_NOVOWEL: Eliminate the arabic vowels,
ColumnText.AR_COMPOSEDTASHKEEL: Compose the tashkeel in the ligatures, or
ColumnText.AR_LIG: Do some extra double ligatures.
If you want to know which exact int values correspond with these constants, you can always print them out or look inside the code, but there is no reason to do this.
The different values are actually to be used as flags (or bits). You can combine these values like this:
column.setArabicOptions(
ColumnText.AR_NOVOWEL |
ColumnText.AR_COMPOSEDTASHKEEL |
ColumnText.AR_LIG);

Unexpected results from Metaphone algorithm

I am using phonetic matching for different words in Java. i used Soundex but its too crude. i switched to Metaphone and realized it was better. However, when i rigorously tested it. i found weird behaviour. i was to ask whether thats the way metaphone works or am i using it in wrong way. In following example its works fine:-
Metaphone meta = new Metaphone();
if (meta.isMetaphoneEqual("cricket","criket")) System.out.prinlnt("Match 1");
if (meta.isMetaphoneEqual("cricket","criketgame")) System.out.prinlnt("Match 2");
This would Print
Match 1
Mathc 2
Now "cricket" does sound like "criket" but how come "cricket" and "criketgame" are the same. If some one would explain this. it would be of great help.
Your usage is slightly incorrect. A quick investigation of the encoded strings and default maximum code length shows that it is 4, which truncates the end of the longer "criketgame":
System.out.println(meta.getMaxCodeLen());
System.out.println(meta.encode("cricket"));
System.out.println(meta.encode("criket"));
System.out.println(meta.encode("criketgame"));
Output (note "criketgame" is truncated from "KRKTKM" to "KRKT", which matches "cricket"):
4
KRKT
KRKT
KRKT
Solution: Set the maximum code length to something appropriate for your application and the expected input. For example:
meta.setMaxCodeLen(8);
System.out.println(meta.encode("cricket"));
System.out.println(meta.encode("criket"));
System.out.println(meta.encode("criketgame"));
Now outputs:
KRKT
KRKT
KRKTKM
And now your original test gives the expected results:
Metaphone meta = new Metaphone();
meta.setMaxCodeLen(8);
System.out.println(meta.isMetaphoneEqual("cricket","criket"));
System.out.println(meta.isMetaphoneEqual("cricket","criketgame"));
Printing:
true
false
As an aside, you may also want to experiment with DoubleMetaphone, which is an improved version of the algorithm.
By the way, note the caveat from the documentation regarding thread-safety:
The instance field maxCodeLen is mutable but is not volatile, and accesses are not synchronized. If an instance of the class is shared between threads, the caller needs to ensure that suitable synchronization is used to ensure safe publication of the value between threads, and must not invoke setMaxCodeLen(int) after initial setup.

SingleColumnValueFilter has no impact on result

hy,
this question is pretty similar to SingleColumnValueFilter not returning proper number of rows .
I use four SingleColumnValueFilter's w/ operator EQUAL and add them to a FilterList with Operator MUST_PASS_ONE. the number of results is the same as w/o setting the FilterList. The value to compare is a byte[] that should be correct as I just store the values from previous results. (it is an IP address that I convert to InetAddress, new InetAddress(value as byte[]), when retrieving the data, and for the query described I just call InetAddress.getAddress which returns a byte[])
Do you have any ideas what might be the problem? Am I using the Filter wrong?
EDIT:
I also used the original values retrieved by the query as value for SingleColumnValueFilter, and there was no difference in the results, thus the byte[] contents can't be the problem.
I think I can give the answer myself, sorry for not debugging and checking all the hbase code before.
I just checked the implementation of the compare algorithm (which is lexicographically), and thus i realized that the length is not taken into account, though I thought it would be filled up w/ zero's; unfortunately it is not.
The only reasonable option would be to create a custom comparator (eg see How do you use a custom comparator with SingleColumnValueFilter on HBase?)

small java problem

Sorry if my question sounds dumb. But some time small things create big problem for you and take your whole time to solve it. But thanks to stackoverflow where i can get GURU advices. :)
So here is my problem. i search for a word in a string and put 0 where that word occur.
For example : search word is DOG and i have string "never ever let dog bite you" so the string
would be 000100 . Now when I try to convert this string into INT it produce result 100 :( which is bad. I also can not use int array i can only use string as i am concatinating it, also using somewhere else too in program.
Now i am sure you are wondering why i want to convert it into INT. So here my answer. I am using 3 words from each string to make this kind of binary string. So lets say i used three search queries like ( dog, dog, ever ) so all three strings would be
000100
000100
010000
Then I want to SUM them it should produce result like this "010200" while it produce result "10200" which is wrong. :(
Thanks in advance
Of course the int representation won't retain leading zeros. But you can easily convert back to a String after summing and pad the zeros on the left yourself - just store the maximum length of any string (assuming they can have different lengths). Or if you wanted to get even fancier you could use NumberFormat, but you might find this to be overkill for your needs.
Also, be careful - you will get some unexpected results with this code if any word appears in 10 or more strings.
Looks like you might want to investigate java.util.BitSet.
You could prefix your value with a '1', that would preserve your leading 0's. You can then take that prefix into account you do your sum in the end.
That all is assuming you work through your 10 overflow issue that was mentioned in another comment.
Could you store it as a character array instead? Your using an int, which is fine, but your really not wanting an int - you want each position in the int to represent words in a string, and you turn them on or off (1 or 0). Seems like storing them in a character array would make more sense.

Java simple String diff util

I'm looking for a simple java lib/src to highlight differences between two Strings, case-sensitive. A html output would be great, but I would be happy to get the indexes of the diffs, something like:
diff("abcd","aacd")
> [2,2]
diff("maniac", "brainiac")
> ["man",brain"] or [0,3] or something like that
The idea is to higlight typos or such in a swing program, since the input shold follow strict conventions.
Apache Commons Lang has a class called StringUtils which has both difference and indexOfDifference which fulfills your needs.
http://commons.apache.org/lang/
Check it out
The java-diff project might also be useful.
This is an implementation of the longest common subsequences (LCS) algorithm for Java. The Diff#diff() method returns a list of Difference objects, each of which describes an addition, a deletion, or a change between the two collections.

Categories

Resources