Representing a string uniquely using hashCode [closed]

Representing a string uniquely using hashCode [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I am trying to create a zookeeper node with unix path as values(like /x/home/rrs/data0) , but this is not allowed.
So I thought of generating a hash code of path and then use it to create a node.
But I read following about the hashcode:
Hash should not be used in a distributed application.
There might be collisions, For example, the Strings "Aa" and "BB" produce the same hashCode: 2112
Should i go ahead with hash code or What other options i have for my use case?
Also if i keep the string same all the time , is it guaranteed to generate same hashCode every time?

Yes, the same string will always generate the same hash-code.
Hash-codes do collide, the chance that similar (but different) strings will collide is just very very small (that's the generic idea). Your application should be able to recover (at least not break) from a collision.
What are the nature of the strings? Are they only letters? Maximum length? These properties can be used to generate a better hashcode. One of the nicest techniques I know of is Zobrist keys. Depending on the nature of your strings, this may be an option.

This depends on what you are trying to do.
But you're right: Java hashCodes are not designed to be collision free.
If you need some kind of unique identifier you could use a cryptographic hash function (like e.g. SHA-256, MD5 etc.) over your string.
If you just have a problem with some characters in your string just replace them e.g. with underscore.
Depending on what Zookeeper is/does perhaps hashCode is not a problem at all. EHCache uses it and it's perfectly fine there for distibuted hash tables.
It's a pitty, but hashCode of String really does generate the same hash code all the time for the same string. This is because it is documented and therefore cannot be changed. (But note: this doesn't include different representations of the same string like it's possible in unicode. But I think this is not a problem here.)

Related

Get a String value from its hashCode [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 10 months ago.
Improve this question
Is there an easy way to get the value of a string using its hashcode? Let me explain:
I have a string variable which has value Hello Guys. It is saved here: String#dye63g. Now, I want to get the string value Hello Guys from String#dye63g.

Based on your comments on this question, your question is REALLY asking:
For an object whose class does not override the default toString() from java.lang.Object itself, can I derive the value again?
The answer is a resounding no. It's NOT A HASH. It's the fully qualified class name, an # symbol, then in hexdigits (so 0-9a-f, not y and g like in your example), the value returned by System.identityHashCode(obj).
This code has no relationship at all with the value of it and isn't the hashCode of that object. It's, usually, its memory address or some modification of that principle.
You can trivially test this:
class Test {
public static void main(String[] args) {
System.out.println(System.identityHashCode("Hey, there!"));
}
}
Run that a few times. The same value might come out (it depends on your system and VM impl), but now run the same code on a different VM, or on a different computer, or reboot and run it again. Different values come out. That should make it rather obvious that you can't.
A second issue is the pigeonhole principle. That hash value is always exactly 32 bits long (why? Well, System.identityHashCode returns an int and those only have 32 bits. Each letter (0-9a-f) represents 4 bytes, hence why that thing after the # is always 8 characters long: each character is 4 bits, 8*4 = 32.
You can test that too. Copy/paste the collected works of Shakespear into a single gigantic string, and now invoke identityHashCode on that - it's still just a short number.
There are 4 billion different numbers that an int can represent and that's it. However, there are more than 4 billion strings imaginable (there are an infinite amount of them, in fact). Hence, there must exist an infinite amount of different strings that all so happen to hash to, say, 1234abcd. Thus, given the hash 1234abcd, it is impossible to say which string produced it - after all, an infinite amount of strings exist that hash to that value. And that's presuming the hashing is 'constant' (same string results in the same hash, e.g. what string's own .hashCode() method does). System.identityHashCode isn't even that and in fact has no relationship at all to the actual value. The number you get depends on, effectively, the memory load of the system when you boot the JVM, more or less.

How to make a simple public-key cryptographic algorithm? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I want to make a simple public-key(asymmetric) encryption. It doesn't have the be secure, I just want to understand the concepts behind them. For instance, I know simple symmetric ciphers can be made with an XOR. I saw in a thread on stackexchange that you need to use trapdoor functions, but I can't find much about them. I want to say, take a group of bytes, and be able to split them someway to get a public/private key. I get the ideas of a shared secret. Say, I generate the random number of 256(not random at all :P), and I split it into 200 and 56. If I do an XOR with 200, I can only decrypt with 200. I want to be able to split numbers random and such to be able to do it asymmetrically.

OK, just a simple demo-idea, based on adding/modulo operation.
Lets say we have a modulo value, for our example 256. This is a public-known, common value.
Let's say you generate a random secret private key in the interval [1-255], for example, pri=133.
Keep secret key in the pocket.
Generate a public key, pub = 256 - pri = 123. This public key (123)
you can share to the world.
Imagine, 3rd party does not know, how to compute the private key from a public. So, they know only public key (123).
Someone from the public wants to send you an encrypted ASCII-byte. He gets his byte, and adds to it the public key by modulo 256 operation:
encrypted = (input_value + pub) % modulto;
For example, I want to send you the letter "X", ASCII code = 88 in encrypted form.
So, I compute:
(88 + 123) % 256 = 211;
I am sending you the value 211 - encrypted byte.
You decrypt it by the same scheme with your private key:
decrypted = (input_value + pri) % 256 = (211 + 133) % 256 = 88;
Of course, using the simple generation pair in this example is weak, because of
the well-known algorithm for generating the private key from the public, and anybody can easily recover the private using the modulo and public.
But, in real cryptography, this algorithm is not known. But, theoretically,
it can be discovered in future.

This is an area of pure mathematics, there's a book called "the mathematics of cyphers" it's quite short but a good introduction. I do suggest you stay away from implementing your own though, especially in Java (you want a compiler that targets a real machine for the kind of maths involved, and optimises accordingly). You should ask about this on the math or computer-science stack-exchanges.
I did get a downvote, so I want to clarify. I'm not being heartless but cyphers are firmly in the domain of mathematics, not programming (even if it is discreet maths, or the mathsy side of comp-sci) it requires a good understanding of algebraic structures, some statistics, it's certainly a fascinating area and I encourage you to read. I do mean the above though, don't use anything you make, the people who "invent" these cyphers have forgotten more than you or I know, implement exactly what they say at most. In Java you ought to expect a really poor throughput btw. Optimisations involving register pressure and allocation pay huge dividends in cypher throughput. Java is stack-based for starters.
Addendum (circa 6 years on)
Java has improved in some areas now (I have a compiler fetish, it's proper weird) however looking back I was right but for the sort-of wrong reasons, Java is much easier to attack through timing, I've seen some great use of relying on tracing compiling techniques to work out what version of software is being used for example. It's also really hard to deal with Spectre which isn't going away any time soon (I like caches.... I feel dirty saying that now)
HOWEVER: above all, don't do this yourself! Toy with it AT MOST - it's very much in the domain of mathematics, and I must say it's probably better done on paper, unless you like admiring a terminal with digits spewn all over it.

http://en.wikipedia.org/wiki/RSA_(algorithm)
Is the standard one on which the (whole) internet is based

What is difference between Java's int[] and i32 in Apache Thrift [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Can anyone help me understand difference between int[] in Java and i32 (NumericDataArray) in Apache Thrift? And when the latter should be used? Also, do we have anything like i32 for strings?

From Thrift Types
Base Types
The base types were selected with the goal of simplicity and clarity
rather than abundance, focusing on the key types available in all
programming languages.
bool: A boolean value (true or false)
byte: An 8-bit signed integer
i16: A 16-bit signed integer
i32: A 32-bit signed integer
i64: A 64-bit signed integer
double: A 64-bit floating point number
string: A text string encoded using UTF-8 encoding
So i32 is a 32-bit signed integer which is mapped to java int.
You have no arrays in thrift, but container-types:
Containers
Thrift containers are strongly typed containers that map to commonly
used and commonly available container types in most programming
languages.
There are three container types:
list: An ordered list of elements. Translates to an STL vector, Java
ArrayList, native arrays in scripting languages, etc.
set: An unordered set of unique elements. Translates to an STL set, Java
HashSet, set in Python, etc. Note: PHP does not support sets, so it is
treated similar to a List
map: A map of strictly unique keys to values. Translates to an STL map, Java HashMap, PHP associative array,
Python/Ruby dictionary, etc. While defaults are provided, the type
mappings are not explicitly fixed. Custom code generator directives
have been added to allow substitution of custom types in various
destination languages.
Container elements may be of any valid Thrift Type.
These containers are mapped to the according Java List, Set and Map.
So if you are using thrift and need to transport a collection of int-values, you will use a list-container of type int32 like list<i32>, which will result in a java.util.ArrayList<Integer> on the java-side.
No need to worry about String: There is a base type string in thrift, which is mapped to java java.lang.String. So you just define a string in thrift and you will have java.lang.String in your generated java-code.

An int[] its a simple array, normally used to do simple things, eg. store temporary data.
The i32 it's a third part library, like William said.
If you do not have a very specific issue, I recommend to you use a int[], but if you need to use i32, maybe can you take a look in this link: http://people.apache.org/~thejas/thrift-0.9/javadoc/org/apache/thrift/protocol/TType.html

What java library are there provides the the facility to generate unique random string combination from a given set of characters?

What java library are there provides the the facility to generate unique random string combination from a given set of characters?
Say I have these set of characters: [a-zA-Z0-9]
And I need to generate 4-character string from this set that is less likely to collide.

Apache Commons Lang has a RandomStringUtils class with a method that takes a sequence of characters and a count, and does what you ask. It makes no guarantee of collision avoidance, though, and with only 4 characters, you're going to struggle to achieve that.

And I need to generate 4-character string from this set that is less likely to collide.
Less likely than what? There are 62^4 = 14.8 million such strings. Due to the birthday paradox, you get about a 50% chance of a collision if you randomly generate 3800 of them. If that's not acceptable, no library will help you, you need to use a longer string or establish uniqueness explicitly (e.g. via incrementing an integer and formatting it in base 62).

if you'd be ok with a longer hash, you'd certainly be able to find some md5 libraries. It's most common for this kind of task. A lot of web sites use it to generate password hashes.

Effective way to handle singular/plural word based on some collection size [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
There are many instances in my work projects where I need to display the size of some collection in a sentence. For example, if the collection's size is 5, it will say "5 users". If it is size of 1 or 0, it will say "1 user" or "0 user". Right now, I'm doing it with if-else statements to determine whether to print the "s" or not, which is tedious.
I'm wondering if there's an open source JSP custom tag library that allows me to accomplish this. I know I can write one myself... basically, it will have 2 parameters like this: <lib:display word="user" collection="userList" />. Depending on the collection size, it will determine whether to append an "s" or not. But then, this implementation is not going to be too robust because I also need to handle "ies" and some words don't use any of those. So, instead of creating a half-baked tool, I'm hoping there's a more robust library I could utilize right away. I'm not too worried about prefixing the word with is/are in this case.
I use Java, by the way.
Thanks much.

Take a look at inflector, a java project which lets you do Noun.pluralOf("user"), or Noun.pluralOf("user", userList.size()), and which handles a bunch of variations and unusual cases (person->people, loaf->loaves, etc.), as well as letting you define custom mapping rules when necessary.

Hmm, I don't quite see why you need a library for this. I would think the function to do it is trivial:
public String singlePlural(int count, String singular, String plural)
{
return count==1 ? singular : plural;
}
Calls would look like:
singlePlural(count, "user", "users");
singlePlural(count, "baby", "babies");
singlePlural(count, "person", "people");
singlePlural(count, "cherub", "cherubim");
... etc ...
Maybe this library does a whole bunch of other things that make it useful. I suppose you could say that it supplies a dictionary of what all the plural forms are, but in any given program you don't care about the plurals of all the words in the language, just the ones you are using in this program. I guess if the word that could be singular or plural is not known at compile time, if it's something entered by the user, then I'd want a third party dictionary rather than trying to build one myself.
Edit
Suddenly it occurs to me that what you were looking for was a function for making plurals generically, embodying a set of rules like "normally just add 's', but if the word ends in 'y' change the 'y' to 'ies', if it ends in 's' change it to 'ses', ..." etc. I think in English that would be impossible for any practical purpose: there are too many special cases, like "person/people" and "child/children" etc. I think the best you could do would be to have a generic "add an 's'" rule, maybe a few other common cases, and then a long list of exceptions. Perhaps in other languages one could come up with a fairly simple rule.
So as I say, if the word is not known at compile time but comes from some user input, then yes, a third-party dictionary is highly desirable.

This gets complicated in languages other than English, that inflector aims to support in the future.
I am familiar with Czech where user = uživatel and:
1 uživatel
2 uživatelé
3 uživatelé
4 uživatelé
5 uživatelů
...
You can see why programs written with hardcoded singular+plural would get un-i18n-able.
Edit:
Java11 allows you to use the following:
ChoiceFormat fmt = new ChoiceFormat("1#uživatel | 1.0< uživatelé | 4< uživatelů");
System.out.println(fmt.format(1));
System.out.println(fmt.format(4));
System.out.println(fmt.format(5));
ChoiceFormat documentation

This functionality is built into Ruby on Rails. I don't know exactly where, but it should be easy enough to find in the source code, and then you could simply crib the code.
EDIT: Found you some code:
inflector.rb (very helpful comments!)
inflections.rb (extensive word list)
If I remember correctly, it's mainly a matter of appending an "s" to most words, though I believe there is a list (probably hash, err dictionary) of some common exceptions. Notable is the conversion from "person" to "people" :)
You would of course be in for a world of pain if you decided you want to internationalize this to other languages than English. Welcome to the world of highly irregular grammars, and good luck!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.