I have to use RenderedImages as parts of keys in a cache (the rest of the key is made up of an X and Y coordinate pair).
Previously, I wrapped the RenderedImage in a custom class with a string field filename and just used the class's filename/toString() to construct a key, but it turns out the keys absolutely must use RenderedImages.
Now, I'm writing the image to a ByteArrayOutputStream and then using that to make a base64 string using DatatypeConverter, but the resulting string from that is massive and really slows down the program.
Is there a good method of creating some kind of string ID from a RenderedImage that doesn't slow everything down too much?
Thanks.
You could take the contents of the image and create an SHA digest or MD5 checksum of the byte contents ? That's going to be fairly unique...
See https://howtodoinjava.com/core-java/io/how-to-generate-sha-or-md5-file-checksum-hash-in-java/
If scanning the whole file is taking too long, just take the first n characters ?
Related
I need some advice. I´ve built a tool that does image operations for uploaded pictures and saves the results. Each time an operation is done, it creates an entry in a JSON file in its folder.
So if there is no JSON it creates a new one and if there is a JSON it appends the information. The problem is, if someone accidentally adds an image that already had been added, the json appends the information again.
Its too much code to post here, but I would be thankful if someone has some advice on how to compare the files before appending or something else.
"Its too much code to post here", so I am going just to lay down the approach fundamentals :) .
An in memory solution would work if you never use the same folder again after exiting the application. If you count duplicates by a property like picture name, then you can go for a simple solution like a HashMap.
HashMap<String,Boolean> hashMapImages = ...
String newImageName = ...
if (! hashMapImages.containsKey(newImageName)){
hashMapImages.put(newImageName,true);
//... append to JSON
}
If you actually want distinct pictures in content you have to design a hash function for your images. As an example you could use a hash function which sums pixel values every 256 pixels away. For random images it is enough to get a distinctive hash value.
int pos = 0;
long hashsum;
while (pos< image.length){
hashsum += image[pos];
pos += 256;
}
long hashKey = hashsum % 65536; //for a 16 bit key
If you plan to reuse same folder again construct an additional JSON file which contains just the key values (whichever key you choose to use). Parse this JSON file and check if you have the image, before appending.
HashMap<String,Boolean> hashMapImages = loadFromJSONContentFolder();
String key = getKey()
if (! hashMapImages.containsKey(key)){
hashMapImages.put(key,true);
//... append to JSON
}
I know how to generate an MD5 Hash for a single string,
But what if I want to generate it like this:
String hash1 = GenHash(username:realm:password);
where the username and realm is fixed, the password is read from a text file which contain many words, and I want each of the words convert into md5 hash...
How can I do this ?
The reason why Neil is so adamant about not using MD5 (and he's right, by the way) is that MD5 is vulnerable to attack. This means that passwords could be recovered if an attacker finds the hash output. Since hashing passwords is generally done so that the preimage (i.e. the password) won't be revealed if the hash is found, you are strongly (oh so strongly) advised against using MD5. (In fact, if you're in business and you use MD5 for passwords, have fun with the negligence lawsuit. [Not legal advice]) Neil recommends some good hash algorithms. (Details of why MD5 is vulnerable are far beyond the scope of this question.)
Okay. Your question is how you hash multiple things together.
In order to hash multiple things together, you generally just concatenate them or otherwise just hash them in order without calling "digest()". (The order does matter, by the way.) In your case, since you have a wrapper on the hashing function, you would merely concatenate the strings together before hashing them.
String username = "myName";
String realm = "Narnia";
String password = "secret";
String hash = GenHash(username + realm + password);
In everyone else's case, they are probably using the standard Java MessageDigest, so they would call digest multiple times... after converting their strings to byte arrays. (Note that you're going to want to specify an encoding when converting a String to a byte array to ensure it's always done the same way, otherwise your hash function may return different results on different machines.)
byte[] usernameBytes; //Set equal to perhaps UTF32 encoding of the string
byte[] realmBytes; //Set equal to perhaps UTF32 encoding of the string
byte[] passwordBytes; //Set equal to perhaps UTF32 encoding of the string
MessageDigest md = MessageDigest.getInstance("MD5");
md.update(usernameBytes); //Updates digest with these bytes
md.update(realmBytes); //Updates digest with these bytes
md.update(passwordBytes); //Updates digest with these bytes
byte[] hashResult = md.digest(); //Outputs result
//Insert code to convert the byte array to an outputtable form (or perhaps you're writing to a binary file)
But remember: using Strings presents a security vulnerability!
Strings should never be used for passwords since the string values cannot be guaranteed to be deleted (since they reside on the heap and are therefore "deleted" by the garbage collector whenever it maybe gets around to it, and by "deleted" I mean deallocated, not deleted). An attacker obtaining access to a memory dump could read the password. (Admittedly you have other problems if an attacker gets a memory dump, but don't make it worse.)
It is preferred that a char[] is used for all passwords or other very-sensitive text values. You would therefore create a new char array of size (username.length() + realm.length() + password.length()) then iterate over each of your strings and add each character to the new array... which you would then hash... then you would wipe all sensitive text values that you are no longer using (by iterating over each array and setting each element equal to (char)0).
Again, you cannot manually delete or wipe a string, but you can manually wipe a char or char[].
Why is char[] preferred over String for passwords?
While you're at it, look up password or hash salting. It may be useful for what you're doing.
I am developing a Java-based downloader for binary data. This data is transferred via a text-based protocol (UU-encoded). For the networking task the netty library is used. The binary data is split by the server into many thousands of small packets and sent to the client (i.e. the Java application).
From netty I receive a ChannelBuffer object every time a new message (data) is received. Now I need to process that data, beside other tasks I need to check the header of the package coming from the server (like the HTTP status line). To do so I call ChannelBuffer.array() to receive a byte[] array. This array I can then convert into a string via new String(byte[]) and easily check (e.g. compare) its content (again, like comparison to the "200" status message in HTTP).
The software I am writing is using multiple threads/connections, so that I receive multiple packets from netty in parallel.
This usually works fine, however, while profiling the application I noticed that when the connection to the server is good and data comes in very fast, then this conversion to the String object seems to be a bottleneck. The CPU usage is close to 100% in such cases, and according to the profiler very much time is spent in calling this String(byte[]) constructor.
I searched for a better way to get from the ChannelBuffer to a String, and noticed the former also has a toString() method. However, that method is even slower than the String(byte[]) constructor.
So my question is: Does anyone of you know a better alternative to achieve what I am doing?
Perhaps you could skip the String conversion entirely? You could have constants holding byte arrays for your comparison values and check array-to-array instead of String-to-String.
Here's some quick code to illustrate. Currently you're doing something like this:
String http200 = "200";
// byte[] -> String conversion happens every time
String input = new String(ChannelBuffer.array());
return input.equals(http200);
Maybe this is faster:
// Ideally only convert String->byte[] once. Store these
// arrays somewhere and look them up instead of recalculating.
final byte[] http200 = "200".getBytes("UTF-8"); // Select the correct charset!
// Input doesn't have to be converted!
byte[] input = ChannelBuffer.array();
return Arrays.equals(input, http200);
Some of the checking you are doing might just look at part of the buffer. If you could use the alternate form of the String constructor:
new String(byteArray, startCol, length)
That might mean a lot less bytes get converted to a string.
Your example of looking for "200" within the message would be an example.
2
You might find that you can use the length of the byte array as a clue. If some messages are long and you are looking for a short one, ignore the long ones and don't convert to characters. Or something like that.
3
Along with what #EricGrunzke said, partially looking in the byte buffer to filter out some messages and find that you don't need to convert them from bytes to characters.
4
If your bytes are ASCII characters, the conversion to characters might be quicker if you use charset "ASCII" instead of whatever the default is for your server:
new String(bytes, "ASCII")
might be faster in that case.
In fact, you might be able to pick and choose the charset for conversion byte-character in some organized fashion that speeds up things.
Depending on what you are trying to do there are a few options:
If you are just trying to get the response status to then can't you just call getStatus()? This would probably be faster than getting the string out.
If you are trying to convert the buffer, then, assuming you know it will be ASCII, which it sounds like you do, then just leave the data as byte[] and convert your UUDecode method to work on a byte[] instead of a String.
The biggest cost of the string conversion is most likely the copying of the data from the byte array to the internal char array of the String, this combined with the conversion is most likely just a bunch of work that you don't need to do.
I've got a string which I created with a custom cipher, which can have any char value (0 through 0xFFFF). This string is created by taking an input plaintext and rotating each character by a pseudorandom value, so I have no control over what the output characters might be.
Can I safely store and retrieve this exactly without any issues into a SQLiteDatabase TEXT field?
I'm think that Java uses UTF-16, so I'm somewhat afraid of chars like NULL, END OF TEXT, ESCAPE, ', ", 0xfeff / 0xfffe (BOM) etc appearing in random places into my string, and I'm not really sure how SQLite will store this internally. If it uses any text-based markers to determine the start and end of fields I'm afraid this will fail.
Ideally I'd like to get back out the exact same character sequence I put in, so that I can put it through the reverse cipher.
I will be using the managed insert(ContentValues) method of SqLiteDatabase, so I think that this would take care of any issues regarding escaping the input string, but I'm still not quite convinced that this can work.
Is this a safe operation, and if not, what else should I do instead to store my encrypted string?
Avoid a Cryptographically weak custom cypher that also causes you problems, instead use Java's built in capabilities which can provide you with a cryptographically strong string.
http://docs.oracle.com/javase/1.4.2/docs/guide/security/jce/JCERefGuide.html#CipherClass
It would be safest to store it as a "blob" -- pretty much identical to a string, only with a separately-specified length.
C strings are generally assumed to be null-terminated.
In SQLite, strings must not contain end-of-string markser, i.e, characters with value zero.
However, you can store binary data as a blob.
This would look something like this:
SQLiteDatabase db = ...;
byte[] binaryData = ...;
ContentValues values = new ContentValues();
values.put("mycolumn", binaryData);
db.insert("mytable", null, values);
Cursor cursor = db.query("mytable", new String[]{"mycolumn"}, ...);
byte[] data = cursor.getBlob(0);
A simple and safe way to put String into the database that you are not sure if it will always work that I can think on top of my mind is to get the byte array:
void put(String key, byte[] value)
byte[] getAsByteArray(String key)
You can convert it to base64 string if you really need to store it as String (but why?) and get it back decoded.
That being said, you shouldn't need to do any of those because for the insert function, it should do the escaping for you if you use ContentValues.
ContentValue uses Parcel to do type changes
I use GZIPOutputStream or ZIPOutputStream to compress a String (my string.length() is less than 20), but the compressed result is longer than the original string.
On some site, I found some friends said that this is because my original string is too short, GZIPOutputStream can be used to compress longer strings.
so, can somebody give me a help to compress a String?
My function is like:
String compress(String original) throws Exception {
}
Update:
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.zip.GZIPOutputStream;
import java.util.zip.*;
//ZipUtil
public class ZipUtil {
public static String compress(String str) {
if (str == null || str.length() == 0) {
return str;
}
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str.getBytes());
gzip.close();
return out.toString("ISO-8859-1");
}
public static void main(String[] args) throws IOException {
String string = "admin";
System.out.println("after compress:");
System.out.println(ZipUtil.compress(string));
}
}
The result is :
Compression algorithms almost always have some form of space overhead, which means that they are only effective when compressing data which is sufficiently large that the overhead is smaller than the amount of saved space.
Compressing a string which is only 20 characters long is not too easy, and it is not always possible. If you have repetition, Huffman Coding or simple run-length encoding might be able to compress, but probably not by very much.
When you create a String, you can think of it as a list of char's, this means that for each character in your String, you need to support all the possible values of char. From the sun docs
char: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).
If you have a reduced set of characters you want to support you can write a simple compression algorithm, which is analogous to binary->decimal->hex radix converstion. You go from 65,536 (or however many characters your target system supports) to 26 (alphabetical) / 36 (alphanumeric) etc.
I've used this trick a few times, for example encoding timestamps as text (target 36 +, source 10) - just make sure you have plenty of unit tests!
If the passwords are more or less "random" you are out of luck, you will not be able to get a significant reduction in size.
But: Why do you need to compress the passwords? Maybe what you need is not a compression, but some sort of hash value? If you just need to check if a name matches a given password, you don't need do save the password, but can save the hash of a password. To check if a typed in password matches a given name, you can build the hash value the same way and compare it to the saved hash. As a hash (Object.hashCode()) is an int you will be able to store all 20 password-hashes in 80 bytes).
Your friend is correct. Both gzip and ZIP are based on DEFLATE. This is a general purpose algorithm, and is not intended for encoding small strings.
If you need this, a possible solution is a custom encoding and decoding HashMap<String, String>. This can allow you to do a simple one-to-one mapping:
HashMap<String, String> toCompressed, toUncompressed;
String compressed = toCompressed.get(uncompressed);
// ...
String uncompressed = toUncompressed.get(compressed);
Clearly, this requires setup, and is only practical for a small number of strings.
Huffman Coding might help, but only if you have a lot of frequent characters in your small String
The ZIP algorithm is a combination of LZW and Huffman Trees. You can use one of theses algorithms separately.
The compression is based on 2 factors :
the repetition of substrings in your original chain (LZW): if there are a lot of repetitions, the compression will be efficient. This algorithm has good performances for compressing a long plain text, since words are often repeated
the number of each character in the compressed chain (Huffman): more the repartition between characters is unbalanced, more the compression will be efficient
In your case, you should try the LZW algorithm only. Used basically, the chain can be compressed without adding meta-informations: it is probably better for short strings compression.
For the Huffman algorithm, the coding tree has to be sent with the compressed text. So, for a small text, the result can be larger than the original text, because of the tree.
Huffman encoding is a sensible option here. Gzip and friends do this, but the way they work is to build a Huffman tree for the input, send that, then send the data encoded with the tree. If the tree is large relative to the data, there may be no not saving in size.
However, it is possible to avoid sending a tree: instead, you arrange for the sender and receiver to already have one. It can't be built specifically for every string, but you can have a single global tree used to encode all strings. If you build it from the same language as the input strings (English or whatever), you should still get good compression, although not as good as with a custom tree for every input.
If you know that your strings are mostly ASCII you could convert them to UTF-8.
byte[] bytes = string.getBytes("UTF-8");
This may reduce the memory size by about 50%. However, you will get a byte array out and not a string. If you are writing it to a file though, that should not be a problem.
To convert back to a String:
private final Charset UTF8_CHARSET = Charset.forName("UTF-8");
...
String s = new String(bytes, UTF8_CHARSET);
You don't see any compression happening for your String, As you atleast require couple of hundred bytes to have real compression using GZIPOutputStream or ZIPOutputStream. Your String is too small.(I don't understand why you require compression for same)
Check Conclusion from this article:
The article also shows how to compress
and decompress data on the fly in
order to reduce network traffic and
improve the performance of your
client/server applications.
Compressing data on the fly, however,
improves the performance of
client/server applications only when
the objects being compressed are more
than a couple of hundred bytes. You
would not be able to observe
improvement in performance if the
objects being compressed and
transferred are simple String objects,
for example.
Take a look at the Huffman algorithm.
https://codereview.stackexchange.com/questions/44473/huffman-code-implementation
The idea is that each character is replaced with sequence of bits, depending on their frequency in the text (the more frequent, the smaller the sequence).
You can read your entire text and build a table of codes, for example:
Symbol Code
a 0
s 10
e 110
m 111
The algorithm builds a symbol tree based on the text input. The more variety of characters you have, the worst the compression will be.
But depending on your text, it could be effective.