Implementing encryption in Java whilst maintaining bitstring length

Implementing encryption in Java whilst maintaining bitstring length - java

I'm trying to figure out a way of implementing Blowfish (or any encryption scheme that will work) in a program I am writing in Java for Android.
I have a sentence, like "I am a dog", which I want to encrypt.
However, before encryption, I encode the sentence with my own 5-bit character representations.
This is my own making, and is like a = "00110" and 'the' = "11001"
So now I have an encoding that is divisible by 5, and looks like
"00011101001101011010"
Is there a way to implement Blowfish to encrypt this binary string, whilst maintaining the length of the bit string.
i.e. the bit string above is 20 bits long. I want the encrypted bit string to also be 20 bits long.
Is this possible with Blowfish? Is it possible at all?
Thanks for any help!

For any block cipher, the cipher text must be at least as big as the block size. That is 64-bits for Blowfish, which means at least a 64-bit output.
If your plaintext is longer than your block size, then you can get the same cipher text size using cipher text stealing: https://en.m.wikipedia.org/wiki/Ciphertext_stealing
Not sure why you are doing the encoding that way, it certainly does not add to security. Also, Blowfish is a dated algorithm: AES is a better choice, but that has block size 128.
Stream ciphers will allow you to get the exact same cipher text size as plaintext size, but I don't know of any good ones implemented in Java. Whatever you do, stay away from rc4: it has real security problems. See the eSTREAM page for possible stream ciphers that should have adequate security. Also, you must never re-use a key for a stream cipher.
EDIT: #CommonsWare pointed out a clever solution from Maarten Bodewes. It looks correct to me, but I don't think you will find an implementation that does this out-of-the-box. Keep in mind also that every ciphertext has to be paired with the IV, which is the same length as the block size (64-bits for blowfish). You should never repeat an IV. My general feeling is that although a clever solution, you're likely going to be better off if you do not have to implement something like this yourself (implementing crypto is dangerous: it is easy to lose security properties by making the smallest mistake).

Related

How to determine the output byte[] size when using Apache Crypto?

I following the example from Apache Crypto byte array encryption/decryption from the link below.
https://commons.apache.org/proper/commons-crypto/xref-test/org/apache/commons/crypto/examples/CipherByteArrayExample.html
At line 54 it created a byte[] of size 32, but I don't understand how the author came up with that number. I noticed it is a multiple of 16 which is the key size from line 40. If that is the reason, why not 64, 128, and so on...

It looks like CryptoCipher is a partial mirror of the Cipher instance. Using the Cipher instance requires Apache to implement a provider, and to use Cipher services, the provider needs to be signed. So it makes some sense to have a different class that has about the same API to easily switch between signed and unsigned code.
That out of the way, it seems CryptoCipher lacks a method called Cipher#getOutputSize. That method is used to retrieve the minimum buffer size for an input plaintext. So basically it seems that they've specified a buffer size that is definitely large enough for a ~ 12 byte input string. Then they resize it using Arrays.copyOf(output, updateBytes+finalBytes) later (which will probably result in 16 bytes for ECB or CBC mode).
By the way, that example is such a piece of crap code that I don't have any high hopes for this library. Look at it and understand that this is not the way to perform cryptography. A key and IV are not strings, IV's should be random for each usage and included with the ciphertext. I won't even go into the single utility method getUTF8String that does absolutely nothing special. Where is the try-with-resources? Why is the CryptoCipher missing essential methods?

Implementing a Pseudorandom Generator using AES

"A pseudorandom generator (PRG) is a deterministic algorithm that takes a short uniformly distributed string, known as the seed, and outputs a longer string that cannot be efficiently distinguished from a uniformly distributed string of that length." [1]
It is my understanding that we can create pseudorandom generators using stream ciphers. For instance, SCAPI, a Secure Multiparty Computation API, uses RC4 in the following example to create an output of a fixed number of bytes (check out.length):
//Create secret key and out byte array
...
//Create prg using the PrgFactory
PseudorandomGenerator prg = PrgFactory.getInstance().getObject("RC4");
SecretKey secretKey = prg.generateKey(256); //256 is the key size in bits.
//set the key
Prg.setKey(secretKey);
//get PRG bytes. The caller is responsible for allocating the out array.
//The result will be put in the out array.
prg.getPRGBytes(out.length, out);
Indeed, pseudorandom generators are particulary useful in some cryptographic protocols (i.e. this protocol) where we need to create a pseudorandom output of bytes, usually of a very large size, fast.
I have actually implemented this protocol using the SCAPI snippet shown above for the PRG part. Yet the authors, instead of using RC4 for their PRG, they use AES128 in CTR mode. Which makes sense since RC4 is known to be broken and since AES can be easily used as stream cipher.
I want to implement a pseudorandom generator using AES in CTR in the same fashion as the snippet above, but I'm unable to do so. My problem is not using AES in CTR, there are countless examples online. My problem is the out.length part. I don't know how to implement a PRG using AES (or any other cipher for that matter) in a way where I get to choose the exact number of output bytes, like the example above. How can I do this?
Before someone mentions that a hash function can do the same job: Indeed, this is basically a hash function but the problem in this particular protocol is that we need very large outputs (i.e. 32MB) where a hash function usually has a fixed output of (192, 256, 512 bits).
Finally, this question is not a duplicate to this one, because the latter is about implementing any kind of PRG in Python where this is one is about implementing an AES_CTR based PRG in Java.
Some useful links:
SCAPI's API
SCAPI's source code on PRGs

In CTR mode you just cut off the bytes you don't need (from the right hand side) of the block encrypt over the last counter. You can create the key stream by performing AES-CTR over the right number (out.length) of zero valued bytes as well.

ccrypt will not decrypt ccrypt-j encrypted files

I've been trying to fix ccrypt-j, a pure-java implementation of the linux ccrypt command. I found there is some problem with the initialization vector (IV) which makes ccrypt not decrypt anything but its own output.
I modified both libraries so that the same nonce is always fed to both implementations of the Rijndael engine, however, the output IV is always different between implementations, i.e. both libraries always have the same result (because Rijndael is deterministic), but those results are always different.
I know the problem is only the way ccrypt generates the IV since:
ccrypt-j-encrypted can be decrypted from ccrypt-j
If I substitute the IV (first 32 bytes of the encrypted files) with that of a ccrypt-encrypted file, ccrypt will decrypt it just fine.
Ccrypt uses its own implementation of Rijndael coded in C, while ccrypt-j uses Bouncy Castle's implementation.
EDIT: 04/01/2016
Because the IV is constructed before any data is encrypted (actually, any data is even read) I believe the problem has to be in the way Rjindael is initialized in both Bouncy Castle and ccrypt's own implementation. I'll try to do the same sequence in both implementations and see what I get.

One half-answer
if you look at the old ccrypt, there are some explanations about IV. if I resume, 4 bytes are fixed - magic number -, it si c051 for a while. Issues about securities are also discussed:
magic number : see that
http://ccrypt.sourceforge.net/faq.html
ccrypt comes from emacs / jka-compr:
http://www.opensource.apple.com/source/emacs/emacs-51/emacs/lisp/jka-compr.el
In ccrypt, the seed is constructed as follows: first, a nonce is
contructed by hashing a combination of the host name, current time,
process id, and an internal counter into a 28-byte value, using a
cryptographic hash function. The nonce is combined with a fixed
four-byte "magic number", and the resulting 32-byte value is encrypted
by one round of the Rijndael block cipher with the given key. This
encrypted block is used as the seed and appended to the beginning of
the ciphertext. The use of the magic number allows ccrypt to detect
non-matching keys before decryption.
magic number there: http://ccrypt.sourcearchive.com/documentation/1.7-7/ccryptlib_8c-source.html
It seems magic number doesnt change (same from 1.1 to 1.10, before, I dont know).
So what ?
ccrypt is designed to be compatible with precedent versions (emacs , ...). It can crypt and decrypt, and is widely used.
Then problem come from ccrypt-j.
what one can see on sourceforge is 2 important things :
1 compatibility
Encrypting a file using ccrypt-j
TODO
Decrypting a file using ccrypt-j
TODO
so what works really ?
2 in fact, it uses bouncy castle, which is well used, and surely implements standards well.
So conclusion ?
you cant hope ccrypt will change.
then: you can decrypt ccrypt by ccrypt-j
but if you want to decrypt by ccrypt, you have to limit ccrypt-j
I doubt about your assertion, because it would be magical !
If I substitute the IV (first 32 bytes of the encrypted files) with
that of a ccrypt-encrypted file, ccrypt will decrypt it just fine.
But if it works, why not use that ? (ccrypt-j can also decrypt ?)
last advice: contact ccrypt-j support
hope it helps

Cryptanalysis of ciphertext using Java

I'm looking for some ideas on an assignment.
I have 7 ciphertext files, all of which are encrypted using the same symmetric key, which is 3 characters long and is alphabetic. No encryption algorithm is provided but the specs state that it is a home-made algorithm and is naive (whatever that means). My objective is to decrypt these files. I'm merely looking for ideas on the attacks which I can carry out on these files.
So far, I have done a frequency analysis, brute force attack to detect Ceasar Cipher, Krasinsky's method to detect Vigenere Cipher, Ciphertext XOR to detect a simple version of the stream cipher. I suspect that the files were encrypted using some mix of ciphers.
By the way, the decrypted plaintext is supposed to contain just a plain message, but the ciphertext reveals the use of over 97 different ASCII symbols!
Any general help, ideas or directions are greatly appreciated! Honestly, I'm not expected to decrypt these files, but then I might as well prove my professor wrong with your help. Thanks!
EDIT
I'm looking for attacks on block or stream ciphers. At least thats what I suspect...

The famous Enigma machine used 3 character symmetric alphabetic keys. 97 ASCII symbols? ASCII runs from 32 to 126 giving 94 symbols. The \n and \r add two more for 96 and then an end of message marker such as \0 for 97. To put it another way, a naive copy of the early Engima machines (with a fixed reflector) encrypting Windows style textual data would match the clues very well.
The enigma machine has some known flaws. If your professor was being exceptionally kind he will have replicated the weak system used by the German Navy early on. This was to encrypt every message with a one time key, but then to allow decryption to transmit the one time key twice at the start of the message encrypted using a standard key. By transmitting it twice they provided extra context to the cryptanalysis.
The second well known flaw was that a character never maps to itself. Thus if you have a potential plain text no character will match.
It is possible to brute force Enigma if you know what the rotors and reflector look like. Without knowing that you have around 10^15 possibilities to explore in this case.

Why not go ahead and get started with brute forcing all of the 26**3 possibilities for each of the most popular symmetric key algorithms:
Twofish
Serpent
AES (Rijndael)
Blowfish
CAST5
And any others you can find.

Since the algorithm is simple and homemade, you might try these naive algorithms:
repeated XOR with the cipher key every 3rd character
repeated XOR with the cipher key every 2nd or 1st character
XOR and rotate/shift: the cipher key is xor'ed with the ciphertext and rotated/shifted
Since you know the plain text it to be regular text, look for patterns in the first few characters of ciphertext and see if they can be combined with the cipher key to arrive at a ASCII code for a letter/number.

Now, you said that you have done the statistical analysis. If algorithm is in fact naive, the frequencies of the symbols will not be uniformly distributed. Some symbols will be found more often. Is it the case? If so, I'd dig from there.
I might as well prove my professor
wrong with your help
With "our help" would be us proving your professor wrong.

How to use the RIM crypto api for TripleDES encryption with NO Padding

It seems the RIM Crypto API provides for only PKCS5 Padding mode for symmetric encryption (3Des) - as far as I know. I'm working with the JDE 4.6.0.
I'm trying to provide cryptography for a blackbery app which needs to be compatible with existing services which already use NoPadding with the standard Java security API.
Is there a way to extend the API to provide for the lacking PADDING modes, or some other hack, to achieve this?

Based on what you've told me, I would use the encrypt function of TripleDESCBCEncryptorEngine to encrypt your blocks.
There is a version of the function that can encrypt multiple blocks at once by specifying the number of blocks.
Here is a reference to that function.
It looks very straightforward, you just pass the key and the IV into the constructor and then proceed to make calls to .encrypt to encrypt the data.
Similarly there is a TripleDESCBCDecryptorEngine here.

I admit to not being familiar at all with the RIM crypto API, but just from reading the documentation it appears just using the the BlockEncryptorEngine.encrypt() method gives you the same functionality as the JCE NoPadding tranformations for block ciphers. So in your example that would be TripleDESEncryptorEngine.

If you are using CBC chaining mode and can arrange for your input data to have a length multiple of the block size (i.e. multiple of eight, when expressed in bytes, if the block cipher is 3DES) then you just have to remove the last block of the encrypted output.
In CBC encryption, input data (m) is first padded into a message which has a length multiple of the block size (with PKCS#5, by adding between 1 and b bytes, where b is the block length, b=8 for 3DES); then it is split into successive b-bytes blocks. Each of those blocks yields an encrypted block of the same size: the encrypted block for message block i is the result of 3DES applied on the bitwise XOR of message block i and encrypted block i-1. Consequently, if the original message m has a length multiple of b, then PKCS#5 padding adds b bytes, i.e. a full block. By removing the last encrypted block, you obtain what you would have got with no padding at all.
Decryption might be trickier. If the RIM API is stream-oriented (if it can gives you some plaintext bytes before having the whole message) then you can feed it with null trailing bytes until it returned you all your message (the extra null bytes will decrypt into pure random-looking junk, just discard it). If the RIM API is message-oriented, then you will have to use your knowledge of the secret key to rebuild a valid "last block" (the one which was you removed during encryption). Namely, with 3DES, this would mean the following: if z is the last encrypted block of the message (the one with "no padding"), then you encrypt an empty message (of no byte at all) with the same key, using z as "initial value" (IV). This should result in a single b-byte block, which you just append to the encrypted message. The effect of that extra block is that the decryption engine will "see" a proper PKCS#5 padding, and transparently remove it, yielding the data you expect.
All of the above assumes that you are using CBC, which is the most common chaining mode, among those which require padding.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.