Identifying an AES encrypted file

Identifying an AES encrypted file - java

Is there a way to identify or inspect an AES encrypted file based on the file content (like the way a ZIP file can be identified by looking for letters "PK" at the beginning of the file)? Is there any magic number associated with AES encrypted files?
We have multiple files in the workflow repository that are either in plain text (could be excel, XML, JSON, text etc.) or AES-256 encrypted and don't have an idea which ones are AES encrypted. I need to write Java code to identify the AES encrypted files and decrypt them automatically. Thanks!

In the absence of any standard header, you could look at the byte frequency. AES encrypted data (or indeed anything encrypted with a decent algorithm) will appear to be a random sequence of bytes. This means that the distribution of byte values 0-255 will be approximately flat (i.e. all byte values are equally likely).
However, textual documents will mostly contain printable characters - some much more than others. Spaces, newlines, vowels etc will be disproportionately common.
So, you could build histograms of byte counts for your various files, and look for a simple way to classify them into encrypted or not-encrypted. For example, look at the ratio of the total count of the 5 least common byte values and the total count of the 5 most common byte values. I would expect this ratio to be close to 1.0 for an encrypted file, and quite far from 1.0 for a normal textual document (I'm sure there are much more sophisticated statistical metrics that could be used...).
This might not work so well for extremely short documents, of course.
See also:
https://www.researchgate.net/post/How_to_detect_if_data_are_encrypted_or_not

AES is a block cipher. On its own, it can only transform a 128 bit value into another seemingly random 128 bit value. In order to encrypt more data, a mode of operation and possibly a padding scheme are added. If you want to go further like producing encrypted files, you really need to define a file format, because that's not provided by the previously mentioned mechanisms.
So, if you say you have an AES-encrypted file, it doesn't mean anything aside from your file being encrypted in some way.
The result of modern encryption looks like random noise, so you can compare the hamming weight of an encrypted file to that of a non-compressed structured file. There will likely be differences as DNA mentioned. Compressed files also look like random noise, but they may contain biases which might be significant enough if the file is long enough.
There are some file formats that contain an identifier how the data was encrypted. Most self-made formats don't have anything close to an identifier, because they are written for a specific application and the protocol or file format doesn't change that often. The developer settled for some "cipher suite" and never bothered to make it flexible. If you know the program that the files are produced by, then you can likely find out if they are encrypted. If that program is open source, this is easy. If it is closed source, you can still reverse-engineer it.

Related

How to encrypt a text so that the length of the output is not more than 100 characters in java

I am trying to encrypt texts (usually with variable length) using Java, but I need to limit the character length of encrypted output since I have to send this encrypted text to an API which has this limitation (it does not save from the 101th char onwards). I see AES256 is a secure algorithm and the JAVA code is here, however even TripleDES produces longer than 100 character encrypted output (of course, for my inputs which are longer texts). I know it sounds awkward to expect a safe encryption algorithm to create a short-length encrypted text(at least not longer than the original text), but any idea about how I can solve my problem would be appreciated.
Thank you

Encryption is not a magic compression algorithm. Modern ciphers actually don't compress at all. A symmetric cipher in a mode of operation generally encrypts 1:1 with some overhead for e.g. the IV or authentication tag.
There are some things you can do:
Make sure you use the most efficient encoding and compression algorithms before encrypting;
Make sure that you minimize the overhead of the cipher (i.e. anything that is not 1:1 encryption);
Make sure that you use all the characters of the alphabet that are used for the storage solution (there is a lot of difference in being able to store 100 UTF-8 characters compared to US-ASCII characters, maybe non-printable characters can also be stored);
Use multiple text messages on the storage for a single encrypted message.
Good luck with your 100 character protocol.

Most efficient way to write to a file?

I am writing my own image compression program in Java, I have entropy encoded data stored in multiple arrays which I need to write to file. I am aware of different ways to write to file but I would like to know what needs to be taken into account when trying to use the least possible amount of storage. For example, what character set should I use (I just need to write positive and negative numbers), would I be able to write less than 1 byte to a file, should I be using Scanners/BufferedWriters etc. Thanks in advance, I can provide more information if needed.

Read the Java tutorial about IO.
You should
not use Writers and character sets, since you want to write binary data
use a buffered stream to avoid too many native calls and make the write fast
not use Scanners, as they're used to read data, and not write data
And no, you won't be able to write less than a byte in a file. The byte is the smallest amount of information that can be stored in a file.

Compression is almost always more expensive than file IO. You shouldn't worry about the speed of your writes unless you know it's a bottle neck.
I am writing my own image compression program in Java, I have entropy encoded data stored in multiple arrays which I need to write to file. I am aware of different ways to write to file but I would like to know what needs to be taken into account when trying to use the least possible amount of storage.
Write the data in a binary format and it will be the smallest. This is why almost all image formats use binary.
For example, what character set should I use (I just need to write positive and negative numbers),
Character encoding is for encoding characters i.e. text. You don't use these in binary formats generally (unless they contain some text which you are unlikely to do initially).
would I be able to write less than 1 byte to a file,
Technically you can use less than the block size on disk e.g. 512 bytes or 4 KB. You can write any amount less than this but it doesn't use less space, nor would it really matter if it did because the amount of disk is too small to worry about.
should I be using Scanners/BufferedWriters etc.
No, These are for text,
Instead use DataOutputStream and DataInputStream as these are for binary.

what character set should I use
You would need to write your data as bytes, not chars, so forget about character set.
would I be able to write less than 1 byte to a file
No, this would not be possible. But to follow decoder expected bit stream you might need to construct a byte, from something like 5 and 3 bits before writing that byte to the file.

Java AES key generation

I'm trying to write a simple password manager in java. I would like to encrypt the file with the stored passwords using AES 256 bit encryption. In addition I would like the user to be able to decrypt the file with a password. When reading other posts online almost all of them stress that it is not secure to simply use a password as a key, they mention using random salts to add security. But I do not understand how I can use random salts when generating the key. If I create the key from the user's password and a random salt then when they try to decrypt their file how will I know what the salt was? This has me completely confused.
Currently I run their password through several different hashes using a constant salt at each step. Is this sufficiently secure or I am I missing something? Any help on how to securely generate a key from a password would be greatly appreciated! Thanks in advance.

Remember that a salt isn't a secret. You can just append it to the encrypted data. The point of the salt is to prevent somebody from using a pre-computed dictionary of common pieces of data encrypted with common passwords as a way into "cracking" the encrypted file.
By making sure that the salt is random and combining it with the password, you remove the possibility of a dictionary attack because there's (effectively) no chance that a hacker will have a database of data pre-encrypted with your "salt+password". (As a starter, see this page, from one of my tutorials, on salts in password-based encryption.)
You also (effectively) eliminate the problem of collisions: where using the same password on two files may give an attacker a clue to the content if the same block of data occurring in both files looks the same in the encrypted version.
You still usually need to take other precautions, though, simply because a typical password doesn't usually contain much entropy. For example, 8 perfectly random lower case letters will generate about 40 bits of entropy; 8 lower case letters obeying typical patterns of English will generate about 20 bits of entropy. In other words, of the 2^256 possible keys, in reality typical users will be choosing among some small fraction in the range 2^20-2^40. In the case of a savvy user, the situation gets a little better, but you will be very unlikely to get close to 256 bits of entropy. (Consider that in a "pass phrase", there'll be about 2.5-3 bits of entropy per character, so a 30-character pass phrase gives you about 75 bits of entropy-- and let's be honest, how many people use anything like a 30 character password?; 8 perfectly random characters using the 'full' range of printable ASCII will give you a little under 64 bits.)
One way of alleviating this situation a little is to transform the password (with salt appended) using a computationally complex one-way function so that it will take a hacker a little longer to try each key that they want to guess. Again, see this page for more details.
To give you a rough idea of the pitfalls of password-based encryption of files, you may also want to have a look at the Arcmexer library I wrote a couple of years ago, which includes a method named isProbablyCorrectPassword(). Combined with a dictionary/algorithm for generating candidate passwords, you can use it to gauge the effectiveness of the above methods (since ZIP file encryption uses a combination of these techniques).

Use this library: http://www.jcraft.com/jsch/
There's a good AES example ere:
http://www.jcraft.com/jsch/examples/AES.java.html
A lot of big names use this package, Maven, Eclipse, etc.

Advice on replacing a block of bytes in a file at run time, when the file is read

Folks. I trust that the community will see this as a relevant question. My apologies if not and mods, please close.
I am developing a video playback app with static content for a customer. My customer wants me to implement some basic security to stop someone unpacking the deployed app (it's for Android) and simply copying the MPEGs. My customer has made basic protection a critical requirement and, he's paying the bills :)
The files are too big to decrpyt on the fly so I'm considering the following approach. I'd welcome thoughts and suggestions as to alternatives. I am aware of the arguments for and against copy protection schemes and security through obscurity, which my proposed approach uses and my question is not "should I?".
Take a block of bytes, say 256, from somewhere in the header of the MPG. Replace those bytes with random values such that the MPEG won't play without a lot of effort to repair it. Store the original 256 bytes in one of the apps bitmaps such that the bitmap still displays properly. When playing the video, read it in through a byte stream and replace the bytes with their original values before passing them to the output stream.
In summary:
Extract 256 bytes from the header of the MPEG
Store these bytes in a bitmap
Randomise values in the original bytes
At run time, read the 256 bytes back out of the bitmap
Read MPEG through an inputstream using a byte array buffer
Replace randomised bytes with the original values
Stream the input to an outputstream which is the input to the video player.
I do recognise at least 2 ways to defeat this, reverse engineering and screen grabbing but the point is to prevent the average thief simply copying my customers content with no effort.
Thoughts folks?
Thanks

I would suggest using an encryption/decryption scheme for the entire stream:
Real time video stream decryption is the standard way to deal with this issue. Its processing overhead is negligible when compared to the actual video decoding. For example, each and every single DVD player out there supports the CSS encryption scheme.
While using Java does impose some restrictions, such as the inability to make effective use of various CPU-specific instructions, you should be able to find a decryption algorithm that is not very expensive. I would suggest profiling your application before rejecting stream encryption algorithms out of hand.
Mangling the header does make some video files hard to read, but far from impossible. Some files have redundant information, others are actually the result of straight-out concatenation which would leave any following segments readable. Some streaming video codecs actually insert enough metadata to rebuild the stream every few seconds. And there are a lot of video formats out there.
In other words there is no way to guarantee that removing any number of bytes from the start of a file would make it unreadable. I also think that imposing on your client a bunch of restrictions w.r.t. the video formats that they can use is not reasonable and limits the future usefulness of your application.

Create a wav with hidden binary data in it and read it (Java)

What I'm willing to do is to convert a text string into a wav file format in high frequencies (18500Hz +): this will be the encoder.
And create an engine to decode this text string from a wav formatted recording that will support error control as I will not use the same file obviously, to read, but a recording of this sound.
Thanks

An important consideration will be whether or not you want to hide the string into an existing audio file (so it sounds like a normal file, but has an encoded message -- that is called steganography), or whether you will just be creating a file that sounds like gibberish, for the purpose of encoding data only. I'm assuming the latter since you didn't ask to hide a message in an existing file.
So I assume you are not looking for low-level details on writing WAV files (I am sure you can find documentation on how to read and write individual samples to a WAV file). Obviously, the simplest approach would be to simply take each byte of the source string, and store it as a sample in the WAV file (assuming an 8-bit recording. If it's a 16-bit recording, you can store two bytes per sample. If it's a stereo 16-bit recording, you can store four bytes per sample). Then you can just read the WAV file back in and read the samples back as bytes. That's the simple approach but as you say, you want to be able to make a (presumably analog) recording of the sound, and then read it back into a WAV file, and still be able to read the data.
With the approach above, if the analog recording is not exactly perfect (and how could it be), you would lose bytes of the message. This means you need to store the message in such a way that missing bytes, or bytes that have a slight error, are not going to be a problem. How you do this will depend highly upon exactly what sort of "damage" will be happening to the sound file. I would expect two major forms of damage:
"Vertical" damage: A sample (byte) would have a slightly higher or lower value than it originally had.
"Horizontal" damage: Samples may be averaged, stretched or squashed horizontally. From a byte perspective, this means some samples may be repeated, while others may be missing.
To combat this, you need some redundancy in the message. More redundancy means the message will take up more space (be longer), but will be more reliable.
I would recommend thinking about how old (pre-mobile) telephone dial tones worked: each key generated a unique tone and sent it across the wire. The tones are long enough, and far enough apart pitch-wise that they can be distinguished even given the above forms of damage. So, choose two parameters: a) length and b) frequency-delta. For each byte of data, select a frequency, spacing the 256 byte values frequency-delta Hertz apart. Then, generate a sine wave for length milliseconds of that frequency. This encodes a lot more redundancy than the above one-byte-per-sample approach, since each byte takes up many samples, and if you lose some samples, it doesn't matter.
When you read them back in, read every length milliseconds of audio data and then estimate the frequency of the sine wave. Map this onto the byte value with the nearest frequency.
Obviously, longer values of length and further-apart frequency-delta will make the signal more reliable, but require the sound to be longer and higher-frequency, respectively. So you will have to play around with these values to see what works.
Some last thoughts, since your title says "hidden" binary data:
If you really want the data to be "hidden", consider encrypting it before encoding it to audio.
If you want to take the steganography approach, you will have to read up on audio steganography (I imagine you can use the above techniques, but you will have to insert them as extremely low-volume signals on top of the existing sound).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.