Dealing with JPEG Files

Dealing with JPEG Files - java

I am quite new with this idea but I tried to open JPEG file in NOTEPAD, & without making any change i RE-Saved it with new name
let new.jpg
but when opened this new.jpg it is firing error in opening, any Viewer is not able to show the image.
Actually I want to open an image in stream of Binary Format(purely Binary) which can be saved in String & on other side it will be rearranged in Stream to save it as JPEG, I want to do this in JAVA. but before programming i tried an experiment as i earlier described but It is raising error.

Openning a JPEG file with Notepad will create error because it will mess up the encoding of some essential JPEG Marker.
Try to open your file with an Hexadecimal editing software (I use HexEdit and it work fine).
You should also take a look at the JPEG structure.

when you save a binary file with notepad it changes the encoding of some of the characters, that's why it is not recognised as a valid JPEG anymore.
i doubt there's a fast way to "go back" to the original file, it involves finding out which bytes were changed.
as for saving it to a string, what do you mean?

Related

Reading Unix executable file

I have "Unix Executable file" with no file extension.
In Mac, I am able to see the content in preview mode but not sure about any other way to see the content.
Looking for a way to read the content and store in some other file location as JPG file or PNG file format.
Not sure how to read this thru Java.
In Mac terminal, I tried "file filename" and got the following output.
PNG image data, 110 x 103, 8-bit/color RGB, non-interlaced

Whatever is reporting 'unix executable file' is oversimplifying things. It's simply 'a file' (unix has sod all to do with it), and the file system has the concept of an 'executable' flag, which you can set or clear on any file and is utterly unrelated to whether the file's contents are executable. You can set any file executable, or not, and especially considering that e.g. macs and linux can mount DOS file systems (Which most USB sticks use because every OS can deal with these file systems), which do not have this flag, and 'for convenience' that means the OS acts as if ALL files have that flag and you can't remove it. In other words, it's a lie, forget about that part.
file is just guessing. This is no blame on file and the authors of that tool are by no means lazy. It's mathematically impossible - the disk system doesn't know what kind of data a file contains, it just knows: This file has these bytes, and it ends there. file just looks at the contents and takes a wild stab in the dark. Its wild stabs are decent, but no guarantee. I can make you a file that is BOTH a legal zip file (will unzip and everything just fine), AND is a PNG image equally well (renders in browsers, preview, etc). What could file possibly tell you here? Literally completely random garbage is ALSO a valid ISO-8859-1 formatted text file. The only way to know that this is clearly not the intended purpose of the file is to use Artificial Intelligence algorithms to realize that the contents in no way form legible words in any language on the planet. That's a very hard problem and file doesn't try to solve it.
Thus, there's no real way to know if it is a PNG file, if all you have is a file on disk. The file extension is a good hint, but if it's missing, you're just guessing. You can toss it through a PNG reader, and if it doesn't crash, it probably is, but it could just be a picture with random static because it isn't really a PNG file.
If you want to convert PNG files, ImageIO can do that.
Generally, the process that got you that file usually DOES know the format. For example, if you download it over the web, the web server didn't JUST send those bytes over. It also sent this header: Content-Type: image/png. THAT (and not the file extension) is what is the webserver's canonical truth. If the process that saves this file to disk elected to take that information and toss it in the garbage, well, now you're stuck guessing. If possible, go back to that part of the process and fix it so this info is no longer tossed in the bin. For example, if you have a shell script that uses wget to download a resource and then later on you have no idea if it's a PNG, or a JPG, or the output of a 'file not found' explanatory page in HTML, then fix wget to save that header and react accordingly.

extracting text AND Images from PDF file

I have been bumping my head against the wall with this one, have researched and pretty much tried every library suggested to me. I am currently trying to write a program in java that will extract text AND images from a pdf file and allow me to write the extracted content to a word file. I have managed to extract the content using the ICEpdf library, however the problem is that I need to be able to write the content in the exact same order as it was read. So, to clarify, I need a library that will help me keep track of where exactly in the page the text and images are situated so I can put them in the same place in my word file.

A PDF to Word converter is a horribly complex proposition.
Your best bet will probably to use Open Office to do it for you and not even try to handle the intermediate steps.
http://www.openoffice.org/api/

Look at this: Advanced PDF parser for Java
OFF:
-Also to my knowledge there is a python parser that sorta converts the pdf to html (that way you can keep track of the ordering of the objects within the pdf). I know its not java, but you might be able to use the output.
http://www.unixuser.org/~euske/python/pdfminer/index.html

Two applications need to export and import a single file which needs to include data and images, best file type?

I'm making two Java applications one to collect data, another to use it. The one collecting will be importing a file from the other which will include data and images and will be decrypted.
I'm unsure what filetype to use. So far all of the data is in XML and works great but I need the images and was hoping not to have to rely on giving all the images in a folder with a path reference.
Ideas?

well, I think that the best way is to create your own format (.myformat or .data). This file will be in fact a Zip file that contains your XML file and images.
There is no perfect example writen in java as far as I know. However, here are some examples :
Not in java
The best example is, as #Bolo said, the odt format. Indeed, OpenOffice writes the doc in an xml file, and the images too. All that is wrapped in an odt file.
The .exe file is an other example. The C files and the resources are put in a single file. try to open it with 7-zip, you'll see.
The Skyrim plugins are .esp file that contain the dds, the scripts, the niffs (textures)...
In java
The minecraft texture packs are a zip file that contains a .mcmeta file (the infos) and the textures (.png)
Jar files are like exe.

If both programs are in java you could also go with serialization, which is basically saving an object as a file (suffix will be .ser I think) and then being able to retrieve it. You should google it, even if it won't help right now it is quite good to know about it.

I'd suggest using JSON. Gson is a decent library.
You can embed images as byte arrays.
Save the serialized string in a file with a preferred extension, read it from the second application, de-serialize, and reconstruct images.

You can convert binary image data to text with Base64 encoding and this way you can embed your images in XML. [1]: http://en.wikipedia.org/wiki/Base64

Cannot open bitmap files

I have a problem with opening bitmap files(widthxheight) in windows. Files are generated by a Java program which reads .dat files by 4bytes and write them as .bmp files. The weird thing is, if the width of the file is multiple of 4, the file can be opened (i.e. 400x450). However, if its not, I cant open the file and it says drawing failed (i.e. 450x400).
Any idea why this is happening? Thanks a lot.

BMP lines are padded to 4 bytes. Please make sure on write also on read to take care of that, see on net, wikipedia about BMP format.

How to edit files to change md5 hash without corrupting?

I need to duplicate various kinds of file types, change them a bit so that the original's md5 hash won't match the modified one, but keep them readable and not corrupted.
TXT files - that's obvious. I just add a random string to the end of the file.
PDF file - well I started looking for a java library to edit pdf files, but then I accidentally tried to open a pdf file in notepad++, and thought - why don't I try to add a random string to the end of the not readable content that I see there. Well, to my surprise it worked and the file wasn't corrupted.
ZIP file - I've tried the same that I did with pdf, and it also worked.
DOCX- the same method stopped working here. Appending just a space (" ") at the end of the binary content of a docx file that I open in a text editor, corrupts the file.
So what I need is:
java libraries for modifying office documents :doc, docx, xls, xlsx, ppt, pptx.
There are still file types that I need to change there md5 hash output, but I don't think they are modifiable in java - media files for example, executables and etc..
So, nevertheless, how can i perform what I want on these files? Is there a way to just "touch" the file, change a header or something and make it nonidentical to an untouched one?
edit:
Ok, here's the motivation - I want to generate massive amount of data as I asked here: How to produce massive amount of data?
At the time of that question, the answers I got there were enough, but not they dont.
I need the data to be nonidentical. Pairs of files must fail md5 hash test.
i can't just generate random strings, because I need to simulate real files and documnets.
I can't use existing data dumps, because I need various sizes of these data sets that include various file types. I need something that I'll give as an input the size, and it will generate the data for me.
So I figured that I should use a starting data set of all the file types that I eventually need, and just duplicate this data set.

java libraries for modifying office documents :doc, docx, xls, xlsx, ppt, pptx.
Apache POI is used to modify MS Office files. Note that newer formats (xlsx, docx, etc.) are simply ZIP files containing XML. Unzipping them and modifying plain text XML might work as well.
The same advice goes to ZIP files: try unzipping and modifying the easiest file.
But what are you actually trying to achieve? Note that randomly attaching some string at the end of the file works only by chance. On other computer or other version of software the file might be considered as corrupted...
I would advice you to either store some metadata external to the file rather than comparing MD5 or look deeper into file formats. There are almost always headers and various pieces of metadata hidden in the file (ID3 tags in MP3, EXIF in images, etc.) It is much safer to modify it instead.
Also look for reserved/not used bytes - it is quite often. But again - why? are you doing it on the first place?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.