EXIFTool JSON to EXIF batch processing - java

I have >400 JPG files and a JSON file for each which contains the image tags, description and title. I've found this command
exiftool -json=picture.json picture.jpg
But I don't want to run this for each and every file.
How can I run this command for the folder containing the JPGs and JSONs or is there another way I can batch process these?
Each JSON file has the same name as it's JPG counterpart so it's easy to identify which files match up to each other.

Assuming your JPGs and JSONs have the same filename, but different extesion(e.g. picture001.jpg has an associated picture001.json,etc.), a batch for loop might work.
Assuming you've already cd-ed into the folder and the files aren't nested in folders, something like this should work
( for jpg in *.jpg; do exiftool -json=${jpg/\.jpg/.json} $jpg; done )
Note that this isn't tested. I recommend making a copy of your folder and testing there beforehand to make sure you don't irreversibly damage them.
I've also noticed you're using the java tag. I had to work with EXIF data in Java a while back (on Android then) and I used the JHeader library. If you want to roll your own little java command line tool, you should be able to use Java's IO classes to traverse your directory and files and the JHeader library to modify the EXIF data.

Related

processing zipped xml files in hadoop using mapreduce

I have a file structure like this.
a.zip contains a1.zip,a2.zip,a3.zip and then each of these zipped files have one xml file per zip.
I need to process these xml files. currently I am extracting zipped files from a.zip, storing them in hdfs and running a MR job to process a1.zip, a2.zip ..... using custom input format and record reader.
Can anyone help me with a better solution where I dont have to unzip a.zip and still process the files in parallel.
Why don't you write a normal java pre-processor class which you can call from the main program. The steps would be:
1) pre-processor class would programmatically extracts the a.zip file into a temp location.
2) programmatically add the child zip classes to hdfs.
3) fire the XML processing in the way you are doing now.
4) if you wish, you can extend the pre-processor class to directly place XML, such that you could keep xml processing program simpler.
Let me know if something is not clear here.

Two applications need to export and import a single file which needs to include data and images, best file type?

I'm making two Java applications one to collect data, another to use it. The one collecting will be importing a file from the other which will include data and images and will be decrypted.
I'm unsure what filetype to use. So far all of the data is in XML and works great but I need the images and was hoping not to have to rely on giving all the images in a folder with a path reference.
Ideas?
well, I think that the best way is to create your own format (.myformat or .data). This file will be in fact a Zip file that contains your XML file and images.
There is no perfect example writen in java as far as I know. However, here are some examples :
Not in java
The best example is, as #Bolo said, the odt format. Indeed, OpenOffice writes the doc in an xml file, and the images too. All that is wrapped in an odt file.
The .exe file is an other example. The C files and the resources are put in a single file. try to open it with 7-zip, you'll see.
The Skyrim plugins are .esp file that contain the dds, the scripts, the niffs (textures)...
In java
The minecraft texture packs are a zip file that contains a .mcmeta file (the infos) and the textures (.png)
Jar files are like exe.
If both programs are in java you could also go with serialization, which is basically saving an object as a file (suffix will be .ser I think) and then being able to retrieve it. You should google it, even if it won't help right now it is quite good to know about it.
I'd suggest using JSON. Gson is a decent library.
You can embed images as byte arrays.
Save the serialized string in a file with a preferred extension, read it from the second application, de-serialize, and reconstruct images.
You can convert binary image data to text with Base64 encoding and this way you can embed your images in XML. [1]: http://en.wikipedia.org/wiki/Base64

Update data documents for JWS deployed app

I have a swing application that uses many data files, these data files will change time to time. How can I load these data files on client's machine? Is there any way to create a folder like structure and run a batch file or so? Any help is appreciated.
There are several ways to do this:
Assume you want to ship your application with the datafiles, you may embed them as a zip/jar in your application-jar-file.
Extract the embedded zip to a temporary local file and use ZipFileSystemProvider to extract the content to some place on the disc.
Here is an example how to extract some content from zip/jar-file embedded in a .jar-file downloaded by JWS.
Same as 1, but skip the zip stuff and instead provide a list of all the resources you want to extract
One other way is to create the files pragmatically using either java.nio.file (java 7+) or java.io.File

How to edit files to change md5 hash without corrupting?

I need to duplicate various kinds of file types, change them a bit so that the original's md5 hash won't match the modified one, but keep them readable and not corrupted.
TXT files - that's obvious. I just add a random string to the end of the file.
PDF file - well I started looking for a java library to edit pdf files, but then I accidentally tried to open a pdf file in notepad++, and thought - why don't I try to add a random string to the end of the not readable content that I see there. Well, to my surprise it worked and the file wasn't corrupted.
ZIP file - I've tried the same that I did with pdf, and it also worked.
DOCX- the same method stopped working here. Appending just a space (" ") at the end of the binary content of a docx file that I open in a text editor, corrupts the file.
So what I need is:
java libraries for modifying office documents :doc, docx, xls, xlsx, ppt, pptx.
There are still file types that I need to change there md5 hash output, but I don't think they are modifiable in java - media files for example, executables and etc..
So, nevertheless, how can i perform what I want on these files? Is there a way to just "touch" the file, change a header or something and make it nonidentical to an untouched one?
edit:
Ok, here's the motivation - I want to generate massive amount of data as I asked here: How to produce massive amount of data?
At the time of that question, the answers I got there were enough, but not they dont.
I need the data to be nonidentical. Pairs of files must fail md5 hash test.
i can't just generate random strings, because I need to simulate real files and documnets.
I can't use existing data dumps, because I need various sizes of these data sets that include various file types. I need something that I'll give as an input the size, and it will generate the data for me.
So I figured that I should use a starting data set of all the file types that I eventually need, and just duplicate this data set.
java libraries for modifying office documents :doc, docx, xls, xlsx, ppt, pptx.
Apache POI is used to modify MS Office files. Note that newer formats (xlsx, docx, etc.) are simply ZIP files containing XML. Unzipping them and modifying plain text XML might work as well.
The same advice goes to ZIP files: try unzipping and modifying the easiest file.
But what are you actually trying to achieve? Note that randomly attaching some string at the end of the file works only by chance. On other computer or other version of software the file might be considered as corrupted...
I would advice you to either store some metadata external to the file rather than comparing MD5 or look deeper into file formats. There are almost always headers and various pieces of metadata hidden in the file (ID3 tags in MP3, EXIF in images, etc.) It is much safer to modify it instead.
Also look for reserved/not used bytes - it is quite often. But again - why? are you doing it on the first place?

Using zip file without extracting in java

I'm facing a problem that, we have a .zip file that contains some text files. Now I'm using java to access that files. If it is not in the .zip file I can read and print on my console easily using FileInputStream.
But how to read a file from .zip file? I use J2SE only..
You should try a ZipInputStream. The interface is a little obtuse, but you can use getNextEntry() to iterate through the items in the .zip file.
As a side note, the Java class-loader does exactly this to load classes from .jar files without extracting them first.
Everything you need is in ZipFile: https://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipFile.html. Google for examples on the web, and if you have specific problems then come back to SO for help.
(The link will eventually break; when it does simply websearch java zipfile.)

Categories

Resources