Binary editing in java - java

I have a file that I am trying to binary edit to cut off a header.
I have identified the start address of the actual data I want to keep in the file, however I am trying to find a way in Java where I can specify a range of bytes to delete from a file.
At the moment I am reading the file in a (Buffered)FileInputStream, and the only way I can see to cut off the header of this file is to save from my start address to the end of the file in memory, then write that out overwriting the original file.
Is there any functionality to remove bits in files without having to go through the process of creating a whole new file?

There is a method to truncate the file (setLength) but there is not API to remove an arbitrary sequence from inside.
If the file is so large that there is a performance issue to rewrite it, I suggest to split it into several files. Some performance maybe can be gained by using RandomAccessFile to seek to the point of deletion, rewrite from there and then truncate.

Try this, it uses a RandomAccessFile to wipe out the un-needed parts of the file, by first seeking to the start index, then wiping the un-needed characters onwards.
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
public class Main {
public static void main(String[] args) {
int startIndex = 21;
int numberOfCharsToRemove = 20;
// Using a RandomAccessFile, overwirte the part you want to wipe
// out using the NUL character
try (RandomAccessFile raf = new RandomAccessFile(new File("/Users/waleedmadanat/Desktop/sample.txt"), "rw")) {
raf.seek(startIndex);
for (int i = 1; i <= numberOfCharsToRemove; i++) {
raf.write('\u0000');
}
} catch (IOException e) {
e.printStackTrace();
}
}
}

I couldn't find any API method to perform what I wanted (goes with the answers above)
I solved it by just re-writing the file back out to a new file, then replacing the old one with the new one.
I used the following code to perform the replacement:
FileOutputStream fout = new FileOutputStream(inFile.getAbsolutePath() + ".tmp");
FileChannel chanOut = fout.getChannel();
FileChannel chanIn = fin.getChannel();
chanIn.transferTo(pos, chanIn.size(), chanOut);
Where pos is my start address to begin the file transfer, which occurs directly under the header that I am cutting out of this file.
I have also noticed no slowdowns using this method

Related

How can I open a FileInputStream that has its share set to allow ReadWrite?

In .Net I can open a FileStream set to FileAccess.Read, FileShare.ReadWrite. How can I do the same in Java?
Files.newInoutStream() does not appear to support either of these capabilities.
Update: Let me explain why. We have a common use case where our application opens a DOCX file while Word has it opening for editing. The only way Windows allows this due to the locks Word has on the file is FileAccess.Read & FileShare.ReadWrite.
And yes, that's dangerous (would be fine if it was FileShare.Read). But the world is what it is here and this in practice works great.
But it means I need to find a way in Java to open an InputStream to that file that the existing constraints due to Word holding it open require.
There is no 'InoutStream' in java.
You're probably looking for Files.newByteChannel:
import java.nio.ByteBuffer;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
class Snippet {
public static void main(String[] args) throws Exception {
Path path = Paths.get("test.txt");
try (var channel = Files.newByteChannel(path, StandardOpenOption.WRITE, StandardOpenOption.READ)) {
ByteBuffer bb = ByteBuffer.allocate(1024);
channel.read(bb);
// Note that 'read' reads 1 to x bytes depending on file system and
// phase of the moon.
bb.flip();
System.out.println("Read: " + StandardCharsets.UTF_8.decode(bb));
bb.clear();
channel.position(0);
channel.write(StandardCharsets.UTF_8.encode("Hello, World!"));
channel.position(0);
channel.read(bb);
bb.flip();
System.out.println("Read: " + StandardCharsets.UTF_8.decode(bb));
}
}
}
make a file named 'test.txt', put in whatever you like, then run this. It'll print whatever is there, then overwrite it with Hello, World!.
Note that the read call is guaranteed to read at least 1 byte, but will not neccessarily fill the entire buffer even if the file is that large: The idea is that you read one 'block' that the OS and file system can efficiently transfer in one operation. You'll need to add some loops if you want to read a guaranteed minimum, or even the entire file.

Splitting a text file

I have this text file of the format:
Token:A1
sometext
Token:A2
sometext
Token:A3
I want to split this file into multiple files, such that
File 1 contains
A1
sometext
File 2 contains
A2
sometext
I do not have much idea about any programming or scripting language as such, what would be the best way to go about the process? I was thinking of using Java to solve the problem.
if you want to use java, I would look into using Scanner in conjunction with File and PrintWriter with a for loop and some exception handling you will be good to go.
import the proper libraries!
import java.io.*;
import java.util.*;
declare the class of course
public class someClass{
public static void main(String [] args){
now here's where stuff starts to get interesting. We use the class File to create a new file that has the name of the file to be read passed as a parameter. You can put whatever you want there whether its a path to the file or just the file name if its in the same directory as your code.
File currentFile = new File("new.txt");
if (currentFile.exists() && currentFile.canRead()){
try{
next we create a scanner to scan through that newly created File object. the for loop continues on as long as the file has new tokens to scan through. .hasNext() returns true only if the input in the scanner has another token. PrintWriter writes and creates the files. I have it set that it will create the files based on the iteration of the loop (0,1,2,3 etc) but that can be easily changed. (see new PrintWriter(i + ".txt". UTF-8); )
Scanner textContents = new Scanner(currentFile);
for(int i = 0; textContents.hasNext(); i++){
PrintWriter writer = new PrintWriter(i + ".txt", "UTF-8");
writer.println(textContents.next());
writer.close();
}
these catch statements are super important! Your code wont even compile without them. If there is an error they will make sure your code doesn't crash. I left the inside of them empty so you can do what you see fit.
} catch (FileNotFoundException e) {
// do something
}
catch (UnsupportedEncodingException i){
//do something
}
}
}
}
and thats pretty much it! if you have any questions be sure to comment!
There is no best way and it depends on your environment and need actually. But for any language figure out your basic algorithm and try using the best available data structure(s). If you are using Java, consider using guava splitter and do look into its implementation.

Regarding stitching of multiple files into a single file

I work on query latencies and have a requirement where I have several files which contain data. I want to aggregate this data into a single file. I use a naive technique where I open each file and collect all the data in a global file. I do this for all the files but this is time taking. Is there a way in which you can stitch the end of one file to the beginning of another and create a big file containing all the data. I think many people might have faced this problem before. Can anyone kindly help ?
I suppose you are currently doing the opening and appending by hand; otherwise I do not know why it would take a long time to aggregate the data, especially since you describe the amount of files using multiple and several which seem to indicate it's not an enormous number.
Thus, I think you are just looking for a way to automatically to the opening and appending for you. In that case, you can use an approach similar to below. Note this creates the output file or overwrites it if it already exists, then appends the contents of all specified files. If you want to call the method multiple times and append to the same file instead of overwriting an existing file, an alternative is to use a FileWriter instead with true as a second argument to its constructor so it will append to an existing file.
void aggregateFiles(List<String> fileNames, String outputFile) {
PrintWriter writer = null;
try {
writer = new PrintWriter(outputFile);
for(String fileName : fileNames) {
Path path = Paths.get(fileName);
String fileContents = new String(Files.readAllBytes(path));
writer.println(fileContents);
}
} catch(IOException e) {
// Handle IOException
} finally {
if(writer != null) writer.close();
}
}
List<String> files = new ArrayList<>();
files.add("f1.txt");
files.add("someDir/f2.txt");
files.add("f3.txt");
aggregateFiles(files, "output.txt");

Read zip or jar file without unzipping it first

I'm not looking for any answers that involve opening the zip file in a zip input or output stream. My question is is it possible in java to just simply open a jar file like any other file (using buffered reader/writer), read it's contents, and write them somewhere else? For example:
import java.io.*;
public class zipReader {
public static void main(String[] args){
BufferedReader br = new BufferedReader(new FileReader((System.getProperty("user.home").replaceAll("\\\\", "/") + "/Desktop/foo.zip")));
BufferedWriter bw = new BufferedWriter(new FileWriter((System.getProperty("user.home").replaceAll("\\\\", "/") + "/Desktop/baf.zip")));
char[] ch = new char[180000];
while(br.read(ch) > 0){
bw.write(ch);
bw.flush();
}
br.close();
bw.close();
}
}
This works on some small zip/jar files, but most of the time will just corrupt them making it impossible to unzip or execute them. I have found that setting the size of the char[] to 1 will not corrupt the file, just everything in it, meaning I can open the file in an archive program but all it's entries will be corrupted and unusable. Does anyone know how to write the above code so it won't corrupt the file? Also here is a line from a jar file I tested this on that became corrupted:
nèñà?G¾Þ§V¨ö—‚?‰9³’?ÀM·p›a0„èwåÕüaEܵp‡aæOùR‰(JºJ´êgžè*?”6ftöãÝÈ—ê#qïc3âi,áž…¹¿Êð)V¢cã>Ê”G˜(†®9öCçM?€ÔÙÆC†ÑÝ×ok?ý—¥úûFs.‡
vs the original:
nèñàG¾Þ§V¨ö—‚‰9³’ÀM·p›a0„èwåÕüaEܵp‡aæOùR‰(JºJ´êgžè*?”6ftöãÝÈ—ê#qïc3âi,áž…¹¿Êð)V¢cã>Ê”G˜(†®9öCçM€ÔÙÆC†ÑÝ×oký—¥úûFs.‡
As you can see either the reader or writer adds ?'s into the files and I can't figure out why. Again I don't want any answers telling me to open it entry by entry, I already know how to do that, if anyone knows the answer to my question please share it.
Why would you want to convert binary data to chars? I think it will be much better to InputStream/OutputStream using byte arrays. See http://www.javapractices.com/topic/TopicAction.do?Id=245
for examples.
bw.write(ch) will write the entire array. Read will only fill in some of it, and return a number telling you how much. This is nothing to do with zip files, just with how IO works.
You need to change your code to look more like:
int charsRead = br.read(buffer);
if (charsRead >= 0) {
bw.write(buffer, 0, charsRead);
} else {
// whatever I do at the end.
}
However, this is only 1/2 of your problem. You are also converting bytes to characters and back again, which will corrupt the data in other ways. Stick to streams.
see the ZipInputStream and ZipOutputStream classes
Edit: use plain FileInputStream and FileOutputStream. I suspect there may be some issues when the reader is interpreting the bytes as characters.
see also: Standard concise way to copy a file in Java? Since you ant to copy the whole file, there is nothing special about it being a zip file

How to estimate zip file size in java before creating it

I am having a requirement wherein i have to create a zip file from a list of available files. The files are of different types like txt,pdf,xml etc.I am using java util classes to do it.
The requirement here is to maintain a maximum file size of 5 mb. I should select the files from list based on timestamp, add the files to zip until the zip file size reaches 5 mb. I should skip the remaining files.
Please let me know if there is a way in java where in i can estimate the zip file size in advance without creating actual file?
Or is there any other approach to handle this
Wrap your ZipOutputStream into a personalized OutputStream, named here YourOutputStream.
The constructor of YourOutputStream will create another ZipOutputStream (zos2) which wraps a new ByteArrayOutputStream (baos)
public YourOutputStream(ZipOutputStream zos, int maxSizeInBytes)
When you want to write a file with YourOutputStream, it will first write it on zos2
public void writeFile(File file) throws ZipFileFullException
public void writeFile(String path) throws ZipFileFullException
etc...
if baos.size() is under maxSizeInBytes
Write the file in zos1
else
close zos1, baos, zos2 an throw an exception. For the exception, I can't think of an already existant one, if there is, use it, else create your own IOException ZipFileFullException.
You need two ZipOutputStream, one to be written on your drive, one to check if your contents is over 5MB.
EDIT : In fact I checked, you can't remove a ZipEntry easily.
http://download.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#size()
+1 for Colin Herbert: Add files one by one, either back up the previous step or removing the last file if the archive is to big. I just want to add some details:
Prediction is way too unreliable. E.g. a PDF can contain uncompressed text, and compress down to 30% of the original, or it contains already-compressed text and images, compressing to 80%. You would need to inspect the entire PDF for compressibility, basically having to compress them.
You could try a statistical prediction, but that could reduce the number of failed attempts, but you would still have to implement above recommendation. Go with the simpler implementation first, and see if it's enough.
Alternatively, compress files individually, then pick the files that won't exceedd 5 MB if bound together. If unpacking is automated, too, you could bind the zip files into a single uncompressed zip file.
There is a better option. Create a dummy LengthOutputStream that just counts the written bytes:
public class LengthOutputStream extends OutputStream {
private long length = 0L;
#Override
public void write(int b) throws IOException {
length++;
}
public long getLength() {
return length;
}
}
You can just simply connect the LengthOutputStream to a ZipOutputStream:
public static long sizeOfZippedDirectory(File dir) throws FileNotFoundException, IOException {
try (LengthOutputStream sos = new LengthOutputStream();
ZipOutputStream zos = new ZipOutputStream(sos);) {
... // Add ZIP entries to the stream
return sos.getLength();
}
}
The LengthOutputStream object counts the bytes of the zipped stream but stores nothing, so there is no file size limit. This method gives an accurate size estimation but almost as slow as creating a ZIP file.
I dont think there is any way to estimate the size of zip that will be created because the zips are processed as streams. Also it would not be technically possible to predict the size of the created compressed format unless you actually compress it.
I did this once on a project with known input types. We knew that general speaking our data compressed around 5:1 (it was all text.) So, I'd check the file size and divide by 5...
In this case, the purpose for doing so was to check that files would likely be below a certain size. We only needed a rough estimate.
All that said, I have noticed zip applications like 7zip will create a zip file of a certain size (like a CD) and then split the zip off to a new file once it reaches the limit. You could look at that source code. I have actually used the command line version of that app in code before. They have a library you can use as well. Not sure how well that will integrate with Java though.
For what it is worth, I've also used a library called SharpZipLib. It was very good. I wonder if there is a Java port to it.
Maybe you could add a file each time, until you reach the 5MB limit, and then discard the last file. Like #Gopi, I don't think there is any way to estimate it without actually compressing the file.
Of course, file size will not increase (or maybe a little, because of the zip header?), so at least you have a "worst case" estimation.
just wanted to share how we implemented manual way
int maxSizeForAllFiles = 70000; // Read from property
int sizePerFile = 22000; // Red from property
/**
* Iterate all attachment list to verify if ZIP is required
*/
for (String attachFile : inputAttachmentList) {
File file = new File(attachFile);
totalFileSize += file.length();
/**
* if ZIP required ??? based on the size
*/
if (file.length() >= sizePerFile) {
toBeZipped = true;
logger.info("File: "
+ attachFile
+ " Size: "
+ file.length()
+ " File required to be zipped, MAX allowed per file: "
+ sizePerFile);
break;
}
}
/**
* Check if all attachments put together cross MAX_SIZE_FOR_ALL_FILES
*/
if (totalFileSize >= maxSizeForAllFiles) {
toBeZipped = true;
}
if (toBeZipped) {
// Zip Here iterating all attachments
}

Categories

Resources