Im trying to extract data from .wotreplay file which is basically replay file of a battle in World Of Tanks battle.
When I put that file in JSON file reader in internet browser i can see some weird symbols and some normal text.
All i need is to read one object( named "vehicles") from that file into Java. Anyone can help?
Link to sample .wotreplay file.
Highlighted data from file that i need to get.
Sorry for my English and thanks for any tips!
Based on this post, I created a simple code example with comments, which should do exactly what you want.
public class Main {
private static ByteBuffer littleEndianBuffer = ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN);
private static final int MAGIC = 0x12323411;
private static ByteBuffer readLittleEndianBuffer(DataInputStream in, int len) throws IOException {
littleEndianBuffer.clear();
byte[] bytes = new byte[len];
int i = in.read(bytes);
if (i < len) {
throw new EOFException();
}
littleEndianBuffer.put(bytes);
littleEndianBuffer.rewind();
return littleEndianBuffer;
}
private static int readLittleEndianInt(DataInputStream in) throws IOException {
return readLittleEndianBuffer(in, 4).getInt();
}
public static void main(String[] args) throws IOException {
// Replace with your own file or maybe argument
File f = new File("D:\\Downloads\\test2.wotreplay");
DataInputStream in = new DataInputStream(new FileInputStream(f));
// This is probably a 'magic' value, which identifies file format. Feel free, to just ignore those bytes
int magic = in.readInt();
System.out.println("Magic matches: " + (magic == MAGIC));
// Seems to contain the value of the winning team (1 or 2)
int winner = readLittleEndianInt(in);
System.out.println("Winner: " + winner);
// The size of JSON string is in little endian order, so we need to reverse it to big endian (96 54 00 00 -> 00 00 54 96)
int size = readLittleEndianInt(in);
System.out.println("Json length: " + size);
// Read all bytes and create string from bytes
byte[] bytes = new byte[size];
System.out.println("Size matches: " + (in.read(bytes) == size));
System.out.println(new String(bytes));
}
}
I would comment to existing answer, but I don't have enough reputation on stack to comment yet.
According to this forum post and the research done by that poster (credits to him/her) you can get the required info by reading the file:
Ignore first 8 bytes (1-8)
Read the next 4 bytes (9-12) as an int. This is the length of the first chunk, which is the one you need. Let's refer to it as chuck1length.
Read the next chuck1length bytes (starting from byte 13) as a String.
Parse the string as JSON.
Done.
I have just developed a python package to do exactly this.
It extracts data from .wotreplay files and stores the results into a local database.
https://pypi.org/project/wotreplay/1.0.0/
Related
This question already has answers here:
What is simplest way to read a file into String? [duplicate]
(9 answers)
Closed 1 year ago.
(The file includes a paragraph)
A rainy day is the bearer of good weather with refrigerating breeze and rain showers. It refreshes everyone by making the climate cool and delightsome and brings in a sigh of relief from the scorching heat. Rainy day gives us relief from the usually hot and humid climate.
Can I just save the entire paragraph to a single String variable (say its called "fullStory")?
File story = new File("story.txt");
//Help read words in file
FileInputStream fileStr1 = new FileInputStream(story);
//Reads one character
InputStreamReader input_1 = new InputStreamReader(fileStr1);
//BufferedReader reads from the input stream
BufferedReader wordReader1 = new BufferedReader(input_1);
This here reads all bytes of a file:
static public byte[] readBytes(final File pFile) throws FileNotFoundException, IOException {
final long size = pFile.length();
if (size > Integer.MAX_VALUE) throw new RuntimeException("File size is too big to be read into array!");
final byte[] ret = new byte[(int) size];
try (FileInputStream fis = new FileInputStream(pFile)) {
fis.read(ret);
}
return ret;
}
So once you get the bytes, you can convert it into a String:
byte[] data = readBytes(file);
String text = new String(text, StandardCharsets.UTF_8);
What I need to mention: this can only read files up to 2 GBytes, because 2^31-1 is the max amount of bytes an int can address and thus a Java array can hold data.
But having one text file with such a size is highly unlikely and by design a problem, so this here is the fastest and easiest way.
Since Java 1.7 there's also Files.readAllBytes(path), but that hits the same restrictions.
After a week of work I designed a binary file format, and made a Java reader for it. It's just an experiment, which works fine, unless I'm using the GZip compression function.
I called my binary type MBDF (Minimal Binary Database Format), and it can store 8 different types:
Integer (There is nothing like a byte, short, long or anything like that, since it is stored in flexible space (bigger numbers take more space))
Float-32 (32-bits floating point format, like java's float type)
Float-64 (64-bits floating point format, like java's double type)
String (A string in UTF-16 format)
Boolean
Null (Just specifies a null value)
Array (Something like java's ArrayList<Object>)
Compound (A String - Object map)
I used this data as test data:
COMPOUND {
float1: FLOAT_32 3.3
bool2: BOOLEAN true
float2: FLOAT_64 3.3
int1: INTEGER 3
compound1: COMPOUND {
xml: STRING "two length compound"
int: INTEGER 23
}
string1: STRING "Hello world!"
string2: STRING "3"
arr1: ARRAY [
STRING "Hello world!"
INTEGER 3
STRING "3"
FLOAT_32 3.29
FLOAT_64 249.2992
BOOLEAN true
COMPOUND {
str: STRING "one length compound"
}
BOOLEAN false
NULL null
]
bool1: BOOLEAN false
null1: NULL null
}
The xml key in a compound does matter!!
I made a file from it using this java code:
MBDFFile.writeMBDFToFile(
"/Users/<anonymous>/Documents/Java/MBDF/resources/file.mbdf",
b.makeMBDF(false)
);
Here, the variable b is a MBDFBinary object, containing all the data given above. With the makeMBDF function it generates the ISO 8859-1 encoded string and if the given boolean is true, it compresses the string using GZip. Then, when writing, an extra information character is added at the beginning of the file, containing information about how to read it back.
Then, after writing the file, I read it back into java and parse it
MBDF mbdf = MBDFFile.readMBDFFromFile("/Users/<anonymous>/Documents/Java/MBDF/resources/file.mbdf");
System.out.println(mbdf.getBinaryObject().parse());
This prints exactly the information mentioned above.
Then I try to use compression:
MBDFFile.writeMBDFToFile(
"/Users/<anonymous>/Documents/Java/MBDF/resources/file.mbdf",
b.makeMBDF(true)
);
I do exactly the same to read it back as I did with the uncompressed file, which should work. It prints this information:
COMPOUND {
float1: FLOAT_32 3.3
bool2: BOOLEAN true
float2: FLOAT_64 3.3
int1: INTEGER 3
compound1: COMPOUND {
xUT: STRING 'two length compound'
int: INTEGER 23
}
string1: STRING 'Hello world!'
string2: STRING '3'
arr1: ARRAY [
STRING 'Hello world!'
INTEGER 3
STRING '3'
FLOAT_32 3.29
FLOAT_64 249.2992
BOOLEAN true
COMPOUND {
str: STRING 'one length compound'
}
BOOLEAN false
NULL null
]
bool1: BOOLEAN false
null1: NULL null
}
Comparing it to the initial information, the name xml changed into xUT for some reason...
After some research I found little differences in binary data between before the compression and after the compression. Such patterns as 110011 change into 101010.
When I make the name xml longer, like xmldm, it is just parsed as xmldm for some reason.
I currently saw the problem only occur on names with three characters.
Directly compressing and decompressing the generated string (without saving it to a file and reading that) does work, so maybe the bug is caused by the file encoding.
As far as I know, the string output is in ISO 8859-1 format, but I couldn't get the file encoding right. When a file is read, it is read as it has to be read, and all the characters are read as ISO 8859-1 characters.
I've some things that could be a reason, I actually don't know how to test them:
The GZip output has a different encoding than the uncompressed encoding, causing small differences while storing as a file.
The file is stored as UTF-8 format, just ignoring the order to be ISO 8859-1 encoding ( don't know how to explain :) )
There is a little bug in the java GZip libraries.
But which one is true, and if none of them is right, what is the true reason for this bug?
I couldn't figure it out right now.
The MBDFFile class, reading and storing the files:
/* MBDFFile.java */
package com.redgalaxy.mbdf;
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class MBDFFile {
public static MBDF readMBDFFromFile(String filename) throws IOException {
// FileInputStream is = new FileInputStream(filename);
// InputStreamReader isr = new InputStreamReader(is, "ISO-8859-1");
// BufferedReader br = new BufferedReader(isr);
//
// StringBuilder builder = new StringBuilder();
//
// String currentLine;
//
// while ((currentLine = br.readLine()) != null) {
// builder.append(currentLine);
// builder.append("\n");
// }
//
// builder.deleteCharAt(builder.length() - 1);
//
//
// br.close();
Path path = Paths.get(filename);
byte[] data = Files.readAllBytes(path);
return new MBDF(new String(data, "ISO-8859-1"));
}
private static void writeToFile(String filename, byte[] txt) throws IOException {
// BufferedWriter writer = new BufferedWriter(new FileWriter(filename));
//// FileWriter writer = new FileWriter(filename);
// writer.write(txt.getBytes("ISO-8859-1"));
// writer.close();
// PrintWriter pw = new PrintWriter(filename, "ISO-8859-1");
FileOutputStream stream = new FileOutputStream(filename);
stream.write(txt);
stream.close();
}
public static void writeMBDFToFile(String filename, MBDF info) throws IOException {
writeToFile(filename, info.pack().getBytes("ISO-8859-1"));
}
}
The pack function generates the final string for the file, in ISO 8859-1 format.
For all the other code, see my MBDF Github repository.
I commented the code I've tried, trying to show what I tried.
My workspace:
- Macbook Air '11 (High Sierra)
- IntellIJ Community 2017.3
- JDK 1.8
I hope this is enough information, this is actually the only way to make clear what I'm doing, and what exactly isn't working.
Edit: MBDF.java
/* MBDF.java */
package com.redgalaxy.mbdf;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
public class MBDF {
private String data;
private InfoTag tag;
public MBDF(String data) {
this.tag = new InfoTag((byte) data.charAt(0));
this.data = data.substring(1);
}
public MBDF(String data, InfoTag tag) {
this.tag = tag;
this.data = data;
}
public MBDFBinary getBinaryObject() throws IOException {
String uncompressed = data;
if (tag.isCompressed) {
uncompressed = GZipUtils.decompress(data);
}
Binary binary = getBinaryFrom8Bit(uncompressed);
return new MBDFBinary(binary.subBit(0, binary.getLen() - tag.trailing));
}
public static Binary getBinaryFrom8Bit(String s8bit) {
try {
byte[] bytes = s8bit.getBytes("ISO-8859-1");
return new Binary(bytes, bytes.length * 8);
} catch( UnsupportedEncodingException ignored ) {
// This is not gonna happen because encoding 'ISO-8859-1' is always supported.
return new Binary(new byte[0], 0);
}
}
public static String get8BitFromBinary(Binary binary) {
try {
return new String(binary.getByteArray(), "ISO-8859-1");
} catch( UnsupportedEncodingException ignored ) {
// This is not gonna happen because encoding 'ISO-8859-1' is always supported.
return "";
}
}
/*
* Adds leading zeroes to the binary string, so that the final amount of bits is 16
*/
private static String addLeadingZeroes(String bin, boolean is16) {
int len = bin.length();
long amount = (long) (is16 ? 16 : 8) - len;
// Create zeroes and append binary string
StringBuilder zeroes = new StringBuilder();
for( int i = 0; i < amount; i ++ ) {
zeroes.append(0);
}
zeroes.append(bin);
return zeroes.toString();
}
public String pack(){
return tag.getFilePrefixChar() + data;
}
public String getData() {
return data;
}
public InfoTag getTag() {
return tag;
}
}
This class contains the pack() method. data is already compressed here (if it should be).
For other classes, please watch the Github repository, I don't want to make my question too long.
Solved it by myself!
It seemed to be the reading and writing system. When I exported a file, I made a string using the ISO-8859-1 table to turn bytes into characters. I wrote that string to a text file, which is UTF-8. The big problem was that I used FileWriter instances to write it, which are for text files.
Reading used the inverse system. The complete file was read into memory as a string (memory consuming!!) and was then being decoded.
I didn't know a file was binary data, where specific formats of them form text data. ISO-8859-1 and UTF-8 are some of those formats. I had problems with UTF-8, because it splitted some characters into two bytes, which I couldn't manage...
My solution to it was to use streams. There exist FileInputStreams and FileOutputStreams in Java, which could be used for reading and writing binary files. I didn't use the streams, as I thought there was no big difference ("files are text, so what's the problem?"), but there is... I implemented this (by writing a new similar library) and I'm now able to pass every input stream to the decoder and every output stream to the encoder. To make uncompressed files, you need to pass a FileOutputStream. GZipped files could use GZipOutputStreams, relying on a FileOutputStream. If someone wants a string with the binary data, a ByteArrayOutputStream could be used. Same rules apply to reading, where the InputStream variant of the mentioned streams should be used.
No UTF-8 or ISO-8859-1 problems anymore, and it seemed to work, even with GZip!
We have a data file for which we need to generate a CRC. (As a placeholder, I'm using CRC32 while the others figure out what CRC polynomial they actually want.) This code seems like it ought to work:
broken:
Path in = ......;
try (SeekableByteChannel reading =
Files.newByteChannel (in, StandardOpenOption.READ))
{
System.err.println("byte channel is a " + reading.getClass().getName() +
" from " + in + " of size " + reading.size() + " and isopen=" + reading.isOpen());
java.util.zip.CRC32 placeholder = new java.util.zip.CRC32();
ByteBuffer buffer = ByteBuffer.allocate (reasonable_buffer_size);
int bytesread = 0;
int loops = 0;
while ((bytesread = reading.read(buffer)) > 0) {
byte[] raw = buffer.array();
System.err.println("Claims to have read " + bytesread + " bytes, have buffer of size " + raw.length + ", updating CRC");
placeholder.update(raw);
loops++;
buffer.clear();
}
// do stuff with placeholder.getValue()
}
catch (all the things that go wrong with opening files) {
and handle them;
}
The System.err and loops stuff is just for debugging; we don't actually care how many times it takes. The output is:
byte channel is a sun.nio.ch.FileChannelImpl from C:\working\tmp\ls2kst83543216xuxxy8136.tmp of size 7196 and isopen=true
finished after 0 time(s) through the loop
There's no way to run the real code inside a debugger to step through it, but from looking at the source to sun.nio.ch.FileChannelImpl.read() it looks like a 0 is returned if the file magically becomes closed while internal data structures are prepared; the code below is copied from the Java 7 reference implementation, comments added by me:
// sun.nio.ch.FileChannelImpl.java
public int read(ByteBuffer dst) throws IOException {
ensureOpen(); // this throws if file is closed...
if (!readable)
throw new NonReadableChannelException();
synchronized (positionLock) {
int n = 0;
int ti = -1;
Object traceContext = IoTrace.fileReadBegin(path);
try {
begin();
ti = threads.add();
if (!isOpen())
return 0; // ...argh
do {
n = IOUtil.read(fd, dst, -1, nd);
} while (......)
.......
But the debugging code tests isOpen() and gets true. So I don't know what's going wrong.
As the current test data files are tiny, I dropped this in place just to have something working:
works for now:
try {
byte[] scratch = Files.readAllBytes(in);
java.util.zip.CRC32 placeholder = new java.util.zip.CRC32();
placeholder.update(scratch);
// do stuff with placeholder.getValue()
}
I don't want to slurp the entire file into memory for the Real Code, because some of those files can be large. I do note that readAllBytes uses an InputStream in its reference implementation, which has no trouble reading the same file that SeekableByteChannel failed to. So I'll probably rewrite the code to just use input streams instead of byte channels. I'd still like to figure out what's gone wrong in case a future scenario comes up where we need to use byte channels. What am I missing with SeekableByteChannel?
Check that 'reasonable_buffer_size' isn't zero.
I have written a Java program to write the ByteArray in to a file. And that resulting ByteArray is a resulting of these three ByteArrays-
First 2 bytes is my schemaId which I have represented it using short data type.
Then next 8 Bytes is my Last Modified Date which I have represented it using long data type.
And remaining bytes can be of variable size which is my actual value for my attributes..
So I have a file now in which first line contains resulting ByteArray which will have all the above bytes as I mentioned above.. Now I need to read that file from C++ program and read the first line which will contain the ByteArray and then split that resulting ByteArray accordingly as I mentioned above such that I am able to extract my schemaId, Last Modified Date and my actual attribute value from it.
I have done all my coding always in Java and I am new to C++... I am able to write a program in C++ to read the file but not sure how should I read that ByteArray in such a way such that I am able to split it as I mentioned above..
Below is my C++ program which is reading the file and printing it out on the console..
int main () {
string line;
//the variable of type ifstream:
ifstream myfile ("bytearrayfile");
//check to see if the file is opened:
if (myfile.is_open())
{
//while there are still lines in the
//file, keep reading:
while (! myfile.eof() )
{
//place the line from myfile into the
//line variable:
getline (myfile,line);
//display the line we gathered:
// and here split the byte array accordingly..
cout << line << endl;
}
//close the stream:
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
Can anyone help me with that? Thanks.
Update
Below is my java code which will write resulting ByteArray into a file and the same file now I need to read it back from c++..
public static void main(String[] args) throws Exception {
String os = "whatever os is";
byte[] avroBinaryValue = os.getBytes();
long lastModifiedDate = 1379811105109L;
short schemaId = 32767;
ByteArrayOutputStream byteOsTest = new ByteArrayOutputStream();
DataOutputStream outTest = new DataOutputStream(byteOsTest);
outTest.writeShort(schemaId);
outTest.writeLong(lastModifiedDate);
outTest.writeInt(avroBinaryValue.length);
outTest.write(avroBinaryValue);
byte[] allWrittenBytesTest = byteOsTest.toByteArray();
DataInputStream inTest = new DataInputStream(new ByteArrayInputStream(allWrittenBytesTest));
short schemaIdTest = inTest.readShort();
long lastModifiedDateTest = inTest.readLong();
int sizeAvroTest = inTest.readInt();
byte[] avroBinaryValue1 = new byte[sizeAvroTest];
inTest.read(avroBinaryValue1, 0, sizeAvroTest);
System.out.println(schemaIdTest);
System.out.println(lastModifiedDateTest);
System.out.println(new String(avroBinaryValue1));
writeFile(allWrittenBytesTest);
}
/**
* Write the file in Java
* #param byteArray
*/
public static void writeFile(byte[] byteArray) {
try{
File file = new File("bytearrayfile");
FileOutputStream output = new FileOutputStream(file);
IOUtils.write(byteArray, output);
} catch (Exception ex) {
ex.printStackTrace();
}
}
It doesn't look like you want to use std::getline to read this data. Your file isn't written as text data on a line-by-line basis - it basically has a binary format.
You can use the read method of std::ifstream to read arbitrary chunks of data from an input stream. You probably want to open the file in binary mode:
std::ifstream myfile("bytearrayfile", std::ios::binary);
Fundamentally the method you would use to read each record from the file is:
uint16_t schemaId;
uint64_t lastModifiedDate;
uint32_t binaryLength;
myfile.read(reinterpret_cast<char*>(&schemaId), sizeof(schemaId));
myfile.read(reinterpret_cast<char*>(&lastModifiedDate), sizeof(lastModifiedDate));
myfile.read(reinterpret_cast<char*>(&binaryLength), sizeof(binaryLength));
This will read the three static members of your data structure from the file. Because your data is variable size, you probably need to allocate a buffer to read it into, for example:
std::unique_ptr<char[]> binaryBuf(new char[binaryLength]);
myfile.read(binaryBuf.get(), binaryLength);
The above are examples only to illustrate how you would approach this in C++. You will need to be aware of the following things:
There's no error checking in the above examples. You'll need to check that the calls to ifstream::read are successful and return the correct amount of data.
Endianness may be an issue, depending on the the platform the data originates from and is being read on.
Interpreting the lastModifiedDate field may require you to write a function to convert it from whatever format Java uses (I have no idea about Java).
I have written a code to split a .gz file into user specified parts using byte[] array. But the for loop is not reading/writing the last part of the parent file which is less than the array size. Can you please help me in fixing this?
package com.bitsighttech.collection.packaging;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.log4j.Logger;
public class FileSplitterBytewise
{
private static Logger logger = Logger.getLogger(FileSplitterBytewise.class);
private static final long KB = 1024;
private static final long MB = KB * KB;
private FileInputStream fis;
private FileOutputStream fos;
private DataInputStream dis;
private DataOutputStream dos;
public boolean split(File inputFile, String splitSize)
{
int expectedNoOfFiles =0;
try
{
double parentFileSizeInB = inputFile.length();
Pattern p = Pattern.compile("(\\d+)\\s([MmGgKk][Bb])");
Matcher m = p.matcher(splitSize);
m.matches();
String FileSizeString = m.group(1);
String unit = m.group(2);
double FileSizeInMB = 0;
try {
if (unit.toLowerCase().equals("kb"))
FileSizeInMB = Double.parseDouble(FileSizeString) / KB;
else if (unit.toLowerCase().equals("mb"))
FileSizeInMB = Double.parseDouble(FileSizeString);
else if (unit.toLowerCase().equals("gb"))
FileSizeInMB = Double.parseDouble(FileSizeString) * KB;
} catch (NumberFormatException e) {
logger.error("invalid number [" + FileSizeInMB + "] for expected file size");
}
double fileSize = FileSizeInMB * MB;
int fileSizeInByte = (int) Math.ceil(fileSize);
double noOFFiles = parentFileSizeInB/fileSizeInByte;
expectedNoOfFiles = (int) Math.ceil(noOFFiles);
int splinterCount = 1;
fis = new FileInputStream(inputFile);
dis = new DataInputStream(new BufferedInputStream(fis));
fos = new FileOutputStream("F:\\ff\\" + "_part_" + splinterCount + "_of_" + expectedNoOfFiles);
dos = new DataOutputStream(new BufferedOutputStream(fos));
byte[] data = new byte[(int) fileSizeInByte];
while ( splinterCount <= expectedNoOfFiles ) {
int i;
for(i = 0; i<data.length-1; i++)
{
data[i] = s.readByte();
}
dos.write(data);
splinterCount ++;
}
}
catch(Exception e)
{
logger.error("Unable to split the file " + inputFile.getName() + " in to " + expectedNoOfFiles);
return false;
}
logger.debug("Successfully split the file [" + inputFile.getName() + "] in to " + expectedNoOfFiles + " files");
return true;
}
public static void main(String args[])
{
String FilePath1 = "F:\\az.gz";
File file= new File(FilePath1);
FileSplitterBytewise fileSplitter = new FileSplitterBytewise();
String splitlen = "1 MB";
fileSplitter.split(file, splitlen);
}
}
I'd suggest to make more methods. You've got a complicated string-handling section of code in split(); it would be best to make a method that takes the human-friendly string as input and returns the number you're looking for. (It would also make it far easier for you to test this section of the routine; there's no way you can test it now.)
Once it is split off and you're writing test cases, you'll probably find that the error message you generate if the string doesn't contain kb, mb, or gb is extremely confusing -- it blames the number 0 for the mistake rather than pointing out the string does not have the expected units.
Using an int to store the file size means your program will never handle files larger than two gigabytes. You should stick with long or double. (double feels wrong for something that is actually confined to integer values but I can't quickly think why it would fail.)
byte[] data = new byte[(int) fileSizeInByte];
Allocating several gigabytes like this is going to destroy your performance -- that's a potentially huge memory allocation (and one that might be considered under control of an adversary; depending upon your security model, this might or might not be a big deal). Don't try to work with the entire file in one piece.
You appear to be reading and writing the files one byte at a time. That's a guarantee to very slow performance. Doing some performance testing for another question earlier today, I found that my machine could read (from a hot cache) 2000 times faster using 131kb blocks than two-byte blocks. One-byte blocks would be even worse. A cold cache would be significantly worse for such small sizes.
fos = new FileOutputStream("F:\\ff\\" + "_part_" + splinterCount + "_of_" + expectedNoOfFiles);
You only appear to ever open one file output stream. Your post probably should have said "only the first works", because it looks like you've not yet tried it on a file that creates three or more pieces.
catch(Exception e)
At this point, you've got the ability to discover errors in your program; you choose to ignore them completely. Sure, you log an error message, but you cannot actually debug your program with the data you log. You should log at a minimum the exception type, message, and maybe even full stack-trace. This combination of data is immensely useful when trying to solve problems, especially in a few months when you've forgotten the details of how it works.
Can you please help me in fixing this?
I would use;
drop the DataInput/OutputStreams, you don't need them.
use in.read(data) to read the whole block instead on one byte at a time. Reading one byte at a time is so much slower!
or read the whole of the data array, you are reading one less.
stop when you reach the end of the file, it might not be a whole multiple of the size.
only write as much as you have read, if your blocks at 1 MB byte there is 100 KB left you should only read/write 100 KB at the end.
close your files when have finished, esp as you have a buffered stream.
you "split" writes everything to the same file (so its not actually splitting) You need to create, write to and close output files in a loop.
don't use fields when you could be/should be using local variables.
would use the length as a long in bytes.
the pattern ignores incorrect input and your pattern doesn't match the test you check for. e.g. your patten allows 1 G or 1 k but these will be treated as 1 MB.