How to extract end of data written inside a file in Java

How to extract end of data written inside a file in Java - java

I create an empty file with desired length in Android using Java like this:
long length = 10 * 1024 * 1024 * 1024;
String file = "PATH\\File.mp4";
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "rw");
randomAccessFil.setLength(length);
That code creates a file with desired length and with NULL data. Then I write data into the file like this:
randomAccessFile.write(DATA);
Now my question is : I want to extract end of data written into the File. I have written this function to find end of data as fast as possible with binary search:
long extractEndOfData(RandomAccessFile accessFile, long from, long end) throws IOException {
accessFile.seek(from);
if (accessFile.read() == 0) {
//this means no data has written into the file
return 0;
}
accessFile.seek(end);
if (accessFile.read() != 0) {
return end + 1;
}
long mid = (from + end) / 2;
accessFile.seek(mid);
if (accessFile.read() == 0) {
return extractEndOfData(accessFile, from, mid - 1);
} else {
if (accessFile.read() == 0) {
return mid + 1;
} else {
return extractEndOfData(accessFile, mid + 1, end);
}
}
}
and I call that function like this to find end of data into the file:
long endOfData = extractEndOfData(randomAccessFile, 0, randomAccessFile.length() - 1);
That function works fine for Files that their data begin with NON-NULL data and there is not any NULL data among data like this:
But for some some files it does not. because some files begin with NULL data as this:
What can i do to solve this problem? Thanks a lot.

I think your issue is clear: You will never be able to find out how much data is written (or where the end of the content) is, when you are only searching for a NULL inside the file. The reason is that NULL is a byte with the value 0x00, which appears in all kinds of binary files (maybe not textfiles) and on the other side, your file is initialized with NULLs.
What you could do is for example to store the size of your data written to the file in the first four bytes of the file.
So when writing the DATA to the file, first write the length of it, and then the actual data content.
But I am still wondering why you don't initialize the file's size to the size you need.

Related

Parsing files over 2.15 GB in Java using Kaitai Struct

I'm parsing large PCAP files in Java using Kaitai-Struct. Whenever the file size exceeds Integer.MAX_VALUE bytes I face an IllegalArgumentException caused by the size limit of the underlying ByteBuffer.
I haven't found references to this issue elsewhere, which leads me to believe that this is not a library limitation but a mistake in the way I'm using it.
Since the problem is caused by trying to map the whole file into the ByteBuffer I'd think that the solution would be mapping only the first region of the file, and as the data is being consumed map again skipping the data already parsed.
As this is done within the Kaitai Struct Runtime library it would mean to write my own class extending fom KatiaiStream and overwrite the auto-generated fromFile(...) method, and this doesn't really seem the right approach.
The auto-generated method to parse from file for the PCAP class is.
public static Pcap fromFile(String fileName) throws IOException {
return new Pcap(new ByteBufferKaitaiStream(fileName));
}
And the ByteBufferKaitaiStream provided by the Kaitai Struct Runtime library is backed by a ByteBuffer.
private final FileChannel fc;
private final ByteBuffer bb;
public ByteBufferKaitaiStream(String fileName) throws IOException {
fc = FileChannel.open(Paths.get(fileName), StandardOpenOption.READ);
bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
}
Which in turn is limitted by the ByteBuffer max size.
Am I missing some obvious workaround? Is it really a limitation of the implementation of Katiati Struct in Java?

There are two separate issues here:
Running Pcap.fromFile() for large files is generally not a very efficient method, as you'll eventually get all files parsed into memory array at once. A example on how to avoid that is given in kaitai_struct/issues/255. The basic idea is that you'd want to have control over how you read every packet, and then dispose of every packet after you've parsed / accounted it somehow.
2GB limit on Java's mmaped files. To mitigate that, you can use alternative RandomAccessFile-based KaitaiStream implementation: RandomAccessFileKaitaiStream — it might be slower, but it should avoid that 2GB problem.

This library provides a ByteBuffer implementation which uses long offset. I haven't tried this approach but looks promising. See section Mapping Files Bigger than 2 GB
http://www.kdgregory.com/index.php?page=java.byteBuffer
public int getInt(long index)
{
return buffer(index).getInt();
}
private ByteBuffer buffer(long index)
{
ByteBuffer buf = _buffers[(int)(index / _segmentSize)];
buf.position((int)(index % _segmentSize));
return buf;
}
public MappedFileBuffer(File file, int segmentSize, boolean readWrite)
throws IOException
{
if (segmentSize > MAX_SEGMENT_SIZE)
throw new IllegalArgumentException(
"segment size too large (max " + MAX_SEGMENT_SIZE + "): " + segmentSize);
_segmentSize = segmentSize;
_fileSize = file.length();
RandomAccessFile mappedFile = null;
try
{
String mode = readWrite ? "rw" : "r";
MapMode mapMode = readWrite ? MapMode.READ_WRITE : MapMode.READ_ONLY;
mappedFile = new RandomAccessFile(file, mode);
FileChannel channel = mappedFile.getChannel();
_buffers = new MappedByteBuffer[(int)(_fileSize / segmentSize) + 1];
int bufIdx = 0;
for (long offset = 0 ; offset < _fileSize ; offset += segmentSize)
{
long remainingFileSize = _fileSize - offset;
long thisSegmentSize = Math.min(2L * segmentSize, remainingFileSize);
_buffers[bufIdx++] = channel.map(mapMode, offset, thisSegmentSize);
}
}
finally
{
// close quietly
if (mappedFile != null)
{
try
{
mappedFile.close();
}
catch (IOException ignored) { /* */ }
}
}
}

read formatted BLE adverting data in android logcat but cant invoke it

I aim to use code via https://github.com/davidgyoung/ble-advert-counter/blob/master/app/src/main/java/com/radiusnetworks/blepacketcounter/MainActivity.java
to scan and read BLE device's adverting data.
The code works well. I could get the formatted adverting data via LogCat as pic shown.
But in the code I can't find the related log statement.
I didnt see BluetoothLeScanner class or onScanResult() method invoked.
And I want to obtain the String "ScanResult{mDevice=F3:E5:7F:73:4F:81, mScanRecord=ScanRecord..." to get the formatted data value.
How can I achieve this?
Thanks

I'm not sure about the logs but here's how you can get the data.
onLeScan() callback has all the information that is being printed in the logs. To get the device information you can use the device object from the call back(ex. device.getAddress()). Scan record will be in the callback's scanRecord byte array. You need to parse the array to get the information. I've used below code to parse the scan information.
public WeakHashMap<Integer, String> ParseRecord(byte[] scanRecord) {
WeakHashMap<Integer, String> ret = new WeakHashMap<>();
int index = 0;
while (index < scanRecord.length) {
int length = scanRecord[index++];
//Zero value indicates that we are done with the record now
if (length == 0) break;
int type = scanRecord[index];
//if the type is zero, then we are pass the significant section of the data,
// and we are thud done
if (type == 0) break;
byte[] data = Arrays.copyOfRange(scanRecord, index + 1, index + length);
if (data != null && data.length > 0) {
StringBuilder hex = new StringBuilder(data.length * 2);
// the data appears to be there backwards
for (int bb = data.length - 1; bb >= 0; bb--) {
hex.append(String.format("%02X", data[bb]));
}
ret.put(type, hex.toString());
}
index += length;
}
return ret;
}
Refer the below link to understand about the ble date advertisement.
BLE obtain uuid encoded in advertising packet
Hope this helps.

reading UTF-16 produces unexpected results

I use the beaglebuddy Java library in an Android project for reading/writing ID3 tags of mp3 files. I'm having an issue with reading the text that was previously written using the same library and could not find anything related in their docs.
Assume I write the following info:
MP3 mp3 = new MP3(pathToFile);
mp3.setLeadPerformer("Jon Skeet");
mp3.setTitle("A Million Rep");
mp3.save();
Looking at the source code of the library, I see that UTF-16 encoding is explicitly set, internally it calls
protected ID3v23Frame setV23Text(String text, FrameType frameType) {
return this.setV23Text(Encoding.UTF_16, text, frameType);
}
and
protected ID3v23Frame setV23Text(Encoding encoding, String text, FrameType frameType) {
ID3v23FrameBodyTextInformation frameBody = null;
ID3v23Frame frame = this.getV23Frame(frameType);
if(frame == null) {
frame = this.addV23Frame(frameType);
}
frameBody = (ID3v23FrameBodyTextInformation)frame.getBody();
frameBody.setEncoding(encoding);
frameBody.setText(encoding == Encoding.UTF_16?Utility.getUTF16String(text):text);
return frame;
}
At a later point, I read the data and it gives me some weird Chinese characters:
mp3.getLeadPerformer(); // 䄀 䴀椀氀氀椀漀渀 刀攀瀀
mp3.getTitle(); // 䨀漀渀 匀欀攀攀琀
I took a look at the built-in Utility.getUTF16String(String) method:
public static String getUTF16String(String string) {
String text = string;
byte[] bytes = string.getBytes(Encoding.UTF_16.getCharacterSet());
if(bytes.length < 2 || bytes[0] != -2 || bytes[1] != -1) {
byte[] bytez = new byte[bytes.length + 2];
bytes[0] = -2;
bytes[1] = -1;
System.arraycopy(bytes, 0, bytez, 2, bytes.length);
text = new String(bytez, Encoding.UTF_16.getCharacterSet());
}
return text;
}
I'm not quite getting the point of setting the first 2 bytes to -2 and -1 respectively, is this a pattern stating that the string is UTF-16 encoded?
However, I tried to explicitly call this method when reading the data, that seems to be readable, but always prepends some cryptic characters at the start:
Utility.getUTF16String(mp3.getLeadPerformer()); // ��Jon Skeet
Utility.getUTF16String(mp3.getTitle()); // ��A Million Rep
Since the count of those characters seems to be constant, I created a temporary workaround by simply cutting them off.
Fields like "comments" where the author does not explicitly enforce UTF-16 when writing are read without any issues.
I'm really curious about what's going on here and appreciate any suggestions.

SeekableByteChannel.read() always returns 0, InputStream is fine

We have a data file for which we need to generate a CRC. (As a placeholder, I'm using CRC32 while the others figure out what CRC polynomial they actually want.) This code seems like it ought to work:
broken:
Path in = ......;
try (SeekableByteChannel reading =
Files.newByteChannel (in, StandardOpenOption.READ))
{
System.err.println("byte channel is a " + reading.getClass().getName() +
" from " + in + " of size " + reading.size() + " and isopen=" + reading.isOpen());
java.util.zip.CRC32 placeholder = new java.util.zip.CRC32();
ByteBuffer buffer = ByteBuffer.allocate (reasonable_buffer_size);
int bytesread = 0;
int loops = 0;
while ((bytesread = reading.read(buffer)) > 0) {
byte[] raw = buffer.array();
System.err.println("Claims to have read " + bytesread + " bytes, have buffer of size " + raw.length + ", updating CRC");
placeholder.update(raw);
loops++;
buffer.clear();
}
// do stuff with placeholder.getValue()
}
catch (all the things that go wrong with opening files) {
and handle them;
}
The System.err and loops stuff is just for debugging; we don't actually care how many times it takes. The output is:
byte channel is a sun.nio.ch.FileChannelImpl from C:\working\tmp\ls2kst83543216xuxxy8136.tmp of size 7196 and isopen=true
finished after 0 time(s) through the loop
There's no way to run the real code inside a debugger to step through it, but from looking at the source to sun.nio.ch.FileChannelImpl.read() it looks like a 0 is returned if the file magically becomes closed while internal data structures are prepared; the code below is copied from the Java 7 reference implementation, comments added by me:
// sun.nio.ch.FileChannelImpl.java
public int read(ByteBuffer dst) throws IOException {
ensureOpen(); // this throws if file is closed...
if (!readable)
throw new NonReadableChannelException();
synchronized (positionLock) {
int n = 0;
int ti = -1;
Object traceContext = IoTrace.fileReadBegin(path);
try {
begin();
ti = threads.add();
if (!isOpen())
return 0; // ...argh
do {
n = IOUtil.read(fd, dst, -1, nd);
} while (......)
.......
But the debugging code tests isOpen() and gets true. So I don't know what's going wrong.
As the current test data files are tiny, I dropped this in place just to have something working:
works for now:
try {
byte[] scratch = Files.readAllBytes(in);
java.util.zip.CRC32 placeholder = new java.util.zip.CRC32();
placeholder.update(scratch);
// do stuff with placeholder.getValue()
}
I don't want to slurp the entire file into memory for the Real Code, because some of those files can be large. I do note that readAllBytes uses an InputStream in its reference implementation, which has no trouble reading the same file that SeekableByteChannel failed to. So I'll probably rewrite the code to just use input streams instead of byte channels. I'd still like to figure out what's gone wrong in case a future scenario comes up where we need to use byte channels. What am I missing with SeekableByteChannel?

Check that 'reasonable_buffer_size' isn't zero.

Generating a .ov2 file with Java

I am trying to figure out how to create a .ov2 file to add POI data to a TomTom GPS device. The format of the data needs to be as follow:
An OV2 file consists of POI records. Each record has the following data format.
1 BYTE, char, POI status ('0' or '2')
4 BYTES, long, denotes length of the POI record.
4 BYTES, long, longitude * 100000
4 BYTES, long, latitude * 100000
x BYTES, string, label for POI, x =3D=3D total length =96 (1 + 3 * 4)
Terminating null byte.
I found the following PHP code that is supposed to take a .csv file, go through it line by line, split each record and then write it into a new file in the proper format. I was hoping someone would be able to help me translate this to Java. I really only need the line I marked with the '--->' arrow. I do not know PHP at all, but everything other than that one line is basic enough that I can look at it and translate it, but I do not know what the PHP functions are doing on that one line. Even if someone could explain it well enough then maybe I could figure it out in Java. If you can translate it directly, please do, but even an explanation would be helpful. Thanks.
<?php
$csv = file("File.csv");
$nbcsv = count($csv);
$file = "POI.ov2";
$fp = fopen($file, "w");
for ($i = 0; $i < $nbcsv; $i++) {
$table = split(",", chop($csv[$i]));
$lon = $table[0];
$lat = $table[1];
$des = $table[2];
--->$TT = chr(0x02).pack("V",strlen($des)+14).pack("V",round($lon*100000)).pack("V",round($lat*100000)).$des.chr(0x00);
#fwrite($fp, "$TT");
}
fclose($fp);
?>

Load a file into an array, where each element is a line from the file.
$csv = file("File.csv");
Count the number of elements in the array.
$nbcsv = count($csv);
Open output file for writing.
$file = "POI.ov2";
$fp = fopen($file, "w");
While $i < number of array items, $i++
for ($i = 0; $i < $nbcsv; $i++) {
Right trim the line (remove all whitespace), and split the string by ','. $table is an array of values from the csv line.
$table = split(",", chop($csv[$i]));
Assign component parts of the table to their own variables by numeric index.
$lon = $table[0];
$lat = $table[1];
$des = $table[2];
The tricky bit.
chr(02) is literally character code number 2.
pack is a binary processing function. It takes a format and some data.
V = unsigned long (always 32 bit, little endian byte order).
I'm sure you can work out the maths bits, but you need to convert them into little endian order 32 bit values.
. is a string concat operator.
Finally it is terminated with chr(0). Null char.
$TT = chr(0x02).
pack("V",strlen($des)+14).
pack("V",round($lon*100000)).
pack("V",round($lat*100000)).
$des.chr(0x00);
Write it out and close the file.
#fwrite($fp, "$TT");
}
fclose($fp);

The key in JAVA is to apply proper byte order ByteOrder.LITTLE_ENDIAN to the ByteBuffer.
The whole function:
private static boolean getWaypoints(ArrayList<Waypoint> geopoints, File f)
{
try{
FileOutputStream fs = new FileOutputStream(f);
for (int i=0;i<geopoints.size();i++)
{
fs.write((byte)0x02);
String desc = geopoints.get(i).getName();
int poiLength = desc.toString().length()+14;
fs.write(ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN).putInt(poiLength).array());
int lon = (int)Math.round((geopoints.get(i).getLongitudeE6()/1E6)*100000);
fs.write(ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN).putInt(lon).array());
int lat = (int)Math.round((geopoints.get(i).getLatitudeE6()/1E6)*100000);
fs.write(ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN).putInt(lat).array());
fs.write(desc.toString().getBytes());
fs.write((byte)0x00);
}
fs.close();
return true;
}
catch (Exception e)
{
return false;
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to extract end of data written inside a file in Java - java

Related

Parsing files over 2.15 GB in Java using Kaitai Struct

read formatted BLE adverting data in android logcat but cant invoke it

reading UTF-16 produces unexpected results

SeekableByteChannel.read() always returns 0, InputStream is fine

Generating a .ov2 file with Java

Categories

Resources