Generating a .ov2 file with Java - java

I am trying to figure out how to create a .ov2 file to add POI data to a TomTom GPS device. The format of the data needs to be as follow:
An OV2 file consists of POI records. Each record has the following data format.
1 BYTE, char, POI status ('0' or '2')
4 BYTES, long, denotes length of the POI record.
4 BYTES, long, longitude * 100000
4 BYTES, long, latitude * 100000
x BYTES, string, label for POI, x =3D=3D total length =96 (1 + 3 * 4)
Terminating null byte.
I found the following PHP code that is supposed to take a .csv file, go through it line by line, split each record and then write it into a new file in the proper format. I was hoping someone would be able to help me translate this to Java. I really only need the line I marked with the '--->' arrow. I do not know PHP at all, but everything other than that one line is basic enough that I can look at it and translate it, but I do not know what the PHP functions are doing on that one line. Even if someone could explain it well enough then maybe I could figure it out in Java. If you can translate it directly, please do, but even an explanation would be helpful. Thanks.
<?php
$csv = file("File.csv");
$nbcsv = count($csv);
$file = "POI.ov2";
$fp = fopen($file, "w");
for ($i = 0; $i < $nbcsv; $i++) {
$table = split(",", chop($csv[$i]));
$lon = $table[0];
$lat = $table[1];
$des = $table[2];
--->$TT = chr(0x02).pack("V",strlen($des)+14).pack("V",round($lon*100000)).pack("V",round($lat*100000)).$des.chr(0x00);
#fwrite($fp, "$TT");
}
fclose($fp);
?>

Load a file into an array, where each element is a line from the file.
$csv = file("File.csv");
Count the number of elements in the array.
$nbcsv = count($csv);
Open output file for writing.
$file = "POI.ov2";
$fp = fopen($file, "w");
While $i < number of array items, $i++
for ($i = 0; $i < $nbcsv; $i++) {
Right trim the line (remove all whitespace), and split the string by ','. $table is an array of values from the csv line.
$table = split(",", chop($csv[$i]));
Assign component parts of the table to their own variables by numeric index.
$lon = $table[0];
$lat = $table[1];
$des = $table[2];
The tricky bit.
chr(02) is literally character code number 2.
pack is a binary processing function. It takes a format and some data.
V = unsigned long (always 32 bit, little endian byte order).
I'm sure you can work out the maths bits, but you need to convert them into little endian order 32 bit values.
. is a string concat operator.
Finally it is terminated with chr(0). Null char.
$TT = chr(0x02).
pack("V",strlen($des)+14).
pack("V",round($lon*100000)).
pack("V",round($lat*100000)).
$des.chr(0x00);
Write it out and close the file.
#fwrite($fp, "$TT");
}
fclose($fp);

The key in JAVA is to apply proper byte order ByteOrder.LITTLE_ENDIAN to the ByteBuffer.
The whole function:
private static boolean getWaypoints(ArrayList<Waypoint> geopoints, File f)
{
try{
FileOutputStream fs = new FileOutputStream(f);
for (int i=0;i<geopoints.size();i++)
{
fs.write((byte)0x02);
String desc = geopoints.get(i).getName();
int poiLength = desc.toString().length()+14;
fs.write(ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN).putInt(poiLength).array());
int lon = (int)Math.round((geopoints.get(i).getLongitudeE6()/1E6)*100000);
fs.write(ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN).putInt(lon).array());
int lat = (int)Math.round((geopoints.get(i).getLatitudeE6()/1E6)*100000);
fs.write(ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN).putInt(lat).array());
fs.write(desc.toString().getBytes());
fs.write((byte)0x00);
}
fs.close();
return true;
}
catch (Exception e)
{
return false;
}
}

Related

How to extract end of data written inside a file in Java

I create an empty file with desired length in Android using Java like this:
long length = 10 * 1024 * 1024 * 1024;
String file = "PATH\\File.mp4";
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "rw");
randomAccessFil.setLength(length);
That code creates a file with desired length and with NULL data. Then I write data into the file like this:
randomAccessFile.write(DATA);
Now my question is : I want to extract end of data written into the File. I have written this function to find end of data as fast as possible with binary search:
long extractEndOfData(RandomAccessFile accessFile, long from, long end) throws IOException {
accessFile.seek(from);
if (accessFile.read() == 0) {
//this means no data has written into the file
return 0;
}
accessFile.seek(end);
if (accessFile.read() != 0) {
return end + 1;
}
long mid = (from + end) / 2;
accessFile.seek(mid);
if (accessFile.read() == 0) {
return extractEndOfData(accessFile, from, mid - 1);
} else {
if (accessFile.read() == 0) {
return mid + 1;
} else {
return extractEndOfData(accessFile, mid + 1, end);
}
}
}
and I call that function like this to find end of data into the file:
long endOfData = extractEndOfData(randomAccessFile, 0, randomAccessFile.length() - 1);
That function works fine for Files that their data begin with NON-NULL data and there is not any NULL data among data like this:
But for some some files it does not. because some files begin with NULL data as this:
What can i do to solve this problem? Thanks a lot.
I think your issue is clear: You will never be able to find out how much data is written (or where the end of the content) is, when you are only searching for a NULL inside the file. The reason is that NULL is a byte with the value 0x00, which appears in all kinds of binary files (maybe not textfiles) and on the other side, your file is initialized with NULLs.
What you could do is for example to store the size of your data written to the file in the first four bytes of the file.
So when writing the DATA to the file, first write the length of it, and then the actual data content.
But I am still wondering why you don't initialize the file's size to the size you need.

reading UTF-16 produces unexpected results

I use the beaglebuddy Java library in an Android project for reading/writing ID3 tags of mp3 files. I'm having an issue with reading the text that was previously written using the same library and could not find anything related in their docs.
Assume I write the following info:
MP3 mp3 = new MP3(pathToFile);
mp3.setLeadPerformer("Jon Skeet");
mp3.setTitle("A Million Rep");
mp3.save();
Looking at the source code of the library, I see that UTF-16 encoding is explicitly set, internally it calls
protected ID3v23Frame setV23Text(String text, FrameType frameType) {
return this.setV23Text(Encoding.UTF_16, text, frameType);
}
and
protected ID3v23Frame setV23Text(Encoding encoding, String text, FrameType frameType) {
ID3v23FrameBodyTextInformation frameBody = null;
ID3v23Frame frame = this.getV23Frame(frameType);
if(frame == null) {
frame = this.addV23Frame(frameType);
}
frameBody = (ID3v23FrameBodyTextInformation)frame.getBody();
frameBody.setEncoding(encoding);
frameBody.setText(encoding == Encoding.UTF_16?Utility.getUTF16String(text):text);
return frame;
}
At a later point, I read the data and it gives me some weird Chinese characters:
mp3.getLeadPerformer(); // 䄀 䴀椀氀氀椀漀渀 刀攀瀀
mp3.getTitle(); // 䨀漀渀 匀欀攀攀琀
I took a look at the built-in Utility.getUTF16String(String) method:
public static String getUTF16String(String string) {
String text = string;
byte[] bytes = string.getBytes(Encoding.UTF_16.getCharacterSet());
if(bytes.length < 2 || bytes[0] != -2 || bytes[1] != -1) {
byte[] bytez = new byte[bytes.length + 2];
bytes[0] = -2;
bytes[1] = -1;
System.arraycopy(bytes, 0, bytez, 2, bytes.length);
text = new String(bytez, Encoding.UTF_16.getCharacterSet());
}
return text;
}
I'm not quite getting the point of setting the first 2 bytes to -2 and -1 respectively, is this a pattern stating that the string is UTF-16 encoded?
However, I tried to explicitly call this method when reading the data, that seems to be readable, but always prepends some cryptic characters at the start:
Utility.getUTF16String(mp3.getLeadPerformer()); // ��Jon Skeet
Utility.getUTF16String(mp3.getTitle()); // ��A Million Rep
Since the count of those characters seems to be constant, I created a temporary workaround by simply cutting them off.
Fields like "comments" where the author does not explicitly enforce UTF-16 when writing are read without any issues.
I'm really curious about what's going on here and appreciate any suggestions.

Reading binary file of doubles written in Java with ObjectOutputStream in Python with numpy.fromfile

I've written an array of doubles in binary format to a file using the ObjectOutputStream's writeDouble() function in Java. When I try to read this file on Python using numpy.fromfile, it doesn't give me the same values. When I try to move around in bits using seek(), it still doesn't help.
If I do the same procedure with 32 bit int, it works, but there's always a bit of the beginning of the file I need to iterate past using seek() because it's just gibberish I can't distinguish.
Relevant Java code:
//arr is an array of type double
try {
ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("data.bin"));
for (int i = 1; i <= 10; i++) {
out.writeDouble(arr[i]);
}
out.close();
} catch (IOException ex) {
Logger.getLogger(Project.class.getName()).log(Level.SEVERE, null, ex);
}
Relevant Python code:
datafile1 = open("data.bin", "rb")
data = np.fromfile(datafile1, dtype=np.float64, count=-1, sep='')
print data
Almost the same, but now no metadata is added to the file:
OutputStream os = new FileOutputStream("data.bin");
DataOutputStream dos = new DataOutputStream( os );
for (int i = 1; i <= 10; i++) {
dos.writeDouble(arr[i]);
}
I know it's been a few years. For posterity, here's how I got this to work with DataOutputStream.writeFloat().
As per https://stackoverflow.com/a/27681630, DataOutputStream writes in Big Endian. Numpy apparently expects binary files to be written in Little Endian, the reverse of that. The solution is to perform byteswap() on the received array.
np.fromfile('filename', np.float32).byteswap()
As you're working with doubles, I suspect you'd need to pass in np.float64 as the second argument instead. If that doesn't work, here's numpy's full list of dtypes:
https://docs.scipy.org/doc/numpy/user/basics.types.html

Problems to compress Excel files, JAVA

I have some problems compressing excel files using the Hffman algorthim. The thing is that my code seems to work with .txt files, but when I'm trying to compress .xlsx or older versions of excel an error occurs.
First of all, I read my file like this:
File file = new File("fileName.xlsx");
byte[] dataOfFile = new byte[(int) file.length()];
DataInputStream dis = new DataInputStream(new FileInputStream(file));
dis.readFully(dataOfFile);
dis.close();
To check this (if everything seems OK) I use this code:
String entireFileText = new String(dataOfFile, "UTF-8");
for(int i=0;i<dataOfFile.length;i++)
{
System.out.print(dataOfFile[i]);
}
By doing this to a .txt file I get something like this (which seems to be OK):
"7210110810811132119111114108100331310721111193297114101321211111173"
But when I use this on .xlsx file I get this and I think the hyphen makes errors that might occur later in the compression:
"8075342006080003301165490-90122100-1245001908291671111101161011101169584121112101115934612010910832-944240-96020000000000000"... and so on
Anyway, by using a string a can map this into a HashMap, where I count the frequency of each character. I have a HashMap:
public static HashMap map;
public static boolean countHowOftenACharacterAppear(String s1) {
String s = s1;
for(int i = 0; i < s.length(); i++){
char c = s.charAt(i);
Integer val = map.get(new Character(c));
if(val != null){
map.put(c, new Integer(val + 1));
}
else{
map.put(c,1);
}
}
return true;
}
When I compress my string I use:
public static String compress(String s) {
String c = new String();
for(int i = 0; i < s.length(); i++)
c = c + fromCharacterToCode.get(s.charAt(i));
return c;
}
fromCharactertoCode is another HashMap of type :
public static HashMap fromCharacterToCode;
(I'm traversing through my table I've built. Dont't think this is the problem)
Anyway, the results from this using the .txt file is:
"01000110110111011011110001101110011011000001000000000"... (PERFECT)
From the .xlsx file:
"10101110110001110null0010000null0011000nullnullnull10110000null00001101011111" ...
I really don't get why I'm getting the nullpointers on the .xlsx files. I would be very happy if I could get some help here to solve this. Many thanks!!
Your problem is java I/O, well before getting to compression.
First, you don't really need DataInputStream here, but leave that aside. You then convert to String entireFileText assuming the contents of the file is text in UTF-8, whereas data files like .xlsx aren't text at all and many text files even on Windows aren't UTF-8. But you don't seem to use entireFileText, so that may not matter. If you do, and the file isn't plain ASCII text, your compressor will "lose" chunks of it and the output of decompression will be only a fraction of the compression input; that is usually considered unsatisfactory.
Then you extract each byte from dataOfFile. byte in Java is signed; plain ASCII text files will have only "positive" bytes 0x00 to 0x7F (and usually all 0x20 to 0x7E plus 0x09 0x0D 0x0A), but everything else (UTF-8 text, UTF-16 text, data, and executables) will have "negative" bytes 0x80 to 0xFF which come out as -0x80 to -0x01.
Your printout "7210110810811132119111114108100331310721111193297114101321211111173" for "the .txt file" is almost certainly the byte sequence 72=H 101=e 108=l 108=l 111=o 32=space 119=w 111=o 114=r 108=l 100=d 33=! 13=CR 10=LF 72=H 111=o 119=w 32=space 97=a 114=r 101=e 32=space 121=y 111=o 117=u 3=(ETX aka ctrl-C) (how did you get a ctrl-C into a file?! or was it really 30=ctrl-Z? that's somewhat usual for Windows text files)
Someone more familiar with .xlsx format might be able to reconstruct that one, but I can tell you right off the hyphens are due to bytes with negative values, printed in decimal (by default) as -128 to -1.
For a general purpose compressor, you shouldn't ever convert to java char's and String's; those are designed for text and not all files are text. Just work with bytes, but if you want them in consistently positive, mask with & 0xFF .

Nibble Hex from Java to PHP

I'm translating one app from java to php and i'm finding some trouble.
I have a string like this 98191107990D0000EF050000789C65970BCCD75318C7CFEFFC ... in java there's this function where I pass this string as parameter:
private static byte[] decodeNibbleHex(String input)
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
char[] chars = input.toCharArray();
for (int i = 0; i < chars.length - 1; i += 2) {
char[] bChars = new char[2];
bChars[0] = chars[i];
bChars[1] = chars[(i + 1)];
int val = Integer.decode("0x" + new String(bChars)).intValue();
baos.write((byte)val);
}
return baos.toByteArray();
}
but... i don't know to to translate this function in PHP... i tried too many times and i'm becoming crazy! i tried with a for cycle, with this eval("\$hex = 0x" . $dati[$i].$dati[$i+1] . ";"); and this $binary_string = pack("h*" , $dati[$i].$dati[$i+1]); and many many other functions...
If someone understand Java and can help me I will appreciate it!!
Thank guys!
Take a look here:
http://www.php.net/manual/de/function.hexdec.php#100578
Is this not exactly what you whrere looking for?
If my understanding is correct of your java function, it takes the string's chars in pairs, and threats them as bytes and put them in a ByteArray. In php there's no such thing as a byte array but you can represent random binary data in everyday strings. This is my take on your function (didn't tried to compare with the java code's output).
$str= '98191107990D0000EF050000789C65970BCCD75318C7CFEFFC';
$output[] = array();
for ($i=0, $c = strlen($str) - 1; $i < $c; $i+=2) {
$output[] = chr(intval($str[$i].$str[$i+1], 16));
}
print join($output); // binary string, not really useful in ascii terminal (-:
In summary this seem to be a base16_decode() function, with base16_encode() written like it follows, you get back the input string:
function base16_encode($str) {
$byteArray = str_split($str);
foreach ($byteArray as &$byte) {
$byte = sprintf('%02x', ord($byte));
}
return join($byteArray);
}
print base16_encode(join($output)); // should print back the original input.

Categories

Resources