fwrite() in C & readInt() in Java differ in endianess - java

Native Code :
writing number 27 using fwrite().
int main()
{
int a = 27;
FILE *fp;
fp = fopen("/data/tmp.log", "w");
if (!fp)
return -errno;
fwrite(&a, 4, 1, fp);
fclose();
return 0;
}
Reading back the data(27) using DataInputStream.readInt() :
public int readIntDataInputStream(void)
{
String filePath = "/data/tmp.log";
InputStream is = null;
DataInputStream dis = null;
int k;
is = new FileInputStream(filePath);
dis = new DataInputStream(is);
k = dis.readInt();
Log.i(TAG, "Size : " + k);
return 0;
}
O/p
Size : 452984832
Well that in hex is 0x1b000000
0x1b is 27. But the readInt() is reading the data as big endian while my native coding is writing as little endian. . So, instead of 0x0000001b i get 0x1b000000.
Is my understanding correct? Did anyone came across this problem before?

From the Javadoc for readInt():
This method is suitable for reading bytes written by the writeInt method of interface DataOutput
If you want to read something written by a C program you'll have to do the byte swapping yourself, using the facilities in java.nio. I've never done this but I believe you would read the data into a ByteBuffer, set the buffer's order to ByteOrder.LITTLE_ENDIAN and then create an IntBuffer view over the ByteBuffer if you have an array of values, or just use ByteBuffer#getInt() for a single value.
All that aside, I agree with #EJP that the external format for the data should be big-endian for greatest compatibility.

There are multiple issues in your code:
You assume that the size of int is 4, it is not necessarily true, and since you want to deal with 32-bit ints, you should use int32_t or uint32_t.
You must open the file in binary more to write binary data reliably. The above code would fail on Windows for less trivial output. Use fopen("/data/tmp.log", "wb").
You must deal with endianness. You are using the file to exchange data between different platforms that may have different native endianness and/or endian specific APIs. Java seems to use big-endian, aka network byte order, so you should convert the values on the C platform with the hton32() utility function. It is unlikely to have significant impact on performance on the PC side, as this function is usually expanded inline, possibly as a single instruction and most of the time will be spent waiting for I/O anyway.
Here is a modified version of the code:
#include <endian.h>
#include <stdint.h>
#include <stdio.h>
int main(void) {
uint32_t a = hton32(27);
FILE *fp = fopen("/data/tmp.log", "wb");
if (!fp) {
return errno;
}
fwrite(&a, sizeof a, 1, fp);
fclose();
return 0;
}

Related

What are the extra bytes in the ZipEntry used for?

The Java library for Zip files has an option in ZipEntry for getExtra() that returns either byte[] or null. What are the extra bytes in the ZipEntry used for? I'm aware of this question about archive attributes linked to getExtra() but it doesn't explain what else the field is used for. Furthermore the question indicates that some things stored in the extra field cannot be set from Java.
The answer can be found in the first two links in the java.util.zip package documentation.
The basic zip format is described in the PKWARE zip specification. Sections 4.5 and 4.6 describe what the extra data is.
The extra data is a series of zero or more blocks. Each block starts with a little-endian 16-bit ID, followed by a little-endian 16-bit count of the bytes that immediately follow.
The PKWARE specification describes some well known extra data record IDs. The Info-Zip format describes many more.
So, if you wanted to check whether a zip entry includes an ASi Unix Extra Field, you might read it like this:
ByteBuffer extraData = ByteBuffer.wrap(zipEntry.getExtra());
extraData.order(ByteOrder.LITTLE_ENDIAN);
while (extraData.hasRemaining()) {
int id = extraData.getShort() & 0xffff;
int length = extraData.getShort() & 0xffff;
if (id == 0x756e) {
int crc32 = extraData.getInt();
short permissions = extraData.getShort();
int linkLengthOrDeviceNumbers = extraData.getInt();
int userID = extraData.getChar();
int groupID = extraData.getChar();
ByteBuffer linkDestBuffer = extraData.slice().limit(length - 14);
String linkDestination =
StandardCharsets.UTF_8.decode(linkDestBuffer).toString();
// ...
} else {
extraData.position(extraData.position() + length);
}
}

Can a packed C structure and function be ported to java?

In the past I have written code which handles incoming data from a serial port. The data has a fixed format.
Now I want to migrate this code to java (android). However, I see many obstacles.
The actual code is more complex, but I have a simplified version here:
#define byte unsigned char
#define word unsigned short
#pragma pack(1);
struct addr_t
{
byte foo;
word bar;
};
#pragma pack();
bool RxData( byte val )
{
static byte buffer[20];
static int idx = 0;
buffer[idx++] = val;
return ( idx == sizeof(addr_t) );
}
The RxData function is called everytime a byte is received. When the complete chunk of data is in, it returns true.
Some of the obstacles:
The used data types are not available to java. In other threads it is recommended to use larger datatypes, but in this case this is not a workable solution.
The size of the structure is in this case exactly 3 bytes. That's also why the #pragma statement is important. Otherwise the C compiler might "optimize" it for memory use, with a different size as a result.
Java also doesn't have a sizeof function and I have found no alternative for this kind of situation.
I could replace the 'sizeof' with a fixed value of 3, but that would be very bad practice IMO.
Is it at all possible to write such a code in java? Or is it wiser to try to add native c source into Android Studio?
Your C code has its problems too. Technically, you do not know how big a char and a short is. You probably want uint8_t and uint16_t respectively. Also, I'm not sure how portable packing is.
In Java, you need a class. The class might as well tell you how many bytes you need to initialise it.
class Addr
{
private byte foo;
private short bar;
public final static int bufferBytes = 3;
public int getUnsignedFoo()
{
return (int)foo & 0xff;
}
public int getUnsignedBar()
{
return (int)bar & 0xffff;
}
}
Probably a class for the buffer too although there may already be a suitable class in the standard library.
class Buffer
{
private final static int maxSize = 20;
private byte[] bytes = new byte[maxSize];
private int idx = 0;
private bool rxData(byte b)
{
bytes[idx++] = b;
return idx == Addr.bufferBytes;
}
}
To answer the question about the hardcodedness of the 3, this is actually the better way to do it because your the specification of your protocol should say "one byte for foo and two bytes for bar" not "a packed C struct with a char and a short in it". One way to deserialise the buffer is like this:
public class Addr
{
// All the stuff from above
public Addr(byte[] buffer)
{
foo = buffer[0];
bar = someFunctionThatGetsTheEndiannessRight(buffer[1], buffer[2]);
}
}
TI have left the way bar is calculated deliberately vague because it depends on your platform as much as anything. You can do it simply with bit shifts e.g.
(((short)buffer[1] & 0xff) << 8) | ((short)buffer[2] & 0xff)
However, there are better options available. For example, you can use a java.nio.ByteBuffer which has the machinery to cope with endian isssues.

How write big endian ByteBuffer to little endian in Java

I currently have a Java ByteBuffer that already has the data in Big Endian format. I then want to write to a binary file as Little Endian.
Here's the code which just writes the file still in Big Endian:
public void writeBinFile(String fileName, boolean append) throws FileNotFoundException, IOException
{
FileOutputStream outStream = null;
try
{
outStream = new FileOutputStream(fileName, append);
FileChannel out = outStream.getChannel();
byteBuff.position(byteBuff.capacity());
byteBuff.flip();
byteBuff.order(ByteOrder.LITTLE_ENDIAN);
out.write(byteBuff);
}
finally
{
if (outStream != null)
{
outStream.close();
}
}
}
Note that byteBuff is a ByteBuffer that has been filled in Big Endian format.
My last resort is a brute force method of creating another buffer and setting that ByteBuffer to little endian and then reading the "getInt" values from the original (big endian) buffer, and "setInt" the value to the little endian buffer. I'd imagine there is a better way...
Endianess has no meaning for a byte[]. Endianess only matter for multi-byte data types like short, int, long, float, or double. The right time to get the endianess right is when you are writing the raw data to the bytes and reading the actual format.
If you have a byte[] given to you, you must decode the original data types and re-encode them with the different endianness. I am sure you will agree this is a) not easy to do or ideal b) cannot be done automagically.
Here is how I solved a similar problem, wanting to get the "endianness" of the Integers I'm writing to an output file correct:
byte[] theBytes = /* obtain a byte array that is the input */
ByteBuffer byteBuffer = ByteBuffer.wrap(theBytes);
ByteBuffer destByteBuffer = ByteBuffer.allocate(theBytes.length);
destByteBuffer.order(ByteOrder.LITTLE_ENDIAN);
IntBuffer destBuffer = destByteBuffer.asIntBuffer();
while (byteBuffer.hasRemaining())
{
int element = byteBuffer.getInt();
destBuffer.put(element);
/* Could write destBuffer int-by-int here, or outside this loop */
}
There might be more efficient ways to do this, but for my particular problem, I had to apply a mathematical transformation to the elements as I copied them to the new buffer. But this should still work for your particular problem.

Converting C++ encryption to Java

I have the following C++ code to cipher a string with XOR.
#define MPI_CIPHER_KEY "qwerty"
Buffer FooClient::cipher_string(const Buffer& _landing_url)
{
String key(CIPHER_KEY);
Buffer key_buf(key.chars(), key.length());
Buffer landing_url_cipher = FooClient::XOR(_url, key_buf);
Buffer b64_url_cipher;
base64_encode(landing_url_cipher, b64_url_cipher);
return b64_url_cipher;
}
Buffer FooClient::XOR(const Buffer& _data, const Buffer& _key)
{
Buffer retval(_data);
unsigned int klen=_key.length();
unsigned int dlen=_data.length();
unsigned int k=0;
unsigned int d=0;
for(;d<dlen;d++)
{
retval[d]=_data[d]^_key[k];
k=(++k<klen?k:0);
}
return retval;
}
I have seen in this question such java impl. would that work for this case?
String s1, s2;
StringBuilder sb = new StringBuilder();
for(int i=0; i<s1.length() && i<s2.length();i++)
sb.append((char)(s1.charAt(i) ^ s2.charAt(i)));
String result = sb.toString();
or is there an easier way to do it?
doesn't look the same to me. the c++ version loops across all of _data no matter what the _key length was, cycling through _key as necessary. (k=(++k<klen?k:0); in the c++ code)
yours returns as soon as the shortest of key or data is hit.
Personally, i'd start with the closest literal translation of C++ to java that you can do, keeping param and local names the same.
Then write unit tests for it that have known inputs and outputs from C++
then start refactoring the java version into using java idioms/etc ensuring the tests still pass.
No - the java code will only XOR up to the length of the smaller string - whereas the C++ code will XOR the entire data completely.
Assuming s1 is your "key" this can be fixed by changing to
for(int i=0; i<s2.length();i++)
sb.append((char)(s1.charAt(i%s1.length()) ^ s2.charAt(i)));
Also the base-64 encoding of the return value is missing.

Wav comparison, same file

I'm currently stumped. I've been looking around and experimenting with audio comparison. I've found quite a bit of material, and a ton of references to different libraries and methods to do it.
As of now I've taken Audacity and exported a 3min wav file called "long.wav" and then split the first 30seconds of that into a file called "short.wav". I figured somewhere along the line I could visually log (log.txt) the data through java for each and should be able to see at least some visual similarities among the values.... here's some code
Main method:
int totalFramesRead = 0;
File fileIn = new File(filePath);
BufferedWriter writer = new BufferedWriter(new FileWriter(outPath));
writer.flush();
writer.write("");
try {
AudioInputStream audioInputStream =
AudioSystem.getAudioInputStream(fileIn);
int bytesPerFrame =
audioInputStream.getFormat().getFrameSize();
if (bytesPerFrame == AudioSystem.NOT_SPECIFIED) {
// some audio formats may have unspecified frame size
// in that case we may read any amount of bytes
bytesPerFrame = 1;
}
// Set an arbitrary buffer size of 1024 frames.
int numBytes = 1024 * bytesPerFrame;
byte[] audioBytes = new byte[numBytes];
try {
int numBytesRead = 0;
int numFramesRead = 0;
// Try to read numBytes bytes from the file.
while ((numBytesRead =
audioInputStream.read(audioBytes)) != -1) {
// Calculate the number of frames actually read.
numFramesRead = numBytesRead / bytesPerFrame;
totalFramesRead += numFramesRead;
// Here, do something useful with the audio data that's
// now in the audioBytes array...
if(totalFramesRead <= 4096 * 100)
{
Complex[][] results = PerformFFT(audioBytes);
int[][] lines = GetKeyPoints(results);
DumpToFile(lines, writer);
}
}
} catch (Exception ex) {
// Handle the error...
}
audioInputStream.close();
} catch (Exception e) {
// Handle the error...
}
writer.close();
Then PerformFFT:
public static Complex[][] PerformFFT(byte[] data) throws IOException
{
final int totalSize = data.length;
int amountPossible = totalSize/Harvester.CHUNK_SIZE;
//When turning into frequency domain we'll need complex numbers:
Complex[][] results = new Complex[amountPossible][];
//For all the chunks:
for(int times = 0;times < amountPossible; times++) {
Complex[] complex = new Complex[Harvester.CHUNK_SIZE];
for(int i = 0;i < Harvester.CHUNK_SIZE;i++) {
//Put the time domain data into a complex number with imaginary part as 0:
complex[i] = new Complex(data[(times*Harvester.CHUNK_SIZE)+i], 0);
}
//Perform FFT analysis on the chunk:
results[times] = FFT.fft(complex);
}
return results;
}
At this point I've tried logging everywhere: audioBytes before transforms, Complex values, and FFT results.
The problem: No matter what values I log, the log.txt of each wav file is completely different. I'm not understanding it. Given that I took the small.wav from the large.wav (and they have all the same properties) there should be a very heavy similarity among either the raw wav byte[] data... or Complex[][] fft data... or something thus far..
How can I possibly try to compare these files if the data isn't even close to similar at any point of these calculations.
I know I'm missing quite a bit of knowledge with regards to audio analysis, and this is why I come to the board for help! Thanks for any info, help, or fixes you can offer!!
Have you looked at MARF? It is a well-documented Java library used for audio recognition.
It is used to recognize speakers (for transcription or securing software) but the same features should be able to be used to classify audio samples. I'm not familiar with it but it looks like you'd want to use the FeatureExtraction class to extract an array of features from each audio sample and then create a unique id.
For 16-bit audio, 3e-05 isn't really that different from zero. So a file of zeros is pretty much the same as a file of zeros (maybe missing equality by some tiny rounding errors.)
ADDED:
For your comparison, read in and plot, using some Java plotting library, a portion of each of the two waveforms when they get past the portion that's mostly (close to) zero.
I think for debugging you better try use matlab to plot out. Since matlab is much more powerful in dealing with this problem.
You use "wavread" to the file, and "stft" to get the short time Fourier Transformation which is a complex number Matrix. Then simply abs(Matrix) to get the magnitude of each complex number. Show the image with imshow(abs(Matrix),[]).
I don't know how do you compare the whole file and 30s clip (by looking at the stft image?)
I don't know how are you comparing both audio files, but, seeing some service that offer music recognition (like TrackId or MotoID), these services take a small sample of the music you're hearing (10-20 secs), then process them in their server, i theorize that they have samples that long or less and that they have a database of (or calculate it on the fly) patterns of that samples (in your case Fourier Transforms), in your case, you may need to break your long audio file in chunks of or smaller size than your sample data, in the first case you may find a specific chunk that resembles more the pattern in your sample data, in the second case your smaller chunks may resamble a part of your sample data and you can calculate the probability that the sample data belongs to a respective audio file.
I think you are looking at Acoustic Fingerprinting
It's hard, and there are libraries to do it.
If you want to implement it yourself, this is a whitepaper on the shazam algorithm.

Categories

Resources