Torch-rnn sample.lua binary file exporting - java

I've been experimenting with Torch for a while now, and I wrote my own audio file format.
In that I want to have my data stored in bytes, so I would use all 256 possible values in the file.
I put my file in preprocess.py with 'bytes' encoding, it threw no exceptions. The training goes well, too, but when I'm generating sample data, it is not really bytecode. Some of the characters are written out and some byte values are just in brackets.
[158][170][171][147][164][199][201][179][170][185][184][163][134][130][151][164][150][130]xnjlbUQcq]Vg|ysx{[130]|svzv[144][168][152][137]m[136][150][134][135][135][177][167][130][128][150][167][159][146][132][131][135]Wm{[155]}mqm[143]x[138]r[140][131][135]yv[135]}enj[138][145][141][140][150][128]mrj[132]vv[133][150][152][155][136][140][159][149][152][131]{[139]wmTPQ\bqveMYk[128]uvt[141][147][139][132][132][143][143][132][148][178][187][174][166][164][150]zt[137]xeo~xjt|x~zxx[130]tgp}[147][141][137][139]
How could I change sample.lua's output? I did make a change but I do not know Lua. This is what I wrote:
local sample = model:sample(opt)
local out = io.open(opt.output, "wb")
out:write(sample)
out:close()
instead of
local sample = model:sample(opt)
print(sample)
That resulted the same output. What could I do to get it working?

Related

How to correctly save stream of bytes to file in Java/Scala? How to fix wrongly saved stream?

Story
While conducting an experiment I was saving a stream of random Bytes generated by a hardware RNG device. After the experiment was finished, I realized that the saving method was incorrect. I hope I can find the way how to fix the corrupted file so that I obtain the correct stream of random numbers back.
Example
The story of the problem can be explained in the following simple example.
Let's say I have a stream of random numbers in an input file randomInput.bin. I will simulate the stream of random numbers coming from the hardware RNG device by sending the input file to stdout via cat. I found two ways how to save this stream to a file:
A) Harmless saving method
This method gives me exactly the original stream of random Bytes.
import scala.sys.process._
import java.io.File
val res = ("cat randomInput.bin" #> new File(outputFile))!
B) Saving method leading to corruption
Unfortunately, this is the original saving method I chose.
import scala.sys.process._
import java.io.PrintWriter
val randomBits = "cat randomInput.bin".!!
val out = new PrintWriter(outputFile)
out.println(randomBits)
if (out != null) {
out.close()
Seq("chmod", "600", outputFile).!
}
The file saved using method B) is still binary, however, is is approximately 2x larger that the file saved by method A). Further analysis shows that the stream of random Bits is significantly less random.
Summary
I suspect that the saving method B) adds something to almost every byte, however, the understanding of this is behind my expertise in Java/Scala I/O.
I would very much appreciate if somebody explained me the low-level difference between methods A) and B). The goal is to revert the changes created by saving method B) and obtain the original stream of random Bytes.
Thank you very much in advance!
The problem is probably that println is meant for text, and this text is being encoded as Unicode, which uses multiple bytes for some or all characters, depending on which version of Unicode.
If the file is exactly 2x larger than it should be, then you've probably got a null byte every other byte, which could be easy to fix. Otherwise, it may be harder to figure out what you would need to do to recover the binary data. Viewing the corrupted file in a hex editor may help you see what happened. Either way, I think it may be easier to just generate new random data and save it correctly.
Especially if this is for an experiment, if your random data has been corrupted and then fixed, it may be harder to justify that the data is truly random compared to just generating it properly in the first place.

Producing checksum in Java

I am implementing a code to produce a checksum from a string. I would just like to know the following below:
Why is the checksum produced directly from a string different from the checksum produced from a file containing the same string but was manually copied to the file using ctrl+c?
Edit: I'm not asking for the implementation. I'm asking why are they different to those who may have encountered this
Another example would be, why is the checksum produced from a file created by code different from the checksum produced from a file created manually where the string was copy-pasted?
But when I compared the two strings using a tool like WinMerge, it gives me the two identical strings.
Any enlightening answers are appreciated

"FF FF" is getting dumped as "FD" on some of the computers with Java Scripting API

I am facing issues with the Java Scripting API together with JavaScript on some of the PCs. After analyzing the dumped file, I noticed that "FF FF" is geeting printed as "FD" on some of the PCs. Below is the code snippet:
var outputfile = new RandomAccessFile(f, "rw");
var byte_data_array = getMyByteArrayData(somebytearray);
var data_string = new java.lang.String(byte_data_array);
outputfile.writeBytes(data_string);
You're converting the data from bytes to String without specifying an encoding (which uses the local-dependant platform default encoding), then write it to a file using the writeBytes() method that is documented in the API doc as discarding the higher-order byte of each character.
What did you expect? I'm actually surprised the result has any resemblance at all to the original data.
What you most likely should do is replace the last two lines with this:
outputfile.write(byte_data_array);
And always remember: bytes are for data, Strings are for text, and if you convert between them, you always need to pay attention to what encoding is used.

ReadLong and WriteLong methods in Java.Random.IO

I am trying to write Long using Java command for Random File IO as follows:
fstreamOut = new FileOutputStream(new File("C:\\Basmah","dataOutput.7"),true);
DataOutputStream out=new DataOutputStream(fstreamOut);
Long p= Long.parseLong(longNumberInString ); // Number of digits for this long key are 7-15
out.writeLong(p);
The problem is that when I write 7-15 digit number using writeLong ; it writes 8 bytes in file.
Then I am trying to read the same record into my program and decode it
Long l=in.readLong();
but I dont get the same number as I wrote ; Instead Iget EOF exception.
A long id 64-bit long. That makes 8 bytes. The DataOutputStream's writeLong method writes the binary representation of the long, not the textual one.
Without knowing the code used to read the long value, it's impossible to tell why it doesn't work.
The code given in your example and comment should work. The fact that it doesn't suggests that something else is going on here:
Maybe the writing and reading is happening on different files.
Maybe the file being written is not flushed / closed before you attempt to read it.
Maybe something else is overwriting the file.
Maybe the snippets of code you have provided are different enough to the real code to make a difference.
In the code that attempts to read the file, print what you get when you call f.length().

Magic number file checking

I'm attempting to read magic numbers/bytes to check the format of a file. Will reading a file byte by byte work in the same way on a Linux machine?
Edit: The following shows to get the magic bytes from a class file using an int. I'm trying to do the same for a variable number of bytes.
http://www.rgagnon.com/javadetails/java-0544.html
I'm not sure that I understand what you are trying to do, but it sounds like what you are trying to do isn't the same thing as what the code that you are linking to is doing.
The java class format is specified to start with a magic number, so that code can only be used to verify if a file might be a java class or not. You can't use the same logic and apply it to arbritraty file formats.
Edit: .. or do you only want to check for wav files?
Edit2: Everything in Java is in big endian, that means that you can use DataInputStream.readInt to read the first four bytes from the file, and then compare the returned int with 0x52494646 (RIFF as a big endian integer)

Categories

Resources