Different Number of Character in Java Android InputStream and C++ ifstream - java

So, I am developing android application that read JSON text file containing some data. I have a 300 kb (307,312 bytes) JSON in a text file (here). I also develop desktop application (cpp) to generate and loading (and parsing) the JSON text file.
When I try to open and read it using ifstream in c++, I get the string length correctly (307,312). I even succesfully parse it.
Here is my code in C++:
std::string json = "";
std::string line;
std::ifstream myfile(textfile.txt);
if(myfile.is_open()){
while(std::getline(myfile, line)){
json += line;
json.push_back('\n');
}
json.pop_back(); // pop back the last '\n'
myfile.close();
}else{
std::cout << "Unable to open file";
}
In my android application, I put my JSON text file in res/raw folder. When I try to open and read using InputStream, the length of the string only 291,896. And I can't parse it (I parse it using jni with the same c++ code, maybe it is not important).
InputStream is = getResources().openRawResource(R.raw.textfile);
byte[] b = new byte[is.available()];
is.read(b);
in_str = new String(b);
UPDATE:
I also have try using this way.
InputStream is = getResources().openRawResource(R.raw.textfile);
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
String line = reader.readLine();
while(line != null){
in_str += line;
in_str += '\n';
line = reader.readLine();
}
if (in_str != null && in_str.length() > 0) {
in_str = in_str.substring(0, in_str.length()-1);
}
Even, I tried moving it from res/raw folder to assets folder in java android project. And of course I change the InputStream line to InputStream is = getAssets().open("textfile.txt"). Still not working.

Okay, I found the solution. It is the ASCII and UTF-8 problem.
From here:
UTF-8 Variable length encoding, 1-4 bytes per code point. ASCII values are encoded as ASCII using 1 byte.
ASCII Single byte encoding
My filesize is 307,312 bytes and basically I need to take the character each byte. So, I should need to encode the file as ASCII.
When I am using C++ ifstream, the string size is 307,312. (same as of the number character if it is using ASCII encoding)
Meanwhile, when I am using Java InputStream, the string size is 291,896. I assume that it happens because of the reader is using UTF-8 encoding instead.
So, how to use get ASCII encoding in Java?
Through this thread and this article, we can use InputStreamReader in Java and set it to ASCII. Here is my complete code:
String in_str = "";
try{
InputStream is = getResources().openRawResource(R.raw.textfile);
BufferedReader reader = new BufferedReader(new InputStreamReader(is, "ASCII"));
String line = reader.readLine();
while(line != null){
in_str += line;
in_str += '\n';
line = reader.readLine();
}
if (in_str != null && in_str.length() > 0) {
in_str = in_str.substring(0, in_str.length()-1);
}
}catch(Exception e){
e.printStackTrace();
}
If you have the same problem, hope this helps. Cheers.

Related

Java - read an array of bytes from an InputStream until a certain character

i need to read an Base64 encoded array of bytes from an inputstream.
I have to stop reading when I reach a \n character in the decoded string, i cannot find an efficient way to do this.
Now i ended with something like this but it does not work as intended because it's too easy it catches an exception and messes all up...
byte buffer[] = new byte[2048];
byte results[] = new byte[2048];
String totalResult = null;
try {
int bytes_read = is.read(buffer);
while (buffer[bytes_read - 1] != '\n') {
bytes_read = is.read(buffer);
String app = new String(buffer, 0, bytes_read);
totalResult += app;
}
String response = Base64.getDecoder().decode(totalResult).toString();
Any idea? The input Stream does not close, so i need to get data from it and separated by '\n'
Rather than reinventing the wheel, consider using (for example) org.apache.commons.codec.binary.Base64InputStream from the Commons Codec project and a BufferedReader (JavaDoc) to wrap your InputStream like so:
try(BufferedReader reader = new BufferedReader(new InputStreamReader(new Base64InputStream(is)))) {
String response = reader.readLine();
...
}
Notes:
Try with resources will automatically close the reader when you're done.
The Base64InputStream will decode Base64 encoded characters on the fly
BufferedReader.readLine()considers \n, \r or \r\n to be line separators for the purpose of determining the end of a line.
I am sure other libraries exist that will facilitate the on-the-fly decoding, or you could write a simple InputStreamWrapper yourself.

Java buffered base64 encoder for streams

I have lots of PDF files that I need to get its content encoded using base64. I have an Akka app which fetch the files as stream and distributes to many workers to encode these files and returns the string base64 for each file. I got a basic solution for encoding:
org.apache.commons.codec.binary.Base64InputStream;
...
Base64InputStream b64IStream = null;
InputStreamReader reader = null;
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
try {
b64IStream = new Base64InputStream(input, true);
reader = new InputStreamReader(b64IStream);
br = new BufferedReader(reader);
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
} finally {
if (b64IStream != null) {
b64IStream.close();
}
if (reader != null) {
reader.close();
}
if (br != null) {
br.close();
}
}
It works, but I would like to know what would be the best way that I can encode the files using a buffer and if there is a faster alternative for this.
I tested some other approaches such as:
Base64.getEncoder
sun.misc.BASE64Encoder
Base64.encodeBase64
javax.xml.bind.DatatypeConverter.printBase64
com.google.guava.BaseEncoding.base64
They are faster but they need the entire file, correct? Also, I do not want to block other threads while encoding 1 PDF file.
Any input is really helpful. Thank you!
Fun fact about Base64: It takes three bytes, and converts them into four letters. This means that if you read binary data in chunks that are divisible by three, you can feed the chunks to any Base64 encoder, and it will encode it in the same way as if you fed it the entire file.
Now, if you want your output stream to just be one long line of Base64 data - which is perfectly legal - then all you need to do is something along the lines of:
private static final int BUFFER_SIZE = 3 * 1024;
try ( BufferedInputStream in = new BufferedInputStream(input, BUFFER_SIZE); ) {
Base64.Encoder encoder = Base64.getEncoder();
StringBuilder result = new StringBuilder();
byte[] chunk = new byte[BUFFER_SIZE];
int len = 0;
while ( (len = in.read(chunk)) == BUFFER_SIZE ) {
result.append( encoder.encodeToString(chunk) );
}
if ( len > 0 ) {
chunk = Arrays.copyOf(chunk,len);
result.append( encoder.encodeToString(chunk) );
}
}
This means that only the last chunk may have a length that is not divisible by three and will therefore contain the padding characters.
The above example is with Java 8 Base64, but you can really use any encoder that takes a byte array of an arbitrary length and returns the base64 string of that byte array.
This means that you can play around with the buffer size as you wish.
If you want your output to be MIME compatible, however, you need to have the output separated into lines. In this case, I would set the chunk size in the above example to something that, when multiplied by 4/3, gives you a round number of lines. For example, if you want to have 64 characters per line, each line encodes 64 / 4 * 3, which is 48 bytes. If you encode 48 bytes, you'll get one line. If you encode 480 bytes, you'll get 10 full lines.
So modify the above BUFFER_SIZE to something like 4800. Instead of Base64.getEncoder() use Base64.getMimeEncoder(64,new byte[] { 13, 10}). And then, when it encodes, you'll get 100 full-sized lines from each chunk except the last. You may need to add a result.append("\r\n") to the while loop.

Android and Java - Unable to encode - encrypt - decode - decrypt string

My desktop application (written in Java) encrypts a file for an Android application.
A string portion from the entire file:
..."A kerékpár (vagy bicikli) egy emberi erővel hajtott kétkerekű jármű. 19. századi kifejlesztése"...
read from a file:
FileInputStream is = new FileInputStream("cycles.txt");
StringBuilder sb = new StringBuilder();
Charset inputCharset = Charset.forName("ISO-8859-1");
BufferedReader br = new BufferedReader(new InputStreamReader(is, inputCharset));
String read;
while((read=br.readLine()) != null) {
sb.append(read);
}
After reading the entire file, I encrypt it:
String text = sb.toString();
Encrypter er = new Encrypter();
byte[] bEncrypt = er.encrypt(text);
After the encryption I encode it to base64:
bEncrypt = Base64.getEncoder().encode(bEncrypt);
text = new String(bEncrypt, "ISO-8859-1");
After this, the file is saved on my PC's disk:
File file = new File(System.getProperty("user.dir") + "/files/encr_cycles.txt");
try {
PrintWriter pr = new PrintWriter(new FileWriter(file));
pr.print(text);
pr.close();
} catch (IOException e) {
e.printStackTrace();
}
From the Android application I read the file:
BufferedReader reader = new BufferedReader(new InputStreamReader(getAssets().open("files/encr_cycles.txt"), "ISO-8859-1"));
// Read line by line
String mLine;
StringBuilder sb = new StringBuilder();
while ((mLine = reader.readLine()) != null) {
sb.append(mLine);
}
Decode and decrypt:
byte[] dec = Base64.decode(encryptedText, Base64.DEFAULT);
byte[] data= new Decipher().doFinal(dec);
String text= new String(data, "ISO-8859-1");
And the given text is:
"A kerékpár (vagy bicikli) egy emberi er?vel hajtott kétkerek? járm?. 19. századi kifejlesztése"
Note the "?" in the string? Some of the characters aren't decoded correctly.
Q1: What did I do wrong?
Q2: Am I using a wrong charset?
I changed the charset to "UTF-8" all over the applications (desktop & mobile). The problem was with the root file. The file wasn't saved in "UTF-8".
What did I done in eclipse:
Open the root file (.txt) in eclipse (drag & drop the file in the editor)
Insert your string or do some changes (blank chars) in the file (in my case the string which cannot be encoded)
Press save (CTLR + S), and a dialog will prompt to: SAVE AS UTF-8
Remove your line
Save again
The other solution is to save your file automatically while editing:
Window > Preferences > General > Content Types, set UTF-8 as the default
encoding for all content types.
Window > Preferences > General > Workspace, set "Text file encoding" to "Other : UTF-8".
Source: How to support UTF-8 encoding in Eclipse

efficient text loading without line separators

so I have a fairly big (4mb) txt with data for a monolingual dictionary. Because the explanations of the words are split into multiple lines i can't read it line by line. On the other hand I have "###" separators which I can use.
My question is: what is the most efficient way to load this text into a map in java/android?
Load file to single String and use split("###") method on it. It gives you array of strings split by your separator. 4 Mb is OK to load it in memory at once.
byte[] encoded = Files.readAllBytes(Paths.get(filePath));
String fileContents = new String(encoded, encoding);
String[] lines = fileContents.split("###");
Update: not sure you can use that code to read file on android - it's for Java SE 7. On android can use code like this:
FileInputStream fis;
fis = openFileInput(filePath);
StringBuffer fileContent = new StringBuffer("");
byte[] buffer = new byte[1024];
while ((n = fis.read(buffer)) != -1)
{
fileContent.append(new String(buffer, 0, n));
}
String[] lines = fileContent.toString().split("###");

Read String and bytes from the same file java

I'm looking for a way to switch between reading bytes (as byte[]) and reading lines of Strings from a file. I know that a byte[] can be obtained form a file through a FileInputStream, and a String can be obtained through a BufferedReader, but using both of them at the same time is proving problematic. I know how long the section of bytes are. String encoding can be kept constant from when I write the file. The filetype is a custom one that is still in development, so I can change how I write data to it.
How can I read Strings and byte[]s from the same file in java?
Read as bytes. When you have read a sequence of bytes that you know should be a string, place those bytes in an array, put the array inside a ByteArrayInputStream and use that as the underlying InputStream for a Reader to get the bytes as characters, then read those characters to produce a String.
For the later parts of this process see the related SO question on how to create a String from an InputStream.
Read the file as Strings using a BufferedReader then use String.getBytes().
Why not try this:
BufferedReader bufferedReader = null;
try {
bufferedReader = new BufferedReader(new FileReader("testing.txt"));
String line = bufferedReader.readLine();
while(line != null){
byte[] b = line.getBytes();
}
} finally {
if(bufferedReader!=null){
bufferedReader.close();
}
}
or
FileInputStream in = null;
BufferedReader bufferedReader = null;
try {
bufferedReader = new BufferedReader(new FileReader("xanadu.txt"));
String line = bufferedReader.readLine();
while(line != null){
//read your line
}
in = new FileInputStream("xanadu.txt");
int c;
while ((c = in.read()) != -1) {
//read your bytes (c)
}
} finally {
if (in != null) {
in.close();
}
if(bufferedReader!=null){
bufferedReader.close();
}
}
Read everything as bytes from the buffered input stream, and convert string sections into String's using constructor that accepts the byte array:
String string = new String(bytes, offset, length, "US-ASCII");
Depending on how the data are actually encoded, you may need to use "UTF-8" or something else as the name of the charset.

Categories

Resources