so I have a fairly big (4mb) txt with data for a monolingual dictionary. Because the explanations of the words are split into multiple lines i can't read it line by line. On the other hand I have "###" separators which I can use.
My question is: what is the most efficient way to load this text into a map in java/android?
Load file to single String and use split("###") method on it. It gives you array of strings split by your separator. 4 Mb is OK to load it in memory at once.
byte[] encoded = Files.readAllBytes(Paths.get(filePath));
String fileContents = new String(encoded, encoding);
String[] lines = fileContents.split("###");
Update: not sure you can use that code to read file on android - it's for Java SE 7. On android can use code like this:
FileInputStream fis;
fis = openFileInput(filePath);
StringBuffer fileContent = new StringBuffer("");
byte[] buffer = new byte[1024];
while ((n = fis.read(buffer)) != -1)
{
fileContent.append(new String(buffer, 0, n));
}
String[] lines = fileContent.toString().split("###");
Related
i need to read an Base64 encoded array of bytes from an inputstream.
I have to stop reading when I reach a \n character in the decoded string, i cannot find an efficient way to do this.
Now i ended with something like this but it does not work as intended because it's too easy it catches an exception and messes all up...
byte buffer[] = new byte[2048];
byte results[] = new byte[2048];
String totalResult = null;
try {
int bytes_read = is.read(buffer);
while (buffer[bytes_read - 1] != '\n') {
bytes_read = is.read(buffer);
String app = new String(buffer, 0, bytes_read);
totalResult += app;
}
String response = Base64.getDecoder().decode(totalResult).toString();
Any idea? The input Stream does not close, so i need to get data from it and separated by '\n'
Rather than reinventing the wheel, consider using (for example) org.apache.commons.codec.binary.Base64InputStream from the Commons Codec project and a BufferedReader (JavaDoc) to wrap your InputStream like so:
try(BufferedReader reader = new BufferedReader(new InputStreamReader(new Base64InputStream(is)))) {
String response = reader.readLine();
...
}
Notes:
Try with resources will automatically close the reader when you're done.
The Base64InputStream will decode Base64 encoded characters on the fly
BufferedReader.readLine()considers \n, \r or \r\n to be line separators for the purpose of determining the end of a line.
I am sure other libraries exist that will facilitate the on-the-fly decoding, or you could write a simple InputStreamWrapper yourself.
I have 2 java classes. Let them be class A and class B.
Class A gets String input from user and stores the input as byte into the FILE, then Class B should read the file and display the Byte as String.
CLASS A:
File file = new File("C:\\FILE.txt");
file.createNewFile();
FileOutputStream fos = new FileOutputStream(file);
String fwrite = user_input_1+"\n"+user_input_2;
fos.write(fwrite.getBytes());
fos.flush();
fos.close();
In CLASS B, I wrote the code to read the file, but I don't know how to read the file content as bytes.
CLASS B:
fr = new FileReader(file);
br = new BufferedReader(fr);
arr = new ArrayList<String>();
int i = 0;
while((getF = br.readLine()) != null){
arr.add(getF);
}
String[] sarr = (String[]) arr.toArray(new String[0]);
The FILE.txt has the following lines
[B#3ce76a1
[B#36245605
I want both these lines to be converted into their respective string values and then display it. How to do it?
Are you forced to save using a String byte[] representation to save data? Take a look at object serialization (Object Serialization Tutorial), you don't have to worry about any low level line by line read or write methods.
Since you are writing a byte array through the FileOutputStream, the opposite operation would be to read the file using the FileInputStream, and construct the String from the byte array:
File file = new File("C:\\FILE.txt");
Long fileLength = file.length();
byte[] bytes = new byte[fileLength.intValue()]
try (FileInputStream fis = new FileInputStream(file)) {
fis.read(bytes);
}
String result = new String(bytes);
However, there are better ways of writing the String to a file.
You could write it using the FileWriter, and read using FileReader (possibly wrapping them by the corresponding BufferedReader/Writer), this will avoid creating intermediate byte array. Or better yet, use Apache Commons' IOUtils or Google's Guava libraries.
So, I am developing android application that read JSON text file containing some data. I have a 300 kb (307,312 bytes) JSON in a text file (here). I also develop desktop application (cpp) to generate and loading (and parsing) the JSON text file.
When I try to open and read it using ifstream in c++, I get the string length correctly (307,312). I even succesfully parse it.
Here is my code in C++:
std::string json = "";
std::string line;
std::ifstream myfile(textfile.txt);
if(myfile.is_open()){
while(std::getline(myfile, line)){
json += line;
json.push_back('\n');
}
json.pop_back(); // pop back the last '\n'
myfile.close();
}else{
std::cout << "Unable to open file";
}
In my android application, I put my JSON text file in res/raw folder. When I try to open and read using InputStream, the length of the string only 291,896. And I can't parse it (I parse it using jni with the same c++ code, maybe it is not important).
InputStream is = getResources().openRawResource(R.raw.textfile);
byte[] b = new byte[is.available()];
is.read(b);
in_str = new String(b);
UPDATE:
I also have try using this way.
InputStream is = getResources().openRawResource(R.raw.textfile);
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
String line = reader.readLine();
while(line != null){
in_str += line;
in_str += '\n';
line = reader.readLine();
}
if (in_str != null && in_str.length() > 0) {
in_str = in_str.substring(0, in_str.length()-1);
}
Even, I tried moving it from res/raw folder to assets folder in java android project. And of course I change the InputStream line to InputStream is = getAssets().open("textfile.txt"). Still not working.
Okay, I found the solution. It is the ASCII and UTF-8 problem.
From here:
UTF-8 Variable length encoding, 1-4 bytes per code point. ASCII values are encoded as ASCII using 1 byte.
ASCII Single byte encoding
My filesize is 307,312 bytes and basically I need to take the character each byte. So, I should need to encode the file as ASCII.
When I am using C++ ifstream, the string size is 307,312. (same as of the number character if it is using ASCII encoding)
Meanwhile, when I am using Java InputStream, the string size is 291,896. I assume that it happens because of the reader is using UTF-8 encoding instead.
So, how to use get ASCII encoding in Java?
Through this thread and this article, we can use InputStreamReader in Java and set it to ASCII. Here is my complete code:
String in_str = "";
try{
InputStream is = getResources().openRawResource(R.raw.textfile);
BufferedReader reader = new BufferedReader(new InputStreamReader(is, "ASCII"));
String line = reader.readLine();
while(line != null){
in_str += line;
in_str += '\n';
line = reader.readLine();
}
if (in_str != null && in_str.length() > 0) {
in_str = in_str.substring(0, in_str.length()-1);
}
}catch(Exception e){
e.printStackTrace();
}
If you have the same problem, hope this helps. Cheers.
I have lots of PDF files that I need to get its content encoded using base64. I have an Akka app which fetch the files as stream and distributes to many workers to encode these files and returns the string base64 for each file. I got a basic solution for encoding:
org.apache.commons.codec.binary.Base64InputStream;
...
Base64InputStream b64IStream = null;
InputStreamReader reader = null;
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
try {
b64IStream = new Base64InputStream(input, true);
reader = new InputStreamReader(b64IStream);
br = new BufferedReader(reader);
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
} finally {
if (b64IStream != null) {
b64IStream.close();
}
if (reader != null) {
reader.close();
}
if (br != null) {
br.close();
}
}
It works, but I would like to know what would be the best way that I can encode the files using a buffer and if there is a faster alternative for this.
I tested some other approaches such as:
Base64.getEncoder
sun.misc.BASE64Encoder
Base64.encodeBase64
javax.xml.bind.DatatypeConverter.printBase64
com.google.guava.BaseEncoding.base64
They are faster but they need the entire file, correct? Also, I do not want to block other threads while encoding 1 PDF file.
Any input is really helpful. Thank you!
Fun fact about Base64: It takes three bytes, and converts them into four letters. This means that if you read binary data in chunks that are divisible by three, you can feed the chunks to any Base64 encoder, and it will encode it in the same way as if you fed it the entire file.
Now, if you want your output stream to just be one long line of Base64 data - which is perfectly legal - then all you need to do is something along the lines of:
private static final int BUFFER_SIZE = 3 * 1024;
try ( BufferedInputStream in = new BufferedInputStream(input, BUFFER_SIZE); ) {
Base64.Encoder encoder = Base64.getEncoder();
StringBuilder result = new StringBuilder();
byte[] chunk = new byte[BUFFER_SIZE];
int len = 0;
while ( (len = in.read(chunk)) == BUFFER_SIZE ) {
result.append( encoder.encodeToString(chunk) );
}
if ( len > 0 ) {
chunk = Arrays.copyOf(chunk,len);
result.append( encoder.encodeToString(chunk) );
}
}
This means that only the last chunk may have a length that is not divisible by three and will therefore contain the padding characters.
The above example is with Java 8 Base64, but you can really use any encoder that takes a byte array of an arbitrary length and returns the base64 string of that byte array.
This means that you can play around with the buffer size as you wish.
If you want your output to be MIME compatible, however, you need to have the output separated into lines. In this case, I would set the chunk size in the above example to something that, when multiplied by 4/3, gives you a round number of lines. For example, if you want to have 64 characters per line, each line encodes 64 / 4 * 3, which is 48 bytes. If you encode 48 bytes, you'll get one line. If you encode 480 bytes, you'll get 10 full lines.
So modify the above BUFFER_SIZE to something like 4800. Instead of Base64.getEncoder() use Base64.getMimeEncoder(64,new byte[] { 13, 10}). And then, when it encodes, you'll get 100 full-sized lines from each chunk except the last. You may need to add a result.append("\r\n") to the while loop.
I am trying to use a FileInputStream to essentially read in a text file, and then output it in a different text file. However, I always get very strange characters when I do this. I'm sure it's some simple mistake I'm making, thanks for any help or pointing me in the right direction. Here's what I've got so far.
File sendFile = new File(fileName);
FileInputStream fileIn = new FileInputStream(sendFile);
byte buf[] = new byte[1024];
while(fileIn.read(buf) > 0) {
System.out.println(buf);
}
The file it is reading from is just a big text file of regular ASCII characters. Whenever I do the system.out.println, however, I get the output [B#a422ede. Any ideas on how to make this work? Thanks
This happens because you are printing a byte array object itself, rather than printing its content. You should construct a String from the buffer and a length, and print that String instead. The constructor to use for this is
String s = new String(buf, 0, len, charsetName);
Above, len should be the value returned by the call of the read() method. The charsetName should represent the encoding used by the underlying file.
If you're reading from a file to another file, you shouldn't convert the bytes to a string at all, just write the bytes read into the other file.
If your intention is to convert a text file from an encoding to another, read from a new InputStreamReader(in, sourceEncoding), and write to a new OutputStreamWriter(out, targetEncoding).
That's because printing buf will print the reference to the byte array, not the bytes themselves as String as you would expect. You need to do new String(buf) to construct the byte array into string
Also consider using BufferedReader rather than creating your own buffer. With it you can just do
String line = new BufferedReader(new FileReader("filename.txt")).readLine();
Your loop should look like this:
int len;
while((len = fileIn.read(buf)) > 0) {
System.out.write(buf, 0, len);
}
You are (a) using the wrong method and (b) ignoring the length returned by read(), other than checking it for < 0. So you are printing junk at the end of each buffer.
the object 's defualt toString method is return object's id in the memory.
byte buf[] is an object.
you can print using this.
File sendFile = new File(fileName);
FileInputStream fileIn = new FileInputStream(sendFile);
byte buf[] = new byte[1024];
while(fileIn.read(buf) > 0) {
System.out.println(Arrays.toString(buf));
}
or
File sendFile = new File(fileName);
FileInputStream fileIn = new FileInputStream(sendFile);
byte buf[] = new byte[1024];
int len=0;
while((len=fileIn.read(buf)) > 0) {
for(int i=0;i<len;i++){
System.out.print(buf[i]);
}
System.out.println();
}