I'm trying to read a resource (asdf.txt), but if the file is bigger than 5000 bytes, (for example) 4700 pieces of null-character inserted to the end of the content variable. Is there any way to remove them? (or to set the right size of the buffer?)
Here is the code:
String content = "";
try {
InputStream in = this.getClass().getResourceAsStream("asdf.txt");
byte[] buffer = new byte[5000];
while (in.read(buffer) != -1) {
content += new String(buffer);
}
} catch (Exception e) {
e.printStackTrace();
}
The simplest way is to do the correct thing: Use a Reader to read text data:
public String readFromFile(String filename, String enc) throws Exception {
String content = "";
Reader in = new
InputStreamReader(this.getClass().getResourceAsStream(filename), enc);
StringBuffer temp = new StringBuffer(1024);
char[] buffer = new char[1024];
int read;
while ((read=in.read(buffer, 0, buffer.length)) != -1) {
temp.append(buffer, 0, read);
}
content = temp.toString();
return content;
}
Note that you definitely should define the encoding of the text file you want to read.
And note that both your code and this example code work equally well on Java SE and J2ME.
Related
I have searched a lot for conversion from byte to string but my query is a little different, please read ahead.
Currently i have a gzip file which i can decompress using the code from http://www.mkyong.com/java/how-to-decompress-file-from-gzip-file/.
This code helps me store my decompressed output in a file, but how do i store it in a variable? I am using this code currently:
public String unGunzipFile(String compressedFile, String decompressedFile) {
byte[] buffer = new byte[1024];
try {
FileInputStream fileIn = new FileInputStream(compressedFile);
GZIPInputStream gZIPInputStream = new GZIPInputStream(fileIn);
FileOutputStream fileOutputStream = new FileOutputStream(decompressedFile);
StringBuffer str = new StringBuffer();
int bytes_read;
while ((bytes_read = gZIPInputStream.read(buffer)) > 0) {
String s = new String(buffer);
str.append(s);
fileOutputStream.write(buffer, 0, bytes_read);
}
gZIPInputStream.close();
fileOutputStream.close();
System.out.println("The file was decompressed successfully!");
System.out.println(str);
String final_string = str.toString();
return final_string;
} catch (IOException ex) {
ex.printStackTrace();
return null;
}
}
Since i am converting bytes to string near the end when bytes_read is not 1024 in length i end up getting some weird data in my StringBuffer, but in the file there is no such data since fileOutputStream.write(buffer, 0, bytes_read); limits it to writing the updated part.
How do i fix this?
Thanks in advance.
Use the String(byte[] bytes, int offset, int length) constructor that lets you specify the length to be converted. i.e.
String s = new String(buffer, 0, bytes_read)
Instead of using String s = new String(buffer), i suggest use
public String(byte[] bytes,
int offset,
int length)
which might help you.
I have the following code:
BlobDomain blobDomain = null;
OutputStream out = null;
try {
blobDomain = new BlobDomain();
out = blobDomain.getBinaryOutputStream();
byte[] buffer = new byte[8192];
int bytesRead = 0;
while ((bytesRead = in.read(buffer, 0, 8192)) != -1) {
out.write(buffer, 0, bytesRead);
String line = (new String(buffer));
fullText += line;
}
} catch (Exception e) {
//do nothing
}finally{
if (out != null)
try {
out.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
when i print the fullText what i see for larger files is that end part of the text is added again to the fullText. So full text has some lines repeated in the end. any suggestions on what is wrong here?
The reason that you are getting this is that you are writing the entire buffer every time to your String. Thus, when you reach the end of the file you may not have read exactly the amount of bytes that your buffer is sized at. The old data is still in the buffer and will also be written to your String.
One option to solve this may be to write your data to a String first and then to write your String to the output stream. This should also be faster than adding to a String after each read.
Save inputStream to String:
java.util.Scanner s = new java.util.Scanner(in).useDelimiter("\\A");
fullText = s.hasNext() ? s.next() : "";
Write String to output stream:
out.write(fullText.getBytes());
If you want to keep you code as-is then perform a substring on the buffer and retrieve only the amount of bytes read. For example:
String line = (new String(buffer.substring(0,bytesRead));
I have the following code, which will read in files in ISO-8859-1, as thats what is required in this application,
private static String readFile(String filename) throws IOException {
String lineSep = System.getProperty("line.separator");
File f = new File(filename);
StringBuffer sb = new StringBuffer();
if (f.exists()) {
BufferedReader br =
new BufferedReader(
new InputStreamReader(
new FileInputStream(filename), "ISO-8859-1"));
String nextLine = "";
while ((nextLine = br.readLine()) != null) {
sb.append(nextLine+ " ");
// note: BufferedReader strips the EOL character.
// sb.append(lineSep);
}
br.close();
}
return sb.toString();
}
The problem is it is pretty slow. I have this function, which is MUCH faster, but I can not seem to find how to place the character encoding:
private static String fastStreamCopy(String filename)
{
String s = "";
FileChannel fc = null;
try
{
fc = new FileInputStream(filename).getChannel();
MappedByteBuffer byteBuffer = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
int size = byteBuffer.capacity();
if (size > 0)
{
byteBuffer.clear();
byte[] bytes = new byte[size];
byteBuffer.get(bytes, 0, bytes.length);
s = new String(bytes);
}
fc.close();
}
catch (FileNotFoundException fnfx)
{
System.out.println("File not found: " + fnfx);
}
catch (IOException iox)
{
System.out.println("I/O problems: " + iox);
}
finally
{
if (fc != null)
{
try
{
fc.close();
}
catch (IOException ignore)
{
}
}
}
return s;
}
Any one have an idea of where i should be putting the ISO encoding?
From the code you posted, you're not trying to "copy" the stream, but read it into a string.
You can simply provide the encoding in the String constructor:
s = new String(bytes, "ISO-88591-1");
Personally I'd just replace the whole method with a call to the Guava method Files.toString():
String content = Files.toString(new File(filename), StandardCharsets.ISO_8859_1);
If you're using Java 6 or earlier, you'll need to use the Guava field Charsets.ISO_8859_1 instead of StandardCharsets.ISO_8859_1 (which was only introduced in Java 7).
However your use of the term "copy" suggests that you want to write the result to some other file (or stream). If that is true, then you don't need to care about the encoding at all, since you can just handle the byte[] directly and avoid the (unnecessary) conversion to and from String.
where you are converting bytes to string e.g. s = new String(bytes, encoding); or vice versa.
I am writting a function that is should detect used charset and then switch it to utf-8. I am using juniversalchardet which is java port for universalchardet by mozilla.
This is my code:
private List<List<String>> setProperEncoding(List<List<String>> input) {
try {
// Detect used charset
UniversalDetector detector = new UniversalDetector(null);
int position = 0;
while ((position < input.size()) & (!detector.isDone())) {
String row = null;
for (String cell : input.get(position)) {
row += cell;
}
byte[] bytes = row.getBytes();
detector.handleData(bytes, 0, bytes.length);
position++;
}
detector.dataEnd();
Charset charset = Charset.forName(detector.getDetectedCharset());
Charset utf8 = Charset.forName("UTF-8");
System.out.println("Detected charset: " + charset);
// rewrite input using proper charset
List<List<String>> newLines = new ArrayList<List<String>>();
for (List<String> row : input) {
List<String> newRow = new ArrayList<String>();
for (String cell : row) {
//newRow.add(new String(cell.getBytes(charset)));
ByteBuffer bb = ByteBuffer.wrap(cell.getBytes(charset));
CharBuffer cb = charset.decode(bb);
bb = utf8.encode(cb);
newRow.add(new String(bb.array()));
}
newLines.add(newRow);
}
return newLines;
} catch (Exception e) {
e.printStackTrace();
return input;
}
}
My problem is that when I read file with chars of for example Polish alphabet, letters like ł,ą,ć and similiar are replaced by ? and other strange things. What am I doing wrong?
EDIT:
For compilation I am using eclipse.
Method parameter is a result of reading MultipartFile. Just using FileInputStream to get every line and then splitting everyline by some separator (it is prepaired for xls, xlsx and csv files). Nothing special there.
First of all, you have your data somewhere in a binary format. For the sake of simplicity, I suppose it comes from an InputStream.
You want to write the output as an UTF-8 String, I suppose it can be an OutputStream.
I would recommend to create an AutoDetectInputStream:
public class AutoDetectInputStream extends InputStream {
private InputStream is;
private byte[] sampleData = new byte[4096];
private int sampleLen;
private int sampleIndex = 0;
public AutoDetectStream(InputStream is) throws IOException {
this.is = is;
// pre-read the data
sampleLen = is.read(sampleData);
}
public Charset getCharset() {
// detect the charset
UniversalDetector detector = new UniversalDetector(null);
detector.handleData(sampleData, 0, sampleLen);
detector.dataEnd();
return detector.getDetectedCharset();
}
#Override
public int read() throws IOException {
// simulate the stream for the reader
if(sampleIndex < sampleLen) {
return sampleData[sampleIndex++];
}
return is.read();
}
}
The second task is quite simple because Java stores the strings (characters) in UTF-8, so just use a simple OutputStreamWriter. So, here's your code:
// open input with Detector stream
// we use BufferedReader so we could read lines
InputStream is = new FileInputStream("in.txt");
AutoDetectInputStream detector = new AutoDetectInputStream(is);
Charset charset = detector.getCharset();
// here we can use the charset to decode the bytes into characters
BufferedReader rdr = new BufferedReader(new InputStreamReader(detector, charset));
// open output to write to
OutputStream os = new FileOutputStream("out.txt");
Writer utf8Writer = new OutputStreamWriter(os, Charset.forName("UTF-8"));
// copy the whole file
String line;
while((line = rdr.readLine()) != null) {
utf8Writer.append(line);
}
// close streams
rdr.close();
utf8Writer.flush();
utf8Writer.close();
So, finally you got all your txt file transcoded to UTF-8.
Note, that the buffer size should be big enough to feed the UniversalDetector.
I'm attempting to output a text file to the console with Java. I was wondering what is the most efficient way of doing so?
I've researched several methods however, it's difficult to discern which is the least performance impacted solution.
Outputting a text file to the console would involve reading in each line in the file, then writing it to the console.
Is it better to use:
Buffered Reader with a FileReader, reading in lines and doing a bunch of system.out.println calls?
BufferedReader in = new BufferedReader(new FileReader("C:\\logs\\"));
while (in.readLine() != null) {
System.out.println(blah blah blah);
}
in.close();
Scanner reading each line in the file and doing system.print calls?
while (scanner.hasNextLine()) {
System.out.println(blah blah blah);
}
Thanks.
If all you want to do is print the contents of a file (and don't want to print the next int/double/etc.) to the console then a BufferedReader is fine.
Your code as it is won't produce the result you're after, though. Try this instead:
BufferedReader in = new BufferedReader(new FileReader("C:\\logs\\log001.txt"));
String line = in.readLine();
while(line != null)
{
System.out.println(line);
line = in.readLine();
}
in.close();
I wouldn't get too hung up about it, though because it's more likely that the main bottleneck will be the ability of your console to print the information that Java is sending it.
If you're not interested in the character based data the text file is containing, just stream it "raw" as bytes.
InputStream input = new BufferedInputStream(new FileInputStream("C:/logs.txt"));
byte[] buffer = new byte[8192];
try {
for (int length = 0; (length = input.read(buffer)) != -1;) {
System.out.write(buffer, 0, length);
}
} finally {
input.close();
}
This saves the cost of unnecessarily massaging between bytes and characters and also scanning and splitting on newlines and appending them once again.
As to the performance, you may find this article interesting. According the article, a FileChannel with a 256K byte array which is read through a wrapped ByteBuffer and written directly from the byte array is the fastest way.
FileInputStream input = new FileInputStream("C:/logs.txt");
FileChannel channel = input.getChannel();
byte[] buffer = new byte[256 * 1024];
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
try {
for (int length = 0; (length = channel.read(byteBuffer)) != -1;) {
System.out.write(buffer, 0, length);
byteBuffer.clear();
}
} finally {
input.close();
}
If it's a relatively small file, a one-line Java 7+ way to do this is:
System.out.println(new String(Files.readAllBytes(Paths.get("logs.txt"))));
See https://docs.oracle.com/javase/7/docs/api/java/nio/file/package-summary.html for more details.
Cheers!
If all you want is most efficiently dump the file contents to the console with no processing in-between, converting the data into characters and finding line breaks is unnecessary overhead. Instead, you can just read blocks of bytes from the file and write then straight out to System.out:
package toconsole;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
public class Main {
public static void main(String[] args) {
BufferedInputStream bis = null;
byte[] buffer = new byte[8192];
int bytesRead = 0;
try {
bis = new BufferedInputStream(new FileInputStream(args[0]));
while ((bytesRead = bis.read(buffer)) != -1) {
System.out.write(buffer, /* start */ 0, /* length */ bytesRead);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try { bis.close(); } catch (Exception e) { /* meh */ }
}
}
}
In case you haven't come across this kind of idiom before, the statement in the while condition both assigns the result of bis.read to bytesRead and then compares it to -1. So we keep reading bytes into the buffer until we are told that we're at the end of the file. And we use bytesRead in System.out.write to make sure we write only the bytes we've just read, as we can't assume all files are a multiple of 8 kB long!
FileInputStream input = new FileInputStream("D:\\Java\\output.txt");
FileChannel channel = input.getChannel();
byte[] buffer = new byte[256 * 1024];
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
try {
for (int length = 0; (length = channel.read(byteBuffer)) != -1;) {
System.out.write(buffer, 0, length);
byteBuffer.clear();
}
} finally {
input.close();
}
Path temp = Files.move
(Paths.get("D:\\\\Java\\\\output.txt"),
Paths.get("E:\\find\\output.txt"));
if(temp != null)
{
System.out.println("File renamed and moved successfully");
}
else
{
System.out.println("Failed to move the file");
}
}
For Java 11 you could use more convenient approach:
Files.copy(Path.of("file.txt"), System.out);
Or for more faster output:
var out = new BufferedOutputStream(System.out);
Files.copy(Path.of("file.txt"), out);
out.flush();