Guava equivalent for IOUtils.toString(InputStream) - java

Apache Commons IO has a nice convenience method IOUtils.toString() to read an InputStream to a String.
Since I am trying to move away from Apache Commons and to Guava: is there an equivalent in Guava? I looked at all classes in the com.google.common.io package and I couldn't find anything nearly as simple.
Edit: I understand and appreciate the issues with charsets. It just so happens that I know that all my sources are in ASCII (yes, ASCII, not ANSI etc.), so in this case, encoding is not an issue for me.

You stated in your comment on Calum's answer that you were going to use
CharStreams.toString(new InputStreamReader(supplier.get(), Charsets.UTF_8))
This code is problematic because the overload CharStreams.toString(Readable) states:
Does not close the Readable.
This means that your InputStreamReader, and by extension the InputStream returned by supplier.get(), will not be closed after this code completes.
If, on the other hand, you take advantage of the fact that you appear to already have an InputSupplier<InputStream> and used the overload CharStreams.toString(InputSupplier<R extends Readable & Closeable>), the toString method will handle both the creation and closing of the Reader for you.
This is exactly what Jon Skeet suggested, except that there isn't actually any overload of CharStreams.newReaderSupplier that takes an InputStream as input... you have to give it an InputSupplier:
InputSupplier<? extends InputStream> supplier = ...
InputSupplier<InputStreamReader> readerSupplier =
CharStreams.newReaderSupplier(supplier, Charsets.UTF_8);
// InputStream and Reader are both created and closed in this single call
String text = CharStreams.toString(readerSupplier);
The point of InputSupplier is to make your life easier by allowing Guava to handle the parts that require an ugly try-finally block to ensure that resources are closed properly.
Edit: Personally, I find the following (which is how I'd actually write it, was just breaking down the steps in the code above)
String text = CharStreams.toString(
CharStreams.newReaderSupplier(supplier, Charsets.UTF_8));
to be far less verbose than this:
String text;
InputStreamReader reader = new InputStreamReader(supplier.get(),
Charsets.UTF_8);
boolean threw = true;
try {
text = CharStreams.toString(reader);
threw = false;
}
finally {
Closeables.close(reader, threw);
}
Which is more or less what you'd have to write to handle this properly yourself.
Edit: Feb. 2014
InputSupplier and OutputSupplier and the methods that use them have been deprecated in Guava 16.0. Their replacements are ByteSource, CharSource, ByteSink and CharSink. Given a ByteSource, you can now get its contents as a String like this:
ByteSource source = ...
String text = source.asCharSource(Charsets.UTF_8).read();

If you've got a Readable you can use CharStreams.toString(Readable). So you can probably do the following:
String string = CharStreams.toString( new InputStreamReader( inputStream, "UTF-8" ) );
Forces you to specify a character set, which I guess you should be doing anyway.

Nearly. You could use something like this:
InputSupplier<InputStreamReader> readerSupplier = CharStreams.newReaderSupplier
(streamSupplier, Charsets.UTF_8);
String text = CharStreams.toString(readerSupplier);
Personally I don't think that IOUtils.toString(InputStream) is "nice" - because it always uses the default encoding of the platform, which is almost never what you want. There's an overload which takes the name of the encoding, but using names isn't a great idea IMO. That's why I like Charsets.*.
EDIT: Not that the above needs an InputSupplier<InputStream> as the streamSupplier. If you've already got the stream you can implement that easily enough though:
InputSupplier<InputStream> supplier = new InputSupplier<InputStream>() {
#Override public InputStream getInput() {
return stream;
}
};

UPDATE: Looking back, I don't like my old solution. Besides it is 2013 now and there are better alternatives available now for Java7. So here is what I use now:
InputStream fis = ...;
String text;
try ( InputStreamReader reader = new InputStreamReader(fis, Charsets.UTF_8)){
text = CharStreams.toString(reader);
}
or if with InputSupplier
InputSupplier<InputStreamReader> spl = ...
try ( InputStreamReader reader = spl.getInput()){
text = CharStreams.toString(reader);
}

Another option is to read bytes from Stream and create a String from them:
new String(ByteStreams.toByteArray(inputStream))
new String(ByteStreams.toByteArray(inputStream), Charsets.UTF_8)
It's not 'pure' Guava, but it's a little bit shorter.

Based on the accepted answer, here is a utility method that mocks the behavior of IOUtils.toString() (and an overloaded version with a charset, as well). This version should be safe, right?
public static String toString(final InputStream is) throws IOException{
return toString(is, Charsets.UTF_8);
}
public static String toString(final InputStream is, final Charset cs)
throws IOException{
Closeable closeMe = is;
try{
final InputStreamReader isr = new InputStreamReader(is, cs);
closeMe = isr;
return CharStreams.toString(isr);
} finally{
Closeables.closeQuietly(closeMe);
}
}

There is much shorter autoclosing solution in case when input stream comes from classpath resource:
URL resource = classLoader.getResource(path);
byte[] bytes = Resources.toByteArray(resource);
String text = Resources.toString(resource, StandardCharsets.UTF_8);
Uses Guava Resources, inspired by IOExplained.

EDIT (2015): Okio is the best abstraction and tools for I/O in Java/Android that I know of. I use it all the time.
FWIW here's what I use.
If I already have a stream in hand, then:
final InputStream stream; // this is received from somewhere
String s = CharStreams.toString(CharStreams.newReaderSupplier(new InputSupplier<InputStream>() {
public InputStream getInput() throws IOException {
return stream;
}
}, Charsets.UTF_8));
If I'm creating a stream:
String s = CharStreams.toString(CharStreams.newReaderSupplier(new InputSupplier<InputStream>() {
public InputStream getInput() throws IOException {
return <expression creating the stream>;
}
}, Charsets.UTF_8));
As a concrete example, I can read an Android text file asset like this:
final Context context = ...;
String s = CharStreams.toString(CharStreams.newReaderSupplier(new InputSupplier<InputStream>() {
public InputStream getInput() throws IOException {
return context.getAssets().open("my_asset.txt");
}
}, Charsets.UTF_8));

For a concrete example, here's how I can read an Android text file asset:
public static String getAssetContent(Context context, String file) {
InputStreamReader reader = null;
InputStream stream = null;
String output = "";
try {
stream = context.getAssets().open(file);
reader = new InputStreamReader(stream, Charsets.UTF_8);
output = CharStreams.toString(reader);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (stream != null) {
try {
stream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (reader != null) {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return output;
}

Related

Is it possible to convert a ByteArrayOutputStream to a InputStream?

Is it possible to convert a ByteArrayOutputStream to a InputStream? I need it for URLConnection.guessContentTypeFromStream().
I need this because I do want to avoid new ByteArrayInputStream(baos.toByteArray()), that make two new copies of the data.
I checked the javadoc of PipedInputStream, and it seems it can be created only from a PipedOutputStream.
I tried this
try (PipedOutputStream pos = new PipedOutputStream()) {
pos.write(bytes);
try (InputStream is = new PipedInputStream(pos)) {
contentType = URLConnection.guessContentTypeFromStream(is);
}
}
catch (IOException e) {
throw new IOUncheckedException(e);
}
But it gives to me
java.io.IOException: Pipe not connected
Is it possible, maybe converting it in an intermediate PipedOutputStream, without creating a copy of the data?
When you have the possibility to replace the ByteArrayOutputStream with a subclass, efficiently reading back is actually very easy:
public class ReadableByteArrayOutputStream extends ByteArrayOutputStream {
public ReadableByteArrayOutputStream() {
}
public ReadableByteArrayOutputStream(int size) {
super(size);
}
public synchronized InputStream read() {
return new ByteArrayInputStream(buf, 0, count);
}
#Override
public synchronized void reset() {
buf = new byte[buf.length];
count = 0;
}
}
While toArray() would create a copy of the buffer, ByteArrayInputStream’s constructor does not copy the data but store the array reference. So it’s the most efficient way to read the data, supporting all bulk transfer methods, as well as mark/reset.
E.g.
ReadableByteArrayOutputStream os = new ReadableByteArrayOutputStream();
OutputStreamWriter w = new OutputStreamWriter(os, "UTF-8");
w.write("just some text");
w.flush();
System.out.println(new Scanner(os.read()).next());
InputStream is = os.read();
is.mark(9);
byte[] b = new byte[9];
new DataInputStream(is).readFully(b);
System.out.println(new String(b, "UTF-8"));
is.reset();
System.out.println(new String(new DataInputStream(is).readAllBytes(), "UTF-8"));
just
just some
just some text
The input stream returned by read() represents a snapshot of the data written so far, without being affected by subsequent writes. To ensure that the input stream stays consistent, the output stream’s reset() method has been overwritten, so this output stream will never overwrite previous data.
Even if this does not answer the question, I solved my problem using mime-util:
public class MyMimeUtil {
static {
MimeUtil.registerMimeDetector(
"eu.medsea.mimeutil.detector.MagicMimeMimeDetector"
);
}
public static String getMime(byte[] bytes) {
Collection<MimeType> mimeTypes = MimeUtil.getMimeTypes(bytes);
MimeType mimeType = mimeTypes.iterator().next();
String res = mimeType.toString();
return res;
}
}

Java: How to create a general method which returns iterator (columnList) of each line of a big data file

This question is more regarding architecture and to enhance my own understanding.
I find myself writing below code over and over whenever I am analyzing tab-delimited data files.
BufferedReader reader = new BufferedReader(new FileReader(new File(filename)));
String line="";
while ( (line=reader.readLine())!=null) {
List<String> columnList = Splitter.on('\t').splitToList(line);
//Do something with columns
}
More info on Splitter
Although nothing is wrong with the above code, I like to know if there is a way to generalize above so that I can put this piece of code in some utility class and keep calling it.
Because the data files would be in giga bytes, I don't want to use Files.readLines(), I still want to use one line at a time and process that line before moving to the next line.
Question:
So, is there a way to create something getFileLineColumnListIterator(String fileName,String delimiter) and I can simply issue .next() on that iterator to get next line's columnList, while still preserving the original order of lines?
Hopefully, my question is not drifting towards functional programming paradigm.
Extra credit if you could answer how to specify encoding while reading the file as above.
P.S. Please feel free to suggest a better headline for this question, this is the best I could come up.
To specify encoding you need to use an InputStreamReader:
try (final BufferedReader reader =
new BufferedReader(new InputStreamReader(new FileInputStream(myFile), Charset.forName("UTF-8")))) {
}
To avoid rewriting the code every time use a library such as OpenCSV - do not reinvent the wheel. For example your code does not cope with escaped delimiters or data wrapped in quotes.
With OpenCSV you can do something like this:
try (final CSVReader reader =
new CSVReader(new InputStreamReader(new FileInputStream(myFile), Charset.forName("UTF-8")), '\t')) {
String[] line;
while ((line = reader.readNext()) != null) {
}
}
If you really want to do this yourself ignoring the warnings above and assuming you are using Guava you can do something like this:
public final class TsvProcessor extends AbstractIterator<List<String>> {
private final Splitter splitter = Splitter.on('\t');
private final Scanner s;
public TsvProcessor(final File file, final String charset) throws FileNotFoundException {
s = new Scanner(file, charset);
}
#Override
protected List<String> computeNext() {
if (!s.hasNext()) {
s.close();
return endOfData();
}
return splitter.splitToList(s.nextLine());
}
}
Usage being:
final Iterator<List<String>> lines = new TsvProcessor(myFile, "UTF-8");
while(lines.hasNext()) {
}
Note, in Java 8 you can use the new Stream API:
final Splitter s = Splitter.on('\t');
Files.lines(myFile.toPath()).map(x -> s.splitToList(x)).forEach(new Consumer<List<String>>() {
#Override
public void accept(final List<String> t) {
//do stuff
}
});
As #JBNizet suggests you could also use the streaming method of Files.readLines that takes a LineProcessor:
Files.readLines(myFile, Charsets.UTF_8, new LineProcessor<T>() {
#Override
public boolean processLine(final String line) throws IOException {
//process line
}
#Override
public T getResult() {
//return result
}
});
You can implement your own LineProcessor and re-use it. Encapsulate the splitting behaviour into that impl.
From the JavaDoc:
Streams lines from a File, stopping when our callback returns false,
or we have read all of the lines.
How about the LineIterator from apache.commons.io
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/LineIterator.html
It does exactly what the name suggests and reads a line only when requested.
You could implement the Iterator interface directly, and you could use an InputStreamReader like this (for your encoding) -
String charSet = "UTF-8";
BufferedReader reader = new BufferedReader(
new java.io.InputStreamReader(
new java.io.FileInputStream(filename), charSet
)
);

Java: How to convert a File object to a String object in java? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to create a Java String from the contents of a file
I have a html file which I want to use to extract information. For that I am using Jsoup.
Now for using Jsoup, I need to convert the html file into a string. How can I do that?
File myhtml = new File("D:\\path\\report.html")';
Now, I want a String object that contains the content inside the html file.
I use apache common IO to read a text file into a single string
String str = FileUtils.readFileToString(file);
simple and "clean". you can even set encoding of the text file with no hassle.
String str = FileUtils.readFileToString(file, "UTF-8");
Use a library like Guava or Commons / IO. They have oneliner methods.
Guava:
Files.toString(file, charset);
Commons / IO:
FileUtils.readFileToString(file, charset);
Without such a library, I'd write a helper method, something like this:
public String readFile(File file, Charset charset) throws IOException {
return new String(Files.readAllBytes(file.toPath()), charset);
}
With Java 7, it's as simple as:
final String EoL = System.getProperty("line.separator");
List<String> lines = Files.readAllLines(Paths.get(fileName),
Charset.defaultCharset());
StringBuilder sb = new StringBuilder();
for (String line : lines) {
sb.append(line).append(EoL);
}
final String content = sb.toString();
However, it does havea few minor caveats (like handling files that does not fit into the memory).
I would suggest taking a look on corresponding section in the official Java tutorial (that's also the case if you have a prior Java).
As others pointed out, you might find sime 3rd party libraries useful (like Apache commons I/O or Guava).
Readin file with file inputstream and append file content to string.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
public class CopyOffileInputStream {
public static void main(String[] args) {
//File file = new File("./store/robots.txt");
File file = new File("swingloggingsscce.log");
FileInputStream fis = null;
String str = "";
try {
fis = new FileInputStream(file);
int content;
while ((content = fis.read()) != -1) {
// convert to char and display it
str += (char) content;
}
System.out.println("After reading file");
System.out.println(str);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (fis != null)
fis.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}
By the way, Jsoup has method that takes file: http://jsoup.org/apidocs/org/jsoup/Jsoup.html#parse(java.io.File,%20java.lang.String)
You can copy all contents of myhtml to String as follows:
Scanner myScanner = null;
try
{
myScanner = new Scanner(myhtml);
String contents = myScanner.useDelimiter("\\Z").next();
}
finally
{
if(myScanner != null)
{
myScanner.close();
}
}
Ofcourse, you can add a catch block to handle exceptions properly.
Why you just not read the File line by line and add it to a StringBuffer?
After you reach end of File you can get the String from the StringBuffer.

Java - Reading input from a file. java.io.FilterInputStream.available(Unknown Source)?

I haven't written any Java in years and I went back to refresh my memory with a simple 'read-from-file' example. Here is my code..
import java.io.*;
public class filereading {
public static void main(String[] args) {
File file = new File("C:\\file.txt");
FileInputStream fs = null;
BufferedInputStream bs = null;
DataInputStream ds = null;
try
{
fs = new FileInputStream(file);
bs = new BufferedInputStream(bs);
ds = new DataInputStream(ds);
while(ds.available()!= 0)
{
String readLine = ds.readLine();
System.out.println(readLine);
}
ds.close();
bs.close();
fs.close();
}
catch(FileNotFoundException e)
{
e.printStackTrace();
}
catch(IOException e)
{
e.printStackTrace();
}
}
}
This compiles fine (although apparently ds.readLine() is deprected), but at runtime, this gives me
Exception in thread "main"
java.lang.NullPointerException at
java.io.FilterInputStream.available(Unknown
Source) at
filereading.main(filereading.java:21)
What gives?
You made a simple typo:
ds = new DataInputStream(ds);
should be
ds = new DataInputStream(bs);
Your code is initializing the DataInputStream with a null source, since ds hasn't been created yet.
Having said that, Jon Skeet's answer gives a better way to write a file-reading program (and you should always use Readers/Writers rather than Streams when dealing with text).
To read a text file, use BufferedReader - in this case, wrapped round an InputStreamReader, wrapped round a FileInputStream. (This allows you to set the encoding explicitly - which you should definitely do.) You should also close resources in finally blocks, of course.
You should then read lines until readLine() returns null, rather than relying on available() IMO. I suspect you'll find that readLine() was returning null for the last line in the file, even though available() returned 2 to indicate the final \r\n. Just a hunch though.
String line;
while ((line = reader.readLine()) != null)
{
System.out.println(line);
}

Most concise way to read the contents of a file/input stream in Java?

What ist most concise way to read the contents of a file or input stream in Java? Do I always have to create a buffer, read (at most) line by line and so on or is there a more concise way? I wish I could do just
String content = new File("test.txt").readFully();
Use the Apache Commons IOUtils package. In particular the IOUtils class provides a set of methods to read from streams, readers etc. and handle all the exceptions etc.
e.g.
InputStream is = ...
String contents = IOUtils.toString(is);
// or
List lines = IOUtils.readLines(is)
I think using a Scanner is quite OK with regards to conciseness of Java on-board tools:
Scanner s = new Scanner(new File("file"));
StringBuilder builder = new StringBuilder();
while(s.hasNextLine()) builder.append(s.nextLine());
Also, it's quite flexible, too (e.g. regular expressions support, number parsing).
Helper functions. I basically use a few of them, depending on the situation
cat method that pipes an InputStream to an OutputStream
method that calls cat to a ByteArrayOutputStream and extracts the byte array, enabling quick read of an entire file to a byte array
Implementation of Iterator<String> that is constructed using a Reader; it wraps it in a BufferedReader and readLine's on next()
...
Either roll your own or use something out of commons-io or your preferred utility library.
To give an example of such an helper function:
String[] lines = NioUtils.readInFile(componentxml);
The key is to try to close the BufferedReader even if an IOException is thrown.
/**
* Read lines in a file. <br />
* File must exist
* #param f file to be read
* #return array of lines, empty if file empty
* #throws IOException if prb during access or closing of the file
*/
public static String[] readInFile(final File f) throws IOException
{
final ArrayList lines = new ArrayList();
IOException anioe = null;
BufferedReader br = null;
try
{
br = new BufferedReader(new FileReader(f));
String line;
line = br.readLine();
while(line != null)
{
lines.add(line);
line = br.readLine();
}
br.close();
br = null;
}
catch (final IOException e)
{
anioe = e;
}
finally
{
if(br != null)
{
try {
br.close();
} catch (final IOException e) {
anioe = e;
}
}
if(anioe != null)
{
throw anioe;
}
}
final String[] myStrings = new String[lines.size()];
//myStrings = lines.toArray(myStrings);
System.arraycopy(lines.toArray(), 0, myStrings, 0, lines.size());
return myStrings;
}
(if you just want a String, change the function to append each lines to a StringBuffer (or StringBuilder in java5 or 6)
String content = (new RandomAccessFile(new File("test.txt"))).readUTF();
Unfortunately Java is very picky about the source file being valid UTF8 though, or you will get an EOFException or UTFDataFormatException.
You have to create your own function, I suppose. The problem is that Java's read routines (those I know, at least) usually take a buffer argument with a given length.
A solution I saw is to get the size of the file, create a buffer of this size and read the file at once. Hoping the file isn't a gigabyte log or XML file...
The usual way is to have a fixed size buffer or to use readLine and concatenate the results in a StringBuffer/StringBuilder.
I don't think reading using BufferedReader is a good idea because BufferedReader will return just the content of line without the delimeter. When the line contains nothing but newline character, BR will return a null although it still doesn't reach the end of the stream.
String org.apache.commons.io.FileUtils.readFileToString(File file)
Pick one from here.
How do I create a Java string from the contents of a file?
The favorite was:
private static String readFile(String path) throws IOException {
FileInputStream stream = new FileInputStream(new File(path));
try {
FileChannel fc = stream.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
/* Instead of using default, pass in a decoder. */
return CharSet.defaultCharset().decode(bb).toString();
}
finally {
stream.close();
}
}
Posted by erickson
Or the Java 8 way:
try {
String str = new String(Files.readAllBytes(Paths.get("myfile.txt")));
...
} catch (IOException ex) {
Logger.getLogger(getClass().getName()).log(Level.SEVERE, null, ex);
}
One may pass an appropriate Charset to the String constructor.

Categories

Resources