Send a text file from URL directly into a scanner - java

I am using java and currently, I can download a text file from the internet, read that file, then send that file into a Scanner. Is it possible to skip writing it to the harddrive and send it straight into the scanner? I tried changing the code some, but it didn't work.
URL link = new URL("http://shayconcepts.com/programming/ComicDownloader/version.txt");
ReadableByteChannel rbc = Channels.newChannel(link.openStream());//Gets the html page
FileOutputStream fos = new FileOutputStream("version.txt");//Creates the output name of the output file to be saved to the computer
fos.getChannel().transferFrom(rbc, 0, 1 << 24);
fos.close();
Scanner sc = new Scanner(new FileReader("version.txt"));

Yes, it is definitely possible. Just do exactly like you said: feed the from the URL obtained input stream straight into the scanner.
Scanner sc = new Scanner(link.openStream());
It has namely also a constructor taking an input stream. It accepts by the way the charset as 2nd argument, you might want to make use of it if the text file is possibly in a different character encoding than the platform default one, otherwise you might risk Mojibake.
Scanner sc = new Scanner(link.openStream(), "UTF-8");

Related

Decode Base64InputStream from String and unzip it with GZIPInputStream in Java

I am tryin to create simple app to decode the zipped data from string. In one textarea user will paste the zipped data after button click the data will be decoded and shown in another textarea.
If I use files, it works perfectly:
zis = new GZIPInputStream(new Base64InputStream(new FileInputStream(inZippedFile)));
where inZippedFile is file. Then result is saved to outputFile.
However, if I want to use string in InputStream it will never finish.
String input = "..."
InputStream in = IOUtils.toInputStream(input, "UTF-8");
zis = new GZIPInputStream(new Base64InputStream(in));
For IOUtils I am using common apache 2.4 jars. Anything what I am doing wrong?
Thanks
The decoding and unzip the string data is working correctly, there was just mistake in parsing the data to corect form. This was causing the long run.
So this is working, no need to set the UTF-8:
new GZIPInputStream(new Base64InputStream(IOUtils.toInputStream(input)));

equivalent to Files.readAllLines() for InputStream or Reader?

I have a file that I've been reading into a List via the following method:
List<String> doc = java.nio.file.Files.readAllLines(new File("/path/to/src/resources/citylist.csv").toPath(), StandardCharsets.UTF_8);
Is there any nice (single-line) Java 7/8/nio2 way to pull off the same feat with a file that's inside an executable Jar (and presumably, has to be read with an InputStream)? Perhaps a way to open an InputStream via the classloader, then somehow coerce/transform/wrap it into a Path object? Or some new subclass of InputStream or Reader that contains an equivalent to File.readAllLines(...)?
I know I could do it the traditional way in a half page of code, or via some external library... but before I do, I want to make sure that recent releases of Java can't already do it "out of the box".
An InputStream represents a stream of bytes. Those bytes don't necessarily form (text) content that can be read line by line.
If you know that the InputStream can be interpreted as text, you can wrap it in a InputStreamReader and use BufferedReader#lines() to consume it line by line.
try (InputStream resource = Example.class.getResourceAsStream("resource")) {
List<String> doc =
new BufferedReader(new InputStreamReader(resource,
StandardCharsets.UTF_8)).lines().collect(Collectors.toList());
}
You can use Apache Commons IOUtils#readLines:
List<String> doc = IOUtils.readLines(inputStream, StandardCharsets.UTF_8);

how to read a csv file line by line having line break and coma in fields

I have to read a csv file line by line and change line break (alt+enter that is \n\r) with space and comas out side of fields with Ctrl+A (\001). As I try this with reading Buffered Reader, it takes line from \n and not interpret \n\r as non line break character. How can I handle this. I have to done this in java
You can read the entire file into a String variable and then use String.replaceAll() to replace the characters as you want:-
File file = new File("abc.csv");
FileInputStream fis = null;
fis = new FileInputStream(file);
byte[] data = new byte[(int) file.length()];
fis.read(data);
fis.close();
String str = new String(data, "UTF-8");
And then replace characters in the String:-
str = str.replaceAll("\r\n", " ");
str = str.replaceAll("[,]", ";");
System.out.println(str);
Then you can create a new file or overwrite the existing file using the new String
I assume you mean "\r\n", not "\n\r". Usually windows-based programs generate "\r\n" while linux-based programs use "\n".
The buffered reader normally uses the global "line.separator" setting that should be set specifically for your system.
On a sidenote: it is usually interesting to use frameworks for parsing instead of doing actual line reads because there are a lot of edge cases (like this one) that a framework usually covers.

Java read utf-8 encoded file, character by character

I have a file saved as utf-8 (saved by my application in fact). How do you read it character by character?
File file = new File(folder+name);
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
DataInputStream dis = new DataInputStream(bis);
The two options seem to be:
char c = dis.readByte()
char c = dis.readChar()
The first option works as long as you only have ascii characters stored, ie english.
The second option reads the first and second byte of the file as one character.
The original file is being written as follows:
File file = File.createTempFile("file", "txt");
FileWriter fstream = new FileWriter(file);
BufferedWriter out = new BufferedWriter(fstream);
You don't want a DataInputStream, that's for reading raw bytes. Use an InputStreamReader, which lets you specify the encoding of the input (UTF-8 in your case).
You should be aware that in the Java world you use streams to process bytes, and readers/writers to process characters. These two are not the same, and you should choose the right one to handle what you have.
Have a look at http://java.sun.com/docs/books/tutorial/i18n/text/stream.html to see how to work with characters in a byte-oriented world.
The Sun Java Tutorial is a highly recommended learning resource.
Use a Reader (eg. BufferedReader)
Reader reader = new BufferedReader(new FileReader(file));
char c = reader.read();
You can read individual bytes and when you hit a byte that is less than 128 (ie. the 8th byte is 0) then that is the last byte of the character.
I'm no Java expert, but I would assume that there are better ways. Maybe some way of telling the reader what encoding it is in...
edit: see dmazzoni's answer.

Corrupt file when using Java to download file

This problem seems to happen inconsistently. We are using a java applet to download a file from our site, which we store temporarily on the client's machine.
Here is the code that we are using to save the file:
URL targetUrl = new URL(urlForFile);
InputStream content = (InputStream)targetUrl.getContent();
BufferedInputStream buffered = new BufferedInputStream(content);
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int letter;
while((letter = buffered.read()) != -1)
fos.write(letter);
fos.close();
Later, I try to access that file by using:
ObjectInputStream keyInStream = new ObjectInputStream(new FileInputStream(savedFile));
Most of the time it works without a problem, but every once in a while we get the error:
java.io.StreamCorruptedException: invalid stream header: 0D0A0D0A
which makes me believe that it isn't saving the file correctly.
I'm guessing that the operations you've done with getContent and BufferedInputStream have treated the file like an ascii file which has converted newlines or carriage returns into carriage return + newline (0x0d0a), which has confused ObjectInputStream (which expects serialized data objects.
If you are using an FTP URL, the transfer may be occurring in ASCII mode.
Try appending ";type=I" to the end of your URL.
Why are you using ObjectInputStream to read it?
As per the javadoc:
An ObjectInputStream deserializes primitive data and objects previously written using an ObjectOutputStream.
Probably the error comes from the fact you didn't write it with ObjectOutputStream.
Try reading it wit FileInputStream only.
Here's a sample for binary ( although not the most efficient way )
Here's another used for text files.
There are 3 big problems in your sample code:
You're not just treating the input as bytes
You're needlessly pulling the entire object into memory at once
You're doing multiple method calls for every single byte read and written -- use the array based read/write!
Here's a redo:
URL targetUrl = new URL(urlForFile);
InputStream is = targetUrl.getInputStream();
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int count;
byte[] buff = new byte[16 * 1024];
while((count = is.read(buff)) != -1) {
fos.write(buff, 0, count);
}
fos.close();
content.close();
You could also step back from the code and check to see if the file on your client is the same as the file on the server. If you get both files on an XP machine, you should be able to use the FC utility to do a compare (check FC's help if you need to run this as a binary compare as there is a switch for that). If you're on Unix, I don't know the file compare program, but I'm sure there's something.
If the files are identical, then you're looking at a problem with the code that reads the file.
If the files are not identical, focus on the code that writes your file.
Good luck!

Categories

Resources