How to read a text file directly from Internet using Java?

How to read a text file directly from Internet using Java? - java

I am trying to read some words from an online text file.
I tried doing something like this
File file = new File("http://www.puzzlers.org/pub/wordlists/pocket.txt");
Scanner scan = new Scanner(file);
but it didn't work, I am getting
http://www.puzzlers.org/pub/wordlists/pocket.txt
as the output and I just want to get all the words.
I know they taught me this back in the day but I don't remember exactly how to do it now, any help is greatly appreciated.

Use an URL instead of File for any access that is not on your local computer.
URL url = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
Scanner s = new Scanner(url.openStream());
Actually, URL is even more generally useful, also for local access (use a file: URL), jar files, and about everything that one can retrieve somehow.
The way above interprets the file in your platforms default encoding. If you want to use the encoding indicated by the server instead, you have to use a URLConnection and parse it's content type, like indicated in the answers to this question.
About your Error, make sure your file compiles without any errors - you need to handle the exceptions. Click the red messages given by your IDE, it should show you a recommendation how to fix it. Do not start a program which does not compile (even if the IDE allows this).
Here with some sample exception-handling:
try {
URL url = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
Scanner s = new Scanner(url.openStream());
// read from your scanner
}
catch(IOException ex) {
// there was some connection problem, or the file did not exist on the server,
// or your URL was not in the right format.
// think about what to do now, and put it here.
ex.printStackTrace(); // for now, simply output it.
}

try something like this
URL u = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
InputStream in = u.openStream();
Then use it as any plain old input stream

What really worked to me: (source: oracle documentation "reading url")
import java.net.*;
import java.io.*;
public class UrlTextfile {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://yoursite.com/yourfile.txt");
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}

Using Apache Commons IO:
import org.apache.commons.io.IOUtils;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.nio.charset.StandardCharsets;
public static String readURLToString(String url) throws IOException
{
try (InputStream inputStream = new URL(url).openStream())
{
return IOUtils.toString(inputStream, StandardCharsets.UTF_8);
}
}

Use this code to read an Internet resource into a String:
public static String readToString(String targetURL) throws IOException
{
URL url = new URL(targetURL);
BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(url.openStream()));
StringBuilder stringBuilder = new StringBuilder();
String inputLine;
while ((inputLine = bufferedReader.readLine()) != null)
{
stringBuilder.append(inputLine);
stringBuilder.append(System.lineSeparator());
}
bufferedReader.close();
return stringBuilder.toString().trim();
}
This is based on here.

For an old school input stream, use this code:
InputStream in = new URL("http://google.com/").openConnection().getInputStream();

I did that in the following way for an image, you should be able to do it for text using similar steps.
// folder & name of image on PC
File fileObj = new File("C:\\Displayable\\imgcopy.jpg");
Boolean testB = fileObj.createNewFile();
System.out.println("Test this file eeeeeeeeeeeeeeeeeeee "+testB);
// image on server
URL url = new URL("http://localhost:8181/POPTEST2/imgone.jpg");
InputStream webIS = url.openStream();
FileOutputStream fo = new FileOutputStream(fileObj);
int c = 0;
do {
c = webIS.read();
System.out.println("==============> " + c);
if (c !=-1) {
fo.write((byte) c);
}
} while(c != -1);
webIS.close();
fo.close();

Alternatively, you can use Guava's Resources object:
URL url = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
List<String> lines = Resources.readLines(url, Charsets.UTF_8);
lines.forEach(System.out::println);

corrected method is deprecated now. It is giving the option
private WeakReference<MyActivity> activityReference;
here solution will useful.

Related

Google App Engine - read static files in Java

I have an App Engine servlet which creates objects based on a JSON file (stored in a "Resources" folder under WEB-INF). I handle the parsing of the file in a separate class:
public class EventParser
{
static Gson gson = new Gson();
static String fileName = "/SOServices-war/src/main/webapp/WEB-INF/Resources/mockEvents.json";
static File file = new File(fileName);
public static List<Event> readDataFromJSON() throws IOException
{
InputStreamReader inReader = new InputStreamReader(new FileInputStream(file), "UTF-8");
String stringFromRes = null;
try (BufferedReader br = new BufferedReader(inReader))
{
StringBuilder sb = new StringBuilder();
String sCurrentLine;
while ((sCurrentLine = br.readLine()) != null)
{
sb.append(sCurrentLine);
sb.append(System.lineSeparator());
}
stringFromRes = sb.toString();
}
catch (IOException e)
{
e.printStackTrace();
}
List<Event> events = new ArrayList<Event>();
Type listOfTestObject = new TypeToken<List<Event>>(){}.getType();
events = gson.fromJson(stringFromRes, listOfTestObject);
return events;
}
}
It is important to point out that this used to work using "WEB-INF/" as path, but I've recreated the project (using the new Maven tutorial) and it has a new folder structure: instead of /war/WEB-INF, it looks like /AppName-war/src/main/webapp/WEB-INF. I don't really understand how it could affect things, but I can't seem to get file reading working again.
So far I've tried:
just using " instead of "WEB-INF/"
putting the files under the webapp folder directly
using the old method
using the string provided in this code snipped, but without the dash at the beginning
Neither of them worked, the files couldn't be found on local development environment (app is not deployed yet).
Update
Big thanks to McDowell for providing me the link, it was a bit different, but it helped a lot. I am now calling the readDataFromJSON() method with a servletContext parameter like so:
List<Event> events = EventParser.readDataFromJSON(this.getServletContext());
And then in the parser:
String filePath = context.getRealPath(fileName);
InputStreamReader inReader = new InputStreamReader(new FileInputStream(new File(filePath)), "UTF-8");
This solves my issue.

Cannot find file on netbeans

I'm trying to access a data file to get questions and answers for my "Quiz" application.
If I access the file from the one on my desktop, it works fine. If I drag and drop the file into my netbeans, I cannot seem to access it.
The file is in the package "quiz" along with my other classes.
Here's the code that works but I want to use the netbeans file.
String fileName = "C:/Users/Michael/Desktop/QUIZ.DAT";
try {
//Make fileReader object to read the file
FileReader file = new FileReader(new File(fileName));
BufferedReader fileStream = new BufferedReader(file);
} catch (Exception e) {
System.out.println("File not found");
}
To try and access the file on netbeans I use this but it cannot find it.
String fileName = "quiz/Quiz.DAT";

Try this, where MyClass is the class name. I have assumed the quiz.dat file is in the same package of the class.
InputStream f = MyClass.class.getResourceAsStream("QUIZ.DAT");
BufferedReader bReader = new BufferedReader(new InputStreamReader(f));
StringBuffer sbfFileContents = new StringBuffer();
String line = null;
while ((line = bReader.readLine()) != null) {
sbfFileContents.append(line);
}
System.out.println(sbfFileContents.toString());

JJPA provided proper code. But let me enhance it better.
Project
com.io
test.txt
com.root
AccessFile.java
This is my program structure. I want to access file from package io So here is the code.
package com.root;
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
public class AccessFile {
public static void main(String args[]){
try{
InputStream f = AccessFile.class.getResourceAsStream("../io/test.txt");
BufferedReader bReader = new BufferedReader(new InputStreamReader(f));
StringBuffer sbfFileContents = new StringBuffer();
String line = null;
while ((line = bReader.readLine()) != null) {
sbfFileContents.append(line);
}
bReader.close();
f.close();
System.out.println(sbfFileContents.toString());
}catch(Exception e){
e.printStackTrace();
}
}
}

If you are trying to read a file in your JAVA project and netbeans is not able to find it, put the file in the root directory of your project and it should be able to find it.

JAVA: How to check if website document contains a word?

I currently have the follow method:
try {
URL url = new URL("http://auth.h.gp/HAKUNA%20MATATA.txt");
Scanner s = new Scanner(url.openStream());
}
catch(IOException ex) {
BotScript.log("Something went wrong =/ Error code:");
ex.printStackTrace();
stop();
}
However, how do I check if it contains a word? I've never worked with Scanners before and I found this snippet online.
Thank you.

Okay, that looks good so far.
You can then use Scanner's next() method to get each word. You can also query hasNext() to see if there's another token available to avoid errors.
boolean foundPumbaa = false;
while (s.hasNext()) {
if (s.next().equalsIgnoreCase("pumbaa")) {
foundPumbaa = true;
System.out.println("We found Pumbaa"); // do something
break;
}
}
if (!foundPumbaa) {
System.out.println("We didn't find Pumbaa");
}
EDIT in response to comment:
Yes, you can turn the text into a String. The best way to do this is probably with a BufferedReader.
From the Java Tutorial, "Reading Directly from a URL":
The following small Java program uses openStream() to get an input
stream on the URL http://www.oracle.com/. It then opens a
BufferedReader on the input stream and reads from the BufferedReader
thereby reading from the URL. Everything read is copied to the
standard output stream:
import java.net.*;
import java.io.*;
public class URLReader {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://www.oracle.com/");
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
In a real program, instead of main throws Exception, you'd have that in a try-catch block and catch an IOException and some various URLExceptions. But this should get you started.

Reading entire html file to String?

Are there better ways to read an entire html file to a single string variable than:
String content = "";
try {
BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
String str;
while ((str = in.readLine()) != null) {
content +=str;
}
in.close();
} catch (IOException e) {
}

You should use a StringBuilder:
StringBuilder contentBuilder = new StringBuilder();
try {
BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
}
String content = contentBuilder.toString();

There's the IOUtils.toString(..) utility from Apache Commons.
If you're using Guava there's also Files.readLines(..) and Files.toString(..).

You can use JSoup.
It's a very strong HTML parser for java

As Jean mentioned, using a StringBuilder instead of += would be better. But if you're looking for something simpler, Guava, IOUtils, and Jsoup are all good options.
Example with Guava:
String content = Files.asCharSource(new File("/path/to/mypage.html"), StandardCharsets.UTF_8).read();
Example with IOUtils:
InputStream in = new URL("/path/to/mypage.html").openStream();
String content;
try {
content = IOUtils.toString(in, StandardCharsets.UTF_8);
} finally {
IOUtils.closeQuietly(in);
}
Example with Jsoup:
String content = Jsoup.parse(new File("/path/to/mypage.html"), "UTF-8").toString();
or
String content = Jsoup.parse(new File("/path/to/mypage.html"), "UTF-8").outerHtml();
NOTES:
Files.readLines() and Files.toString()
These are now deprecated as of Guava release version 22.0 (May 22, 2017).
Files.asCharSource() should be used instead as seen in the example above. (version 22.0 release diffs)
IOUtils.toString(InputStream) and Charsets.UTF_8
Deprecated as of Apache Commons-IO version 2.5 (May 6, 2016). IOUtils.toString should now be passed the InputStream and the Charset as seen in the example above. Java 7's StandardCharsets should be used instead of Charsets as seen in the example above. (deprecated Charsets.UTF_8)

I prefers using Guava :
import com.google.common.base.Charsets;
import com.google.common.io.Files;
File file = new File("/path/to/file", Charsets.UTF_8);
String content = Files.toString(file);

For string operations use StringBuilder or StringBuffer classes for accumulating string data blocks. Do not use += operations for string objects. String class is immutable and you will produce a large amount of string objects upon runtime and it will affect on performance.
Use .append() method of StringBuilder/StringBuffer class instance instead.

Here's a solution to retrieve the html of a webpage using only standard java libraries:
import java.io.*;
import java.net.*;
String urlToRead = "https://google.com";
URL url; // The URL to read
HttpURLConnection conn; // The actual connection to the web page
BufferedReader rd; // Used to read results from the web page
String line; // An individual line of the web page HTML
String result = ""; // A long string containing all the HTML
try {
url = new URL(urlToRead);
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = rd.readLine()) != null) {
result += line;
}
rd.close();
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(result);
SRC

import org.apache.commons.io.IOUtils;
import java.io.IOException;
try {
var content = new String(IOUtils.toByteArray ( this.getClass().
getResource("/index.html")));
} catch (IOException e) {
e.printStackTrace();
}
//Java 10 Code mentioned above - assuming index.html is available inside resources folder.

Writing a java program to remove the comments in same java program?

I am Writing a java program to remove the comments in the same java program.
I am thinking of using a file reader. But I'm not sure whether it will work.
Because two process will be using the same file.
But I think before executing the code, java file will make a .class file.
So if I use a filereader to edit the java file. It should not give me error that another process is already using this file.
Am I thinking correct?
Thanks in advance.

Yes, you can do that without any problems.
Note: Be careful with things like:
String notAComment = "// This is not a comment";

If you just want to remove comments from a Java program, why don't you do a simple search and replace using a regex, and convert all comments into an empty string?
Here's a verbose way of doing it, in Java:
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.BufferedReader;
class Cleaner{
public static void main( String a[] )
{
String source = readFile("source.java");
System.out.println(source.replaceAll("(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)",""));
}
static String readFile(String fileName) {
File file = new File(fileName);
char[] buffer = null;
try {
BufferedReader bufferedReader = new BufferedReader( new FileReader(file));
buffer = new char[(int)file.length()];
int i = 0;
int c = bufferedReader.read();
while (c != -1) {
buffer[i++] = (char)c;
c = bufferedReader.read();
}
} catch (IOException e) {
e.printStackTrace();
}
return new String(buffer);
}
}

You are right, the are not two processes using the same file, your program will use the .class files and process the .java files. You may want to take a closer look at this page:
Finding Comments in Source Code Using Regular Expressions

Yes, using a FileReader will work. One thing to watch out is the FileEncoding if you might have non-English characters or work across different platforms. In Eclipse and other IDEs you can change the character set for a Java source file to different encodings. If unsure, it might be worth using:
InputStream in = ....
BufferedReader r = new BufferedReader(new InputStreamReader(in, "UTF-8"));
..
and likewise when you are writing the output back out, use an OutputStreamWriter with UTF-8.

Have a look at the post Remove comments from String for doing your stuff. You may use either FileReader or java.util.Scanner class to read the file.

Its late but it may help some to remove all types of comments.
package com.example;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
class CommentRemover {
public static void main(String a[]) {
File file = new File("F:/Java Examples/Sample.java");
String fileString = readLineByLine(file);
fileString = fileString.replaceAll(
"(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)", "");
System.out.println(fileString);
}
private static String readLineByLine(File file) {
String textFile = "";
FileInputStream fstream;
try {
fstream = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(
fstream));
String strLine;
while ((strLine = br.readLine()) != null) {
textFile = textFile + replaceComments(strLine) + "\n";
}
br.close();
} catch (Exception e) {
e.printStackTrace();
}
return textFile;
}
private static String replaceComments(String strLine) {
if (strLine.startsWith("//")) {
return "";
} else if (strLine.contains("//")) {
if (strLine.contains("\"")) {
int lastIndex = strLine.lastIndexOf("\"");
int lastIndexComment = strLine.lastIndexOf("//");
if (lastIndexComment > lastIndex) { // ( "" // )
strLine = strLine.substring(0, lastIndexComment);
}
} else {
int index = strLine.lastIndexOf("//");
strLine = strLine.substring(0, index);
}
}
return strLine;
}
}

I made a open source library (CommentRemover on GitHub) for this necessity , you can remove single line and multiple line Java Comments.
It supports remove or NOT remove TODO's.
Also it supports JavaScript , HTML , CSS , Properties , JSP and XML Comments too.
Little code snippet how to use it (There is 2 type usage):
First way InternalPath
public static void main(String[] args) throws CommentRemoverException {
// root dir is: /Users/user/Projects/MyProject
// example for startInternalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc.. goes like that
.removeTodos(false) // Do Not Touch Todos (leave them alone)
.removeSingleLines(true) // Remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startInternalPath("src.main.app") // Starts from {rootDir}/src/main/app , leave it empty string when you want to start from root dir
.setExcludePackages(new String[]{"src.main.java.app.pattern"}) // Refers to {rootDir}/src/main/java/app/pattern and skips this directory
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}
Second way ExternalPath
public static void main(String[] args) throws CommentRemoverException {
// example for externalInternalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc..
.removeTodos(true) // Remove todos
.removeSingleLines(false) // Do not remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startExternalPath("/Users/user/Projects/MyOtherProject")// Give it full path for external directories
.setExcludePackages(new String[]{"src.main.java.model"}) // Refers to /Users/user/Projects/MyOtherProject/src/main/java/model and skips this directory.
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}

public class Copy {
void RemoveComments(String inputFilePath, String outputFilePath) throws FileNotFoundException, IOException {
File in = new File(inputFilePath);
File out = new File(outputFilePath);
BufferedReader bufferedreader = new BufferedReader(new FileReader(in));
PrintWriter pw = new PrintWriter(new FileWriter(out));
String line = null, lineToRemove = null;
while ((line = bufferedreader.readLine()) != null) {
if (line.startsWith("/*") && line.endsWith("*/")) {
lineToRemove = line;
}
if (!line.trim().equals(lineToRemove)) {
pw.println(line);
pw.flush();
}
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read a text file directly from Internet using Java? - java

try something like this URL u = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt"); InputStream in = u.openStream(); Then use it as any plain old input stream

For an old school input stream, use this code: InputStream in = new URL("http://google.com/").openConnection().getInputStream();

Alternatively, you can use Guava's Resources object: URL url = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt"); List<String> lines = Resources.readLines(url, Charsets.UTF_8); lines.forEach(System.out::println);

corrected method is deprecated now. It is giving the option private WeakReference<MyActivity> activityReference; here solution will useful.

Related

Google App Engine - read static files in Java

Cannot find file on netbeans

JAVA: How to check if website document contains a word?

Reading entire html file to String?

Writing a java program to remove the comments in same java program?

Categories

Resources