Java - Unable to obtain HTML plaintext from webside

Java - Unable to obtain HTML plaintext from webside - java

I have a strange problem. I have in the past used a program I wrote myself to check if a new chapter has come out on a story at fanfiction.net and that program works fine even now (though its GUI leaves a lot to wish for).
However, when I am trying to make a new version I can't seem to load the webpage even though I'm using the exact same code (Copy Pasted). This is the code below. When sending in a URL like https://www.fanfiction.net/s/11012678/36 to the nextExists method it should return 'true'. My old program does, but this one doesn't even though it's the same code.
The only thing I can think of that might have any effect would be that I am using a new version of Eclipse which might cause it to mistake the Encoding, but I have tried checking all the common encoding types and nothing provides the HTML plaintext.
Does anyone have any idea what might be causing this? It's not a disaster if I can't get this right but I would like to know for the future in case I run into the same problem again.
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
public class Util {
private static final String BEFORE = "<button class=btn TYPE=BUTTON onClick=\"self.location='", AFTER = "'\">Next ></button>", SITE = "fanfiction.net";
public static String readSite(String path) throws Exception{
URL url = new URL(path);
BufferedReader in = null;
String line;
try{
StringBuilder builder = new StringBuilder();
in = new BufferedReader(new InputStreamReader(url.openStream()));
line = in.readLine();
if(line == null){
return null;
}
builder.append(line);
while((line = in.readLine()) != null){
builder.append('\n' + line);
}
return builder.toString();
} finally{
if(in != null){
in.close();
}
}
}
public static String updatePathToEnd(String path) throws Exception{
outer: while(nextExists(path)){
String data = readSite(path);
if(path.contains(SITE)){
String link = path.substring(0, path.indexOf(SITE) + SITE.length()) + data.substring(data.indexOf(BEFORE) + BEFORE.length(), data.indexOf(AFTER));
if(readSite(link) != null) {
path = link;
continue outer;
}
}
}
return path;
}
public static boolean nextExists(String path) throws Exception{
String text = readSite(path);
if(path.contains(SITE)){
return text==null ? false : text.contains(AFTER);
}
return false;
}
}

I tried in bluej and works perfect, it seems that the problem is in Eciplse
Regards

Related

G suite account get report java sample question

I am trying to use this api to get report with java, and here is the link
https://developers.google.com/admin-sdk/reports/v1/appendix/activity/meet
and here is what i am using now
public static String getGraph() {
String PROTECTED_RESOURCE_URL = "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/meet?eventName=call_ended&maxResults=10&access_token=";
String graph = "";
try {
URL urUserInfo = new URL(PROTECTED_RESOURCE_URL + "access_token");
HttpURLConnection connObtainUserInfo = (HttpURLConnection) urUserInfo.openConnection();
if (connObtainUserInfo.getResponseCode() == HttpURLConnection.HTTP_OK) {
StringBuilder sbLines = new StringBuilder("");
BufferedReader reader = new BufferedReader(
new InputStreamReader(connObtainUserInfo.getInputStream(), "utf-8"));
String strLine = "";
while ((strLine = reader.readLine()) != null) {
sbLines.append(strLine);
}
graph = sbLines.toString();
}
} catch (IOException ex) {
x.printStackTrace();
}
return graph;
}
I am pretty sure it's not a smart way to do that and the string I get is quite complex, are there any jave sample that i can get the data directly instead of using java origin httpRequest
Or, are there and class I can import to switch the json string to the object!?
Anyone can help?!
I have trying this for many days already!
Thanks!!

JAVA: How to check if website document contains a word?

I currently have the follow method:
try {
URL url = new URL("http://auth.h.gp/HAKUNA%20MATATA.txt");
Scanner s = new Scanner(url.openStream());
}
catch(IOException ex) {
BotScript.log("Something went wrong =/ Error code:");
ex.printStackTrace();
stop();
}
However, how do I check if it contains a word? I've never worked with Scanners before and I found this snippet online.
Thank you.

Okay, that looks good so far.
You can then use Scanner's next() method to get each word. You can also query hasNext() to see if there's another token available to avoid errors.
boolean foundPumbaa = false;
while (s.hasNext()) {
if (s.next().equalsIgnoreCase("pumbaa")) {
foundPumbaa = true;
System.out.println("We found Pumbaa"); // do something
break;
}
}
if (!foundPumbaa) {
System.out.println("We didn't find Pumbaa");
}
EDIT in response to comment:
Yes, you can turn the text into a String. The best way to do this is probably with a BufferedReader.
From the Java Tutorial, "Reading Directly from a URL":
The following small Java program uses openStream() to get an input
stream on the URL http://www.oracle.com/. It then opens a
BufferedReader on the input stream and reads from the BufferedReader
thereby reading from the URL. Everything read is copied to the
standard output stream:
import java.net.*;
import java.io.*;
public class URLReader {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://www.oracle.com/");
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
In a real program, instead of main throws Exception, you'd have that in a try-catch block and catch an IOException and some various URLExceptions. But this should get you started.

Getting information from a html file

I'm writing a program where I get information from a page and put it in excel file.
The problem is, I don't find a way to search for the tag with the specific info.
Here is my code(so far):
private void getAll() throws IOException {
for (int i = 0;i<250;i++){
URL vurl = new URL("http://www.bamart.be/nl/artists/detail/" + i);
BufferedReader reader = new BufferedReader(new InputStreamReader(vurl.openStream()));
String line;
while ((line = reader.readLine()) != null){
if (line.equalsIgnoreCase("<div class=\"subcontent\">"){
System.out.println("Found info!");
}
printInfo(line,i);
}
}
}
private void printInfo(String info,int i){
System.out.println("/***********************************************/");
System.out.println("************\t" + info + "**********************/");
System.out.println("/************" +" Artist page:" + i + " of 999 **********************/" );
}
The println doesn't come up, but it is in the html file.

if (line.equalsIgnoreCase("<div class=\"subcontent\">"){ }
This if statement is checking for exact equality (ignoring case) however there could be other content on that line including whitespace for example.
What you might want instead would be something like
if (line.toLowerCase().contains("<div class=\"subcontent\">") { }

Try using Jsoup starting with this example

Writing a java program to remove the comments in same java program?

I am Writing a java program to remove the comments in the same java program.
I am thinking of using a file reader. But I'm not sure whether it will work.
Because two process will be using the same file.
But I think before executing the code, java file will make a .class file.
So if I use a filereader to edit the java file. It should not give me error that another process is already using this file.
Am I thinking correct?
Thanks in advance.

Yes, you can do that without any problems.
Note: Be careful with things like:
String notAComment = "// This is not a comment";

If you just want to remove comments from a Java program, why don't you do a simple search and replace using a regex, and convert all comments into an empty string?
Here's a verbose way of doing it, in Java:
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.BufferedReader;
class Cleaner{
public static void main( String a[] )
{
String source = readFile("source.java");
System.out.println(source.replaceAll("(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)",""));
}
static String readFile(String fileName) {
File file = new File(fileName);
char[] buffer = null;
try {
BufferedReader bufferedReader = new BufferedReader( new FileReader(file));
buffer = new char[(int)file.length()];
int i = 0;
int c = bufferedReader.read();
while (c != -1) {
buffer[i++] = (char)c;
c = bufferedReader.read();
}
} catch (IOException e) {
e.printStackTrace();
}
return new String(buffer);
}
}

You are right, the are not two processes using the same file, your program will use the .class files and process the .java files. You may want to take a closer look at this page:
Finding Comments in Source Code Using Regular Expressions

Yes, using a FileReader will work. One thing to watch out is the FileEncoding if you might have non-English characters or work across different platforms. In Eclipse and other IDEs you can change the character set for a Java source file to different encodings. If unsure, it might be worth using:
InputStream in = ....
BufferedReader r = new BufferedReader(new InputStreamReader(in, "UTF-8"));
..
and likewise when you are writing the output back out, use an OutputStreamWriter with UTF-8.

Have a look at the post Remove comments from String for doing your stuff. You may use either FileReader or java.util.Scanner class to read the file.

Its late but it may help some to remove all types of comments.
package com.example;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
class CommentRemover {
public static void main(String a[]) {
File file = new File("F:/Java Examples/Sample.java");
String fileString = readLineByLine(file);
fileString = fileString.replaceAll(
"(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)", "");
System.out.println(fileString);
}
private static String readLineByLine(File file) {
String textFile = "";
FileInputStream fstream;
try {
fstream = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(
fstream));
String strLine;
while ((strLine = br.readLine()) != null) {
textFile = textFile + replaceComments(strLine) + "\n";
}
br.close();
} catch (Exception e) {
e.printStackTrace();
}
return textFile;
}
private static String replaceComments(String strLine) {
if (strLine.startsWith("//")) {
return "";
} else if (strLine.contains("//")) {
if (strLine.contains("\"")) {
int lastIndex = strLine.lastIndexOf("\"");
int lastIndexComment = strLine.lastIndexOf("//");
if (lastIndexComment > lastIndex) { // ( "" // )
strLine = strLine.substring(0, lastIndexComment);
}
} else {
int index = strLine.lastIndexOf("//");
strLine = strLine.substring(0, index);
}
}
return strLine;
}
}

I made a open source library (CommentRemover on GitHub) for this necessity , you can remove single line and multiple line Java Comments.
It supports remove or NOT remove TODO's.
Also it supports JavaScript , HTML , CSS , Properties , JSP and XML Comments too.
Little code snippet how to use it (There is 2 type usage):
First way InternalPath
public static void main(String[] args) throws CommentRemoverException {
// root dir is: /Users/user/Projects/MyProject
// example for startInternalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc.. goes like that
.removeTodos(false) // Do Not Touch Todos (leave them alone)
.removeSingleLines(true) // Remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startInternalPath("src.main.app") // Starts from {rootDir}/src/main/app , leave it empty string when you want to start from root dir
.setExcludePackages(new String[]{"src.main.java.app.pattern"}) // Refers to {rootDir}/src/main/java/app/pattern and skips this directory
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}
Second way ExternalPath
public static void main(String[] args) throws CommentRemoverException {
// example for externalInternalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc..
.removeTodos(true) // Remove todos
.removeSingleLines(false) // Do not remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startExternalPath("/Users/user/Projects/MyOtherProject")// Give it full path for external directories
.setExcludePackages(new String[]{"src.main.java.model"}) // Refers to /Users/user/Projects/MyOtherProject/src/main/java/model and skips this directory.
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}

public class Copy {
void RemoveComments(String inputFilePath, String outputFilePath) throws FileNotFoundException, IOException {
File in = new File(inputFilePath);
File out = new File(outputFilePath);
BufferedReader bufferedreader = new BufferedReader(new FileReader(in));
PrintWriter pw = new PrintWriter(new FileWriter(out));
String line = null, lineToRemove = null;
while ((line = bufferedreader.readLine()) != null) {
if (line.startsWith("/*") && line.endsWith("*/")) {
lineToRemove = line;
}
if (!line.trim().equals(lineToRemove)) {
pw.println(line);
pw.flush();
}
}
}
}

How to read a text file directly from Internet using Java?

I am trying to read some words from an online text file.
I tried doing something like this
File file = new File("http://www.puzzlers.org/pub/wordlists/pocket.txt");
Scanner scan = new Scanner(file);
but it didn't work, I am getting
http://www.puzzlers.org/pub/wordlists/pocket.txt
as the output and I just want to get all the words.
I know they taught me this back in the day but I don't remember exactly how to do it now, any help is greatly appreciated.

Use an URL instead of File for any access that is not on your local computer.
URL url = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
Scanner s = new Scanner(url.openStream());
Actually, URL is even more generally useful, also for local access (use a file: URL), jar files, and about everything that one can retrieve somehow.
The way above interprets the file in your platforms default encoding. If you want to use the encoding indicated by the server instead, you have to use a URLConnection and parse it's content type, like indicated in the answers to this question.
About your Error, make sure your file compiles without any errors - you need to handle the exceptions. Click the red messages given by your IDE, it should show you a recommendation how to fix it. Do not start a program which does not compile (even if the IDE allows this).
Here with some sample exception-handling:
try {
URL url = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
Scanner s = new Scanner(url.openStream());
// read from your scanner
}
catch(IOException ex) {
// there was some connection problem, or the file did not exist on the server,
// or your URL was not in the right format.
// think about what to do now, and put it here.
ex.printStackTrace(); // for now, simply output it.
}

try something like this
URL u = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
InputStream in = u.openStream();
Then use it as any plain old input stream

What really worked to me: (source: oracle documentation "reading url")
import java.net.*;
import java.io.*;
public class UrlTextfile {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://yoursite.com/yourfile.txt");
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}

Using Apache Commons IO:
import org.apache.commons.io.IOUtils;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.nio.charset.StandardCharsets;
public static String readURLToString(String url) throws IOException
{
try (InputStream inputStream = new URL(url).openStream())
{
return IOUtils.toString(inputStream, StandardCharsets.UTF_8);
}
}

Use this code to read an Internet resource into a String:
public static String readToString(String targetURL) throws IOException
{
URL url = new URL(targetURL);
BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(url.openStream()));
StringBuilder stringBuilder = new StringBuilder();
String inputLine;
while ((inputLine = bufferedReader.readLine()) != null)
{
stringBuilder.append(inputLine);
stringBuilder.append(System.lineSeparator());
}
bufferedReader.close();
return stringBuilder.toString().trim();
}
This is based on here.

For an old school input stream, use this code:
InputStream in = new URL("http://google.com/").openConnection().getInputStream();

I did that in the following way for an image, you should be able to do it for text using similar steps.
// folder & name of image on PC
File fileObj = new File("C:\\Displayable\\imgcopy.jpg");
Boolean testB = fileObj.createNewFile();
System.out.println("Test this file eeeeeeeeeeeeeeeeeeee "+testB);
// image on server
URL url = new URL("http://localhost:8181/POPTEST2/imgone.jpg");
InputStream webIS = url.openStream();
FileOutputStream fo = new FileOutputStream(fileObj);
int c = 0;
do {
c = webIS.read();
System.out.println("==============> " + c);
if (c !=-1) {
fo.write((byte) c);
}
} while(c != -1);
webIS.close();
fo.close();

Alternatively, you can use Guava's Resources object:
URL url = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
List<String> lines = Resources.readLines(url, Charsets.UTF_8);
lines.forEach(System.out::println);

corrected method is deprecated now. It is giving the option
private WeakReference<MyActivity> activityReference;
here solution will useful.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Unable to obtain HTML plaintext from webside - java

I tried in bluej and works perfect, it seems that the problem is in Eciplse Regards

Related

G suite account get report java sample question

JAVA: How to check if website document contains a word?

Getting information from a html file

Writing a java program to remove the comments in same java program?

How to read a text file directly from Internet using Java?

Categories

Resources