Regular expressions in Java, can't search all HTML

Regular expressions in Java, can't search all HTML - java

I'm working with Java regular expressions on Android platform.
I'm trying to search this HTML for defined a regular expression.
Here's my code:
public void mainaaForWWW(String websiteSource){
try {
websiteSource = readDataFromWWW(websiteSource);
} catch (IOException e1) {
e1.printStackTrace();
}
ArrayList<String> cinemaArray = new ArrayList<String>();
Pattern sample = Pattern.compile("<div class=\"theatre\">");
Matcher secuence = sample.matcher(websiteSource);
try {
while (secuence.find()) {
cinemaArray.add(secuence.group());
}
} catch (Exception e) {
e.printStackTrace();
}
titleTableForWWW = new String[cinemaArray.size()];
for(int i = 0; i < titleTableForWWW.length; i++)
titleTableForWWW[i] = cinemaArray.get(i);
}
The problem is quite strange, because when I debug the code, String websiteSource is okay (all HTML files are completely loaded), but there's only 4 while loops. In the HTML document I found manually 11 matches. This regex is simplified only to find what's going on. Any ideas?
Ok, my bad. I found a solution:
So, here's my code responsible for writing HTML source code to String:
public String readDataFromWWW(String UrlAdress) throws IOException
{
String line = null;
URL url = new URL(UrlAdress);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream(), "ISO-8859-2"));
while (rd.readLine() != null) {
line += rd.readLine();
}
System.out.println(line);
return line;
I think that reading to string that way, may something messed up, so I replaced this method by this one:
public String readDataFromWWW(String UrlAdress) throws IOException
{
String wyraz = "";
try {
String webPage = UrlAdress;
URL url = new URL(webPage);
URLConnection urlConnection = url.openConnection();
InputStream is = urlConnection.getInputStream();
InputStreamReader isr = new InputStreamReader(is, "ISO-8859-2");
int numCharsRead;
char[] charArray = new char[1024];
StringBuffer sb = new StringBuffer();
while ((numCharsRead = isr.read(charArray)) > 0) {
sb.append(charArray, 0, numCharsRead);
}
wyraz = sb.toString();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return wyraz;
}
And everything works FINE! Thanks a lot for clues and help. I think the problem was connected with newline durring writing String, but I'm not quite sure.

Related

G suite account get report java sample question

I am trying to use this api to get report with java, and here is the link
https://developers.google.com/admin-sdk/reports/v1/appendix/activity/meet
and here is what i am using now
public static String getGraph() {
String PROTECTED_RESOURCE_URL = "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/meet?eventName=call_ended&maxResults=10&access_token=";
String graph = "";
try {
URL urUserInfo = new URL(PROTECTED_RESOURCE_URL + "access_token");
HttpURLConnection connObtainUserInfo = (HttpURLConnection) urUserInfo.openConnection();
if (connObtainUserInfo.getResponseCode() == HttpURLConnection.HTTP_OK) {
StringBuilder sbLines = new StringBuilder("");
BufferedReader reader = new BufferedReader(
new InputStreamReader(connObtainUserInfo.getInputStream(), "utf-8"));
String strLine = "";
while ((strLine = reader.readLine()) != null) {
sbLines.append(strLine);
}
graph = sbLines.toString();
}
} catch (IOException ex) {
x.printStackTrace();
}
return graph;
}
I am pretty sure it's not a smart way to do that and the string I get is quite complex, are there any jave sample that i can get the data directly instead of using java origin httpRequest
Or, are there and class I can import to switch the json string to the object!?
Anyone can help?!
I have trying this for many days already!
Thanks!!

Parsing HTML page: difference in page content between Java code and browser

URL: https://www.bing.com/search?q=vevo+USIV30300367
If I View source of the above URL (in Internet Explorer 11 for that matter), the sub-string pertaining to the first search result is:
"[h2][a href="https://www.vevo.com/watch/rush/tom-sawyer-(live-exit-stage-left-version)/USIV30300367" h="ID=SERP,5075.1"]Tom [strong]Sawyer (Live Exit Stage Left Version[/strong]) - Rush - [strong]Vevo[/strong][/a][/h2]"
Whereas via Java code, I get this:
"[h2][a href="https://www.vevo.com/watch/rush/tom-sawyer-(live-exit-stage-left-version)/USIV30300367" h="ID=SERP,5077.1"][span dir="ltr"]Tom [strong]Sawyer (Live Exit Stage Left Version[/strong]) - …[/span][/a][/h2]"
The formatting is a bit different (check the [span] tags), but even worse, the video title has been truncated in the search result string (i.e. "Rush - Vevo" became "...").
Why is that? How to fix it?
(NOTE: I am using "[" and "]" in this post as replacements for the original HTML tagging delimiters to avoid my strings being formatted here on SO.)
Below is my Java code:
private String getWebPage(String pageURL, UserAgentBrowser uab)
{
URL url = null;
InputStream is = null;
BufferedReader br = null;
URLConnection conn = null;
StringBuilder pagedata = new StringBuilder();
String contenttype = null, charset = "utf-8";
String line = null;
try {
url = new URL(pageURL);
conn = url.openConnection();
conn.addRequestProperty("User-Agent", uab.toString());
contenttype = conn.getContentType();
int indexL = contenttype.indexOf("charset=") + 8;
if (indexL > 7) {
int indexR = contenttype.indexOf(";", indexL);
charset = (indexR == -1 ? contenttype.substring(indexL): contenttype.substring(indexL, indexR));
}
is = conn.getInputStream(); // Could throw an IOException
br = new BufferedReader(new InputStreamReader(is, charset));
while (true) {
line = br.readLine();
if (line == null) break;
pagedata.append(line);
}
} catch (MalformedURLException mue) {
// mue.printStackTrace();
} catch (IOException ioe) {
// ioe.printStackTrace();
} finally {
try {
if (is != null) is.close();
} catch (IOException ioe) {
// Nothing to see here
}
}
return (pagedata.length() == 0 ? null : pagedata.toString());
}
And
String pagedata = getWebPage("https://www.bing.com/search?q=vevo+USIV30300367", UserAgentBrowser.INTERNET_EXPLORER);
Where UserAgentBrowser.INTERNET_EXPLORER.toString() equals:
"Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"

How to read another remote url android if the first url is not possible

I have this code and it works well to read a remote file, but I wonder how it would be possible and it would be like to read a second url url if the first fails.
That is, I read the first file url, if available, ok continued.
if you can not read the first url, then accesses the second url.
As you can add a second url "backup"
Thanks.
// Code
try {
// Create a URL for the desired page
URL url = new URL("http://myurl.com/archive.txt");
// Read all the text returned by the server
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
network1 = in.readLine();
network2 = in.readLine();
network3 = in.readLine();
network4 = in.readLine();
in.close();
} catch (MalformedURLException e) {
} catch (IOException e) {
}
// Code
}

Use something like this
String[] readUrl(String urlStr) throws Exception {
URL url = new URL(urlStr);
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String result = new String[4];
for(i=0; i< 4; i++) {
result[i] = in.readLine();
}
return result;
}
String[] tryMultipleUrls(String url1, String url2) {
String result[] = null;
try {
result = readUrl(url1);
}
catch(Exception ex) {
result = readUrl(url2);
}
return result;
}

Extract some contents from the url using regular expressions in java

I want to extract contents from this url http://www.xyz.com/default.aspx and this is the below content that I want to extract using regular expression.
String expr = "
What Regular Expression should I use here
";
Pattern patt = Pattern.compile(expr, Pattern.DOTALL | Pattern.UNIX_LINES);
URL url4 = null;
try {
url4 = new URL("http://www.xyz.com/default.aspx");
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Text" +url4);
Matcher m = null;
try {
m = patt.matcher(getURLContent(url4));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Match" +m);
while (m.find()) {
String stateURL = m.group(1);
System.out.println("Some Data" +stateURL);
}
public static CharSequence getURLContent(URL url8) throws IOException {
URLConnection conn = url8.openConnection();
String encoding = conn.getContentEncoding();
if (encoding == null) {
encoding = "ISO-8859-1";
}
BufferedReader br = new BufferedReader(new
InputStreamReader(conn.getInputStream(), encoding));
StringBuilder sb = new StringBuilder(16384);
try {
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
System.out.println(line);
sb.append('\n');
}
} finally {
br.close();
}
return sb;
}

As #bkent314 has mentioned, jsoup is a better and cleaner approach than using regular expression.
If you inspect the source code of that website, you basically want content from this snippet:-
<div class="smallHd_contentTd">
<div class="breadcrumb">...</div>
<h2>Services</h2>
<p>...</p>
<p>...</p>
<p>...</p>
</div>
By using jsoup, your code may look something like this:-
Document doc = Jsoup.connect("http://www.ferotech.com/Services/default.aspx").get();
Element content = doc.select("div.smallHd_contentTd").first();
String header = content.select("h2").first().text();
System.out.println(header);
for (Element pTag : content.select("p")) {
System.out.println(pTag.text());
}
Hope this helps.

Return the text of a file as a string? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to create a Java String from the contents of a file
Is it possible to process a multi-lined text file and return its contents as a string?
If this is possible, please show me how.
If you need more information, I'm playing around with I/O. I want to open a text file, process its contents, return that as a String and set the contents of a textarea to that string.
Kind of like a text editor.

Use apache-commons FileUtils's readFileToString

Check the java tutorial here -
http://download.oracle.com/javase/tutorial/essential/io/file.html
Path file = ...;
InputStream in = null;
StringBuffer cBuf = new StringBuffer();
try {
in = file.newInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line = null;
while ((line = reader.readLine()) != null) {
System.out.println(line);
cBuf.append("\n");
cBuf.append(line);
}
} catch (IOException x) {
System.err.println(x);
} finally {
if (in != null) in.close();
}
// cBuf.toString() will contain the entire file contents
return cBuf.toString();

Something along the lines of
String result = "";
try {
fis = new FileInputStream(file);
bis = new BufferedInputStream(fis);
dis = new DataInputStream(bis);
while (dis.available() != 0) {
// Here's where you get the lines from your file
result += dis.readLine() + "\n";
}
fis.close();
bis.close();
dis.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return result;

String data = "";
try {
BufferedReader in = new BufferedReader(new FileReader(new File("some_file.txt")));
StringBuilder string = new StringBuilder();
for (String line = ""; line = in.readLine(); line != null)
string.append(line).append("\n");
in.close();
data = line.toString();
}
catch (IOException ioe) {
System.err.println("Oops: " + ioe.getMessage());
}
Just remember to import java.io.* first.
This will replace all newlines in the file with \n, because I don't think there is any way to get the separator used in the file.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular expressions in Java, can't search all HTML - java

Related

G suite account get report java sample question

Parsing HTML page: difference in page content between Java code and browser

How to read another remote url android if the first url is not possible

Extract some contents from the url using regular expressions in java

Return the text of a file as a string? [duplicate]

Categories

Resources