Jsoup Elements maxes at 20 entries - java

the size of my Jsoup Elements seems to be maxed at size 20, no matter what I do.
The purpose is to:
Pull the the specified page
Select the elements I am interested in (timestamps and texts atm)
Combine the selected elements in a new list and print.
But somehow only the first 20 entries will be included. (compare output on console with actual page)
Can someone give me any hint on where I am lacking the deeper understanding?
Thank you very much and best wishes,
kw
Console output:
22:59
Mein lieber Arbeitskollege hat mich nach Feierabend noch im Studio besucht...
23:02
Und???
23:04
Ich sag nur Personalküche! :D
23:05
Fühl dich gehighfived! ✋:D
10:30
Haha ich hab eben beim REWE einer Frau mit 2 kleinen Kindern im Wagen 5 Tüten Sticker geschenkt die ich an der Kasse bekommen hab. Die werden sich jetzt den ganzen Tag über das letzte Päckchen streiten. Ich bin so ein teuflisches Genie! 😃😈😈
09:04
Ihr Dorfis könnt ja doch ganz schön gut Party machen
09:55
...und wir wissen das Kühe nicht Lila sind!
00:13
Mein Bett ist viel zu groß um allein drin zu liegen..
00:15
Meins auch
00:16
Wir sind wie die Arschlöcher, die allein mit ihren dicken Autos rumfahren ohne Fahrgemeinschaften zu gründen.
00:20
Bettgemeinschaft?
Code:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class SmsGetter
{
public static void main(String[] args) throws IOException
{
String url = "http://www.smsvongesternnacht.de";
Document doc = Jsoup.connect(url).timeout(30000).get();
Elements timestamps = doc.select(".sms-tag");
Elements texts = doc.select(".sms-bubble");
Elements sms = new Elements(400);
for(int i=0; i<timestamps.size(); i+=2)
{
sms.add(i, timestamps.get(i/2));
sms.add(i+1, texts.get(i/2));
}
for (Element entries:sms)
{
System.out.format(" %s", entries.text());
System.out.println();
}
}
}
Edit: Inserted missing line.

I would have selected your elements by the sms-participant class. Having two selects from the document is causing you to write that weird for loop where you are assuming sms-tag and sms-bubble are the same size. That being said I looked at the page and there are only twenty sms-participant entries shown.

Related

Swing doesn't open the window, even when the code works fine on terminal

I wanted to create a program that generates a random number and I must find the mystery number. That works great. Next I wanted to make it in a window to learn GUI & Swing, not in the terminal, so i used some Swing code.
But here is the issue, the code still works fine in the terminal, but it doesn't open a window like I want.
First, IO tried to create a second class for the GUI part, but it didn't work, so I grouped everything in the main method. It works in terminal but doesn't open the window.
Any help or tips please ?
/**
* Auteur Andres "VongoSanDi" Boulanger
*/
import java.util.Scanner;
import javax.swing.*;
import java.awt.*;
import javax.swing.JPanel;
import javax.swing.BorderFactory;
import java.awt.GridBagConstraints;
import java.awt.GridBagLayout;
import java.awt.Insets;
import javax.swing.JTextField;
public class ChiffreMystere {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int numeroMin = 0, numeroMax = 0, entreeUtilisateur = 0;
char rejouer = 'O';
System.out.println("Bienvenue au jeu du chiffre mystère\n");
do {
System.out.println("Veuillez choisir un minimum et un maximum pour le jeu.");
System.out.print("Minimum : ");
numeroMin = sc.nextInt();
System.out.print("Maximum : ");
numeroMax = sc.nextInt();
int chiffreMystere = (int)(Math.random() * ((numeroMax - numeroMin) + 1)) + numeroMin;
System.out.print("\nSelon vous, quel est le chiffre mystère : ");
while(entreeUtilisateur != chiffreMystere) {
entreeUtilisateur = sc.nextInt();
if (entreeUtilisateur != chiffreMystere) {
System.out.println("\nCe n'est pas le bon chiffre.");
System.out.print("Quel est le chiffre mystère ? : ");
}
}
System.out.println("\nBravo, vous avez trouvé le chiffre mystère, qui était le : "+chiffreMystere);
System.out.print("Voulez-vous rejouer ? (O/N) : ");
rejouer = sc.next().charAt(0);
} while(rejouer == 'O');
JFrame frame = new JFrame("Chiffre mystère");
frame.setVisible(true);
frame.setSize(100,200);
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
sc.close();
}
}
Finally I would like to create a second class only for the GUI part and be able to call it from the main method.

Can not identify text in Spanish with Lingpipe

Some days ago, I am developing an java server to keep a bunch of data and identify its language, so I decided to use lingpipe for such task. But I have facing an issue, after training code and evaluating it with two languages(English and Spanish) by getting that I can't identify spanish text, but I got a successful result with english and french.
The tutorial that I have followed in order to complete this task is:
http://alias-i.com/lingpipe/demos/tutorial/langid/read-me.html
An the next steps I have made in order to complete the task:
Steps followed to train a Language Classifier
~1.First place and unpack the english and spanish metadata inside a folder named leipzig, as follow (Note: Metadata and Sentences are provided from http://wortschatz.uni-leipzig.de/en/download):
leipzig //Main folder
1M sentences //Folder with data of the last trial
eng_news_2015_1M
eng_news_2015_1M.tar.gz
spa-hn_web_2015_1M
spa-hn_web_2015_1M.tar.gz
ClassifyLang.java //Custom program to try the trained code
dist //Folder
eng_news_2015_300K.tar.gz //unpackaged english sentences
spa-hn_web_2015_300K.tar.gz //unpackaged spanish sentences
EvalLanguageId.java
langid-leipzig.classifier //trained code
lingpipe-4.1.2.jar
munged //Folder
eng //folder containing the sentences.txt for english
sentences.txt
spa //folder containing the sentences.txt for spanish
sentences.txt
Munge.java
TrainLanguageId.java
unpacked //Folder
eng_news_2015_300K //Folder with the english metadata
eng_news_2015_300K-co_n.txt
eng_news_2015_300K-co_s.txt
eng_news_2015_300K-import.sql
eng_news_2015_300K-inv_so.txt
eng_news_2015_300K-inv_w.txt
eng_news_2015_300K-sources.txt
eng_news_2015_300K-words.txt
sentences.txt
spa-hn_web_2015_300K //Folder with the spanish metadata
sentences.txt
spa-hn_web_2015_300K-co_n.txt
spa-hn_web_2015_300K-co_s.txt
spa-hn_web_2015_300K-import.sql
spa-hn_web_2015_300K-inv_so.txt
spa-hn_web_2015_300K-inv_w.txt
spa-hn_web_2015_300K-sources.txt
spa-hn_web_2015_300K-words.txt
~2.Second unpack the language metadata compressed into a unpack folder
unpacked //Folder
eng_news_2015_300K //Folder with the english metadata
eng_news_2015_300K-co_n.txt
eng_news_2015_300K-co_s.txt
eng_news_2015_300K-import.sql
eng_news_2015_300K-inv_so.txt
eng_news_2015_300K-inv_w.txt
eng_news_2015_300K-sources.txt
eng_news_2015_300K-words.txt
sentences.txt
spa-hn_web_2015_300K //Folder with the spanish metadata
sentences.txt
spa-hn_web_2015_300K-co_n.txt
spa-hn_web_2015_300K-co_s.txt
spa-hn_web_2015_300K-import.sql
spa-hn_web_2015_300K-inv_so.txt
spa-hn_web_2015_300K-inv_w.txt
spa-hn_web_2015_300K-sources.txt
spa-hn_web_2015_300K-words.txt
~3.Then Munge the sentences of each one in order to remove the line numbers, tabs and replacing line breaks with single space characters. The output is uniformly written using the UTF-8 unicode encoding (Note:the munge.java at Lingpipe site).
/-----------------Command line----------------------------------------------/
javac -cp lingpipe-4.1.2.jar: Munge.java
java -cp lingpipe-4.1.2.jar: Munge /home/samuel/leipzig/unpacked /home/samuel/leipzig/munged
----------------------------------------Results-----------------------------
spa
reading from=/home/samuel/leipzig/unpacked/spa-hn_web_2015_300K/sentences.txt charset=iso-8859-1
writing to=/home/samuel/leipzig/munged/spa/spa.txt charset=utf-8
total length=43267166
eng
reading from=/home/samuel/leipzig/unpacked/eng_news_2015_300K/sentences.txt charset=iso-8859-1
writing to=/home/samuel/leipzig/munged/eng/eng.txt charset=utf-8
total length=35847257
/---------------------------------------------------------------/
<---------------------------------Folder------------------------------------->
munged //Folder
eng //folder containing the sentences.txt for english
sentences.txt
spa //folder containing the sentences.txt for spanish
sentences.txt
<-------------------------------------------------------------------------->
~4.Next we start by training the language(Note:the TrainLanguageId.java at Lingpipe LanguageId tutorial).
/---------------Command line--------------------------------------------/
javac -cp lingpipe-4.1.2.jar: TrainLanguageId.java
java -cp lingpipe-4.1.2.jar: TrainLanguageId /home/samuel/leipzig/munged /home/samuel/leipzig/langid-leipzig.classifier 100000 5
-----------------------------------Results-----------------------------------
nGram=100000 numChars=5
Training category=eng
Training category=spa
Compiling model to file=/home/samuel/leipzig/langid-leipzig.classifier
/----------------------------------------------------------------------------/
~5. We evaluated our trained code with the next result, having some issues on the confusion matrix (Note:the EvalLanguageId.java at Lingpipe LanguageId tutorial).
/------------------------Command line---------------------------------/
javac -cp lingpipe-4.1.2.jar: EvalLanguageId.java
java -cp lingpipe-4.1.2.jar: EvalLanguageId /home/samuel/leipzig/munged /home/samuel/leipzig/langid-leipzig.classifier 100000 50 1000
-------------------------------Results-------------------------------------
Reading classifier from file=/home/samuel/leipzig/langid-leipzig.classifier
Evaluating category=eng
Evaluating category=spa
TEST RESULTS
BASE CLASSIFIER EVALUATION
Categories=[eng, spa]
Total Count=2000
Total Correct=1000
Total Accuracy=0.5
95% Confidence Interval=0.5 +/- 0.02191346617949794
Confusion Matrix
reference \ response
,eng,spa
eng,1000,0 <---------- not diagonal sampling
spa,1000,0
Macro-averaged Precision=NaN
Macro-averaged Recall=0.5
Macro-averaged F=NaN
Micro-averaged Results
the following symmetries are expected:
TP=TN, FN=FP
PosRef=PosResp=NegRef=NegResp
Acc=Prec=Rec=F
Total=4000
True Positive=1000
False Negative=1000
False Positive=1000
True Negative=1000
Positive Reference=2000
Positive Response=2000
Negative Reference=2000
Negative Response=2000
Accuracy=0.5
Recall=0.5
Precision=0.5
Rejection Recall=0.5
Rejection Precision=0.5
F(1)=0.5
Fowlkes-Mallows=2000.0
Jaccard Coefficient=0.3333333333333333
Yule's Q=0.0
Yule's Y=0.0
Reference Likelihood=0.5
Response Likelihood=0.5
Random Accuracy=0.5
Random Accuracy Unbiased=0.5
kappa=0.0
kappa Unbiased=0.0
kappa No Prevalence=0.0
chi Squared=0.0
phi Squared=0.0
Accuracy Deviation=0.007905694150420948
Random Accuracy=0.5
Random Accuracy Unbiased=0.625
kappa=0.0
kappa Unbiased=-0.3333333333333333
kappa No Prevalence =0.0
Reference Entropy=1.0
Response Entropy=NaN
Cross Entropy=Infinity
Joint Entropy=1.0
Conditional Entropy=0.0
Mutual Information=0.0
Kullback-Liebler Divergence=Infinity
chi Squared=NaN
chi-Squared Degrees of Freedom=1
phi Squared=NaN
Cramer's V=NaN
lambda A=0.0
lambda B=NaN
ONE VERSUS ALL EVALUATIONS BY CATEGORY
CATEGORY[0]=eng VERSUS ALL
First-Best Precision/Recall Evaluation
Total=2000
True Positive=1000
False Negative=0
False Positive=1000
True Negative=0
Positive Reference=1000
Positive Response=2000
Negative Reference=1000
Negative Response=0
Accuracy=0.5
Recall=1.0
Precision=0.5
Rejection Recall=0.0
Rejection Precision=NaN
F(1)=0.6666666666666666
Fowlkes-Mallows=1414.2135623730949
Jaccard Coefficient=0.5
Yule's Q=NaN
Yule's Y=NaN
Reference Likelihood=0.5
Response Likelihood=1.0
Random Accuracy=0.5
Random Accuracy Unbiased=0.625
kappa=0.0
kappa Unbiased=-0.3333333333333333
kappa No Prevalence=0.0
chi Squared=NaN
phi Squared=NaN
Accuracy Deviation=0.011180339887498949
CATEGORY[1]=spa VERSUS ALL
First-Best Precision/Recall Evaluation
Total=2000
True Positive=0
False Negative=1000
False Positive=0
True Negative=1000
Positive Reference=1000
Positive Response=0
Negative Reference=1000
Negative Response=2000
Accuracy=0.5
Recall=0.0
Precision=NaN
Rejection Recall=1.0
Rejection Precision=0.5
F(1)=NaN
Fowlkes-Mallows=NaN
Jaccard Coefficient=0.0
Yule's Q=NaN
Yule's Y=NaN
Reference Likelihood=0.5
Response Likelihood=0.0
Random Accuracy=0.5
Random Accuracy Unbiased=0.625
kappa=0.0
kappa Unbiased=-0.3333333333333333
kappa No Prevalence=0.0
chi Squared=NaN
phi Squared=NaN
Accuracy Deviation=0.011180339887498949
/-----------------------------------------------------------------------/
~6.Then we tried to make a real evaluation with spanish text:
/-------------------Command line----------------------------------/
javac -cp lingpipe-4.1.2.jar: ClassifyLang.java
java -cp lingpipe-4.1.2.jar: ClassifyLang
/-------------------------------------------------------------------------/
<---------------------------------Result------------------------------------>
Text: Yo soy una persona increíble y muy inteligente, me admiro a mi mismo lo que me hace sentir ansiedad de lo que viene, por que es algo grandioso lleno de cosas buenas y de ahora en adelante estaré enfocado y optimista aunque tengo que aclarar que no lo haré por querer algo, sino por que es mi pasión.
Best Language: eng <------------- Wrong Result
<----------------------------------------------------------------------->
Code for ClassifyLang.java:
import com.aliasi.classify.Classification;
import com.aliasi.classify.Classified;
import com.aliasi.classify.ConfusionMatrix;
import com.aliasi.classify.DynamicLMClassifier;
import com.aliasi.classify.JointClassification;
import com.aliasi.classify.JointClassifier;
import com.aliasi.classify.JointClassifierEvaluator;
import com.aliasi.classify.LMClassifier;
import com.aliasi.lm.NGramProcessLM;
import com.aliasi.util.AbstractExternalizable;
import java.io.File;
import java.io.IOException;
import com.aliasi.util.Files;
public class ClassifyLang {
public static String text = "Yo soy una persona increíble y muy inteligente, me admiro a mi mismo"
+ " estoy ansioso de lo que viene, por que es algo grandioso lleno de cosas buenas"
+ " y de ahora en adelante estaré enfocado y optimista"
+ " aunque tengo que aclarar que no lo haré por querer algo, sino por que no es difícil serlo. ";
private static File MODEL_DIR
= new File("/home/samuel/leipzig/langid-leipzig.classifier");
public static void main(String[] args)
throws ClassNotFoundException, IOException {
System.out.println("Text: " + text);
LMClassifier classifier = null;
try {
classifier = (LMClassifier) AbstractExternalizable.readObject(MODEL_DIR);
} catch (IOException | ClassNotFoundException ex) {
// Handle exceptions
System.out.println("Problem with the Model");
}
Classification classification = classifier.classify(text);
String bestCategory = classification.bestCategory();
System.out.println("Best Language: " + bestCategory);
}
}
~7.I tried with a 1 million metadata file, but it got the same result and also changing the ngram number by getting the same results.
I will be so thankfull for your help.
Well, after days working in Natural Language Processing I found a way to determine the language of one text using OpenNLP.
Here is the Sample Code:
https://github.com/samuelchapas/languagePredictionOpenNLP/tree/master/TrainingLanguageDecOpenNLP
and over here is the training Corpus for the model created to make language predictions.
I decided to use OpenNLP for the issue described in this question, really this library has a complete stack of functionalities.
Here is the sample for model training>
https://mega.nz/#F!HHYHGJ4Q!PY2qfbZr-e0w8tg3cUgAXg

Copying an Array won't work JAVA

I'am trying to copy or duplicate an Array without success
for (int i=0;i<x*y+y;i++)
{
tmpInt = br.read();
//Wenn i%x 0 ist dann brakeY eins hochzählen, um damit die Anzahl der Zeilen zu bekommen
if (i%x==0 && brakeY<y-1) brakeY++;
if (tmpX<=x-1) tmpX++;
else tmpX = 0;
// Beim ersten Ausführen dieser Teilfunktion wird das Array aus der Textdatei ausgelesen. Beim zweiten Mal jedoch gibt es nur den aktuellen Stand der Map wieder um Veränderungen zu sehen.
spielFeld[tmpX][brakeY] = (char) tmpInt;
System.out.print(spielFeld[tmpX][brakeY]);
//System.out.println("----------");
}
I'am trying to copy the Array, called spielFeld (german for playground), in this line spielFeldT = spielFeld.clone(); , (spielFeldT = spielFeld didn't work either) so that I can interact with it globally. The results are:
1xwvutsrqpo
2 ü n
3 !öä m
4 " l
5 K §$% k
789abcdefgh
which is exactly how it's should look like,
but if I'am tyring to print the copied array exactly the same way as I printed this one something like this appears.
1 ü �
3 !öä �n
4 " � l
5 K §$%� k
6 � fgh
789abcdefgh
789abcdefgh
You can use the System.arraycopy(...) method.
Here is the Syntax of the method,
public static void arraycopy(Object src, int srcPos, Object dest, int destPos, int length)
For further information, you may want to take a look at this question.

How can I reset a if-command? Or jump back to the point before the if-command was executed?

So yesterday I asked a question about some GUI-ing. I completly threw that over, since I found it a little to complicated for me to actually deal with it.
Now I am reworking the thing in the console.
And I got myself stuck again. My problem this time: How can I jump back to a point before a if-command was executed?
Direct example:
import java.util.Scanner;
import java.io.*;
public class HBA {
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
System.out.println("Herzlichen Glückwunsch Anna! Und viel Spaß mit deinem Geschenk!") ;
System.out.println("Neben diesem Programm befindet sich eine Passwort gesicherte Datei, die einen weiteren Teil deines Geschenks enthällt."
+ "Um an das Passwort zu gelangen wirst du jedoch ein paar ganz besonders schwierige Fragen beantworten müssen!"
+ "Wenn du bereit für das Quiz bist, gib in die Eingabe: 'ok' ein.");
String OK, Q1, Q2, Q3, Q4, Q5, Q6, Q7;
BufferedReader repo = null;
OK = scan.next();
if (OK == "ok") {
System.out.println("Alles gut, fangen wir mit etwas leichtem an!");
}
else {
System.out.println("Wie... bei so einer einfachen Sache versagst du schon? Versuchs nochmal!");
}
System.out.println("Frage 1: Wer ist Supergeil? \n A: Erik \n B: Anna \n C: The J \n D: Friedrich Liechtenstein");
mark(0);
Q1 = scan.next();
if (Q1 == "D") {
System.out.println("Richtig! Der erste Buchstabe lautet: S");
}
else {
System.out.println("Leider falsch. Versuch es nochmal.");
reset();
}
}
}
The scripted works as expected, besides: If you type something wrong in the last part:
System.out.println("Frage 1: Wer ist Supergeil? \n A: Erik \n B: Anna \n C: The J \n D: Friedrich Liechtenstein");
mark(0);
Q1 = scan.next();
if (Q1 == "D") {
System.out.println("Richtig! Der erste Buchstabe lautet: S");
}
else {
System.out.println("Leider falsch. Versuch es nochmal.");
reset();
}
}
}
It just ends the script. Instead it should jump back to the beginning of the if-command.
Means: The answer to the question in the System.out.printLn (it is a question) is D and you typ A instead, it should reset the whole thing that you can try it again and answer something different. How can I achieve that? I read that BufferedReader have a mark() and reset() function, but I don't know if they work the way I expect them to or how I would have to integrate them.
I also thought about using a while or a do command. But I haven't found a way for that yet.
Can someone pls enlighten me?
Thanks!

Java says a nonempty file is empty?

I have a particular file that Java says is empty...
Source Code
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class MinimumWorkingExample
{
public static void main(String[] args) throws FileNotFoundException
{
String filename = "/home/tyson/Data/English-French_test/test/test.f";
Scanner fileIn = new Scanner(new File(filename));
System.out.println("***START***");
while(fileIn.hasNextLine())
{
System.out.println(fileIn.nextLine());
}
System.out.println("***FINISH***");
}
}
Output
***START***
***FINISH***
...but the file is not empty:
Console
tyson#tyson-desktop:~$ head /home/tyson/Data/English-French_test/test/test.f
<s snum=0001> 2 . </s>
<s snum=0002> 2 . </s>
<s snum=0003> oh , oh ! </s>
<s snum=0004> oh , oh ! </s>
<s snum=0005> oh , oh ! </s>
<s snum=0006> souvenons - nous , monsieur le Orateur , que ce sont ces secteurs de notre soci�t� qui servent de �pine dorsale � notre �conomie . </s>
<s snum=0007> bravo ! </s>
<s snum=0008> bravo ! </s>
<s snum=0009> monsieur le Orateur , ma question se adresse � le ministre charg� de les transports . </s>
<s snum=0010> tous deux poss�dent de nombreuses ann�es de exp�rience dans la fabrication et la distribution de les produits forestiers . </s>
tyson#tyson-desktop:~$
Question
Why is this happening???
Also do Scanner fileIn = new Scanner(new File(filename), "Cp1252"); as this is the encoding for French, and your system seems to be UTF-8.
The Scanner might have encoding problems if it thinks to read UTF-8 multibytes.
You may be missing the Scanner's default delimiter so it sees your whole file as one line without end, thus hasNextLine() is false. Make sure that the character you get from
Scanner.delimiter()
Is present in your file. If they don't match, you can use
Scanner.useDelimiter("\\s or your regex/string here")
to set it to the correct one.
According to Java Docs, the Line Separators are any of the ones below. Does your file contain any?
private static final String LINE_SEPARATOR_PATTERN = "\r\n|[\n\r\u2028\u2029\u0085]"

Categories

Resources