I have a list of names in the form of a CSV and I am up for google searching those names using java. But the problem that i am facing is that when i initially run the code i am able to search the query but in the middle of the code the code starts to throw 503 exceptions and when i again run the code it starts throwing 503 exceptions from the very beginning.Here is the code that i am using.
public class ExtractInformation
{
static String firstname,middlename,lastname;
public static final int PAGE_NUMBERS = 10;
public static void readCSV()
{
boolean first = true;
try
{
String splitBy = ",";
BufferedReader br = new BufferedReader(new FileReader("E:\\KOLDump\\names.csv"));
String line = null;
String site = null;
while((line=br.readLine())!=null)
{
if(first)
{
first = false;
continue;
}
String[] b = line.split(splitBy);
firstname = b[0];
middlename = b[1];
lastname = b[2];
String name = null;
if(middlename == null || middlename.length() == 0)
{
name = firstname+" "+lastname+" OR "+lastname+" "+firstname.charAt(0);
}
else
{
name = firstname+" "+lastname+" OR "+lastname+" "+firstname.charAt(0)+" OR "+firstname+" "+middlename.charAt(0)+". "+lastname;
}
BufferedReader brs = new BufferedReader(new FileReader("E:\\KOLDump\\site.csv"));
while((site = brs.readLine()) != null)
{
if(first)
{
first = false;
continue;
}
String [] s = site.split(splitBy);
String siteName = s[0];
siteName = (siteName.replace("www.", ""));
siteName = (siteName.replace("http://", ""));
getDataFromGoogle(name.trim(), siteName.trim());
}
brs.close();
}
//br.close();
}
catch(Exception e)
{
System.out.println("unable to read file...some problem in the csv");
}
}
public static void main(String[] args)
{
readCSV();
}
private static void getDataFromGoogle(String query,String siteName)
{
Set<String> result = new HashSet<String>();
String request = "http://www.google.co.in/search?q="+query+" "+siteName;
try
{
Document doc = Jsoup.connect(request).userAgent("Chrome").timeout(10000).get();
Element query_results = doc.getElementById("ires");
Elements gees = query_results.getElementsByClass("g");
for(Element gee : gees)
{
Element h3 = gee.getElementsByTag("h3").get(0);
String annotation = h3.getElementsByTag("a").get(0).attr("href");
if(annotation.split("q=",2)[1].contains(siteName))
{
System.out.println(annotation.split("q=",2)[1]);
}
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
any suggestions on how to remove this exceptions from the code would really be helpful.
If you wait a little do the 503's go away? If so, then you're probably being rate-limited by Google. https://support.google.com/gsa/answer/2686272?hl=en
You may need to put some kind of delay between requests.
Related
Just wondering what I have done wrong here I'm getting an error in the method setLine() which is:
error: incompatible types: String[] cannot be converted to State[]
Im not too sure on what to do to fix it since I need the line to be split and stored in that state array so I can determine whether if it is a state or location when reading from a csv file.
public static void readFile(String inFilename)
{
FileInputStream fileStrm = null;
InputStreamReader rdr;
BufferedReader bufRdr;
int stateCount = 0, locationCount = 0;
String line;
try
{
fileStrm = new FileInputStream(inFilename);
rdr = new InputStreamReader(fileStrm);
bufRdr = new BufferedReader(rdr);
line = bufRdr.readLine();
while (line != null)
{
if (line.startsWith("STATE"))
{
stateCount++;
}
else if (line.startsWith("LOCATION"))
{
locationCount++;
}
line = bufRdr.readLine();
}
fileStrm.close();
State[] state = new State[stateCount];
Location[] location = new Location[locationCount];
}
catch (IOException e)
{
if (fileStrm != null)
{
try { fileStrm.close(); } catch (IOException ex2) { }
}
System.out.println("Error in file processing: " + e.getMessage());
}
}
public static void processLine(String csvRow)
{
String thisToken = null;
StringTokenizer strTok;
strTok = new StringTokenizer(csvRow, ":");
while (strTok.hasMoreTokens())
{
thisToken = strTok.nextToken();
System.out.print(thisToken + " ");
}
System.out.println("");
}
public static void setLine(State[] state, Location[] location, int stateCount, int locationCount, String line)
{
int i;
state = new State[stateCount];
state = line.split("="); <--- ERROR
for( i = 0; i < stateCount; i++)
{
}
}
public static void writeOneRow(String inFilename)
{
FileOutputStream fileStrm = null;
PrintWriter pw;
try
{
fileStrm = new FileOutputStream(inFilename);
pw = new PrintWriter(fileStrm);
pw.println();
pw.close();
}
catch (IOException e)
{
if (fileStrm != null)
{
try
{
fileStrm.close();
}
catch (IOException ex2)
{}
}
System.out.println("Error in writing to file: " + e.getMessage());
}
}
This error occurs, as it just says 'String[] cannot be converted to State[]'. That is like you wanted to store an Integer into a String, it's the same, because the types don't have a relation to each other (parent -> child).
So if you want to solve your problem you need a method which converts the String[] into a State[]. Something like this:
private State[] toStateArray(String[] strings){
final State[] states = new State[strings.length];
for(int i = strings.length-1; i >= 0; i--){
states[i] = new State(strings[i]); // here you have to decide how to convert String to State
}
return states;
}
I use LanguageTool for some spellchecking and spell correction functionality in my application.
The LanguageTool documentation describes how to exclude words from spell checking (with call the addIgnoreTokens(...) method of the spell checking rule you're using).
How do you add some words (e.g., from a specific dictionary) to spell checking? That is, can LanguageTool fix words with misspellings and suggest words from my specific dictionary?
Unfortunately, the API doesn't support this I think. Without the API, you can add words to spelling.txt to get them accepted and used as suggestions. With the API, you might need to extend MorfologikSpellerRule and change this place of the code. (Disclosure: I'm the maintainer of LanguageTool)
I have similar requirement, which is load some custom words into dictionary as "suggest words", not just "ignored words". And finally I extend MorfologikSpellerRule to do this:
Create class MorfologikSpellerRuleEx extends from MorfologikSpellerRule, override the method "match()", and write my own "initSpeller()" for creating spellers.
And then for the language tool, create this custom speller rule to replace existing one.
Code:
Language lang = new AmericanEnglish();
JLanguageTool langTool = new JLanguageTool(lang);
langTool.disableRule("MORFOLOGIK_RULE_EN_US");
try {
MorfologikSpellerRuleEx spellingRule = new MorfologikSpellerRuleEx(JLanguageTool.getMessageBundle(), lang);
spellingRule.setSpellingFilePath(spellingFilePath);
//spellingFilePath is the file has my own words + words from /hunspell/spelling_en-US.txt
langTool.addRule(spellingRule);
} catch (IOException e) {
e.printStackTrace();
}
The code of my custom MorfologikSpellerRuleEx:
public class MorfologikSpellerRuleEx extends MorfologikSpellerRule {
private String spellingFilePath = null;
private boolean ignoreTaggedWords = false;
public MorfologikSpellerRuleEx(ResourceBundle messages, Language language) throws IOException {
super(messages, language);
}
#Override
public String getFileName() {
return "/en/hunspell/en_US.dict";
}
#Override
public String getId() {
return "MORFOLOGIK_SPELLING_RULE_EX";
}
#Override
public void setIgnoreTaggedWords() {
ignoreTaggedWords = true;
}
public String getSpellingFilePath() {
return spellingFilePath;
}
public void setSpellingFilePath(String spellingFilePath) {
this.spellingFilePath = spellingFilePath;
}
private void initSpellerEx(String binaryDict) throws IOException {
String plainTextDict = null;
if (JLanguageTool.getDataBroker().resourceExists(getSpellingFileName())) {
plainTextDict = getSpellingFileName();
}
if (plainTextDict != null) {
BufferedReader br = null;
if (this.spellingFilePath != null) {
try {
br = new BufferedReader(new FileReader(this.spellingFilePath));
}
catch (Exception e) {
br = null;
}
}
if (br != null) {
speller1 = new MorfologikMultiSpeller(binaryDict, br, plainTextDict, 1);
speller2 = new MorfologikMultiSpeller(binaryDict, br, plainTextDict, 2);
speller3 = new MorfologikMultiSpeller(binaryDict, br, plainTextDict, 3);
br.close();
}
else {
speller1 = new MorfologikMultiSpeller(binaryDict, plainTextDict, 1);
speller2 = new MorfologikMultiSpeller(binaryDict, plainTextDict, 2);
speller3 = new MorfologikMultiSpeller(binaryDict, plainTextDict, 3);
}
setConvertsCase(speller1.convertsCase());
} else {
throw new RuntimeException("Could not find ignore spell file in path: " + getSpellingFileName());
}
}
private boolean canBeIgnored(AnalyzedTokenReadings[] tokens, int idx, AnalyzedTokenReadings token)
throws IOException {
return token.isSentenceStart() || token.isImmunized() || token.isIgnoredBySpeller() || isUrl(token.getToken())
|| isEMail(token.getToken()) || (ignoreTaggedWords && token.isTagged()) || ignoreToken(tokens, idx);
}
#Override
public RuleMatch[] match(AnalyzedSentence sentence) throws IOException {
List<RuleMatch> ruleMatches = new ArrayList<>();
AnalyzedTokenReadings[] tokens = getSentenceWithImmunization(sentence).getTokensWithoutWhitespace();
// lazy init
if (speller1 == null) {
String binaryDict = null;
if (JLanguageTool.getDataBroker().resourceExists(getFileName())) {
binaryDict = getFileName();
}
if (binaryDict != null) {
initSpellerEx(binaryDict); //here's the change
} else {
// should not happen, as we only configure this rule (or rather its subclasses)
// when we have the resources:
return toRuleMatchArray(ruleMatches);
}
}
int idx = -1;
for (AnalyzedTokenReadings token : tokens) {
idx++;
if (canBeIgnored(tokens, idx, token)) {
continue;
}
// if we use token.getToken() we'll get ignored characters inside and speller
// will choke
String word = token.getAnalyzedToken(0).getToken();
if (tokenizingPattern() == null) {
ruleMatches.addAll(getRuleMatches(word, token.getStartPos(), sentence));
} else {
int index = 0;
Matcher m = tokenizingPattern().matcher(word);
while (m.find()) {
String match = word.subSequence(index, m.start()).toString();
ruleMatches.addAll(getRuleMatches(match, token.getStartPos() + index, sentence));
index = m.end();
}
if (index == 0) { // tokenizing char not found
ruleMatches.addAll(getRuleMatches(word, token.getStartPos(), sentence));
} else {
ruleMatches.addAll(getRuleMatches(word.subSequence(index, word.length()).toString(),
token.getStartPos() + index, sentence));
}
}
}
return toRuleMatchArray(ruleMatches);
}
}
I want to get user1.name variable from public static void FileRead()to private void jButton1ActionPerformed but It doesn't read user1.name. Can you explain me how can I use that variable.
public static class Users {
public String name;
public String password;
Users(String name1, String password1) {
name = name1;
password = password1;
}
}
public static void FileRead() {
try {
BufferedReader in = new BufferedReader(new FileReader("C:/Users/B_Ali/Documents/NetBeansProjects/JavaApplication20/UserNamePassword.txt"));
String[] s1 = new String[5];
String[] s2 = new String[5];
int i = 0;
while ((s1[i] = in.readLine()) != null) {
s1[i] = s2[i];
i = i + 1;
if (i == 1) {
Users user1 = new Users(s2[0], s2[1]);
}
else if (i == 3) {
Users user2 = new Users(s2[2], s2[3]);
}
else if (i == 5) {
Users user3 = new Users(s2[4], s2[5]);
}
}
in.close();
}
catch (FileNotFoundException ex) {
Logger.getLogger(LoginScreen.class.getName()).log(Level.SEVERE, null, ex);
}
catch (IOException ex) {
Logger.getLogger(LoginScreen.class.getName()).log(Level.SEVERE, null, ex);
}
}
private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {
JOptionPane.showMessageDialog(null, user1.name);
// TODO add your handling code here:
}
where should I declare it?
Edit: if i declare it as you said before. It becomes meaningless because i want to use user1.name which is defined in FileRead().
Declare it global.
Users user1;
public static class Users {
public String name;
public String password;
Users(String name1, String password1) {
name = name1;
password = password1;
}
public String getName()
{ return this.name;
}
public String getPassword()
{return this.password;
}
}
Users user1, user2;
public static void FileRead() {
try {
BufferedReader in = new BufferedReader(new FileReader("C:/Users/B_Ali/Documents/NetBeansProjects/JavaApplication20/UserNamePassword.txt"));
String[] s1 = new String[5];
String[] s2 = new String[5];
int i = 0;
while ((s1[i] = in.readLine()) != null) {
s1[i] = s2[i];
i = i + 1;
if (i == 1) {
Users user1 = new Users(s2[0], s2[1]);
}
else if (i == 3) {
Users user2 = new Users(s2[2], s2[3]);
}
else if (i == 5) {
Users user3 = new Users(s2[4], s2[5]);
}
}
in.close();
}
catch (FileNotFoundException ex) {
Logger.getLogger(LoginScreen.class.getName()).log(Level.SEVERE, null, ex);
}
catch (IOException ex) {
Logger.getLogger(LoginScreen.class.getName()).log(Level.SEVERE, null, ex);
}
}
private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {
JOptionPane.showMessageDialog(null, user1.getName());
// TODO add your handling code here:
}
You need to declare the variables you need outside of the try clause like:
String global = null;
try{
global = "abc";
throw new Exception();
}
catch(Exception e){
if (global != null){
//your code
}
}
Well what i am trying to achieve is to save pairs of words in a sentence and if the word is already there , i am trying to save a list of words against one.
To save the pairing as there could many millions as my data set file is very large , i opted for orientdb. I dont know if i am approaching it correctly but orientdb is very slow. After 8 hours of running it has only made pairs for 12000 sentences.
As far as i have checked the major slowdown was in browsing cluster.
Attached is my code, please if ant one can give any pointers over my approach.
public static void main(String[] args) {
// TODO Auto-generated method stub
Main m = new Main();
m.openDatabase();
m.readFile("train_v2.txt");
m.closeDatabase();
}
}
class Main {
ODatabaseDocumentTx db;
Map<String, Object> index;
List<Object> list = null;
String pairing[];
ODocument doc;
Main() {
}
public void closeDatabase() {
if (!db.isClosed()) {
db.close();
}
}
void openDatabase() {
db = new ODatabaseDocumentTx("local:/databases/model").open("admin",
"admin");
doc = new ODocument("final");
}
public void readFile(String filename) {
InputStream ins = null; // raw byte-stream
Reader r = null; // cooked reader
int i = 1;
BufferedReader br = null; // buffered for readLine()
try {
String s;
ins = new FileInputStream(filename);
r = new InputStreamReader(ins, "UTF-8"); // leave charset out
// for
// default
br = new BufferedReader(r);
while ((s = br.readLine()) != null) {
System.out.println("" + i);
createTermPair(s.replaceAll("[^\\w ]", "").trim());
i++;
}
} catch (Exception e) {
System.err.println(e.getMessage()); // handle exception
} finally {
closeDatabase();
if (br != null) {
try {
br.close();
} catch (Throwable t) { /* ensure close happens */
}
}
if (r != null) {
try {
r.close();
} catch (Throwable t) { /* ensure close happens */
}
}
if (ins != null) {
try {
ins.close();
} catch (Throwable t) { /* ensure close happens */
}
}
}
}
private void createTermPair(String phrase) {
phrase = phrase + " .";
String[] word = phrase.split(" ");
for (int i = 0; i < word.length - 1; i++) {
if (!word[i].trim().equalsIgnoreCase("")
&& !word[i + 1].trim().equalsIgnoreCase("")) {
String wordFirst = word[i].toLowerCase().trim();
String wordSecond = word[i + 1].toLowerCase().trim();
String pair = wordFirst + " " + wordSecond;
checkForPairAndWrite(pair);
}
}
}
private void checkForPairAndWrite(String pair) {
try {
pairing = pair.trim().split(" ");
if (!pairing[1].equalsIgnoreCase(" ")) {
index = new HashMap<String, Object>();
for (ODocument docr : db.browseCluster("final")) {
list = docr.field(pairing[0]);
}
if (list == null) {
list = new ArrayList<>();
}
list.add("" + pairing[1]);
if (list.size() >= 1)
index.put(pairing[0], list);
doc.fields(index);
doc.save();
}// for (int i = 0; i < list.size(); i++) {
// System.out.println("" + list.get(i));
// }
} catch (Exception e) {
}
return;
}
}
I wrote a simple java application, I have a problem please help me;
I have a file (JUST EXAMPLE):
1.TXT
-------
SET MRED:NAME=MRED:0,MREDID=60;
SET BCT:NAME=BCT:0,NEPE=DCS,T2=5,DK0=KOR;
CREATE LCD:NAME=LCD:0;
-------
and this is my source code
import java.io.IOException;
import java.io.*;
import java.util.StringTokenizer;
class test1 {
private final int FLUSH_LIMIT = 1024 * 1024;
private StringBuilder outputBuffer = new StringBuilder(
FLUSH_LIMIT + 1024);
public static void main(String[] args) throws IOException {
test1 p=new test1();
String fileName = "i:\\1\\1.txt";
File file = new File(fileName);
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
StringTokenizer st = new StringTokenizer(line, ";|,");
while (st.hasMoreTokens()) {
String token = st.nextToken();
p.processToken(token);
}
}
p.flushOutputBuffer();
}
private void processToken(String token) {
if (token.startsWith("MREDID=")) {
String value = getTokenValue(token,"=");
outputBuffer.append("MREDID:").append(value).append("\n");
} else if (token.startsWith("DK0=")) {
String value = getTokenValue(token,"=");
outputBuffer.append("DK0=:").append(value).append("\n");
} else if (token.startsWith("NEPE=")) {
String value = getTokenValue(token,"=");
outputBuffer.append("NEPE:").append(value).append("\n");
}
if (outputBuffer.length() > FLUSH_LIMIT) {
flushOutputBuffer();
}
}
private String getTokenValue(String token,String find) {
int start = token.indexOf(find) + 1;
int end = token.length();
String value = token.substring(start, end);
return value;
}
private void flushOutputBuffer() {
System.out.print(outputBuffer);
outputBuffer = new StringBuilder(FLUSH_LIMIT + 1024);
}
}
I want this output :
MREDID:60
DK0=:KOR
NEPE:DCS
But this application show me this :
MREDID:60
NEPE:DCS
DK0=:KOR
please tell me how can i handle this , because of that DK0 must be at first and this is just a sample ; my real application has 14000 lines
Thanks ...
Instead of outputting the value when you read it, put it in a hashmap. Once you've read your entire file, output in the order you want by getting the values from the hashmap.
Use a HashTable to store the values and print from it in the desired order after parsing all tokens.
//initialize hash table
HashTable ht = new HashTable();
//instead of outputBuffer.append, put the values in to the table like
ht.put("NEPE", value);
ht.put("DK0", value); //etc
//print the values after the while loop
System.out.println("MREDID:" + ht.get("MREDID"));
System.out.println("DK0:" + ht.get("DK0"));
System.out.println("NEPE:" + ht.get("NEPE"));
Create a class, something like
class data {
private int mredid;
private String nepe;
private String dk0;
public void setMredid(int mredid) {
this.mredid = mredid;
}
public void setNepe(String nepe) {
this.nepe = nepe;
}
public void setDk0(String dk0) {
this.dk0 = dk0;
}
public String toString() {
String ret = "MREDID:" + mredid + "\n";
ret = ret + "DK0=:" + dk0 + "\n";
ret = ret + "NEPE:" + nepe + "\n";
}
Then change processToken to
private void processToken(String token) {
Data data = new Data();
if (token.startsWith("MREDID=")) {
String value = getTokenValue(token,"=");
data.setMredid(Integer.parseInt(value));
} else if (token.startsWith("DK0=")) {
String value = getTokenValue(token,"=");
data.setDk0(value);
} else if (token.startsWith("NEPE=")) {
String value = getTokenValue(token,"=");
data.setNepe(value);
}
outputBuffer.append(data.toString());
if (outputBuffer.length() > FLUSH_LIMIT) {
flushOutputBuffer();
}
}