Parsing Google news in Java

Parsing Google news in Java - java

What is the best way to do that?
I want to parse the news and, then, filter them using something like keyword and find the match.
Someone has already done? And, it is lawful?

You can use rss feeds of google news url http://news.google.com/?output=rss it will return google rss news in the rss tag with html tags. Then either write custom code to read/parse the xml or using any existing RSS reading library like https://github.com/vgrec/SimpleRssReader

I have written a function to accomplish this which will return link and title of the random news each time.
public Document getNews() {
Document news = new Document();
URL rssUrl = null;
try {
rssUrl = new URL("https://news.google.com/rss");
} catch (MalformedURLException e) {
e.printStackTrace();
}
DocumentBuilder builder = null;
try {
builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
org.w3c.dom.Document doc = null;
try {
doc = builder.parse(rssUrl.openStream());
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
NodeList items = doc.getElementsByTagName("item");
Element item = (Element) items.item(new Random().nextInt(items.getLength()));
news.append("title", getValue(item, "title"));
news.append("link", getValue(item, "link"));
return news;
}
private String getValue(Element parent, String nodeName) {
return parent.getElementsByTagName(nodeName).item(0).getFirstChild().toString();
}

Related

java managed server bean crashes

Why would I not be able to call this multiple times?
private Document getStationery(String txtStationery,Database mailDB){
try {
View mailView = mailDB.getView("(Stationery)");
DocumentCollection dc = mailView.getAllDocumentsByKey("Memo Stationery");
Document tmpdoc;
Document doc = dc.getFirstDocument();
while (doc != null) {
if(doc.getItemValueString("MailStationeryName").equals(txtStationery))
{
return doc;
}
tmpdoc = dc.getNextDocument();
doc.recycle();
doc = tmpdoc;
}
} catch (NotesException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
Crashes on second use of it below .... something to do with not recycling?
public void send() throws NotesException, IOException, Exception{
Session session = getCurrentSession();
Database userDB = getUserDatabase();
Database mailbox = session.getDatabase("", "mail1.box");
Document stationeryDoc1 = getStationery("Test1",userDB);
Document stationeryDoc2 = getStationery("Test2",userDB);

You could try without recycling at all (generally not a good idea, but here it may be helpful to rule out other problems), or recycle the objects in the getStationary() method properly, beginning with the Document, the DocumentCollection, and finally the View. At the moment, the only object you recycle is the previous Document object in the while loop.

Querying mongoDB using the mlab data API

im trying to extract a single document through a manually coded variable.
So here I am trying to find a document with the fields name=siteName.
String baseUrl = String.format("https://api.mlab.com/api/1/databases/%s/collections/%s?q=",DB_NAME,COLLECTION_SITE_NAME );
StringBuilder stringBuilder = new StringBuilder(baseUrl);
try {
String first = URLEncoder.encode("{","UTF-8");
String second = URLEncoder.encode("}","UTF-8");
String point = URLEncoder.encode(":","UTF-8");
String URL = first+"\"name\""+point+siteName+second;
stringBuilder.append(URL);
stringBuilder.append("&apiKey="+API_KEY);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return stringBuilder.toString();
}

After making changes in XML file, old data is loading in Java application

I am making some changes to an embedded XML file in my Java application. I have some fields, a LOAD button and a SAVE button. After clicking the save button I can see the XML file updating, but after clicking the load button the old values are being loaded to the fields.
Here is my code:
public class MyLoad_SaveSampleProject {
public String field1 = "";
public String field2 = "";
public void loadSampleProject() {
InputStream file = MyLoad_SaveSampleProject.class.getResourceAsStream("/main/resources/otherClasses/projects/SampleProject.xml");
try {
DocumentBuilderFactory DocBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder DocBuilder = DocBuilderFactory.newDocumentBuilder();
Document Doc = DocBuilder.parse(file);
NodeList list = Doc.getElementsByTagName("*"); //create a list with the elements of the xml file
for (int i=0; i<list.getLength(); i++) {
Element element = (Element)list.item(i);
if (element.getNodeName().equals("field1")) {
field1 = element.getChildNodes().item(0).getNodeValue().toString();
} else if (element.getNodeName().equals("field2")) {
field2 = element.getChildNodes().item(0).getNodeValue().toString();
}
}
} catch (Exception e) {
System.out.println(e);
}
}
public void saveSampleProject(String field1Str, String field2Str) {
InputStream file = MyLoad_SaveSampleProject.class.getResourceAsStream("/main/resources/otherClasses/projects/SampleProject.xml");
try {
DocumentBuilderFactory DocBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder DocBuilder = DocBuilderFactory.newDocumentBuilder();
Document Doc = DocBuilder.parse(file);
NodeList list = Doc.getElementsByTagName("*"); //create a list with the elements of the xml file
for (int i=0; i<list.getLength(); i++) {
Node thisAttribute = list.item(i);
if (thisAttribute.getNodeName().equals("field1")) {
thisAttribute.setTextContent(field1Str);
} else if (thisAttribute.getNodeName().equals("field2")) {
thisAttribute.setTextContent(field2Str);
}
}
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(Doc);
StreamResult result = new StreamResult(new File("src/main/resources/otherClasses/projects/SampleProject.xml"));
transformer.transform(source, result);
} catch (ParserConfigurationException pce) {
pce.printStackTrace();
} catch (TransformerException tfe) {
tfe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} catch (SAXException sae) {
sae.printStackTrace();
}
}
public String returnField1() {
return field1;
}
public String returnField2() {
return field2;
}
}
And this is my default XML file:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><Strings>
<field1>string1</field1>
<field2>string2</field2>
</Strings>
When the save button is pressed I am using the saveSampleProject method. When the load button is pressed I am using the loadSampleProject method and then I am getting the field values with the returnField1 and returnField2 methods.
I have no idea of what could be wrong with what I'm doing. I would appreciate any suggestions.

Most probably that calling method getResourceAsStream() leads to resource caching. Since you are using File() in save method try to get InputStream on data load using File object, and not as resource.

nullpointerexception while trying to read from xml file with dom parser

I am trying to read from xml file but I get a null pointer exception.
this is the xml file:
<war>
<missileLaunchers>
<launcher id="L101" isHidden="false">
<missile id="M1" destination="Sderot" launchTime="2" flyTime="12" damage="1500"/>
<missile id="M2" destination="Beer-Sheva" launchTime="5" flyTime="7" damage="2000"/>
</launcher>
<launcher id="L102" isHidden="true">
<missile id="M3" destination="Ofakim" launchTime="4" flyTime="3" damage="5000"/>
<missile id="M4" destination="Beer-Sheva" launchTime="9" flyTime="7" damage="1000"/>
</launcher>
</missileLaunchers>
<missileDestructors >
<destructor id="D201">
<destructdMissile id="M1" destructAfterLaunch="4"/>
<destructdMissile id="M3" destructAfterLaunch="7" />
<destructdMissile id="M4" destructAfterLaunch="2"/>
</destructor>
<destructor id="D202">
<destructdMissile id="M2" destructAfterLaunch="3"/>
</destructor>
</missileDestructors>
<missileLauncherDestructors >
<destructor type="plane" >
<destructedLanucher id="L101" destructTime="4"/>
</destructor>
<destructor type="ship">
<destructedLanucher id="L102" destructTime="8" />
<destructedLanucher id="L102" destructTime="12"/>
</destructor>
</missileLauncherDestructors>
</war>
and this is the code:
public class XmlReader
{
File fXmlFile=null;
DocumentBuilderFactory dbFactory=null;
DocumentBuilder dBuilder=null;
Document doc=null;
public XmlReader(String filePath) throws ClassNotFoundException
{
if(filePath!=null)
{
this.fXmlFile = new File(filePath);
dbFactory = DocumentBuilderFactory.newInstance();
try {
dBuilder = dbFactory.newDocumentBuilder();
} catch (ParserConfigurationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
try {
doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
} catch (SAXException | IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
else System.out.println("Xml file not found");
}
//gets value by tag name
private static String getTagValue(String tag, Element element) {
if(element.hasChildNodes())
{
NodeList nodeList = element.getElementsByTagName(tag).item(0).getChildNodes();
Node node = (Node) nodeList.item(0);
if(node==null)
return null;
return node.getNodeValue();
}
else return element.getNodeValue();
}
//launcher
public List<Launcher> readLauncher() throws Exception
{
List<Launcher> launcherList = new ArrayList<Launcher>();
try
{
NodeList nList = doc.getElementsByTagName("launcher");
for(int i=0;i<nList.getLength();i++)
{launcherList.add(getLauncher(nList.item(i)));}
}
catch (Exception e)
{
e.printStackTrace();
}
return launcherList;
}
//builds the object
private static Launcher getLauncher(Node node)
{
//XMLReaderDOM domReader = new XMLReaderDOM();
Launcher launcher = new Launcher();
if (node.getNodeType() == Node.ELEMENT_NODE)
{
Element element = (Element) node;
// launcher.setIsHidden(Boolean.parseBoolean(getTagValue("isHidden", element)));
// launcher.setId(getTagValue("id", element));
System.out.println("id = "+getTagValue("id", element));
System.out.println("ishidden = "+getTagValue("isHidden", element));
}
return launcher;
}
}
And this is the stack trace:
java.lang.NullPointerException
at XmlReader.getTagValue(XmlReader.java:56)
at XmlReader.getLauncher(XmlReader.java:96)
at XmlReader.readLauncher(XmlReader.java:78)
at Program.main(Program.java:27)
I can not change the format of the xml file.
It seems to fail when it tries to get the actual value of the node's fields or so I assume.
Though I don;t understand the reason...when I check the size of the node list it turns fine it does give me 2.

The problem is below line:
System.out.println("id = " + getTagValue("id", element));
where getTagValue("id", element) is calling
NodeList nodeList = element.getElementsByTagName(tag).item(0).getChildNodes();
Here element.getElementsByTagName("id") will return null
It should be get from attribute
// gets value by tag name
private static String getTagValue(String tag, Element element) {
return element.getAttributeNode(tag).getValue();
}

You are calling getElementsByTagName() in getTagValues, however you are trying to retrieve attributes of the tag. You may need to call getAttribute() instead. For Example:
element.getAttribute(attributeName)
where attributeName is "id" or "isHidden". This will return the value as a String and can be returned directly with no further processing.

SAXException unable to get document encoding

I'm trying to make an application that displays news feed from a website so I get the input stream and parse it in document using SAX but it returns SAX exception that it is unable to determine type of coding of this Stream . I tried before that to put The website's stream manually in XML file and read the file and It worked but when streaming directly from Internet it throws that exception and this is my code :
public final class MyScreen extends MainScreen {
protected static RichTextField RTF = new RichTextField("Plz Wait . . . ",
Field.FIELD_BOTTOM);
public MyScreen() {
// Set the displayed title of the screen
super(Manager.NO_VERTICAL_SCROLL);
setTitle("Yalla Kora");
Runnable R = new Runnable();
R.start();
add(RTF);
}
private class Runnable extends Thread {
public Runnable() {
// TODO Auto-generated constructor stub
ConnectionFactory factory = new ConnectionFactory();
ConnectionDescriptor descriptor = factory
.getConnection("http://www.yallakora.com/arabic/rss.aspx?id=0");
HttpConnection httpConnection;
httpConnection = (HttpConnection) descriptor.getConnection();// Connector.open("http://www.yallakora.com/pictures/main//2011/11/El-Masry-807-11-2011-21-56-7.jpg");
Manager mainManager = getMainManager();
RichList RL = new RichList(mainManager, true, 2, 1);
InputStream input;
try {
input = httpConnection.openInputStream();
Document document;
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder;
try {
docBuilder = docBuilderFactory.newDocumentBuilder();
docBuilder.isValidating();
try {
document = docBuilder.parse(input);
document.getDocumentElement().normalize();
NodeList item = document.getElementsByTagName("item");
int k = item.getLength();
for (int i = 0; i < k; i++) {
Node value = item.item(i);
NodeList Data = value.getChildNodes();
Node title = Data.item(0);
Node link = Data.item(1);
Node date = Data.item(2);
Node discription = Data.item(5);
Node Discription = discription.getFirstChild();
String s = Discription.getNodeValue();
int mm = s.indexOf("'><BR>");
int max = s.length();
String imagelink = s.substring(0, mm);
String Khabar = s.substring(mm + 6, max);
String Date = date.getFirstChild().getNodeValue();
String Title = title.getFirstChild().getNodeValue();
String Link = link.getFirstChild().getNodeValue();
ConnectionFactory factory1 = new ConnectionFactory();
ConnectionDescriptor descriptor1 = factory1
.getConnection(imagelink);
HttpConnection httpConnection1;
httpConnection1 = (HttpConnection) descriptor1
.getConnection();
InputStream input1;
input1 = httpConnection1.openInputStream();
byte[] bytes = IOUtilities.streamToBytes(input1);
Bitmap bitmap = Bitmap.createBitmapFromBytes(bytes,
0, -1, 1);
;
RL.add(new Object[] { bitmap, Title, Khabar, Date });
add(new RichTextField(link.getNodeValue(),
Field.NON_FOCUSABLE));
}
RTF.setText("");
} catch (SAXException e) {
// TODO Auto-generated catch block
RTF.setText("SAXException " + e.toString());
e.printStackTrace();
}
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
RTF.setText("ParserConfigurationException " + e.toString());
e.printStackTrace();
}
} catch (IOException e) {
RTF.setText("IOException " + e.toString());
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}}
Any Ideas ??

I recommend restructuring this code into at least two parts.
I would create a download function that is given a URL and downloads the bytes associated with that URL. This should open and close the connection, and just return either the bytes downloaded or an error indication.
I would use this download processing as a 'function call' to download your XML bytes. Then parse the bytes that are obtained feeding these direct into your parser. If the data is properly constructed XML, it will have a header indicating the encoding used, so you do not need to worry about that, the parser will cope.
Once you have this parsed, then use the download function again to download the bytes associated with any images you want.
Regarding the SAX processing, have you reviewed this question:
parse-xml-inputstream-in-blackberry-java-application

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing Google news in Java - java

What is the best way to do that? I want to parse the news and, then, filter them using something like keyword and find the match. Someone has already done? And, it is lawful?

You can use rss feeds of google news url http://news.google.com/?output=rss it will return google rss news in the rss tag with html tags. Then either write custom code to read/parse the xml or using any existing RSS reading library like https://github.com/vgrec/SimpleRssReader

Related

java managed server bean crashes

Querying mongoDB using the mlab data API

After making changes in XML file, old data is loading in Java application

nullpointerexception while trying to read from xml file with dom parser

SAXException unable to get document encoding

Categories

Resources