how to generate RSS for news sites programmatically in java/j2ee? - java

how to generate RSS for news sites programmatically? I dont know how to start..

I learned how to write RSS from this article:
http://www.petefreitag.com/item/465.cfm
You can also just go to an RSS feed you like and press "View Source". Then you should simply use your java application to reproduce an XML similar to the XML you see (Only with your data).
When you finish, use one of many RSS Validators to validate your RSS.
It's easier than it first looks...

This code shows how to query a database to generate arbitrary XML from a JSP, manually.
It's not RSS, but the idea might be helpful to you.
private String ExecQueryGetXml(java.sql.PreparedStatement stmt, String rowEltName) {
String result= "<none/>";
String item;
java.sql.ResultSet resultSet;
java.sql.ResultSetMetaData metaData ;
StringBuffer buf = new StringBuffer();
int i;
try {
resultSet = stmt.executeQuery();
metaData= resultSet.getMetaData();
int numberOfColumns = metaData.getColumnCount();
String[] columnNames = new String[numberOfColumns];
for( i = 0; i < numberOfColumns; i++)
columnNames[i] = metaData.getColumnLabel(i+1);
try {
// if ((root!=null) && (!root.equals("")))
// buf.append('<').append(root).append('>').append('\n');
// each row is an element, each field a sub-element
while ( resultSet.next() ) {
// open the row elt
buf.append(' ').append('<').append(rowEltName).append(">\n");
for( i= 0; i < numberOfColumns; i++) {
item = resultSet.getString(i+1);
if(item==null) continue;
buf.append(" <").append(columnNames[i]).append('>');
// check for CDATA required here?
buf.append(item);
buf.append("</").append(columnNames[i]).append(">\n");
}
buf.append("\n </").append(rowEltName).append(">\n");
}
// conditionally close the row elt
// if ((root!=null) && (!root.equals("")))
// buf.append("</").append(root).append(">\n");
result= buf.toString();
}
catch(Exception e1) {
System.err.print("\n\n----Exception (2): failed converting ResultSet to xml.\n");
System.err.print(e1);
result= "<error><message>Exception (2): " + e1.toString() + ". Failed converting ResultSet to xml.</message></error>\n";
}
}
catch(Exception e2) {
System.err.print("\n\n----Exception (3).\n");
System.err.print("\n\n----query failed, or getMetaData failed.\n");
System.err.print("\n\n---- Exc as string: \n" + e2);
System.err.print("\n\n---- Exc via helper: \n" +
dinoch.demo.ExceptionHelper.getStackTraceAsString(e2));
result= "<error><message>Exception (3): " + e2 + ". query failed, or getMetaData() failed.</message></error>\n";
}
return result;
}

How about using a framework like Rome or jrss

Related

getting first level of categorisation from Notes view

I have a categorized Notes view, let say the first categorized column is TypeOfVehicle the second categorized column is Model and the third categorized column is Manufacturer.
I would like to collect only the values for the first category and return it as json object:
I am facing two problems:
- I can not read the value for the category, the column values are emptry and when I try to access the underlying document it is null
the script won't hop over to the category/sibling on the same level.
can someone explain me what am I doing wrong here?
private Object getFirstCategory() {
JsonJavaObject json = new JsonJavaObject();
try{
String server = null;
String filepath = null;
server = props.getProperty("server");
filepath = props.getProperty("filename");
Database db;
db = utils.getSession().getDatabase(server, filepath);
if (db.isOpen()) {
View vw = db.getView("transport");
if (null != vw) {
vw.setAutoUpdate(false);
ViewNavigator nav;
nav = vw.createViewNav();
JsonJavaArray arr = new JsonJavaArray();
Integer count = 0;
ViewEntry tmpentry;
ViewEntry entry = nav.getFirst();
while (null != entry) {
Vector<?> columnValues = entry.getColumnValues();
if(entry.isCategory()){
System.out.println("entry notesid = " + entry.getNoteID());
Document doc = entry.getDocument();
if(null != doc){
if (doc.hasItem("TypeOfVehicle ")){
System.out.println("category has not " + "TypeOfVehicle ");
}
else{
System.out.println("category IS " + doc.getItemValueString("TypeOfVehicle "));
}
} else{
System.out.println("doc is null");
}
JsonJavaObject row = new JsonJavaObject();
JsonJavaObject jo = new JsonJavaObject();
String TypeOfVehicle = String.valueOf(columnValues.get(0));
if (null != TypeOfVehicle ) {
if (!TypeOfVehicle .equals("")){
jo.put("TypeOfVehicle ", TypeOfVehicle );
} else{
jo.put("TypeOfVehicle ", "Not categorized");
}
} else {
jo.put("TypeOfVehicle ", "Not categorized");
}
row.put("request", jo);
arr.put(count, row);
count++;
tmpentry = nav.getNextSibling(entry);
entry.recycle();
entry = tmpentry;
} else{
//tmpentry = nav.getNextCategory();
//entry.recycle();
//entry = tmpentry;
}
}
json.put("data", arr);
vw.setAutoUpdate(true);
vw.recycle();
}
}
} catch (Exception e) {
OpenLogUtil.logErrorEx(e, JSFUtil.getXSPContext().getUrl().toString(), Level.SEVERE, null);
}
return json;
}
What you're doing wrong is trying to treat any single view entry as both a category and a document. A single view entry can only be one of a category, a document, or a total.
If you have an entry for which isCategory() returns true, then for the same entry:
isDocument() will return false.
getDocument() will return null.
getNoteID() will return an empty string.
If the only thing you need is top-level categories, then get the first entry from the navigator and iterate over entries using nav.getNextSibling(entry) as you're already doing, but:
Don't try to get documents, note ids, or fields.
Use entry.getColumnValues().get(0) to get the value of the first column for each category.
If the view contains any uncategorised documents, it's possible that entry.getColumnValues().get(0) might throw an exception, so you should also check that entry.getColumnValues().size() is at least 1 before trying to get a value.
If you need any extra data beyond just top-level categories, then note that subcategories and documents are children of their parent categories.
If an entry has a subcategory, nav.getChild(entry) will get the first subcategory of that entry.
If an entry has no subcategories, but is a category which contains documents, nav.getChild(entry) will get the first document in that category.

Nested loop creates duplicate entries when ran, cannot find issue?

When the code is ran the nested loop causes it to create occasional duplicate entries to the system, i have spent a while looking through this but still cant find what is causing this, would greatly appreciate any help?
for(int i = 0; i < subWorkItemElement.getChildNodes().getLength(); i++) {
Boolean test = false;
WorkItemCommon existingChild = null;
String summary = null;
if(subWorkItemElement.getChildNodes().item(i).getNodeName().equals("workitem")) {
// We know it's a work item - but is it in the existing list?
Element childWorkItem = (Element) subWorkItemElement.getChildNodes().item(i);
for(int j = 0; j < subWorkItemElement.getChildNodes().getLength(); j++) {
if(childWorkItem.getChildNodes().item(j) instanceof Element) {
if(((Element)childWorkItem.getChildNodes().item(j)).getNodeName().equals("details")) {
summary = ((Element) childWorkItem.getChildNodes().item(j)).getElementsByTagName("summary")
.item(0).getTextContent();
for(String k : userInfoHashMap.keySet()) {
summary = summary.replace("${" + k + "}", userInfoHashMap.get(k));
}
if(childHashTable.containsKey(summary)) {
test = true;
existingChild = childHashTable.get(summary);
IWorkItem workItem = existingChild.getWorkItem();
System.out.println("INFO: The task with summary \"" + summary + "\" already exists. Skipping creation.");
System.out.println("this task is work item: " + workItem.getId());
//either check the tasks in the xml for updated details and then modify the existing workitem
//or just modify the work item without checking for updates
makeChildTask(childWorkItem, existingChild, childHashTable, userInfoHashMap, workItemHashMap, rtc, false);
break;
}
}
}
}
if(!test) {
System.out.println("INFO: The task with summary " + summary + " does not currently exist. Creating.");
makeChildTask(childWorkItem, thisItem, childHashTable, userInfoHashMap, workItemHashMap, rtc, true);
} else makeFromExistingChildTask(childWorkItem, existingChild, userInfoHashMap, workItemHashMap, rtc);
}
}
You are possibly (not sure what makeChildTask() does) changing an XML structure while iterating through the children list. While not necessarily incorrect, this can mean you get entries inserted while you process the list. Since you call the subWorkItemElement.getChildNodes().getLength() each time instead of cache'ing it, this might result in the length changing in between the loop iterations.

How to fetch data of multiple HTML tables through Web Scraping in Java

I was trying to scrape the data of a website and to some extents I succeed in my goal. But, there is a problem that the web page I am trying to scrape have got multiple HTML tables in it. Now, when I execute my program it only retrieves the data of the first table in the CSV file and not retrieving the other tables. My java class code is as follows.
public static void parsingHTML() throws Exception {
//tbodyElements = doc.getElementsByTag("tbody");
for (int i = 1; i <= 1; i++) {
Elements table = doc.getElementsByTag("table");
if (table.isEmpty()) {
throw new Exception("Table is not found");
}
elements = table.get(0).getElementsByTag("tr");
for (Element trElement : elements) {
trElement2 = trElement.getElementsByTag("tr");
tdElements = trElement.getElementsByTag("td");
File fold = new File("C:\\convertedCSV9.csv");
fold.delete();
File fnew = new File("C:\\convertedCSV9.csv");
FileWriter sb = new FileWriter(fnew, true);
//StringBuilder sb = new StringBuilder(" ");
//String y = "<tr>";
for (Iterator<Element> it = tdElements.iterator(); it.hasNext();) {
//Element tdElement1 = it.next();
//final String content2 = tdElement1.text();
if (it.hasNext()) {
sb.append("\r\n");
}
for (Iterator<Element> it2 = trElement2.iterator(); it.hasNext();) {
Element tdElement2 = it.next();
final String content = tdElement2.text();
//stringjoiner.add(content);
//sb.append(formatData(content));
if (it2.hasNext()) {
sb.append(formatData(content));
sb.append(" , ");
}
if (!it.hasNext()) {
String content1 = content.replaceAll(",$", " ");
sb.append(formatData(content1));
//it2.next();
}
}
System.out.println(sb.toString());
sb.flush();
sb.close();
}
System.out.println(sampleList.add(tdElements));
}
}
}
What I analyze is that there is a loop which is only checking tr tds. So, after first table there is a style sheet on the HTML page. May be due to style sheet loop is breaking. I think that's the reason it is proceeding to the next table.
P.S: here's the link which I am trying to scrap
http://www.mufap.com.pk/nav_returns_performance.php?tab=01
What you do just at the beginning of your code will not work:
// loop just once, why
for (int i = 1; i <= 1; i++) {
Elements table = doc.getElementsByTag("table");
if (table.isEmpty()) {
throw new Exception("Table is not found");
}
elements = table.get(0).getElementsByTag("tr");
Here you loop just once, read all table elements and then process all tr elements for the first table you find. So even if you would loop more than once, you would always process the first table.
You will have to iterate all table elements, e.g.
for(Element table : doc.getElementsByTag("table")) {
for (Element trElement : table.getElementsByTag("tr")) {
// process "td"s and so on
}
}
Edit Since you're having troubles with the code above, here's a more thorough example. Note that I'm using Jsoup to read and parse the HTML (you didn't specify what you are using)
Document doc = Jsoup
.connect("http://www.mufap.com.pk/nav_returns_performance.php?tab=01")
.get();
for (Element table : doc.getElementsByTag("table")) {
for (Element trElement : table.getElementsByTag("tr")) {
// skip header "tr"s and process only data "tr"s
if (trElement.hasClass("tab-data1")) {
StringJoiner tdj = new StringJoiner(",");
for (Element tdElement : trElement.getElementsByTag("td")) {
tdj.add(tdElement.text());
}
System.out.println(tdj);
}
}
}
This will concat and print all data cells (those having the class tab-data1). You will still have to modify it to write to your CSV file though.
Note: in my tests this processes 21 tables, 243 trs and 2634 tds.

Strange error in apache tomcat?

I am using apache tomcat with eclipse keplee .this is my jsp file which runs a java file which queries from tdb using sparql . jsp file :
<%# page import="check.test4query" %>
<% test4query demo = new test4query();
test4query dem = new test4query();
String[] id =new String[20];
String[] dat =new String[20];
int i;
demo.mai("SELECT ?x WHERE { ?y <TO:> 'hjcooljohny75#gmail.com' . ?y <SUB:> ?x} LIMIT 20 ");
for(i=0;i<20;i++)
{ id[i]=test4query.arr[i];
id[i] = id[i].substring(0, Math.min(id[i].length(), 30));
}
for(i=0;i<20;i++)
{ //id[i]=test4query.arr[i];
out.println("<tr>"+"&nbsp&nbsp&nbsp&nbsp"+id[i]+"<hr style='border-color:#E6E6E6;padding:0px;margin:0px'>"+"</tr>");
}
%>
this is my test4query :
public static String[] arr=new String[20];
public void mai (String s) {
//String s;
//load the dataset
//String query1;
//query1="hjcooljohny75#gmail.com";
//query1 = (String)(subjectentry.getText());
// s="SELECT ?x WHERE { ?y <TO:> '"+query1+"' . ?y <SUB:> ?x} LIMIT 20 ";
System.out.println(s);
String directory = "EMAILADDRESS" ;
Dataset ds = TDBFactory.createDataset(directory) ;
Model model = ds.getDefaultModel() ;
ds.begin(ReadWrite.READ) ;
QueryExecution qExec = QueryExecutionFactory.create(s, ds) ;
int i=0;
try{
ResultSet rs = qExec.execSelect() ;
String x=rs.toString();
while (rs.hasNext()) {
QuerySolution qs = rs.next();
String rds;
if(qs.get("x")!=null)
rds = qs.get("x").toString();
else rds="hi";
// String em = (String)rs.getString();
if(rds==null)
break;
//System.out.println(rds);
arr[i] = rds;
i++;
//for (int i =0; i < arr.length; i++){
}
} finally
{qExec.close() ;
ds.commit();
ds.end();
}
for( i=0;i<20;i++)
System.out.println(arr[i]);
//arr[0]="hi";
// return arr;
// try {
// ResultSetFormatter.out(rs) ;
// } finally { qExec.close() ; }
// Another query - same view of the data.
}
The problem is when I start the tomcat server it runs perfectly showing the results but after that if I refresh the page it shows error:
com.hp.hpl.jena.tdb.transaction.TDBTransactionException: Not in a transaction
com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction.get(DatasetGraphTransaction.java:106)
com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction.get(DatasetGraphTransaction.java:40)
com.hp.hpl.jena.sparql.core.DatasetGraphTrackActive.getDefaultGraph(DatasetGraphTrackActive.java:91)
com.hp.hpl.jena.sparql.core.DatasetImpl.getDefaultModel(DatasetImpl.java:103)
check.test4query.mai(test4query.java:59)
org.apache.jsp.grayscale.gmail_005flike_jsp._jspService(gmail_005flike_jsp.java:210)
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:432)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:403)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:347)
javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
and this happens all the time but when I restart the server it shows correct ans for the first time and then error..I don't know what is the problem and how to correct it??
This is the wrong way round:
Model model = ds.getDefaultModel() ;
ds.begin(ReadWrite.READ) ;
Try instead:
ds.begin(ReadWrite.READ) ;
Model model = ds.getDefaultModel() ;
The problem is that you are leaking resources and not cleaning up after yourself
You start a transaction with your ds.begin(ReadWrite.READ) call but you don't ever close the transaction with a ds.end() call which is why subsequent calls fail with the error you see. You also fail to ever close your QueryExecution
You should use a try { } catch { } finally { } to ensure that you clean up relevant resources

Java Bean not working as expected

OK, I have a JSP running the following script section.
<% irCollection mgrq = new irCollection();
mgrq.setMgrid("Chris Novish");
mgrq.populateCollection();
int pagenum;
if (request.getParameter("p") != null) {
String pagedatum=request.getParameter("p");
pagenum = Integer.parseInt(pagedatum);
} else { pagenum = 0; }
for (int i=0;i<10;i++) {
int rownum = pagenum * 10 + i;
InquireRecord currec = mgrq.getCurRecords(rownum);
out.println(currec.getID()); %>
irCollection has an ArrayList property that stores a several InquireRecord objects. It gets this data from a database using the mgrid as (set in line 2 there) as the matching term.
But I'm getting an IndexOutOfBounds exception on what appears here as line 11.
I've done some tests, and I'm pretty sure that it's because populateCollection() isn't getting things done. I have a getSize method that gives me a size of 0.
I made a test class in Eclipse to make sure all my methods were working:
package com.serco.inquire;
public class test {
public static void main (String[] args) {
String mgr = "Chris Novish";
irCollection bob = new irCollection();
bob.setMgrid(mgr);
bob.populateCollection();
InquireRecord fred = bob.getCurRecords(1);
System.out.println(fred.getID());
}
}
That test class produces exactly what I'd expect.
Other than the names of some of the local variables, I can't see what I'm doign different in the JSP.
So... tell me, what noobish mistake did I make?
for the sake of being thorough, here's the populateCollection() method:
public void populateCollection() {
try {
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
String filename = "inquire.mdb";
String database = "jdbc:odbc:Driver={Microsoft Access Driver (*.mdb)};DBQ=";
database+= filename.trim() + ";DriverID=22;READONLY=true}";
Connection con = DriverManager.getConnection( database ,"","");
Statement s = con.createStatement();
s.execute ("SELECT * FROM inquiries WHERE manager = '" + mgrid + "'");
ResultSet rs = s.getResultSet();
int cur;
if (rs != null) {
while (rs.next()) {
cur = rs.getRow();
cur -- ;
int curID = rs.getInt("ID");
this.newIR.setID(curID);
String cursub = rs.getString("submitter");
this.newIR.setSubmitter(cursub);
this.iRecords.add(cur, this.newIR);
}
this.size = iRecords.size();
this.pages = this.size / 10;
int remain = this.size % 10;
if (remain > 0) { this.pages++; }
} else { System.out.println("no records."); }
}
catch (Throwable e) {
System.out.println(e);
}
}
Your IndexOutOfBounds exception is probably being caused by the value of rownum being passed to mgrq.getCurRecords().
Your test code proves nothing because there you're calling getCurRecords() with a constant which is probably always valid for your system and will never cause the exception.
My suggestion is to step through the code in your JSP with a debugger, or even simply to print out the value of your variables (especially pagedatum, pagenum and rownum) in your JSP code.
Is your JSP Snippet correct? It looks like you started the braces for the
for (int i=0;i<10;i++) {
but I dont see a end braces for that at all. Can you check if that is the case and if so, fix the code appropriately?

Categories

Resources