I'm trying to make an application that displays news feed from a website so I get the input stream and parse it in document using SAX but it returns SAX exception that it is unable to determine type of coding of this Stream . I tried before that to put The website's stream manually in XML file and read the file and It worked but when streaming directly from Internet it throws that exception and this is my code :
public final class MyScreen extends MainScreen {
protected static RichTextField RTF = new RichTextField("Plz Wait . . . ",
public MyScreen() {
// Set the displayed title of the screen
setTitle("Yalla Kora");
Runnable R = new Runnable();
private class Runnable extends Thread {
public Runnable() {
// TODO Auto-generated constructor stub
ConnectionFactory factory = new ConnectionFactory();
ConnectionDescriptor descriptor = factory
HttpConnection httpConnection;
httpConnection = (HttpConnection) descriptor.getConnection();// Connector.open("http://www.yallakora.com/pictures/main//2011/11/El-Masry-807-11-2011-21-56-7.jpg");
Manager mainManager = getMainManager();
RichList RL = new RichList(mainManager, true, 2, 1);
InputStream input;
try {
input = httpConnection.openInputStream();
Document document;
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
DocumentBuilder docBuilder;
try {
docBuilder = docBuilderFactory.newDocumentBuilder();
try {
document = docBuilder.parse(input);
NodeList item = document.getElementsByTagName("item");
int k = item.getLength();
for (int i = 0; i < k; i++) {
Node value = item.item(i);
NodeList Data = value.getChildNodes();
Node title = Data.item(0);
Node link = Data.item(1);
Node date = Data.item(2);
Node discription = Data.item(5);
Node Discription = discription.getFirstChild();
String s = Discription.getNodeValue();
int mm = s.indexOf("'><BR>");
int max = s.length();
String imagelink = s.substring(0, mm);
String Khabar = s.substring(mm + 6, max);
String Date = date.getFirstChild().getNodeValue();
String Title = title.getFirstChild().getNodeValue();
String Link = link.getFirstChild().getNodeValue();
ConnectionFactory factory1 = new ConnectionFactory();
ConnectionDescriptor descriptor1 = factory1
HttpConnection httpConnection1;
httpConnection1 = (HttpConnection) descriptor1
InputStream input1;
input1 = httpConnection1.openInputStream();
byte[] bytes = IOUtilities.streamToBytes(input1);
Bitmap bitmap = Bitmap.createBitmapFromBytes(bytes,
0, -1, 1);
RL.add(new Object[] { bitmap, Title, Khabar, Date });
add(new RichTextField(link.getNodeValue(),
} catch (SAXException e) {
// TODO Auto-generated catch block
RTF.setText("SAXException " + e.toString());
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
RTF.setText("ParserConfigurationException " + e.toString());
} catch (IOException e) {
RTF.setText("IOException " + e.toString());
// TODO Auto-generated catch block
Any Ideas ??
I recommend restructuring this code into at least two parts.
I would create a download function that is given a URL and downloads the bytes associated with that URL. This should open and close the connection, and just return either the bytes downloaded or an error indication.
I would use this download processing as a 'function call' to download your XML bytes. Then parse the bytes that are obtained feeding these direct into your parser. If the data is properly constructed XML, it will have a header indicating the encoding used, so you do not need to worry about that, the parser will cope.
Once you have this parsed, then use the download function again to download the bytes associated with any images you want.
Regarding the SAX processing, have you reviewed this question:
Why would I not be able to call this multiple times?
private Document getStationery(String txtStationery,Database mailDB){
try {
View mailView = mailDB.getView("(Stationery)");
DocumentCollection dc = mailView.getAllDocumentsByKey("Memo Stationery");
Document tmpdoc;
Document doc = dc.getFirstDocument();
while (doc != null) {
return doc;
tmpdoc = dc.getNextDocument();
doc = tmpdoc;
} catch (NotesException e) {
// TODO Auto-generated catch block
return null;
Crashes on second use of it below .... something to do with not recycling?
public void send() throws NotesException, IOException, Exception{
Session session = getCurrentSession();
Database userDB = getUserDatabase();
Database mailbox = session.getDatabase("", "mail1.box");
Document stationeryDoc1 = getStationery("Test1",userDB);
Document stationeryDoc2 = getStationery("Test2",userDB);
You could try without recycling at all (generally not a good idea, but here it may be helpful to rule out other problems), or recycle the objects in the getStationary() method properly, beginning with the Document, the DocumentCollection, and finally the View. At the moment, the only object you recycle is the previous Document object in the while loop.
I need to create a diff between two HTML documents in my app. I found a library called DaisyDiff that can do it. It has an API that looks like this:
* Diffs two html files, outputting the result to the specified consumer.
public static void diffHTML(InputSource oldSource, InputSource newSource,
ContentHandler consumer, String prefix, Locale locale)
throws SAXException, IOException
I know absolutely nothing about SAX and I can't figure out what to pass as the third argument. After poking through https://code.google.com/p/daisydiff/source/browse/trunk/daisydiff/src/java/org/outerj/daisy/diff/Main.java I wrote this method:
protected String doInBackground(String... params)
try {
String oldFileName = params[0],
newFileName = params[1];
ByteArrayOutputStream os = new ByteArrayOutputStream();
FileInputStream oldis = null, newis = null;
oldis = openFileInput(oldFileName);
newis = openFileInput(newFileName);
SAXTransformerFactory tf = (SAXTransformerFactory) TransformerFactory
TransformerHandler result = tf.newTransformerHandler();
result.setResult(new StreamResult(os));
DaisyDiff.diffHTML(new InputSource(oldis), new InputSource(newis), result, "", Locale.getDefault());
Log.d("diff", "output length = " + os.size());
return os.toString("Utf-8");
}catch (Exception e){
return e.toString();
I have no idea if that even makes sense. It doesn't work, nothing is written to the output. Please help me with this. Thanks in advance.
According to how HtmlTestFixture.diff is coded up (inside src/test/java of DaisyDiff, you need to give it instructions on how the result should be formatted. Have you tried adding the below setOutputProperty(...) calls?
//#Test comes from TestNG and is not related to DaisyDiff
public void daisyDiffTest() throws Exception {
String html1 = "<html><body>var v2</body></html>";
String html2 = "<html> \n <body> \n Hello world \n </body> \n </html>";
try {
StringWriter finalResult = new StringWriter();
SAXTransformerFactory tf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
TransformerHandler result = tf.newTransformerHandler();
result.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
result.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
result.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
result.getTransformer().setOutputProperty(OutputKeys.ENCODING, "UTF-8");
result.setResult(new StreamResult(finalResult));
ContentHandler postProcess = result;
DaisyDiff.diffHTML(new InputSource(new StringReader(html1)), new InputSource(new StringReader(html2)), postProcess, "test", Locale.ENGLISH);
} catch (SAXException e) {
// TODO Auto-generated catch block
} catch (IOException e) {
// TODO Auto-generated catch block
Done this way, my output is as follows. Now I can stick this into an HTML file, include the right css and js files and have a pretty output.
<span class="diff-html-removed" id="removed-test-0" previous="first-test" changeId="removed-test-0" next="added-test-0">var v2</span><span class="diff-html-added" previous="removed-test-0" changeId="added-test-0" next="last-test"> </span><span class="diff-html-added" id="added-test-0" previous="removed-test-0" changeId="added-test-0" next="last-test">Hello world </span>
I am making some changes to an embedded XML file in my Java application. I have some fields, a LOAD button and a SAVE button. After clicking the save button I can see the XML file updating, but after clicking the load button the old values are being loaded to the fields.
Here is my code:
public class MyLoad_SaveSampleProject {
public String field1 = "";
public String field2 = "";
public void loadSampleProject() {
InputStream file = MyLoad_SaveSampleProject.class.getResourceAsStream("/main/resources/otherClasses/projects/SampleProject.xml");
try {
DocumentBuilderFactory DocBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder DocBuilder = DocBuilderFactory.newDocumentBuilder();
Document Doc = DocBuilder.parse(file);
NodeList list = Doc.getElementsByTagName("*"); //create a list with the elements of the xml file
for (int i=0; i<list.getLength(); i++) {
Element element = (Element)list.item(i);
if (element.getNodeName().equals("field1")) {
field1 = element.getChildNodes().item(0).getNodeValue().toString();
} else if (element.getNodeName().equals("field2")) {
field2 = element.getChildNodes().item(0).getNodeValue().toString();
} catch (Exception e) {
public void saveSampleProject(String field1Str, String field2Str) {
InputStream file = MyLoad_SaveSampleProject.class.getResourceAsStream("/main/resources/otherClasses/projects/SampleProject.xml");
try {
DocumentBuilderFactory DocBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder DocBuilder = DocBuilderFactory.newDocumentBuilder();
Document Doc = DocBuilder.parse(file);
NodeList list = Doc.getElementsByTagName("*"); //create a list with the elements of the xml file
for (int i=0; i<list.getLength(); i++) {
Node thisAttribute = list.item(i);
if (thisAttribute.getNodeName().equals("field1")) {
} else if (thisAttribute.getNodeName().equals("field2")) {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(Doc);
StreamResult result = new StreamResult(new File("src/main/resources/otherClasses/projects/SampleProject.xml"));
transformer.transform(source, result);
} catch (ParserConfigurationException pce) {
} catch (TransformerException tfe) {
} catch (IOException ioe) {
} catch (SAXException sae) {
public String returnField1() {
return field1;
public String returnField2() {
return field2;
And this is my default XML file:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><Strings>
When the save button is pressed I am using the saveSampleProject method. When the load button is pressed I am using the loadSampleProject method and then I am getting the field values with the returnField1 and returnField2 methods.
I have no idea of what could be wrong with what I'm doing. I would appreciate any suggestions.
Most probably that calling method getResourceAsStream() leads to resource caching. Since you are using File() in save method try to get InputStream on data load using File object, and not as resource.
public class SOAPClient implements Runnable {
* endpoint url, the address where soap xml will be sent. It is hard coded
* now, later on to be made configurable
private String endpointUrl = "";
* This is for debugging purposes Message and response are written to the
* fileName
static String fileName = "";
* serverResponse This is a string representation of the response received
* from server
private String serverResponse = null;
public String tempTestStringForDirectory = "";
* A single file or a folder maybe provided
private File fileOrFolder;
public SOAPClient(String endpointURL, File fileOrFolder) {
this.endpointUrl = endpointURL;
this.fileOrFolder = fileOrFolder;
serverResponse = null;
* Creats a SOAPMessage out of a file that is passed
* #param fileAddress - Contents of this file are read and a SOAPMessage is
* created that will get sent to the server. This is a helper method. Is
* this step (method, conversion) necessary? set tempSoapText = XML String,
* currently getting from file, but it can be a simple string
private SOAPMessage xmlStringToSOAPMessage(String fileAddress) {
// Picking up this string from file right now
// This can come from anywhere
String tempSoapText = readFileToString(fileAddress);
SOAPMessage soapMessage = null;
try {
// Create SoapMessage
MessageFactory msgFactory = MessageFactory.newInstance();
SOAPMessage message = msgFactory.createMessage();
SOAPPart soapPart = message.getSOAPPart();
// Load the SOAP text into a stream source
byte[] buffer = tempSoapText.getBytes();
ByteArrayInputStream stream = new ByteArrayInputStream(buffer);
StreamSource source = new StreamSource(stream);
ByteArrayOutputStream out = new ByteArrayOutputStream();
// Set contents of message
soapMessage = message;
} catch (SOAPException e) {
System.out.println("soapException xmlStringToSoap()");
System.out.println("SOAPException : " + e);
} catch (IOException e) {
System.out.println("IOException xmlStringToSoap()");
System.out.println("IOException : " + e);
return soapMessage;
* Reads the file passed and creates a string. fileAddress - Contents of
* this file are read into a String
private String readFileToString(String fileAddress) {
FileInputStream stream = null;
MappedByteBuffer bb = null;
String stringFromFile = "";
try {
stream = new FileInputStream(new File(fileAddress));
FileChannel fc = stream.getChannel();
bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
stringFromFile = Charset.defaultCharset().decode(bb).toString();
} catch (IOException e) {
System.out.println("readFileToString IOException");
} finally {
try {
} catch (IOException e) {
System.out.println("readFileToString IOException");
return stringFromFile;
* soapXMLtoEndpoint sends the soapXMLFileLocation to the endpointURL
public void soapXMLtoEndpoint(String endpointURL, String soapXMLFileLocation) throws SOAPException {
SOAPConnection connection = SOAPConnectionFactory.newInstance().createConnection();
SOAPMessage response = connection.call(xmlStringToSOAPMessage(soapXMLFileLocation), endpointURL);
SOAPBody responseBody = response.getSOAPBody();
SOAPBodyElement responseElement = (SOAPBodyElement) responseBody.getChildElements().next();
SOAPElement returnElement = (SOAPElement) responseElement.getChildElements().next();
if (responseBody.getFault() != null) {
System.out.println("fault != null");
System.out.println(returnElement.getValue() + " " + responseBody.getFault().getFaultString());
} else {
serverResponse = returnElement.getValue();
System.out.println("\nfault == null, got the response properly.\n");
* This is for debugging purposes. Writes string to a file.
* #param message Contents to be written to file
* #param fileName the name of the
private static void toFile(String message, String fileName) {
try {
FileWriter fstream = new FileWriter(fileName);
System.out.println("printing to file: ".concat(fileName));
BufferedWriter out = new BufferedWriter(fstream);
} catch (Exception e) {
System.out.println("toFile() Exception");
System.err.println("Error: " + e.getMessage());
* Using dom to parse the xml. Getting both orderID and the description.
* #param xmlToParse XML in String format to parse. Gets the orderID and
* description Is the error handling required? What if orderID or
* description isn't found in the xmlToParse? Use setters and getters?
* #param fileName only for debuggining, it can be safely removed any time.
private void domParsing(String xmlToParse, String fileName) {
if (serverResponse == null) {
} else {
try {
System.out.println("in domParsing()");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
System.out.println("serverResponse contains fault");
Document doc = dBuilder.parse(new InputSource(new StringReader(serverResponse)));
NodeList orderNodeList = doc.getElementsByTagName("Order");
if (orderNodeList.getLength() > 0) {
tempTestStringForDirectory = tempTestStringForDirectory + "\n Got order\n" + "\n" + fileName + "\n" + "got order\n";
for (int x = 0; x < orderNodeList.getLength(); x++) {
NodeList descriptionNodeList = doc.getElementsByTagName("Description");
if (descriptionNodeList.getLength() > 0) {
System.out.println("getting description");
String tempDescriptionString = descriptionNodeList.item(0).getTextContent();
tempTestStringForDirectory = tempTestStringForDirectory + "\n Got description" + "\n" + fileName + "\n" + tempDescriptionString + "\n";
} catch (Exception e) {
System.out.println("domParsing() Exception");
* Reads a single file or a whole directory structure
private void listFilesForFolder(final File fileOrFolder) {
String temp = "";
if (fileOrFolder.isDirectory()) {
for (final File fileEntry : fileOrFolder.listFiles()) {
if (fileEntry.isDirectory()) {
} else {
if (fileEntry.isFile()) {
temp = fileEntry.getName();
try {
soapXMLtoEndpoint(endpointUrl, fileOrFolder.getAbsolutePath() + "\\" + fileEntry.getName());
domParsing(serverResponse, fileEntry.getName());
} catch (SOAPException e) {
if (fileOrFolder.isFile()) {
temp = fileOrFolder.getName();
System.out.println("this is a file");
try {
soapXMLtoEndpoint(endpointUrl, fileOrFolder.getAbsolutePath());
} catch (SOAPException e) {
domParsing(serverResponse, temp);
public void run() {
toFile(tempTestStringForDirectory, "test.txt");
public static void main(String[] args) {
String tempURLString = ".../OrderingService";
String tempFileLocation = "C:/Workspace2/Test5/";
SOAPClient soapClient = new SOAPClient(tempURLString, new File(tempFileLocation));
Thread thread = new Thread(soapClient);
System.out.println("program ended");
I think n threads for n files would be bad? Wouldn't that crash the system, or give too many threads error?
I'm trying to make my program multi threaded. I don't know what I am missing. My program has a logic to know if a single file is passed or a directory is passed. One thread is fine if a single file is passed. But what should I do if a directory is passed? Do I need to create threads in my listFilesForFolder method? Are the threads always started from the main method, or can they be started from other methods? Also, this program is going to be used by other people, so it should be my job to handle the threads properly. All they should have to do is be using my program. So I feel that the thread logic should not belong in the main method but rather listFilesForFolder which is the starting point of my program. Thank you for your help.
From what I have seen, most download managers will try to download at most around 3 files at a time, plus or minus two. I suggest you do the same. Essentially, you could do something like this (Psuedo code)
//Set up a list of objects
Mutex mutex
String next_object(void){
String nextFile;
try {
if (nextFileIndex<fileList.length)
} catch(InterruptedException ie) {
return nextFile;
Each thread :
String nextFile;
//Get nextFile
} while (!nextFile.equals(""))
I'm having an xml which i parse and get the data from between the nodes. However this data is surrounded by html tags. i create another xml and put this data in it. Now i have to get parse it again to get the proper html syntax.
Kindly help.
public class XMLfunctions {
public final static Document XMLfromString(String xml){
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xml));
doc = db.parse(is);
} catch (ParserConfigurationException e) {
System.out.println("XML parse error: " + e.getMessage());
return null;
} catch (SAXException e) {
System.out.println("Wrong XML file structure: " + e.getMessage());
return null;
} catch (IOException e) {
System.out.println("I/O exeption: " + e.getMessage());
return null;
return doc;
/** Returns element value
* #param elem element (it is XML tag)
* #return Element value otherwise empty String
public final static String getElementValue( Node elem ) {
Node kid;
if( elem != null){
if (elem.hasChildNodes()){
for( kid = elem.getFirstChild(); kid != null; kid = kid.getNextSibling() ){
if( kid.getNodeType() == Node.TEXT_NODE ){
return kid.getNodeValue();
return "";
/*Start Parsing Body */
public static String getBodyXML(String id){
String line = null;
try {
DefaultHttpClient httpClient = new DefaultHttpClient();
HttpPost httpPost = new HttpPost(""+id+"&version=2.2&start=0&rows=10&indent=on");
HttpResponse httpResponse = httpClient.execute(httpPost);
HttpEntity httpEntity = httpResponse.getEntity();
line = EntityUtils.toString(httpEntity);
} catch (UnsupportedEncodingException e) {
line = "<results status=\"error\"><msg>Can't connect to server</msg></results>";
} catch (MalformedURLException e) {
line = "<results status=\"error\"><msg>Can't connect to server</msg></results>";
} catch (IOException e) {
line = "<results status=\"error\"><msg>Can't connect to server</msg></results>";
String st= ParseXMLBodyNode(line,"doc");
return st;
public static String ParseXMLBodyNode(String str,String node){
String xmlRecords = str;
String results = "";
String[] result = new String [1];
StringBuffer sb = new StringBuffer();
StringBuffer text = new StringBuffer();
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlRecords));
Document doc = db.parse(is);
NodeList indiatimes1 = doc.getElementsByTagName(node);
sb.append("<results count=");
for (int i = 0; i < indiatimes1.getLength(); i++) {
Node node1 = indiatimes1.item(i);
if (node1.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node1;
NodeList nodelist = element.getElementsByTagName("str");
Element element1 = (Element) nodelist.item(0);
NodeList title = element1.getChildNodes();
for(int j=0; j<title.getLength();j++){
String tmpText = html2text(text.toString());
result[i] = title.item(0).getNodeValue();
} catch (Exception e) {
System.out.println("Exception........"+results );
return sb.toString();
public static String html2text(String html) {
String pText = Jsoup.clean(html, Whitelist.basic());
return pText;
My class which inititates the process
public class NewsDetails extends ListActivity{
/** Called when the activity is first created. */
public void onCreate(Bundle savedInstanceState) {
protected void onStart() {*/
Intent myIntent = getIntent();
String id = myIntent.getStringExtra("content_id");
String title = myIntent.getStringExtra("title");
ArrayList<HashMap<String, String>> mylist = new ArrayList<HashMap<String, String>>();
String xml = XMLfunctions.getBodyXML(id);
Document doc = XMLfunctions.XMLfromString(xml);
int numResults = XMLfunctions.numResults(doc);
if((numResults <= 0)){
Toast.makeText(NewsDetails.this, "No Result Found", Toast.LENGTH_LONG).show();
NodeList nodes = doc.getElementsByTagName("result");
for (int i = 0; i < nodes.getLength(); i++) {
HashMap<String, String> map = new HashMap<String, String>();
map.put("title", title);
Element e = (Element)nodes.item(i);
map.put("news", XMLfunctions.getValue(e, "news"));
ListAdapter adapter = new SimpleAdapter(this, mylist , R.layout.list_item, new String[] { "title", "news" }, new int[] { R.id.item_title, R.id.item_subtitle });
final ListView lv = getListView();
Sample xml which i get after converting from jsoup
<results count="1">
<ul><li><p>as part of its growth plan,</p></li><li><p>in a bid to achieve the target</p></li><li><p>it is pointed out that most of ccl's production came from opencast mines and only 2 mt from underground (ug) mines. ccl is now trying to increase the share underground production. the board of ccl has, thus, approved the introduction of continuous mine in chiru ug at a cost of about rs 145 crore to raise this mine's production from 2 mt to 8 mt per annum.</p></li><li><p>mr ritolia said that.</p></li></ul>
I want to extract the content between the news tags. This xml is fed to XMLfromString(String xml) function in XMLFunctions class which then returns only "<" and rest of the body is left.
I'm not able to get the body with html tags to provide formatting.
One option is to use XML CDATA section as:
<ul><li><p>as part of its growth plan,</p></li><li><p>in a bid to achieve the target</p></li><li><p>it is pointed out that most of ccl's production came from opencast mines and only 2 mt from underground (ug) mines. ccl is now trying to increase the share underground production. the board of ccl has, thus, approved the introduction of continuous mine in chiru ug at a cost of about rs 145 crore to raise this mine's production from 2 mt to 8 mt per annum.</p></li><li><p>mr ritolia said that.</p></li></ul>
Then your parser will not treat HTML tags as XML and allow you access to raw content of the element. The other option is to encode the HTML tags i.e. convert all < into <, > into >, & into & etc. For more on encoding see here