Changing Encoding while Xml.parse() with SaxFeedParser Java - java

I am trying to load hebrew rss using the fllow :
Xml.parse(_InputStream, Xml.Encoding.ISO_8859_1 , root.getContentHandler());
taken from ibm site :
link text
I would like to use other Encoding like "ISO8859_8" rather than :
Xml.Encoding.ISO_8859_1,
Xml.Encoding.US_ASCII,
Xml.Encoding.UTF_16,
Xml.Encoding.UTF_8
Thanks a lot!

you can't, because Xml.Encoding is an enum. you'll have to use one of the other methods, or -- if you can -- get the RSS feed producer to output UTF-8.

Related

How can I efficiently extract text from bunch for web pages without extra information

I have list of webpages around 1 million, I want to efficiently just extract text from those pages. Currently I am using BeautifulSoup library in python to get text from HTML and using request command to get html of a webpage. This approach extract some extra information in addition to the text like if any javascript is listed in body.
Could you please suggest me any suitable and efficient way to do the task. I looked at scrapy but it looks like it crawls specific website. Can we pass it list of specific webpages to get information from ?
Thank you in advance.
Yes, you can use Scrapy to crawl a set of URLs in a generic fashion.
You simply need to set them on the start_urls list attribute of your spider, or reimplement the start_requests spider method to yield requests from any data source, and then implement your parse callback to perform the generic content extraction you want.
You can use html-text to extract text from them, and regular Scrapy selectors to extract additional data like the one you mention.
In scrapy you can set up your own parser. E.g. Beautiful soup. This parser you can call from your parse method.
To extract text from generic pages I traverse the body only, exclude comments etc and some tags like script, style, etc:
for snippet in soup.find('body').descendants:
if isinstance(snippet, bs4.element.NavigableString) \
and not isinstance(snippet, EXCLUDED_STRING_TYPES)\
and snippet.parent.name not in EXCLUDED_TAGS:
snippet = re.sub(UNICODE_WHITESPACES, ' ', snippet)
snippet = snippet.strip()
if snippet != '':
snippets.append(snippet)
with
EXCLUDED_STRING_TYPES = (bs4.Comment, bs4.CData, bs4.ProcessingInstruction, bs4.Declaration)
EXCLUDED_TAGS = ['script', 'noscript', 'style', 'pre', 'code']
UNICODE_WHITESPACES = re.compile(u'[\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \x85\xa0\u1680\u2000\u2001\u2002\u2003\u2004'
u'\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]+')

Generating Java code from transformation JSON to a XML

I wonder do you now any program that can generate for me a Java code from transformation JSON to a XML? In Altova MapForce there is such a possibility, but unfortunately when I want use it, it returns me an Error:
Altova_Hierarchical_JSON.mfd: Mapping validation failed - 1 error(s), 0 warning(s)
Altova_Hierarchical: JSON components are not supported for Java.
The output component Altova_Hierarchical has no output file name set. A default file name will be used.
The second thing, that is important for me, is capabilities to create graphic mapping between JSON and XML like example below shows:
Thanks for the reply.
There is underscore-java library with static method U.xmlToJson(xml). I am the maintainer of the project.
<anyxml>data</anyxml>
Output:
{
"anyxml": "data",
"#omit-xml-declaration": "yes"
}

get all values using get paremeters in java

I'm passing the some values url from flex to java example:
URL format:
../mahesh/initUser.do?method=fwdAccDetails&securityId=mUuB3/p/ky5JhZPY5T8Znf01YCcIarIalQiGEXPMMsOkWDX+KtT4fx2gMML+uup8
After I'm tiring to get "securityId" values in java like
request.getParameter("securityId")
But I'm getting following values only
mUuB3/p/ky5JhZPY5T8Znf01YCcIarIalQiGEXPMMsOkWDX KtT4fx2gMML uup8
symbol getting empty space in java side..
Here is my Flex code:
navigateToURL(new URLRequest('../mahesh/initUser.do?method=fwdAccDetails&securityId='+value+'),'_s‌​elf');
I didn't get full values.. any one can help me how I will get correct values in Java..
You should use the encodeURIComponent()-Function to properly encode your securityId.
value = encodeURIComponent(value);
navigateToURL(new URLRequest('../mahesh/initUser.do?method=fwdAccDetails&securityId='+value+'),'_s‌​elf');
That way your String will be correct on the Java side.
If you want to read more about proper escaping, have a look at When are you supposed to use escape instead of encodeURI / encodeURIComponent? (Same arguments apply for Flex and JavaScript).
i just resolve my issue for following code in a javURLDecoder.decode(param1AfterEncoding.replace("+", "%2B"), "UTF-8").replace("%2B", "+")
Now its working fine only.. i dint other special character will work fine.. i will check it later..

Parsing a String representation of XML

From a String like
<TestData Identifier=\"Test\" requiredAttribute=\"Present\"></TestData> <TestData Identifier=\"Test1\" requiredAttribute=\"Present1\"></TestData> <TestData Identifier=\"Test2\" requiredAttribute=\"Present2\"></TestData> <TestData Identifier=\"Test3\" requiredAttribute=\"Present3\"></TestData>
whats the best way to get the values of the attributes requiredAttribute i.e (Present,Present1,Present2...)
You can look into JAXB unmarshalling. Check out this page for more details, it should point to what you need
http://jaxb.java.net/tutorial/section_3_1-Unmarshalling-and-Using-the-Data.html#Unmarshalling and Using the Data
For basic XML parsing like this, I've found NanoXML # http://nanoxml.cyberelf.be/ to be about the easiest and mostlightweight.
Working with XML in Java send you down a long road to pain if you start using all the other libraries.
That's not XML - but you could do it with regex or by converting it to XML and parsing it out. The latter is probably more expensive. It depends on what the actual test data is and your requirements for it.

Converting HTTP Response (Java "Properties" stream format) in to NSDictionary

I am working on iphone application which contains HTTP Request and Response.
The format of the response is a key/value format compatible with the Java "Properties" stream format.
I want to store the response into a NSDictionay. Could you suggest me any way to do this?
Thank you.
sangee
Edit:
Thanks guyz for the quick replies!!!
is their any other ways to store them in NSSdictionay?
I just want to store the album name and description in an array like this:
mutablearray = [wrwr, dsf, my album];
could you please let me know if this possible or not?
Thanks again!!!
This is the response i got it for my HTTP request...
GR2PROTO
debug_album= debug_gallery_version= debug_user=admin debug_user_type=Gallery_User debug_user_already_logged_in= server_version=2.12 status=0 status_text=Login successful.
#GR2PROTO debug_album= debug_gallery_version= debug_user=admin debug_user_type=Gallery_User debug_user_already_logged_in=1
album.name.1=wrwr album.title.1=wrwr album.summary.1= album.parent.1=0 album.resize_size.1=640 album.thumb_size.1=100 album.perms.add.1=true album.perms.write.1=true album.perms.del_item.1=true album.perms.del_alb.1=true album.perms.create_sub.1=true album.info.extrafields.1=Description
album.name.2=dsf album.title.2=dsf album.summary.2= album.parent.2=0 album.resize_size.2=640 album.thumb_size.2=100 album.perms.add.2=true album.perms.write.2=true album.perms.del_item.2=true album.perms.del_alb.2=true album.perms.create_sub.2=true album.info.extrafields.2=Description
album.name.3=my album album.title.3=my album album.summary.3= album.parent.3=0 album.resize_size.3=640 album.thumb_size.3=100 album.perms.add.3=true album.perms.write.3=true album.perms.del_item.3=true album.perms.del_alb.3=true album.perms.create_sub.3=true album.info.extrafields.3=Description
If you can, I would recommend serializing the data as JSON (or XML, if you have to) and parsing it using TouchJSON or a similar parser. If you really can't, then you'll have to implement your own parser--take a look at NSScanner.
Look at NSStream and the Stream Programming Guide for Cocoa.
Back in the day when Java was fully integrated into Cocoa, NSStream mapped onto Java streams. It still might. IIRC, (it's been a while) NSStream will return a properly populated NSDictionary from a Java stream.
Edit:
It looks like the text returned is just a space delimited hash which is the Java version of dictionary. It takes the form of key=value space key=value. The only tricky part is that some of the hashes are nested.
The first line for example is nested:
debug_album{
debug_gallery_version{
debug_user=admin
debug_user_type=Gallery_User
debug_user_already_logged_in{
server_version=2.12
status=0
status_text=Login successful.
}
}
}
You need a recursive scanner to parse that. The "key=space" pattern indicates a nested dictionary.

Categories

Resources