Converting HTTP Response (Java "Properties" stream format) in to NSDictionary - java

I am working on iphone application which contains HTTP Request and Response.
The format of the response is a key/value format compatible with the Java "Properties" stream format.
I want to store the response into a NSDictionay. Could you suggest me any way to do this?
Thank you.
sangee
Edit:
Thanks guyz for the quick replies!!!
is their any other ways to store them in NSSdictionay?
I just want to store the album name and description in an array like this:
mutablearray = [wrwr, dsf, my album];
could you please let me know if this possible or not?
Thanks again!!!
This is the response i got it for my HTTP request...
GR2PROTO
debug_album= debug_gallery_version= debug_user=admin debug_user_type=Gallery_User debug_user_already_logged_in= server_version=2.12 status=0 status_text=Login successful.
#GR2PROTO debug_album= debug_gallery_version= debug_user=admin debug_user_type=Gallery_User debug_user_already_logged_in=1
album.name.1=wrwr album.title.1=wrwr album.summary.1= album.parent.1=0 album.resize_size.1=640 album.thumb_size.1=100 album.perms.add.1=true album.perms.write.1=true album.perms.del_item.1=true album.perms.del_alb.1=true album.perms.create_sub.1=true album.info.extrafields.1=Description
album.name.2=dsf album.title.2=dsf album.summary.2= album.parent.2=0 album.resize_size.2=640 album.thumb_size.2=100 album.perms.add.2=true album.perms.write.2=true album.perms.del_item.2=true album.perms.del_alb.2=true album.perms.create_sub.2=true album.info.extrafields.2=Description
album.name.3=my album album.title.3=my album album.summary.3= album.parent.3=0 album.resize_size.3=640 album.thumb_size.3=100 album.perms.add.3=true album.perms.write.3=true album.perms.del_item.3=true album.perms.del_alb.3=true album.perms.create_sub.3=true album.info.extrafields.3=Description

If you can, I would recommend serializing the data as JSON (or XML, if you have to) and parsing it using TouchJSON or a similar parser. If you really can't, then you'll have to implement your own parser--take a look at NSScanner.

Look at NSStream and the Stream Programming Guide for Cocoa.
Back in the day when Java was fully integrated into Cocoa, NSStream mapped onto Java streams. It still might. IIRC, (it's been a while) NSStream will return a properly populated NSDictionary from a Java stream.
Edit:
It looks like the text returned is just a space delimited hash which is the Java version of dictionary. It takes the form of key=value space key=value. The only tricky part is that some of the hashes are nested.
The first line for example is nested:
debug_album{
debug_gallery_version{
debug_user=admin
debug_user_type=Gallery_User
debug_user_already_logged_in{
server_version=2.12
status=0
status_text=Login successful.
}
}
}
You need a recursive scanner to parse that. The "key=space" pattern indicates a nested dictionary.

Related

How can I efficiently extract text from bunch for web pages without extra information

I have list of webpages around 1 million, I want to efficiently just extract text from those pages. Currently I am using BeautifulSoup library in python to get text from HTML and using request command to get html of a webpage. This approach extract some extra information in addition to the text like if any javascript is listed in body.
Could you please suggest me any suitable and efficient way to do the task. I looked at scrapy but it looks like it crawls specific website. Can we pass it list of specific webpages to get information from ?
Thank you in advance.
Yes, you can use Scrapy to crawl a set of URLs in a generic fashion.
You simply need to set them on the start_urls list attribute of your spider, or reimplement the start_requests spider method to yield requests from any data source, and then implement your parse callback to perform the generic content extraction you want.
You can use html-text to extract text from them, and regular Scrapy selectors to extract additional data like the one you mention.
In scrapy you can set up your own parser. E.g. Beautiful soup. This parser you can call from your parse method.
To extract text from generic pages I traverse the body only, exclude comments etc and some tags like script, style, etc:
for snippet in soup.find('body').descendants:
if isinstance(snippet, bs4.element.NavigableString) \
and not isinstance(snippet, EXCLUDED_STRING_TYPES)\
and snippet.parent.name not in EXCLUDED_TAGS:
snippet = re.sub(UNICODE_WHITESPACES, ' ', snippet)
snippet = snippet.strip()
if snippet != '':
snippets.append(snippet)
with
EXCLUDED_STRING_TYPES = (bs4.Comment, bs4.CData, bs4.ProcessingInstruction, bs4.Declaration)
EXCLUDED_TAGS = ['script', 'noscript', 'style', 'pre', 'code']
UNICODE_WHITESPACES = re.compile(u'[\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \x85\xa0\u1680\u2000\u2001\u2002\u2003\u2004'
u'\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]+')

Generating Java code from transformation JSON to a XML

I wonder do you now any program that can generate for me a Java code from transformation JSON to a XML? In Altova MapForce there is such a possibility, but unfortunately when I want use it, it returns me an Error:
Altova_Hierarchical_JSON.mfd: Mapping validation failed - 1 error(s), 0 warning(s)
Altova_Hierarchical: JSON components are not supported for Java.
The output component Altova_Hierarchical has no output file name set. A default file name will be used.
The second thing, that is important for me, is capabilities to create graphic mapping between JSON and XML like example below shows:
Thanks for the reply.
There is underscore-java library with static method U.xmlToJson(xml). I am the maintainer of the project.
<anyxml>data</anyxml>
Output:
{
"anyxml": "data",
"#omit-xml-declaration": "yes"
}

Workaround on Scraping HTML by diving into js source code

I learn about jSoup recently and would like to dive more into it. However, I have met obstacle handling webpages with javascript (I have no knowledge in js, yet :/).
I have read that htmlunit would be the correct tool to perform webbrowser actions, but I figured out that I would need no knowledge in js if I can find out the JSON object obtained in the webpage using the javascript.
For example, this page:
among the source files, one of them is tooltips.js. In this file, variable rgNeededFeeds is generated and called in method LoadHeropediaData(), which is the method to generate the whole URL link for getting the json object.
URL = URL + 'jsfeed/heropediadata?feeds='+strFeeds+'&v=3633666222511362823&l=english';
I could not get my mind on what is actually strFeeds. I have tried various combinations but it doesn't work (it returned an empty array...). Or, my guess is totally off?
What I actually need is the data it displays on top when you click on one of the "items". The info in the "hover" would do too, but it lack the "recepi" info. And I'm presuming that by getting the json object from the full URL above, well, basically all data infos should be in that json.
Anyways, this is only based on what I understand from staring at those source files for hours. Do correct me if I'm wrong. (I'm in Java by the way)
**p/s: I would also like to take this opportunity to express my thanks to Balusc, he has been everywhere when I have doubts on jSoup. :>*
strFeeds is nothing but one of these two strings : itemdata or abilitydata
You can find this in tooltips.js at line 38-45
var rgNeededFeeds = [];
$.each( [ 'item', 'ability' ],
function( i, ttType ){
icons = GetIconCollection( ttType );
if ( icons.length ){
rgNeededFeeds.push( ttType+'data' );
//..............
}
}
)
ttType is the value of an iteration over the array [ 'item', 'ability' ] which concatenated with the string data is pushed into the array rgNeededFeeds
The function LoadHeropediaData is called at the end of the function above with rgNeededFeeds as parameter :
LoadHeropediaData( rgNeededFeeds );
Aside note : If you begin to start scraping websites, learning javascript will be MANDATORY.
NOTE : you're right, the JSON contains all the information needed...

Parsing a String representation of XML

From a String like
<TestData Identifier=\"Test\" requiredAttribute=\"Present\"></TestData> <TestData Identifier=\"Test1\" requiredAttribute=\"Present1\"></TestData> <TestData Identifier=\"Test2\" requiredAttribute=\"Present2\"></TestData> <TestData Identifier=\"Test3\" requiredAttribute=\"Present3\"></TestData>
whats the best way to get the values of the attributes requiredAttribute i.e (Present,Present1,Present2...)
You can look into JAXB unmarshalling. Check out this page for more details, it should point to what you need
http://jaxb.java.net/tutorial/section_3_1-Unmarshalling-and-Using-the-Data.html#Unmarshalling and Using the Data
For basic XML parsing like this, I've found NanoXML # http://nanoxml.cyberelf.be/ to be about the easiest and mostlightweight.
Working with XML in Java send you down a long road to pain if you start using all the other libraries.
That's not XML - but you could do it with regex or by converting it to XML and parsing it out. The latter is probably more expensive. It depends on what the actual test data is and your requirements for it.

Changing Encoding while Xml.parse() with SaxFeedParser Java

I am trying to load hebrew rss using the fllow :
Xml.parse(_InputStream, Xml.Encoding.ISO_8859_1 , root.getContentHandler());
taken from ibm site :
link text
I would like to use other Encoding like "ISO8859_8" rather than :
Xml.Encoding.ISO_8859_1,
Xml.Encoding.US_ASCII,
Xml.Encoding.UTF_16,
Xml.Encoding.UTF_8
Thanks a lot!
you can't, because Xml.Encoding is an enum. you'll have to use one of the other methods, or -- if you can -- get the RSS feed producer to output UTF-8.

Categories

Resources