I need to build a web service that analyzises SEO. The service will show how often the site was updated. I need to figure out how to get the posted date or update frequency from the HTML of the website.
For example on http://googletesting.blogspot.com/ I can get date from the tag <span>Wednesday, June 04, 2014</span>. Other websites don't use the same tags and date format so I can't us the same code to detect those dates.
(Dates can have very different formats in different locales. Also, month names can be written as text or as number. I need to match as much dates as possible.Sometime,date format isn't posted date but it's just words in articles.
My Algorithm about this
I attempt to get "posted date" from all posted then calculate update frequency.
Such as Fist posted at 30May 2012, Second posted at 29May2012, Third posted at 28May2012
So I will get result that this website was updated dairly
In the end, I want to know if each website updates:
Yearly
Monthly
Weekly
Daily
How do I reliably get this from any website?
Instead of parsing the dates in the page, you could download the home page and store it. Then you could come back every day and download the homepage again to see if it changed. This approach would work even for sites that don't publish any dates on their homepage. It would take longer to get your answer though.
Another approach would be to download the RSS feed for the site if it has one. The example site you give one has an XML feed: http://feeds.feedburner.com/blogspot/RLXA?format=xml RSS feeds are meant to be machine readable and the dates are in a consistent format.
You also say that you are using Java. I've found that Java's date parsing libraries are not very flexible. They force you to know the exact format of the date before you parse it. I have written a free, open source flexible date time parser in Java that you could try: http://ostermiller.org/utils/DateTimeParse.html Once you found dates on the page (maybe for looking at what comes after "posted on"), you could use my flexible parser to parse dates in a variety of formats.
Related
I am currently developing a Java application based on Google Custom Search API, using their Java libraries.
According to Google's documentation, they associate a date to each indexed Web page:
Page Dates: Google estimates the date for a page based the URL, title, byline date and other features. This date can be used with the sort operator using the special structured data type date, as in &sort=date.
I want to retrieve the date associated to all the results returned for a given request. However, I didn't find anything related to this task in Google's documentation: there are parameters one can use to sort the results by date, or focus on a certain period of time, but nothing regarding retrieving the precise dates themselves. And I couldn't find any reference to this problem on the Web neither.
So, I am turning to SO to ask these questions:
Is it even possible to do that through Google's API? How?
Otherwise, is there a workaround?
My primary need is to get DAU, MAU, Crash percent, Availability, Rating etc., for any custom time period. (Eg: last 2days, 1week, Date1 - Date2 etc.,) So far I have been using the data from Crash Trends page in dashboard, by setting custom date values and getting the data/values manually.
So, I wanted to automate this, and started implementing the Rest API. The documentation seemed pretty vague, and I only found the endpoint "apps" in the API to be returning something related to what I am looking for (but it only provides very limited details, and no way to set custom dates)
API I request used : https://developers.crittercism.com:443/v1.0/apps?attributes=appName,crashPercent,mau,rating
Am I missing something in the documentations??
Can someone tell me how I can get the details I want from via the Rest API??
Mainly the crash trends details like AVAILABILITY/CRASH PERCENT/DAU/MAU etc., for custom date intervals (not exceeding more than a month). Thanks!
I am a product manager at Crittercism.
The developers.crittercism.com/v1.0/apps endpoint gives you a snapshot of the app data along with some other properties ( link to the apps tore, icon url etc)
For your requirement you should use this endpoint
developers.crittercism.com/v1.0/errorMonitoring/graph
You use this to get retrieve the following metrics
dau
mau
rating
crashes
crashPercen
appLoads
affectedUsers
affectedUserPercent
You can query for two time ranges 1 day (1440 mins) and 1 month (43200 mins)
Here is the documentation for this http://docs.crittercism.com/api/api.html#!/errorMonitoring/graph
Hope this helps.
I have a java web app, where one of the functionalities I need, is to fetch an existing date, and have the option of editing the date, by selecting another. This works really good with a JQuery DatePicker that I found, in combination with a regular input field, type text, that triggers the datepicker.
My existing date (GregorianCalendar) objects are loaded from a database, and parsed into JavaBeans, and further loaded into the jsp page, and the specific input field.
However, Java's SimpleDateFormat, JSTL functions-tag library, and DatePicker's formatDate function! does not interpret formatting styles the same way. MM in java and JSTL gives "03", while in the DatePicker, it gives "March" as output. My desired "03" is achieved by the pattern mm in DatePicker, but then it shows the objects minutes, if any, in JSTL.
I figure there are two options. Either I create two patterns which can format the different outputs to the same desired String. Or I could use datePickers formatDate-function to format my java date object upon loading of the data.
I much desire to have only one global string pattern for my datepicker, because I want to be able to change the pattern without breaking the code, or having to many dependencies. Therefor solution number 2 is what I want.
Right now, this is my code for getting the formatted date String, using tag libraries:
<fmt:formatDate type="date"
pattern="${appdata.dateFormat}"
value="${myBeanObject.gregorianCalendarObject.time}">
</fmt:formatDate>
I want something like:
$.datepicker.formatDate( ${appdata.dateFormat},
${myBeanObject.gregorianCalendarObject.time});
However, with my limited JQuery/Javascript experience, I do not know how or where to put this code, and how to invoke the function correctly. Everything I seem to find about invoking scripts seem to involve an onclick, or "ready"-functions. I merely want to get the string, while building the page.
I realise that my problem is more basic than my title, but if anyone has another idea on how to get java's and the DatePicker's formatting patterns to play along, I would be happy to hear them. If not, how can I invoke the script, and get that String I want?
Thanks in advance.
EDIT
So far I have added a small parser in my singleton applicationwide data object. When setting the dateFormat-string, it automatically parses the String to one that is matching JQuery DatePicker's mismatching interpretation. But to make this really work, I would need to add a whole lot of possible translations, other than month and year-patterns.
public void setDateFormat(String dateFormat) {
this.dateFormat = dateFormat;
datePickerFormat = dateFormat.replaceAll("yyyy", "yy");
datePickerFormat = datePickerFormat.replaceAll("M", "m");
}
This question already has answers here:
Natural Language date and time parser for java [closed]
(8 answers)
Closed 6 years ago.
Can somebody suggest any Library in Java which is capable of parsing Date/Time Calendar Event from Unstructured Data.
Example
Starts 10pm Tonight! Sunday feb 10th => 10/Feb/2013 10pm
tomorrow (feb 10th) => 10/Feb/2013
Sunday Feb 10\r\nwith daily screenings till Feb 16th
and so on
The input data comes from user, so he may enter data in any random format.
I started of identifying all the possible token and do a regex match to phrase all tokens.
I wonder if someone can suggest some Library in Java, which might actually help in parsing.
I ran through other post on SO, but they seem to suggest techniques, i wonder if somebody has a library.
Thanks
You could take some of the trunk source from Apache openLNP (natural language processing) at http://opennlp.apache.org/ or just set up a callable RESTful web service by implementing openNLP on your server. Benefit of implementing the OOB openNLP is you have entity extractors through the nameFinder interface for dates, times, organizations, locations, and people. You would also be able to build an example file of more typical context for the items of interest indicating their appropriate entity type and train the NLP model against it to gain a better hit rate for your context. I have a working example of a C# NLP in the apps section of my portfolio at http://www.augmentedintel.com/apps/csharpnlp/extract-names-from-text.aspx.
UTAH (https://github.com/sonalake/utah-parser) is able to handle generic parsing of unstructured text into maps. Once you've done that you should be able to throw that into a formatter.
I have gone through the tutorials of using network resources in android. But I was not satisfied. What I am looking for is an way to get inside the webpage. The tutorials tell how to get to an URL. Well that is pretty simple as far as I can tell.
I am trying to make an currency converter app and for that I will have to get the exchange rates. How to do that exactly? This webpage gives an decent amount of exchange rates. I want to use this in my app. How can I do it?
Example-User selects an "from" and an "to" currency in the app. And that conversion should happen instantaneously. So for that I will have to get the exchange rates before hand and store them in the database. And If the user is offline,the app should select the last updated values.
Please help!
I would use an API; like this free open source one: http://josscrowcroft.github.com/open-exchange-rates/ to get the currency exchange rate as it would be impossible or at least extremely difficult to parse the data from the url you provided.
The API I suggested above will give you the rates you need back in JSON format which can be easily parsed in java.