text mining and advanced search solution for sharepoint [closed]

text mining and advanced search solution for sharepoint [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Within my organization, we have maintained a sharepoint site to store a large amount of files related to previous/ongoing projects. These files can be word, pdf and ppt files. We are interesting to build a solution that have following functionalities
1) Advanced search, return a set of files that matches the keyword input by users. It is better to mark the returned files with some label (like using color) on the contents that are directly related to the search keyword.
2) Enable users to perform some types of analysis on the sharepoint site. Such as social network analysis of the person who are authors of some sharepoint files.
Are there any commercial software or open source library to fulfill these types of tasks?

This response is assuming you are using SharePoint 2010 or 2013.
Consider using faceted search. If you have an Enterprise cal you can easily set this up. The trick is making sure the metadata for the facets is available. This would obtain the search behavior your looking for, but not the interaction and tagging.
For this it would be best to create a custom solution, and leverage term sets in managed metadata. In SharePoint 2010 there is conditional formatting that you could use for color coding, however this is deprecated in 2013.
Hope those directions are helpful, but ultimately you are likely going to need to do a combination with custom code and event handlers.

Related

Is there a Java library for all HTML tag names? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
This post was edited and submitted for review 11 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I was wondering if there exists any Java library that contains all HTML tags. I am writing selenium tests for fairly complex web sites using the Java binding, and often needing to find an element by tag name. I thought having a class with constants referring to each tag name would be nice. Since there is a finite list of HTML tags, I'm thinking this must already exist. I could begin writing mine, of course, but why reinvent the wheel if there is one out there. I have checked the Selenium Java API documentation but can't find any. Any suggestions?

No, I do not believe such a library is currently available.
Although there are a finite number of STANDARD tags in html, there is also the ability to have USER DEFINED tags. There are also different versions of HTML (current is HTML5) that support many different tags For example the tag is no longer supported in HTML5, but does exist in older versions of HTML. These two factors may make it increasingly difficult to create a definitive library of all tags.
The best option would probably be to create your own, personalized library for the project.

Create PDF's on Google App-Engine [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Since GAE has severe restrictions like - "A Java application cannot use any classes used to write to the filesystem"...
Is there a good Java PDF library that can write the PDF to memory for streaming to the cloud?

You can use iText without limitations now. There is no need for a patch since version 5.2.0 anymore.
Have a look at the following post for an example: Generate PDF using GAE and iText

According to this thread on google groups (requires authentication), PDFjet can be used on GAE (it has been slightly modified to replace files by streams at a few places). As they say in the thread:
It's a quite low-level library but should be ok for simple tasks.
As of now, both iText and JasperReports are listed as incompatible on the "Will it play in App Engine" page due to the dependence on several classes that are not in the JRE class whitelist.
Update (2010/09/26): As mentioned by Guido in a comment (and I thank him for that), some people claim they have an iText patch to make it compatible with GAE. Worth the try if you want to play with iText.

Extracting data from Wikipedia [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am creating a Spring application and I have the need to integrate with Wikipedia. In particular, I would like to extract data on a given (large) set of Cities, e.g. country, website and coordinates.
I am trying to understand which libraries or frameworks can be useful, but the big issue I am dealing with is that there is no reference structure for the pages I would like to extract information from. The main problem is not that some information is missing, which would be totally acceptable, but rather the city representation changes from city to city. E.g. the DBPedia ontology (e.g. http://dbpedia.org/ontology/City) does not reflect what I can extract via SPARQL query from dbpedia.org/sparql. This way, I don't know how to extract the data I need systematically (i.e. for my entire set).
Is there any technology that can support my task of extracting data on a predefined set of cities?
One solution could be to put in place some Natural Language Processing in order to seek for the required info in the entire Wikipedia page, but that requires a lot of effort, if I have to write it on my own.
Another solution could be leveraging a source of structured data that pre-processed Wikipedia for me and gave some structure to the contained information, but I could not find one.
On third solution could be trying to make different queries to Wikipedia, but I cannot figure out a way to extract the information I need via those Wikipedia APIs.

Data from Wikipedia is being transfered to Wikidata. Using their API you could get what you want. If you want a shortcut you could use the Wikidata query tool: http://wdq.wmflabs.org/api_documentation.html

Im not a java guy, but I did something like this in .Net.
You need some kind of web scraping framework.
In .Net there is HtmlAgilityPack. Where you get the site and then with fx XPATH go through elements of the sites. Offcourse you need to know where on the site the informations is. That could be the tags around the heading, text and so on.
For java, the framework I just found was
Tag Soup
HtmlUnit
Web-Harvest
jARVEST
jsoup
Jericho HTML Parser

Are there any adhoc report builder libraries available in Java/ Groovy ? That generate SQL queries based on dimensions/measures selected [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Ideally the library needs a detailing of what tables/columns/aggregation each dimension/measure map to. Then when given the list of selected ones it generates the SQL querie(s)

Probably what you need is a generic layer to access the underlying analytical database, like OLAP4J which provide API layer to the underlying analytical databases.

I haven't used it myself, but I've heard good things about Pentaho. Java based and open source.

See also this question here: Java Business Intelligence framework with ad-hoc web reporting? and the linked jasper plugin for Groovy. However, ad hoc query is very seldom and I am currently facing the same problems.
I think that Adhoc queries for BIRT and JasperSoft are offered only in the "Enterprise" (read "commercial") solution. I am trying to implement it in a way that the creater of the report can provide parameters (special marked) and that the end user can choose to include or exclude this parameters. This is not particulary "Ad hoc", but will be enough for my customers requirements.

If you are looking to generate a query easily from several databases you can try Active Query Builder, it's graphic (let you drop down tables), dead simple to put in your program and is easily customizable (to some extent), this coupled perhaps with JasperReports or with a simple grid, may help you to do what you want.

How to create a webpage similar to Google's online spreadsheet? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I need to create a webpage quite similar to Google's great Docs spreadsheet, but in my case I cannot use their's.
Also there is no need to have the full feature set of the Google Docs spreadsheet (which is really big!).
But my minimal feature set is:
- change, add, delete cells and content
- change of formats like color, size, font
- the functions sum(), avg(), count()
My preferred tools are JSP, tomcat, JQuery. The server representation does not matter and could be any of xml, text, database tables or s.th. else.
I am quite sure, that there are perfect open solutions out there - which I can use to start - to fit my requirements but my problem is to find them.
Searching for "Google spreadsheet alternative" did not work very well.
Any hint or link is appreciated.
Thank you.
Alex

Take a look at primefaces sheet. If you like jQuery and use JSP, you should consider learning JSF and use primefaces since primefaces already heavily uses jQuery, meaning if you have to personalize the behavior of its controls, you are already presented with a familiar interface.
Primefaces sheet seems to provide what you're looking for, which is to say, something similar to an excel sheet.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.