Java library for text analysis and counts [closed] - java

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I need a stable Java library that I can pass a huge string to (e.g., a few chapters from Moby Dick) and get "word count"-like stats:
Number of paragraphs
Number of sentences
Number of words
Number of characters
Preferably something internationalizable/localizable but not required. I figured Apache Commons would have something like this, but after a thorough search it does not.
I could write this myself but it would probably be buggy and take a lot of time; plus I don't want to reinvent the wheel if it already exists. I am thinking of using Apache Tika but cannot confirm if it will do what I need. It seems to handle word count, but not the others. Thanks in advance.

Take a look at Apache Tika. It might serve your requirements

Related

Website to practice Java coding assignments for interview [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am looking for a website where I could practice Java coding assignments that you often get on job interviews. I mean those tasks where you need to calculate primitive numbers, implement some sorting, or do something with an Array, List or a Map. I'm quite experienced java developer, but such tasks can sometimes be tricky :)
Do you know any free websites that could help?
Thanks.
Go to interviewstreet. Companies often use it as first technical screen

Why have Apache commons-math based it's Fraction on int type? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Why have Apache commons-math3 based it's Fraction on int type???
Are there any reasons to use int instead of long? Do we have some performance gains here? Aren't longs process at the same speed as ints on modern CPUs?
I think we got only unneeded limitations from this decision.
Please correct me if I am mistaking.
If you want arbitrary precision, use BigFraction. Many platforms -- especially e.g. Android -- have tight memory constraints and may not be as efficient for 64-bit computations. Additionally, any performance improvements to long may not have been available when Fraction was originally written, and for API compatibility, it may not be changeable.

Pros and Cons of using regex in Java [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Can someone list comprehensive list the pros and cons of using regular expressions in Java programing?
Pro: when regular expressions do what you need.
Con: when they don't.
Other than that, the question as stated is mostly ideological.
Pros:
They are an effective way to match against input.
They are easily configurable and can be separated from code.
Cons:
They be hard to read.
They are not performant. If performance is a concern do not use them.
Pro: It works and it's simple.
Con: There are none.
Why ask? Perhaps you have something more specific you'd like to know?

java: csv...do I need a library? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Do I need a library if I only need to make csv formatted file. I don't need reading and parsing it.
No, you don't. And even reading/parsing can be easily done with a plain JRE.
CSV is a plain (ascii-)text format with only a few rules:
rows (objects) are separated with a \n
columns (fields, attributes) are spearated with a delimiter char (usually a comma, but define whatever you need)
row and column delimiters must not be part of the field values
Unless it's a really trivial part of your application and you're absolutely sure you won't ever need to parse a CSV file, you need a CSV-serialization library.
I have tried openCSV and I'm pretty happy using it. Of course you can write your own class to handle this serialization, but a library always comes with more features at the expense of an extra dependency...

What HTML parsing libraries do you recommend in Java [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I want to parse some HTML in order to find the values of some attributes/tags etc.
What HTML parsers do you recommend? Any pros and cons?
NekoHTML, TagSoup, and JTidy will allow you to parse HTML and then process with XML tools, like XPath.
I have tried HTML Parser which is dead simple.
Do you need to do a full parse of the HTML? If you're just looking for specific values within the contents (a specific tag/param), then a simple regular expression might be enough, and could very well be faster.

Categories

Resources