Converting PHP function to Java - java

I've been trying to convert a PHP code to Java, but its not working as intended. I get an error in the loop with "String index out of range" after a few runs on char nextchar = inprogresskey.charAt(ranpos);
The PHP code is:
function munge($address)
{
$address = strtolower($address);
$coded = "";
$unmixedkey = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789.#";
$inprogresskey = $unmixedkey;
$mixedkey="";
$unshuffled = strlen($unmixedkey);
for ($i = 0; $i <= strlen($unmixedkey); $i++)
{
$ranpos = rand(0,$unshuffled-1);
$nextchar = $inprogresskey{$ranpos};
$mixedkey .= $nextchar;
$before = substr($inprogresskey,0,$ranpos);
$after = substr($inprogresskey,$ranpos+1,$unshuffled-($ranpos+1));
$inprogresskey = $before.''.$after;
$unshuffled -= 1;
}
$cipher = $mixedkey;
$shift = strlen($address);
for ($j=0; $j<strlen($address); $j++)
{
if (strpos($cipher,$address{$j}) == -1 )
{
$chr = $address{$j};
$coded .= $address{$j};
}
else
{
$chr = (strpos($cipher,$address{$j}) + $shift) % strlen($cipher);
$coded .= $cipher{$chr};
}
}
$txt = "<script type=\"text/javascript\" language=\"javascript\">\n";
$txt .= "\ncoded = \"" . $coded . "\"\n" .
" key = \"".$cipher."\"\n".
" shift=coded.length\n".
" link=\"\"\n".
" for (i=0; i<coded.length; i++) {\n" .
" if (key.indexOf(coded.charAt(i))==-1) {\n" .
" ltr = coded.charAt(i)\n" .
" link += (ltr)\n" .
" }\n" .
" else { \n".
" ltr = (key.indexOf(coded.charAt(i))-
shift+key.length) % key.length\n".
" link += (key.charAt(ltr))\n".
" }\n".
" }\n".
"document.write(\"<a href='mailto:\"+link+\"'>\"+link+\"</a>\")\n" .
"\n".
"//-"."->\n" .
"<" . "/script><noscript>N/A" .
"<"."/noscript>";
return $txt;
}
And my Java code is:
private String encryptEmail(String email)
{
String address = email.toLowerCase();
String coded = "";
String unmixedkey = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789.#";
String inprogresskey = unmixedkey;
String mixedkey = "";
int unshuffled = unmixedkey.length();
for (int i = 0; i <= unmixedkey.length(); i++) {
Random random = new Random();
int ranpos = random.nextInt(unshuffled - 1);
char nextchar = inprogresskey.charAt(ranpos);
mixedkey += nextchar;
String before = StringUtils.substring(inprogresskey, 0, ranpos);
String after = StringUtils.substring(inprogresskey, ranpos + 1, unshuffled - (ranpos + 1));
inprogresskey = before + "" + after;
unshuffled -= 1;
}
String cipher = mixedkey;
int shift = address.length();
for (int j = 0; j < address.length(); j++) {
int chr = -1;
if (StringUtils.indexOf(cipher, address.substring(j - 1, j)) == -1) {
coded += address.charAt(j);
} else {
chr = (cipher.charAt(j + shift)) % cipher.length();
coded += cipher.charAt(chr);
}
}
StringBuilder sb = new StringBuilder();
sb.append("<script type=\"text/javascript\">\n");
sb.append("var coded = \"" + coded + "\";\n");
sb.append("var key = \"" + cipher + "\";\n");
sb.append("var shift = coded.length;\n");
sb.append("var link = \"\";\n");
sb.append("for (i = 0; i < coded.length; i++) {\n");
sb.append(" if (key.indexOf(coded.charAt(i))==-1) {\n");
sb.append(" ltr = coded.charAt(i);\n");
sb.append(" link += (ltr);\n");
sb.append(" }\n");
sb.append(" else {\n");
sb.append(" ltr = (key.indexOf(coded.charAt(i))-shift+key.length) % key.length;\n");
sb.append(" link += (key.charAt(ltr));\n");
sb.append(" }");
sb.append("}");
sb.append("document.write(\"<a rel='nofollow' href='mailto:\" + link + \"'>\" + link + \"</a>\");\n");
sb.append("</script>");
return sb.toString();
}
Am I missing out on some functions (charAt, indexOf)?
Thanks

int ranpos = random.nextInt(unshuffled - 1);
atlast ranpos = 1
and you are doing nextInt(1 - 1)
char nextchar = inprogresskey.charAt(ranpos)
that's way above line gives you error
what you need to do is:
update your for loop for (int i = 0; i < unmixedkey.length(); i++)
and inside the loop add the below line of code
if(unshuffled==1)
{
ranpos = 1;
}
else {
ranpos = random.nextInt(unshuffled - 1);
}
The below is fully functional for loop code.
for (int i = 0; i < unmixedkey.length(); i++) {
Random random = new Random();
int ranpos=0;
if(unshuffled==1)
{
ranpos = 1;
}else{
ranpos = random.nextInt(unshuffled - 1);
}
char nextchar = inprogresskey.charAt(ranpos);
mixedkey += nextchar;
String before = StringUtils.substring(inprogresskey, 0, ranpos);
String after = StringUtils.substring(inprogresskey, ranpos + 1, unshuffled - (ranpos + 1));
inprogresskey = before + "" + after;
unshuffled -= 1;
}

I suspect that unshuffled is equal to 0 on the last time through the loop, and so charAt(-1) is failing.
You should take a look at Java IDEs like Eclipse and the debugger. Adding breakpoints will enable you to step through the code as it runs, and see the values of all variables, which would be the quickest way of solving this sort of issue in future.

Related

Parse string containing javascript

I have a string:
2 + 2 = ${2 + 2}
This is a ${"string"}
This is an object: ${JSON.stringify({a: "B"})}
This should be "<something>": ${{
abc: "def",
cba: {
arr: [
"<something>"
]
}
}.cba.arr[0]}
This should ${"${also work}"}
And after parsing it I should get something like that:
2 + 2 = 4
This is a string
This is an object: {"a":"B"}
This should be "<something>": <something>
This should ${also work}
So I need help implementing it in Java, I simply need to get what is between ${ and }.
I tried using a regular expression: \${(.+?)} but it fails when string inside contains }
So after a bit of testing, I've ended up with this:
ScriptEngine scriptEngine = new ScriptEngineManager(null).getEngineByName("JavaScript");
String str = "2 + 2 = ${2 + 2}\n" +
"This is a ${\"string\"}\n" +
"This is an object: ${JSON.stringify({a: \"B\"})}\n" +
"This should be \"F\": ${var test = {\n" +
" a : {\n" +
" c : \"F\"\n" +
" }\n" +
"};\n" +
"test.a.c\n" +
"}\n" +
"This should ${\"${also work}\"}"; // String to be parsed
StringBuffer result = new StringBuffer();
boolean dollarSign = false;
int bracketsOpen = 0;
int beginIndex = -1;
int lastEndIndex = 0;
char[] chars = str.toCharArray();
for(int i = 0; i < chars.length; i++) { // i is for index
char c = chars[i];
if(dollarSign) {
if(c == '{') {
if(beginIndex == -1) {
beginIndex = i + 1;
}
bracketsOpen++;
} else if(c == '}') {
if(bracketsOpen > 0) {
bracketsOpen--;
}
if(bracketsOpen <= 0) {
int endIndex = i;
String evalResult = ""; // evalResult is the replacement value
try {
evalResult = scriptEngine.eval(str.substring(beginIndex, endIndex)).toString(); // Using script engine as an example; str.substring(beginIndex, endIndex) is used to get string between ${ and }
} catch (ScriptException e) {
e.printStackTrace();
}
result.append(str.substring(lastEndIndex, beginIndex - 2));
result.append(evalResult);
lastEndIndex = endIndex + 1;
dollarSign = false;
beginIndex = -1;
bracketsOpen = 0;
}
} else {
dollarSign = false;
}
} else {
if(c == '$') {
dollarSign = true;
}
}
}
result.append(str.substring(lastEndIndex));
System.out.println(result.toString());

Replace characters with substring using a loop Java

I am trying to replace each instance of what is between two brackets using a loop and an array. array1a and array1b are the indices of where the brackets open and close. I want to get the number between the two brackets and increment it by one and replace the value currently there, but as the string text is currently a list (such as "list item (0) list item (10) list item (1023)" I want to use a loop to increment the value of each rather than to set all the values within brackets to the same value. I hope this makes sense!
String text = myString.getText();
for (int x = 0; x < 10; x++) {
array2[x] = text.substring(array1a[x], array1b[x]);
array2[x] = array2[x] + 1;
array3[x] = "(" + array2[x] + ")";
String text2 = text.replaceAll("\\(.*\\)", array3[x]);
myString.setText(text2);
}
Full Code:
public class CreateVideoList extends JFrame implements ActionListener {
JButton play = new JButton("Play Playlist");
JButton addVideo = new JButton("Add Video");
TextArea playlist = new TextArea(6, 50);
JTextField videoNo = new JTextField(2);
private int x = 0;
#Override
public void actionPerformed(ActionEvent e) {
String key = videoNo.getText();
String name = VideoData.getName(key);
String director = VideoData.getDirector(key);
Integer playCount = VideoData.getPlayCount(key);
String text = playlist.getText();
String rating = CheckVideos.stars(VideoData.getRating(key));
String output = name + " - " + director + "\nRating: "
+ rating
+ "\nPlay Count: " + playCount;
String newItem = key + " " + name + " - " + director + " ("
+ playCount + ") " + "\n";
String addToList = "";
String[] array3 = new String[100];
if ("Add Video".equals(e.getActionCommand())) {
if (Character.isDigit(text.charAt(0)) == false) {
playlist.setText("");
}
if (addToList.indexOf(key) == -1) {
addToList += addToList + newItem;
playlist.append(addToList);
array3[x] = key;
x++;
} else if (addToList.indexOf(key) != -1) {
JOptionPane.showMessageDialog(CreateVideoList.this,
"This video is already in the playlist. Please select a"
+ " different video.", "Add to playlist error", JOptionPane.INFORMATION_MESSAGE);
}
}
if ("Play Playlist".equals(e.getActionCommand())) {
Integer length = (text.length());
int counta = 0;
Integer[] array1a = new Integer[100];
Integer[] array1b = new Integer[100];
String strPlayCount = "";
for (x = 0; x < length; x++) {
if (text.charAt(x) == '(') {
counta++;
array1a[counta - 1] = x;
array1a[counta - 1] = array1a[counta - 1] + 1;
}
if (text.charAt(x) == ')') {
array1b[counta - 1] = x;
array1b[counta - 1] = array1b[counta - 1];
}
}
String[] array2 = new String[counta];
String[] array4 = new String[100];
for (int y = 0; y < counta; y++) {
array2[y] = text.substring(array1a[y], array1b[y]);
array2[y] = array2[y] + 1;
playCount = Integer.parseInt(array2[y]);
array4[y] = "(" + array2[y] + ")";
String text2 = text.replaceAll("\\(.*\\)", array4[y]);
playlist.setText(text2);
}
}
}
Replace
array2[x] = array2[x] + 1;
array3[x] = "(" + array2[x] + ")";
with
Integer n = Integer.parseInt(array2[x]) + 1;
array3[x] = "(" + n.toString() + ")";

Split 2 string and join

I have 2 string which I want to join as per my requirements. Say I have
String sa = {"as,asd,asdf"};
String qw = {"12,123,1234"};
String[] separated = ItemSumm.split(",");
String[] separateds = Itemumm.split(",");
StringBuffer sb = new StringBuffer();
for (int i = 0; i < separateds.length; i++)
{
if (separated.length == i + 1)
{
sb.append(separated[i] + "(" + separateds[i] + ")");
} else
{
sb.append(separated[i] + "(" + separateds[i] + "),");
}
}
deleteListItem.list_summ.setText(sb.toString());
it gives as(12),asd(123),asdf(1234)
But problem is , it can be like
String sa = {"as,asdf"};
String qw = {"12,123,1234"};
So in this I want like
as(12),asdf(123),1234
Try this code :
String sa = {"as,asd"};
String qw = {"12,123,1234"};
String[] separated = ItemSumm.split(",");
String[] separateds = Itemumm.split(",");
StringBuffer sb = new StringBuffer();
for (int i = 0; i < separateds.length; i++) {
if (separated.length == i + 1) {
if(separated.length == i) {
sb.append(separateds[i] + "");
} else {
sb.append(separated[i] + "(" + separateds[i] + ")");
}
} else {
if(separated.length == i) {
sb.append("," + separateds[i]);
} else {
sb.append(separated[i] + "(" + separateds[i] + "),");
}
}
}
deleteListItem.list_summ.setText(sb.toString());
// Answer : as(12),asd(123),1234
String sa = {"as,asd,asdf"};
String qw = {"12,123,1234"};
String[] separated = ItemSumm.split(",");
String[] separateds = Itemumm.split(",");
StringBuffer sb = new StringBuffer();
// first loop through separated, starting with a comma
for (int i = 0; i < separated.length; i++) {
sb.append(",").append(separated[i]).append("(").append(separateds[i]).append(")"));
}
// append remaining items in separateds
for (int i = separated.length; i < separateds.length; i++) {
sb.append(",").append(separateds[i]);
}
deleteListItem.list_summ.setText(sb.toString().substring(1)); // remove starting comma
if the lenghts of the strings are the sa, do the join
if (separated.length == i + 1 && (separated[i].lenght == separateds[i].lenght))

KeyTyped event - not able to read the last char of a string

So I've been working on this password strength checker, and to provide visual feedback of the breakdown of points to the user as the password is being typed in, I use a KeyTyped event and then analyze the string and eventually start giving out points as the minimum length is reached. Here's what the a part of the analysis looks like :
if (in.matches("[a-z]+")){
lowerPenalty = -15;
}
if (in.matches("[0-9]+")){
numPenalty = -15;
}
for(int i=0;i<inLen;i++){
if ((in.charAt(i) + "").matches("[A-Z]")){
upperCounter++;
upperBonus = upperBonus + 4;
}
However, when I run the program, it doesn't consider the last character of the password typed in by the user, and thus the corresponding counter is not incremented. Here's the screenshot:
As you can see in the above screenshot, the numCounter in Number Bonus row is still at '1' instead of '2'. I've tried using KeyPressed event, though the problem still persists.
Please help.
As requested, here's the keyListener code:
input.addKeyListener(new KeyAdapter(){
#Override
public void keyTyped(KeyEvent e1){
if ((int)e1.getKeyChar() == 8){
count = -1;
baseScore = 0;
lenBonus = 0;
upperBonus = 0;
upperCounter = 0;
numBonus = 0;
numCounter = 0;
symBonus = 0;
symCounter = 0;
comBonus = 0;
lowerPenalty = 0;
numPenalty = 0;
comBonus = 0;
totalScore = 0;
input.setText("");
strength_lbl.setText("Enter a random password");
strength_lbl.setBackground(Color.LIGHT_GRAY);
}
count++;
Counter.setText(count+"");
analyzeStr(input.getText());
baseScore_txt.setText(baseScore+"" );
lowerPen_txt.setText(lowerPenalty+"");
numonlyPen_txt.setText(numPenalty+"");
upperBonus_txt.setText(upperBonus+" [" + (upperCounter) + "x4]");
numBonus_txt.setText(numBonus+" [" + numCounter + "x5]");
symBonus_txt.setText(symBonus+" [" + symCounter + "x5]");
comBonus_txt.setText(comBonus+"");
totalScore = baseScore + lenBonus + upperBonus + numBonus + symBonus + comBonus + lowerPenalty + numPenalty;
totalScore_txt.setText(totalScore+"");
if (totalScore>=1 && totalScore<50){
strength_lbl.setText("Weak!");
strength_lbl.setBackground(Color.red);
}
if (totalScore>=50 && totalScore<75){
strength_lbl.setText("Average!");
strength_lbl.setBackground(Color.orange);
}
if (totalScore>=75 && totalScore<100 ){
strength_lbl.setText("Strong!");
strength_lbl.setBackground(Color.cyan);
}
if (totalScore>=100){
strength_lbl.setText("Secure!");
strength_lbl.setBackground(Color.green);
}
}
});
As requested, here's the analyzeString method:
public void analyzeStr(String str){
String in = input.getText();
int inLen = input.getText().length();
if (count == 1){
strength_lbl.setBackground(Color.RED);
strength_lbl.setText("At least 8 characters please!");
}
if (input.getText().length()<8){
lengthBonus_txt.setText("0");
}
else{
lengthBonus_txt.setText(lenBonus +" [" + (count-8) + "x3]");
}
if (count==8){
baseScore = 50;
if (in.matches("[a-z]+")){
lowerPenalty = -15;
}
if (in.matches("[0-9]+")){
numPenalty = -15;
}
for(int i=0;i<inLen;i++){
if ((in.charAt(i) + "").matches("[A-Z]")){
upperCounter++;
upperBonus = upperBonus + 4;
}
if ((in.charAt(i) + "").matches("[0-9]")){
numCounter++;
numBonus = numBonus + 5;
}
if ((in.charAt(i) + "").matches("[!,#,#,$,%,^,&,*,?,_,~]")){
symCounter++;
symBonus = symBonus + 5;
}
}
}
if (count>8){
lenBonus = lenBonus + 3;
lengthBonus_txt.setText(lenBonus+" [" + (inLen-7) + "x3]");
if ((in.charAt(inLen-1) + "").matches("[A-Z]")){
upperCounter++;
upperBonus = upperBonus + 4;
}
if ((in.charAt(inLen-1) + "").matches("[0-9]")){
numCounter++;
numBonus = numBonus + 5;
}
if ((in.charAt(inLen-1) + "").matches("[!,#,#,$,%,^,&,*,?,_,~]")){
symCounter++;
symBonus = symBonus + 5;
}
}
if (count>=8){
if (in.matches("[A-Z][0-9][!,#,#,$,%,^,&,*,?,_,~]")){
comBonus = 25;
}
if (in.matches("[0-9][A-Z][!,#,#,$,%,^,&,*,?,_,~]")){
comBonus = 25;
}
if (in.matches("[!,#,#,$,%,^,&,*,?,_,~][0-9][A-Z]")){
comBonus = 25;
}
if (in.matches("[!,#,#,$,%,^,&,*,?,_,~][A-Z][0-9]")){
comBonus = 25;
}
if (in.matches("[!,#,#,$,%,^,&,*,?,_,~][A-Z][0-9]")){
comBonus = 25;
}
if (in.matches("[A-Z][!,#,#,$,%,^,&,*,?,_,~][0-9]")){
comBonus = 25;
}
if (in.matches("[0-9][!,#,#,$,%,^,&,*,?,_,~][A-Z]")){
comBonus = 25;
}
}
}

Sentence comparison with NLP

I used lingpipe for sentence detection but I don't have any idea if there is a better tool. As far as I have understood, there is no way to compare two sentences and see if they mean the same thing.
Is there anyother good source where I can have a pre-built method for comparing two sentences and see if they are similar?
My requirement is as below:
String sent1 = "Mary and Meera are my classmates.";
String sent2 = "Meera and Mary are my classmates.";
String sent3 = "I am in Meera and Mary's class.";
// several sentences will be formed and basically what I need to do is
// this
boolean bothAreEqual = compareOf(sent1, sent2);
sop(bothAreEqual); // should print true
boolean bothAreEqual = compareOf(sent2, sent3);
sop(bothAreEqual);// should print true
How to test if the meaning of two sentences are the same: this would be a too open-ended question.
However, there are methods for comparing two sentences and see if they are similar. There are many possible definition for similarity that can be tested with pre-built methods.
See for example http://en.wikipedia.org/wiki/Levenshtein_distance
Distance between
'Mary and Meera are my classmates.'
and 'Meera and Mary are my classmates.':
6
Distance between
'Mary and Meera are my classmates.'
and 'Alice and Bobe are not my classmates.':
14
Distance between
'Mary and Meera are my classmates.'
and 'Some totally different sentence.':
29
code:
public class LevenshteinDistance {
private static int minimum(int a, int b, int c) {
return Math.min(Math.min(a, b), c);
}
public static int computeDistance(CharSequence str1,
CharSequence str2) {
int[][] distance = new int[str1.length() + 1][str2.length() + 1];
for (int i = 0; i <= str1.length(); i++){
distance[i][0] = i;
}
for (int j = 0; j <= str2.length(); j++){
distance[0][j] = j;
}
for (int i = 1; i <= str1.length(); i++){
for (int j = 1; j <= str2.length(); j++){
distance[i][j] = minimum(
distance[i - 1][j] + 1,
distance[i][j - 1] + 1,
distance[i - 1][j - 1]
+ ((str1.charAt(i - 1) == str2.charAt(j - 1)) ? 0 : 1));
}
}
int result = distance[str1.length()][str2.length()];
//log.debug("distance:"+result);
return result;
}
public static void main(String[] args) {
String sent1="Mary and Meera are my classmates.";
String sent2="Meera and Mary are my classmates.";
String sent3="Alice and Bobe are not my classmates.";
String sent4="Some totally different sentence.";
System.out.println("Distance between \n'"+sent1+"' \nand '"+sent2+"': \n"+computeDistance(sent1, sent2));
System.out.println("Distance between \n'"+sent1+"' \nand '"+sent3+"': \n"+computeDistance(sent1, sent3));
System.out.println("Distance between \n'"+sent1+"' \nand '"+sent4+"': \n"+computeDistance(sent1, sent4));
}
}
Here is wat i have come up with. this is just a substitute till i get to the real thing but it might be of some help to people out there..
package com.examples;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import com.aliasi.sentences.MedlineSentenceModel;
import com.aliasi.sentences.SentenceModel;
import com.aliasi.tokenizer.IndoEuropeanTokenizerFactory;
import com.aliasi.tokenizer.Tokenizer;
import com.aliasi.tokenizer.TokenizerFactory;
import com.aliasi.util.Files;
import com.sun.accessibility.internal.resources.accessibility;
public class SentenceWordAnalysisAndLevenshteinDistance {
private static int minimum(int a, int b, int c) {
return Math.min(Math.min(a, b), c);
}
public static int computeDistance(CharSequence str1, CharSequence str2) {
int[][] distance = new int[str1.length() + 1][str2.length() + 1];
for (int i = 0; i <= str1.length(); i++) {
distance[i][0] = i;
}
for (int j = 0; j <= str2.length(); j++) {
distance[0][j] = j;
}
for (int i = 1; i <= str1.length(); i++) {
for (int j = 1; j <= str2.length(); j++) {
distance[i][j] = minimum(
distance[i - 1][j] + 1,
distance[i][j - 1] + 1,
distance[i - 1][j - 1]
+ ((str1.charAt(i - 1) == str2.charAt(j - 1)) ? 0
: 1));
}
}
int result = distance[str1.length()][str2.length()];
return result;
}
static final TokenizerFactory TOKENIZER_FACTORY = IndoEuropeanTokenizerFactory.INSTANCE;
static final SentenceModel SENTENCE_MODEL = new MedlineSentenceModel();
public static void main(String[] args) {
try {
ArrayList<String> sentences = null;
sentences = new ArrayList<String>();
// Reading from text file
// sentences = readSentencesInFile("D:\\sam.txt");
// Giving sentences
// ArrayList<String> sentences = new ArrayList<String>();
sentences.add("Mary and Meera are my classmates.");
sentences.add("Mary and Meera are my classmates.");
sentences.add("Meera and Mary are my classmates.");
sentences.add("Alice and Bobe are not my classmates.");
sentences.add("Some totally different sentence.");
// Self-implemented
wordAnalyser(sentences);
// Internet referred
// levenshteinDistance(sentences);
} catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
}
}
private static ArrayList<String> readSentencesInFile(String path) {
ArrayList<String> sentencesList = new ArrayList<String>();
try {
System.out.println("Reading file from : " + path);
File file = new File(path);
String text = Files.readFromFile(file, "ISO-8859-1");
System.out.println("INPUT TEXT: ");
System.out.println(text);
List<String> tokenList = new ArrayList<String>();
List<String> whiteList = new ArrayList<String>();
Tokenizer tokenizer = TOKENIZER_FACTORY.tokenizer(
text.toCharArray(), 0, text.length());
tokenizer.tokenize(tokenList, whiteList);
System.out.println(tokenList.size() + " TOKENS");
System.out.println(whiteList.size() + " WHITESPACES");
String[] tokens = new String[tokenList.size()];
String[] whites = new String[whiteList.size()];
tokenList.toArray(tokens);
whiteList.toArray(whites);
int[] sentenceBoundaries = SENTENCE_MODEL.boundaryIndices(tokens,
whites);
System.out.println(sentenceBoundaries.length
+ " SENTENCE END TOKEN OFFSETS");
if (sentenceBoundaries.length < 1) {
System.out.println("No sentence boundaries found.");
return new ArrayList<String>();
}
int sentStartTok = 0;
int sentEndTok = 0;
for (int i = 0; i < sentenceBoundaries.length; ++i) {
sentEndTok = sentenceBoundaries[i];
System.out.println("SENTENCE " + (i + 1) + ": ");
StringBuffer sentenceString = new StringBuffer();
for (int j = sentStartTok; j <= sentEndTok; j++) {
sentenceString.append(tokens[j] + whites[j + 1]);
}
System.out.println(sentenceString.toString());
sentencesList.add(sentenceString.toString());
sentStartTok = sentEndTok + 1;
}
} catch (IOException e) {
// TODO: handle exception
e.printStackTrace();
}
return sentencesList;
}
private static void levenshteinDistance(ArrayList<String> sentences) {
System.out.println("\nLevenshteinDistance");
for (int i = 0; i < sentences.size(); i++) {
System.out.println("Distance between \n'" + sentences.get(0)
+ "' \nand '" + sentences.get(i) + "': \n"
+ computeDistance(sentences.get(0),
sentences.get(i)));
}
}
private static void wordAnalyser(ArrayList<String> sentences) {
System.out.println("No.of Sentences : " + sentences.size());
List<String> stopWordsList = getStopWords();
List<String> tokenList = new ArrayList<String>();
ArrayList<List<String>> filteredSentences = new ArrayList<List<String>>();
for (int i = 0; i < sentences.size(); i++) {
tokenList = new ArrayList<String>();
List<String> whiteList = new ArrayList<String>();
Tokenizer tokenizer = TOKENIZER_FACTORY.tokenizer(sentences.get(i)
.toCharArray(), 0, sentences.get(i).length());
tokenizer.tokenize(tokenList, whiteList);
System.out.print("Sentence " + (i + 1) + ": " + tokenList.size()
+ " TOKENS, ");
System.out.println(whiteList.size() + " WHITESPACES");
filteredSentences.add(filterStopWords(tokenList, stopWordsList));
}
for (int i = 0; i < sentences.size(); i++) {
System.out.println("\n" + (i + 1) + ". Comparing\n '"
+ sentences.get(0) + "' \nwith\n '" +
sentences.get(i)
+ "' : \n");
System.out.println(filteredSentences.get(0) + "\n and \n"
+ filteredSentences.get(i));
System.out.println("Percentage of similarity: "
+ calculateSimilarity(filteredSentences.get(0),
filteredSentences.get(i))
+ "%");
}
}
private static double calculateSimilarity(List<String> list1,
List<String> list2) {
int length1 = list1.size();
int length2 = list2.size();
int count1 = 0;
int count2 = 0;
double result1 = 0.0;
double result2 = 0.0;
int least, highest;
if (length2 > length1) {
least = length1;
highest = length2;
} else {
least = length2;
highest = length1;
}
// computing result1
for (String string1 : list1) {
if (list2.contains(string1))
count1++;
}
result1 = (count1 * 100) / length1;
// computing result2
for (String string2 : list2) {
if (list1.contains(string2))
count2++;
}
result2 = (count2 * 100) / length2;
double avg = (result1 + result2) / 2;
return avg;
}
private static List<String> getStopWords() {
String stopWordsString = ".,a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your";
List<String> stopWordsList = new ArrayList<String>();
List<String> stopWordTokenList = new ArrayList<String>();
List<String> whiteList = new ArrayList<String>();
Tokenizer tokenizer = TOKENIZER_FACTORY.tokenizer(
stopWordsString.toCharArray(), 0, stopWordsString.length());
tokenizer.tokenize(stopWordTokenList, whiteList);
for (int i = 0; i < stopWordTokenList.size(); i++) {
// System.out.println((i + 1) + ":" + tokenList.get(i));
if (!stopWordTokenList.get(i).equals(",")) {
stopWordsList.add(stopWordTokenList.get(i));
}
}
System.out.println("No.of stop words: " + stopWordsList.size());
return stopWordsList;
}
private static List<String> filterStopWords(List<String> tokenList,
List<String> stopWordsList) {
List<String> filteredSentenceWords = new ArrayList<String>();
for (String sentenceToken : tokenList) {
if (!stopWordsList.contains(sentenceToken)) {
filteredSentenceWords.add(sentenceToken);
}
}
return filteredSentenceWords;
}
}

Categories

Resources