collection of variant of words using special symbols [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am working on a project related to spam filtering. Many of you might be aware of this technique used by spammers:
writing "items" as "|tem" (pipe instead of i)
$ale instead of sale
h0t instead of hot (zero instead of letter 'o')
etc. etc.
I am wondering if there is a database available for all such possible variants of words using special symbols? Or does any one know about some good strategy to tackle this trick?
Currently what I have done is, I simply replace '#' with 'a', '|' with 'i', '$' with 's' and so on. I need your views on this issue! Please help.

It seems you are taking a message as a starting point and trying to transform it.
Another aproach could be to start by defining a list of words which are likely to be changed (sale, viagra, etc) and then generate all possible similar words. As a measure of similarity you can take a Levenshtein distance.

Related

JAVA get Plural/Singular for a given String/Word [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
trying to match a word with some hard coded values, let's say i have this word
'revenue' but 'revenues'
should also be a match.same way like this
'liability' > 'liabilities' .
what would be the approach we should take here, thanks in advance.
I have tried using my own algorithm but it is very difficult maintain word library and its respective plural or singular.
If you don't want to maintain full dictionary, then you might try to implement some general rules plus dictionary of exceptions from those rules.
But these are all quick and hacky solutions. Depending on how good must it be, different approaches would also be available like machine learning and maybe some language services available on clouds like AWS or Azure...
You might want to look at PorterStemmer of lucene. The idea is to compare the stems of both the words instead of comparing singulars and plurals. You can read more about it here.
Here is the maven dependency and below is an example:
PorterStemmer stemmer = new PorterStemmer();
stemmer.setCurrent("liability");
stemmer.stem();
System.out.println(stemmer.getCurrent());
stemmer.setCurrent("liabilities");
stemmer.stem();
System.out.println(stemmer.getCurrent());
The above returns same stems for both the words.

java programming homework 2015 [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
i am having some issues with this question for my homework and was wondering if anyone has any ideas or insights to the answer really struggling ,,,,
You are to amend the program you created last week, this time however the program will ask you to
type the names that need to be stored. Once this is complete the program will display the names in
alphabetical order and display the number of characters for each name and the number of vowels
each name has.
Since this is homework, I'll give you a hint. If you've studied the Java Collections, you can stored the names in alphabetical order. If you've not got to collections yet, the you simply save them off as Strings as they come in.
When it's time to display them in alphabetical order you could:
- sort them before you display them
- or brute force it. Loop thru your storage of names and display all the A's. Then loop thru picking up the B's.
As for counting vowels, look at the Java documentation for the methods that are on the String class. Hint - look for methods that return an "int".

Algorithm for finding a sequence of characters [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to make a program that :
accepts a string of characters(initial), i.e. a DNA sequence.
The number of characters that are accepted should be given as an input before inserting the string.
Accepts a number, then receives the same number of characters
Find all the possible 'mutations' that could occur from input 2, and check in input 1 whether there are occurrences of these mutations.
Mutations are a number of strings that could be made from input 2. For example, with AGGT, GAGT, GGAT, TGGA, AGGT, ATGG, and AGTG.
So a sample run of the program would be like
2
6 4
ATGGAT
AGGT
3
Usually, when you ask a question on this site, you should show us the effort you put in your work and exactly show or tell us where do you think what you did does not work.
I am not giving you an answer, but I'm giving you better: the opportunity to learn more on the subject. You can learn about what we call in java "regular expressions" here:http://docs.oracle.com/javase/tutorial/essential/regex/
If you make an update of your query showing us what you did, people might be more encline to help you. Stackoverflow has a great community of programmers who are willing to help you just as long as you show them respect by showing where you failed and not directly asking answers for your type of problems !
Learning about regex will definetly help you out to solve this situation. If you build a method using regex and matches and you have issues with it, post what you did and what was intended with this method !
Hope it helps you :)

Searching a set of keywords in a large string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have to do a search in a text file or a large string to check if the text contains a set of keywords (could be millions). If it contains the keywords I have to highlight whatever keywords got matched. What approach should be taken for this? Does lucene provide a solution for this?
You've tagged your question with Elasticsearch - if you're open to using ES I think Percolation with highlighting may fit what you need. You could register each keyword as a separate query with the percolator and then run each document or string thru it. It will return a list of the queries that matched. You can also combine it with highlighting.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html
http://blog.qbox.io/elasticsesarch-percolator
You can use lucene ShingleFilter
You will find lots of example on the net, here is one http://www.massapi.com/class/sh/ShingleFilter.html

Simulating a given boolean expression in the form of circuits using Java [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am trying to write a java program that will take in a expression as an input string (of the form R1*R2 + R2*R3 ) and then display the corresponding circuit.
For ex: The output should be something like this
Add a resistor R1.
To its end, Add a resistor R2.
Start a new connection.
Add a resistor R2.
To its end, Add a resistor R3.
I will like to represent it in diagram, but first I will just like to realize it like the above for simplicity.... Please suggest an algorithm !!
As I see it this involves solving two fairly non-trivial tasks:
You need to parse the expression. I bet you'd like to have parenthesized expressions as well, such as (R1+R2)*R3. This calls for using a proper parser. I'd recommend you to look into parser generators such as JFlex/JavaCup or ANTLR.
If you want to display it to the user in a sensible way, you need to lay out the components in two dimensions nicely. This is basically a graph-problem, which can be solved by existing libraries. Still, as I said, it may not be trivial to implement.

Categories

Resources