Looking for similar strings in a string array [duplicate] - java

This question already has answers here:
How to search an array for a part of string?
(6 answers)
Closed 1 year ago.
I have a string array. For example:
["Tartrazine","Orange GGN", "Riboflavin-5-Phosphate"]
And I have a string. For example:
"Riboflvin"
I want to look for most similar string in the array and get it if it exists.
So I need this output:
"Riboflavin-5-Phosphate"
But if the array looks like this:
["Tartrazine","Orange GGN", "Quinoline"]
I want something like this output:
"No similar strings found"
I tried using FuzzyWuzzy library, but it shows a lot of false alarms.

You can use String#contains method, sequentially reducing the length of the string to search if the full string is not found:
String[] arr = {"Tartrazine", "Orange GGN", "Riboflavin-5-Phosphate"};
String element = "Riboflvin";
boolean found = false;
for (int i = 0; i < element.length(); i++) {
// take shorter substring if nothing found at previous step
String part = element.substring(0, element.length() - i);
// if any string from array contains this substring
if (Arrays.stream(arr).anyMatch(str -> str.contains(part))) {
System.out.println("Found part: " + part);
// then print these strings one by one
Arrays.stream(arr).filter(str -> str.contains(part))
.forEach(System.out::println);
found = true;
break;
}
}
// if nothing found
if (!found) {
System.out.println("No similar strings found");
}
Output:
Found part: Ribofl
Riboflavin-5-Phosphate

Well, it depends what you want to do exactly.
There are a couple of things you can do you can check wether the array contains an exact match of the String you are looking for by just calling list.contains("yourStr") the list directly. You could also check each value to see whether it contains a certain substring like so:
foreach(String s : list) {
if (s.contains(subStr) {
return s;
}
}
Otherwise, if you really would like to check similarity it becomes a bit more complicated. Then we really have to answer the question: "how similar is similar enough?". I guess this post as a decent answer to that problem: Similarity String Comparison in Java

Related

Compare one string of first file to all strings of second file, in Java

In Java.
I have 2 PDF files,
I extract the Title from the first file, and the reference part from the second file.
I want to check if the all the title part is in the reference part or not.
My problem is how to take all the title part as one variable then search it in all the reference part.
This is the part of the code:
PDFUtil pdfUtil = new PDFUtil();
String a = pdfUtil.getText("9.pdf");
String Title = a.substring(0,68);
System.out.println("The title part: "+Title);
String b = pdfUtil.getText("333.pdf");
String Refer = b.substring(b.indexOf("Reference")+0,b.length());
if ("Reference".equalsIgnoreCase("Reference")) {
System.out.println("The References part of the second file is: "+Refer);
System.out.println();
}
if (Title.contains(Refer)) {
System.out.println("Found ");
}
the output part:
The title part: Customized Efficient Collection of Big Data for Advertising Services
The References part of the second file is: [1] J. Han, H. Pei, and Y. Yin.” Mining, Frequent Patterns without Candidate Generation” In: Proc. (all the refernce part)
I try a lot of methods, but the output is always false, even if the exact title part is in the reference part.
Any Idea?
and is there another method than (contains) to search?
Thanks.
``could you please add more details but in genral
the main problem could be
1. extra space in one of the 2 strings
like : "abcd" "abcd "
this 2 strings are not equal
the strings are not extracted correctly so could you please add more details about how you ate extracting data from pdf
if the strings are not english you have problem with encoding
here is a code it might help
String a="stack overflow";
String b="tack";
// solution 1
System.out.println(a.contains(b));
// soultion 2
int counter=0;
for(int i=0;i<b.length();i++)
{
for(int j=0;j<a.length();j++)
{
if(b.substring(i,i+1).equals(a.substring(j,j+1)))
{
counter++;
}
}
}
if(counter>=b.length())
{
System.out.println("string found ");
}
counter=0;
// solution 3 fuzzy one
int index=0;
for(int i=0;i<b.length();i++)
{
index=a.indexOf(b.substring(i,i+1));
if(index!=-1)
{
counter++;
}
}
if(counter<b.length())
{
System.out.println("string not found ");
}
else
{
System.out.println("string found ");
}
solution 2 and 3 are working main problem in the method
String.contains(String)
is the size of the 2 strings example a.contains(b)
if size of a is smaller that size of b it will return false to get around this problem you can first check size of 2 strings before using .contains
if(a.length()>=b.length())
{
System.out.println(a.contains(b));
}
this is 5th solution it will work try it
String a="Iam in the world of abc";
String b="world";
for(int i=0;i<a.length()-b.length();i++)
{
//System.out.println(i);
if(a.substring(i,b.length()+i).equals(b))
{
System.out.println("true s");
System.out.println(a.substring(i,b.length()+i));
}
}

Trying to reverse a string on Java. Why can't I get the reversed text as my only output? [duplicate]

This question already has answers here:
Reverse a string in Java
(36 answers)
Closed 5 years ago.
I can't get my function to return the reversed string. I keep getting the original string, plus the reversed sting attached together.
P.S this is obviously a question from someone new. Cut me some slack and save me from the horribly demoralizing down vote.
int i;
reverse = reverse.replaceAll("[^a-zA-Z]+", "").toLowerCase();
for (i = reverse.length() - 1; i >= 0; i--) {
reverse = reverse + reverse.charAt(i);
}
return reverse;
}
You need another String (or some other method / memory) to build your return value (consider "a", starting from the end of the String add "a" - thus you get "aa"). Instead, I would use a StringBuilder - and the entire method might be written like
return new StringBuilder(reverse.replaceAll("[^a-zA-Z]+", "")
.toLowerCase()).reverse().toString();
Change the snippet to,
String reverse_string="";
for (i = reverse.length() - 1; i >= 0; i--) {
reverse_string += reverse.charAt(i);
}
return reverse_string;
You will need a separate String variable to contsruct the newly reversed string.
Why not just let the existing Java api do this for you?
String someString = "somestring123";
System.out.println(new StringBuilder(someString).reverse().toString());
output:
StringBuilder

Get the index of a character or a substring by reversely loop in Python

Here is what I want to do in Java, however I need to implement this into Python.
String foo = new String("Foo's bar");
for (int i = foo.length-1; i--; i>=0) {
if(foo[i] == '\'') {
String bar = foo.substring(0,i);
}
}
What I want to do is find a key character from the last character to the very first one once I find it I need to substring the rest part (Imagine get someone's name from a phrase).
Try rfind(): https://docs.python.org/2/library/stdtypes.html#str.rfind
foo = "Foo's bar'"
index_to = foo.rfind("'", 0, len(foo))
print foo[0:index_to]
>>> Foo
What you want to achieve is quite easy. You will get a list of splited elements. Pick the first one.
"Foo's bar".split("\'")[0]

String Manipulation Comparison [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
EDITED
I have this kind of string
"A-B-C-D"
"B-C-A"
"D-A"
"B-A-D"
"D-A-B-C"
Now my problem is if the user input has a value of "A-C" or "C-A" the numbers 1,2,5 will be my output because the those numbers has a value "A-C" if for example the user has a input value of either of this three "A-B-D","B-A-D","A-D-B" the the output will be 1,4,5. hope it clears the question
Note:
the search sequence depends on user input and i want it to be more efficient because I have 10 thousand of data I don't want to use as much loop as possible.
This might change based on you needs to have the String in the exact pattern you have it or not, but REALLY simply...
public class Simple {
public static void main(String[] args) {
System.out.println("1. " + matches("A-B-C-D"));
System.out.println("2. " + matches("B-C-A"));
System.out.println("3. " + matches("D-A"));
System.out.println("4. " + matches("B-A-D"));
System.out.println("5. " + matches("D-A-B-C"));
}
public static boolean matches(String value) {
return value.contains("A") && value.contains("C");
}
}
Which outputs
1. true
2. true
3. false
4. false
5. true
Extended example using variable matchers
So, the basic idea is to provide some kind of list of values to be matched against. This example simply uses a String varargs (or String array), but it wouldn't be hard to make it use something like List
public class Simple {
public static void main(String[] args) {
String[] match = new String[]{"A", "D", "C"};
System.out.println("1. " + matches("A-B-C-D", match));
System.out.println("2. " + matches("B-C-A", match));
System.out.println("3. " + matches("D-A", match));
System.out.println("4. " + matches("B-A-D", match));
System.out.println("5. " + matches("D-A-B-C", match));
}
public static boolean matches(String value, String... matches) {
boolean doesMatch = true;
for (String match : matches) {
if (!value.contains(match)) {
doesMatch = false;
break;
}
}
return doesMatch;
}
}
This outputs...
1. true
2. false
3. false
4. false
5. true
Use an array and go through each index and see if it contains "C-A" or "A-C" then if it does print the number.
String stringArray[] = {"A-B-C-D", "B-C-A", "D-A", "B-A-D", "D-A-B-C"};
for(int i = 0; i < stringArray.length; i++) {
String pattern = ".*C-.*A.*";
String pattern2 = ".*A-.*C.*";
if(stringArray[i].matches(pattern) || stringArray[i].matches(pattern2))
System.out.println(i + 1);
}
Edit: This applies to an older version of the OP that was unclear about finding the sequence in order; and so this searches for sequences in order, which isn't correct now.
There are many options. Below I outline one approach that tokenizes the strings first, and another that uses a simple regex generated from the input string.
Approach 1: Parsing Strings
Start by parsing each String into an array of substrings, that will make this all easier to work with. You may want to parse each of the strings when you originally read them instead of every time you need to:
String myString = "A-B-C-D";
String[] sequence = myString.split("-");
Next, consider using an List<String> instead of a String[], because it will make the rest of this a bit easier (you'll see). So, instead of the above:
String myString = "A-B-C-D";
List<String> sequence = Arrays.asList(myString.split("-"));
Now the problem becomes checking if two of these arrays match:
public static boolean containsSequence (List<String> searchIn, List<String> searchFor) {
}
You need to check both directions, but you can simply reverse the array and reduce this problem further to just checking the forward direction (there are certainly ways to do this and avoid the copy but they can get complicated and it's only worth it if you have high performance requirements):
public static boolean containsSequence (List<String> searchIn, List<String> searchFor) {
// first check forward
if (containsSequenceForward(searchIn, searchFor))
return true;
// now check in reverse
List<String> reversedSearchFor = new ArrayList<String>(searchFor);
Collections.reverse(reversedSearchFor);
return containsSequenceForward(searchIn, reversedSearchFor);
}
public static boolean containsSequenceForward (List<String> searchIn, List<String> searchFor) {
}
// usage example:
public static void example () {
List<String> searchIn = Arrays.asList("D-A-B-C".split("-"));
List<String> searchFor = Arrays.asList("A-C".split("-"));
boolean contained = containsSequence(searchIn, searchFor);
}
Now you just need to implement containsSequenceForward. I'd like you to do this yourself, but I will provide an algorithm as a hint:
Start at the beginning of searchIn and searchFor.
Go through searchIn one element at a time.
When you find the current element of searchFor in searchIn, advance searchFor to next element.
If you hit the end of searchFor you've found the sequence.
If you hit the end of searchIn but not searchFor, then the sequence doesn't match.
Now you have the ability to check if one sequence contains another in any order. To apply it to your entire collection, I recommend preparsing all of the strings into a List<String> once at the start, then you can go through each of those using the above algorithm.
There are many alternative options. For example, you could use indexOf on searchIn to find the each element in searchFor and make sure the indices are in increasing order.
Approach 2: Regular Expressions
Another option here is to use a regular expression to find the search sequence in the source string. You can build the regular expression dynamically from the search sequence quite easily:
String searchIn = "D-C-B-A";
String searchFor = "C-A";
String searchForPattern = searchFor.replace("-", ".*"); // yields "C.*A"
if (searchIn.matches(".*" + searchForPattern + ".*"))
/* then it matches forwards */;
Then to match in reverse, if the forward match fails, you can just reverse searchFor and repeat:
String searchForReverse = new StringBuilder(searchFor).reverse().toString();
String searchForReversePattern = searchForReverse.replace("-", ".*"); // yields "A.*C"
if (searchIn.matches(".*" + searchForReversePattern + ".*"))
/* then it matches backwards */;
Note that this particular regex solution assumes that each element is only one character long.
Also both of the above approaches assume case-sensitive matches -- to make the first case-insensitive I would just convert the strings to lowercase before parsing. For the second you can use a case-insensitive regex.
Hope that helps. Work it out on a piece of paper if you have to.
The general take home point here is it helps to reduce these problems to their smallest components first.
Call this function on every string you wish to check. If it returns true, add that string to your result set.
boolean matches(String s, char[] chars) {
for(char c : chars) {
if (s.indexOf(c) == -1) {
return false;
}
}
return true;
}

Searching for a String in a String array [duplicate]

This question already has answers here:
Checking if String x equals any of the Strings from String[]
(12 answers)
Closed 9 years ago.
while writing my program I have run into another nooby road block.
if(StringTerm[0].equals("wikipedia"))
{
StringBuilder SearchTermBuilder = new StringBuilder();
for(int i = 1; i < StringTerm.length; i++)
{
SearchTermBuilder.append(StringTerm[i] + " ");
}
// This is the string it outputs.
WIKI_ID = "Wikipedia";
SearchTerm = SearchTermBuilder.toString();
SearchTermFull = WikiBaseLinkReference.WIKI_WIK + SearchTermBuilder.toString();
}
This code checks for input from a console command "/wiki" and checks to see if the first string after the word "wiki" matches "wikipedia" and if so, it builds a string to match what I want it to do.
This is all well and good, and the program works fine, but I want users to be able to use different keywords to get the same results.
For Example: If I type /wiki wikipedia, it would do the same as /wiki pediawiki
If I made an array of different names called WIKIPEDIA
public static String WIKIPEDIA[] = {"wikipedia","pediawiki"};
How would I tell the if statement to check to see if the text entered equals one of the strings inside of my array? Every time I try to use an || or operator it throws me some errors.
Thanks in advance.
You need a version of "any":
public boolean any(String[] array, String s) {
for(String value : array) {
if(s.equals(value)) { return true; }
}
return false;
}
Then
if(any(WIKIPEDIA, "wikipedia")) {
}

Categories

Resources