Parsing String by pattern of substrings - java

I need to parse a formula and get all the variables that were used. The list of variables is available. For example, the formula looks like this:
String f = "(Min(trees, round(Apples1+Pears1,1)==1&&universe==big)*number";
I know that possible variables are:
String[] vars = {"trees","rivers","Apples1","Pears1","Apricots2","universe","galaxy","big","number"};
I need to get the following array:
String[] varsInF = {"trees", "Apples1","Pears1", "universe", "big","number"};
I believe that split method is good here but can’t figure the regexp required for this.

No need for any regex pattern - just check which item of the supported vars is contained in the given string:
List<String> varsInf = new ArrayList<>();
for(String var : vars)
if(f.contains(var))
varsInf.add(var);
Using Stream<> you can:
String[] varsInf = Arrays.stream(vars).filter(f::contains).toArray(String[]::new);

Assuming "variable" is represented by one alphanumeric character or sequential sequence of multiple such characters, you should split by not-alphanumeric characters, i. e. [^\w]+, then collect result by iteration or filter:
Set<String> varSet = new HashSet<>(Arrays.asList(vars));
List<String> result = new ArrayList<>();
for (String s : f.split("[^\\w]+")) {
if (varSet.contains(s)) {
result.add(s);
}
}

Related

Java-Stream & Optional - Find a value that matches to a stream-element or provide a Default value

I have a Dictionary object which consists of several entries:
record Dictionary(String key, String value, String other) {};
I would like to replace words in the given String my a which are present as a "key" in one of the dictionaries with the corresponding value. I can achieve it like this, but I guess, there must be a better way to do this.
An example:
> Input: One <sup>a</sup> Two <sup>b</sup> Three <sup>D</sup> Four
> Output: One [a-value] Two [b-value] Three [D] Four
The code to be improved:
public class ReplaceStringWithDictionaryEntries {
public static void main(String[] args) {
List<Dictionary> dictionary = List.of(new Dictionary("a", "a-value", "a-other"),
new Dictionary("b", "b-value", "b-other"));
String theText = "One <sup>a</sup> Two <sup>b</sup> Three <sup>D</sup> Four";
Matcher matcher = Pattern.compile("<sup>([A-Za-z]+)</sup>").matcher(theText);
StringBuilder sb = new StringBuilder();
int matchLast = 0;
while (matcher.find()) {
sb.append(theText, matchLast, matcher.start());
Optional<Dictionary> dict = dictionary.stream().filter(f -> f.key().equals(matcher.group(1))).findFirst();
if (dict.isPresent()) {
sb.append("[").append(dict.get().value()).append("]");
} else {
sb.append("[").append(matcher.group(1)).append("]");
}
matchLast = matcher.end();
}
if (matchLast != 0) {
sb.append(theText.substring(matchLast));
}
System.out.println("Result: " + sb.toString());
}
}
Output:
Result: One [a-value] Two [b-value] Three [D] Four
Do you have a more elegant way to do this?
Since Java 9, Matcher#replaceAll can accept a callback function to return the replacement for each matched value.
String result = Pattern.compile("<sup>([A-Za-z]+)</sup>").matcher(theText)
.replaceAll(mr -> "[" + dictionary.stream().filter(f -> f.key().equals(mr.group(1)))
.findFirst().map(Dictionary::value)
.orElse(mr.group(1)) + "]");
Create a map from your list using key as key and value as value, use the Matcher#appendReplacement method to replace matches using the above map and calling Map.getOrDefault, use the group(1) value as default value. Use String#join to put the replacements in square braces
public static void main(String[] args) {
List<Dictionary> dictionary = List.of(
new Dictionary("a", "a-value", "a-other"),
new Dictionary("b", "b-value", "b-other"));
Map<String,String> myMap = dictionary.stream()
.collect(Collectors.toMap(Dictionary::key, Dictionary::value));
String theText = "One <sup>a</sup> Two <sup>b</sup> Three <sup>D</sup> Four";
Matcher matcher = Pattern.compile("<sup>([A-Za-z]+)</sup>").matcher(theText);
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
matcher.appendReplacement(sb,
String.join("", "[", myMap.getOrDefault(matcher.group(1), matcher.group(1)), "]"));
}
matcher.appendTail(sb);
System.out.println(sb.toString());
}
record Dictionary( String key, String value, String other) {};
Map vs List
As #Chaosfire has pointed out in the comment, a Map is more suitable collection for the task than a List, because it eliminates the need of iterating over collection to access a particular element
Map<String, Dictionary> dictByKey = Map.of(
"a", new Dictionary("a", "a-value", "a-other"),
"b", new Dictionary("b", "b-value", "b-other")
);
And I would also recommend wrapping the Map with a class in order to provide continent access to the string-values of the dictionary, otherwise we are forced to check whether a dictionary returned from the map is not null and only then make a call to obtain the required value, which is inconvenient. The utility class can facilitate getting the target value in a single method call.
To avoid complicating the answer, I would not implement such a utility class, and for simplicity I'll go with a Map<String,String> (which basically would act as a utility class intended to act - providing the value within a single call).
public static final Map<String, String> dictByKey = Map.of(
"a", "a-value",
"b", "b-value"
);
Pattern.splitAsStream()
We can replace while-loop with a stream created via splitAsStream() .
In order to distinguish between string-values enclosed with tags <sup>text</sup> we can make use of the special constructs which are called Lookbehind (?<=</sup>) and Lookahead (?=<sup>).
(?<=foo) - matches a position that immediately precedes the foo.
(?=foo) - matches a position that immediately follows after the foo;
For more information, have a look at this tutorial
The pattern "(?=<sup>)|(?<=</sup>)" would match a position in the given string right before the opening tag and immediately after the closing tag. So when we apply this pattern splitting the string with splitAsStream(), it would produce a stream containing elements like "<sup>a</sup>" enclosed with tags, and plain string like "One", "Two", "Three".
Note that in order to reuse the pattern without recompiling, it can be declared on a class level:
public static final Pattern pattern = Pattern.compile("(?=<sup>)|(?<=</sup>)");
The final solution would result in lean and simple stream:
public static void foo(String text) {
String result = pattern.splitAsStream(text)
.map(str -> getValue(str)) // or MyClass::getValue
.collect(Collectors.joining());
System.out.println(result);
}
Instead of tackling conditional logic inside a lambda, it's often better to extract it into a separate method (sure, you can use a ternary operator and place this logic right inside the map operation in the stream if you wish instead of having this method, but it'll be a bit messy):
public static String getValue(String str) {
if (str.matches("<sup>\\p{Alpha}+</sup>")) {
String key = str.replaceAll("<sup>|</sup>", "");
return "[" + dictByKey.getOrDefault(key, key) + "]";
}
return str;
}
main()
public static void main(String[] args) {
foo("One <sup>a</sup> Two <sup>b</sup> Three <sup>D</sup> Four");
}
Output:
Result: One [a-value] Two [b-value] Three [D] Four
A link to Online Demo

generate and output all possible combinations, from three List<String> only once per unique item

I'm trying to generate all possible combinations of sentences. As variables I have two Strings, one String will be the subject, e.g. health and one will be an object, such as fruit, but then I will have a List<String> of values associated with one "head" word, so in keeping with the two components just mentioned, they would be associated with the list [improve, change, alter, modify]. I want to generate all possible combinations of these sentences and add each one to the List<Sentences> with something like:
Sentence example_sentence = new Sentence(verb, object, subject);
sentences.add(example_sentence);
Right now the larger function in which this is taking place looks like this:
public Sentence dataPreprocessing(String raw_subject, String raw_object, String raw_verb, List<Sentence> sentences) throws IOException {
WordNet wordnet = new WordNet();
String verb = wordnet.getStem(raw_verb);
String object = wordnet.getStem(raw_object);
String subject = wordnet.getStem(raw_subject);
List<String> verb_hypernym_container = new ArrayList<>();
verb_hypernym_container = wordnet.getHypernyms(verb, POS.VERB);
//wordnet.getHypernyms(this.object, POS.NOUN);
//wordnet.getHypernyms(this.subject, POS.NOUN);
Sentence return_sentence = new Sentence( verb, object, subject );
return return_sentence;
}
How can I most effectively achieve this goal of generating all possible sentences?
Since you have a fixed number of lists, the simplest way is to just use nested loops:
List<Sentence> sentences = new ArrayList<>();
for(String verb_hypernym : wordnet.getHypernyms(verb, POS.VERB))
for(String object_hypernym : wordnet.getHypernyms(object, POS.NOUN))
for(String subject_hypernym : wordnet.getHypernyms(subject, POS.NOUN))
sentences.add(new Sentence(verb_hypernym, object_hypernym, subject_hypernym));
return sentences;
Or, to avoid calling getHypernyms more often than necessary:
List<String> verb_hypernyms = wordnet.getHypernyms(verb, POS.VERB);
List<String> object_hypernyms = wordnet.getHypernyms(object, POS.NOUN);
List<String> subject_hypernyms = wordnet.getHypernyms(subject, POS.NOUN);
for(String verb_hypernym : verb_hypernyms)
for(String object_hypernym : object_hypernyms)
for(String subject_hypernym : subject_hypernyms)
sentences.add(new Sentence(verb_hypernym, object_hypernym, subject_hypernym));
return sentences;
Once you have a list of nouns and verbs you could use streams to return a list of sentences. This also gives you a chance to remove any duplicates, sort or anything else you need to do to the stream.
List<Sentence> sentences = subjectList.stream()
.flatMap(object -> verbList.stream()
.flatMap(verb -> objectList.stream()
.map(subject -> new Sentence(object, verb, subject))))
.distinct()
.collect(Collectors.toList());

Java string join to string

Assume i have List Collection of Strings and i want to loop by this List and add each element of List to some variable of type String.
List<String> words;
//assume words have 5 elements
String summary;//variable where i want to keep all elements
for (String word : words){
//here i want to add new word to the variable summary
}
As i know java always creates new object of String. Even if i try to change value - new object will be created anyway, am i right?
So here is a question how to join all elements of List in one variable?
On any version of Java:
Apache Commons has a class StringUtils that has a join method:
String result = StringUtils.join(words, ",")
On Java 8, you can do this natively. See this article.
use a StringBuilder to join all the words.
List<String> words;
//assume words have 5 elements
StringBuilder summary = new StringBuilder();
for (String word : words){
summary.append(word);
}
Finally, get the joined String with summary.toString().
Note : If you have an idea of the number of characters that would be appended to the StringBuilder, it will be more efficient to use the constructor that gets an initial size :
summary = new StringBuilder(size);.
I think the easiest solution would be to use a StringBuilder:
String summary;
List<String> words = ...;
StringBuilder builder = new StringBuilder();
for (String word : words) {
builder.append(word);
}
summary = builder.toString();
The simplest way would be to use an existing functionality, for example Apache Common's StringUtils.join(). If that's not possible, this will work:
StringBuilder sb = new StringBuilder();
for (String word : words) {
sb.append(word);
}
String summary = sb.toString();
You can try the following code,
List<String> words;
String summary = null;
for (String word : words)
{
summary = summary + word + " , ";
}
System.out.println("List items : " + summary);
An alternative to Apache commons (StringUtils) is Guava's Joiner.
For example:
List<String> words = new ArrayList<>();
words.add("word");
words.add("anotherWord");
String joinedWords = Joiner.on(",").join(words);
This may also be useful if you're not able to use Java 8.
See wiki:
https://github.com/google/guava/wiki/StringsExplained
As Eran suggest or just, use simple concatination
List<String> words;
//assume words have 5 elements
String summary = "";//variable where i want to keep all elements
for (String word : words){
summary = summary + word;
}
Have you looked at
StringUtils.join()
as a possible solution?
String summary = words.stream().collect( Collectors.joining() );
You can simply add.
summary += word;

Place all text in quotes into ArrayList

I'm looking for an easy way to take a string and have all values in quotes placed into an ArrayList
Eg
The "car" was "faster" than the "other"
I would like to have an ArrayList that contains
car, faster, other
I think I might need to use RegEx for this but I'm wondering if there is another simpler way.
Using a regex, it is actually quite easy. Note: this solution supposes that there cannot be nested quotes:
private static final Pattern QUOTED = Pattern.compile("\"([^\"]+)\"");
// ...
public List<String> getQuotedWords(final String input)
{
// Note: Java 7 type inference used; in Java 6, use new ArrayList<String>()
final List<String> ret = new ArrayList<>();
final Matcher m = QUOTED.matcher(input);
while (m.find())
ret.add(m.group(1));
return ret;
}
The regex is:
" # find a quote, followed by
([^"]+) # one or more characters not being a quote, captured, followed by
" # a quote
Of course, since this is in a Java string quotes need to be quoted... Hence the Java string for this regex: "\"([^\"]+)\"".
Use this script to parse the input:
public static void main(String[] args) {
String input = "The \"car\" was \"faster\" than the \"other\"";
List<String> output = new ArrayList<String>();
Pattern pattern = Pattern.compile("\"\\w+\"");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
output.add(matcher.group().replaceAll("\"",""));
}
}
Output list contains:
[car,faster,other]
You can use Apache common String Utils substringsBetween method
String[] arr = StringUtils.substringsBetween(input, "\"", "\"");
List<String> = new ArrayList<String>(Arrays.asList(arr));

error in java programming:

paramList = new ArrayList<String>();
paramList.add(line.split(","));
When I used this, it gives me the error:
cannot find symbol
symbol: method add(java.lang.String[])
location: interface java.util.List<java.lang.String>
With this, I would get output in this manner: "abc" "bvc" "ebf" . Now can I use trim to get rid of " ". So that the output becomes: abc bvc ebf
Input: "AAH196","17:13:00","02:49:00",287,166.03,"Austin","TX","Virginia Beach","VA"
Output: AAH196 17:13:00 02:49:00 287 166.03 Austin TX Virginia Beach VA
I need to remove the " "around the words and , between the words. I want to store this output and then jdbc will parse this data into the tables of my database on mysql.
paramList.add() wants a String but line.split(",") returns String[]
The two are not equivalent.
Maybe you want something like:
paramList = Array.asList(line.split(","));
or
paramList = new ArrayList<String>();
for(String s : line.split(",")){
paramList.add(s);
}
As for the added question, there are lots of ways to skin a cat.
If the words are ALWAYS surrounded by quotes then you can do something like:
paramList = new ArrayList<String>();
for(String s : line.split(",")){
paramList.add(s.substring(1, s.length());
}
This will remove the first and last char from the String. Which will always be the quotes.
If you need something more flexible (For instance this solution would ruin string that aren't surrounded by quotes) then you should read up on regex and java's Pattern class to see what suites your needs.
As for the solution that you provided, trim() will only remove surrounding whitespace.
import java.util.ArrayList;
class TestGenericArray {
public static void main(String[] args) {
String[] stringArray = {"Apple", "Banana", "Orange"};
ArrayList<String[]> arrayList = new ArrayList<String[]>();
arrayList.add(stringArray);
}
}

Categories

Resources