Is there a way to put lines on a vector? - java

I have a problem, I'm making a simple program in Java that reads a regular expression like this in a txt file:
Name:{[A-z][0-9]}
Phone:{[0-9]}
but I'm looking a way to split the name of the regex and put it on a vector:
(0) [Name:]
(1) [Phone:]
and another vector that it contents the regular expression like this:
(0) [[A-z][0-9]]
(1) [[0-9]]
The principal idea is splitting the name of the regex and the regex on separated vectors by line of the txt archive, but the problem is that is a n number of expressions on the txt file.
Is there an example or something?
Thank you.

Rather than storing two parallel vectors, I'd recommend using a Map data structure for this purpose. If you need to preserve the order of elements from the file for some reason, you can use a LinkedHashMap. For example:
Map<String, String> regexesByName = new LinkedHashMap<>();
for (String line : linesFromFile) {
int index = line.indexOf(':');
if (index < 0) {
throw new IllegalArgumentException("Cannot parse: " + line);
}
String name = line.substring(0, index);
String regex = line.substring(index + 1);
regexesByName.put(name, regex);
}
Then to enumerate all the name/regex pairs, you can do:
for (Map.Entry<String, String> entry : regexesByName.entrySet()) {
// entry.getKey() is the name, entry.getValue() is the regex
}
And to look up the regex for a given name, just do:
String regex = regexesByName.get(name);
If what you want is to later convert the regexes into compiled Pattern objects, that's easily doable as well:
Map<String, Pattern> compiledRegexesByName = new LinkedHashMap<>();
for (Map.Entry<String, String> entry : regexesByName.entrySet()) {
compiledRegexesByName.put(
entry.getKey(), Pattern.compile(entry.getValue()));
}

Related

How can I use Java Stream to find the average of all values that share a key?

I'm having a lot of trouble with trying to average the values of a map in java. My method takes in a text file and sees the average length of each word starting with a certain letter (case insensitive and goes through all words in the text file.
For example, let's say I have a text file that contains the following::
"Apple arrow are very common Because bees behave Cant you come home"
My method currently returns:
{A=5, a=8, B=7, b=10, c=10, C=5, v=4, h=4, y=3}
Because it is looking at the letters and finding the average length of the word, but it is still case sensitive.
It should return:
{A=5, a=8, B=7, b=10, c=10, C=5, v=4, h=4, y=3}
{a=4.3, b=5.5, c=5.0, v=4.0, h=4.0, y=3}
This is what I have so far.
public static Map<String, Integer> findAverageLength(String filename) {
Map<String, Integer> wordcount = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
try
{
Scanner in = new Scanner(new File(filename));
List<String> wordList = new ArrayList<>();
while (in.hasNext())
{
wordList.add(in.next());
}
wordcount = wordList.stream().collect(Collectors.toConcurrentMap(w->w.substring(0,1), w -> w.length(), Integer::sum));
System.out.println(wordcount);
}
catch (IOException e)
{
System.out.println("File: " + filename + " not found");
}
return wordcount;
}
You are almost there.
You could try the following.
We group by the first character of the word, converted to lowercase. This lets us collect into a Map<Character, …>, where the key is the first letter of each word. A typical map entry would then look like
a = [ Apple, arrow, are ]
Then, the average of each group of word lengths is calculated, using the averagingDouble method. A typical map entry would then look like
a = 4.33333333
Here is the code:
// groupingBy and averagingDouble are static imports from
// java.util.stream.Collectors
Map<Character, Double> map = Arrays.stream(str.split(" "))
.collect(groupingBy(word -> Character.toLowerCase(word.charAt(0)),
averagingDouble(String::length)));
Note that, for brevity, I left out additional things like null checks, empty strings and Locales.
Also note that this code was heavily improved responding to the comments of Olivier Grégoire and Holger below.
You can try with the following:
String str = "Apple arrow are very common Because bees behave Cant you come home";
Map<String, Double> map = Arrays.stream(str.split(" "))
.collect(Collectors.groupingBy(s -> String.valueOf(Character.toLowerCase(s.charAt(0))),
Collectors.averagingDouble(String::length)));
The split method will split the string into an array of strings using the delimiter " ". Then, you want to group by the average of the string length. Hence, the use the of Collectors.groupingBy method and the downstream parameter Collectors.averagingDouble(String::length). Finally, given the constraints that you have described we need to group by lower case (or up case) of the first char in the String (i.e., Character.toLowerCase(s.charAt(0)))).
and then print the map:
map.entrySet().forEach(System.out::println);
If you do not need to keep the map structure you can do it in one go:
Arrays.stream(str.split(" "))
.collect(Collectors.groupingBy(s -> String.valueOf(Character.toLowerCase(s.charAt(0))), Collectors.averagingDouble(String::length)))
.entrySet().forEach(System.out::println);
Just convert the first letter, which you obtain using substring, to the same case. Upper or lower, doesn't matter.
w.substring(0,1).toLowercase()
You've defined a case-insensitive map, but you haven't used it. Try Collectors.toMap(w->w.substring(0,1), w -> w.length(), Integer::sum, () -> new TreeMap<String, Integer>(String.CASE_INSENSITIVE_ORDER)), or just Collectors.toMap(w->w.toUpperCase().substring(0,1), w -> w.length(), Integer::sum)

Extract string and number combination from a larger string

I have a number of large strings looking like this:
"text(24), text_2(5), text_4(822)..."
I'm trying to check if a specific text exists and get the corresponding value.
Is there a quick way to do this?
Edit:
I have an array with all possible text values. At the moment I use foreach to check for text values.
I have the string text_2 and what I need is the corresponding 5 as an integer.
You can use regex to extract all the text element from the String and store them into a map, e.g:
String s = "text(24), text_2(5), text_4(822)";
Pattern pattern = Pattern.compile("([a-zA-Z]*(_)?[0-9]*\\([0-9]+\\))");
Matcher matcher = pattern.matcher(s);
Map<String, Integer> valuesMap = new HashMap<>();
while(matcher.find()){
String[] tokens = matcher.group().split("(?=\\([0-9]+\\),?)");
String key = tokens[0];
Integer value = Integer.parseInt(tokens[1].substring(1, tokens[1].length() - 1));
valuesMap.put(key, value);
}
System.out.println(valuesMap);
Once done, you can call valuesMap.get("test_2"); to get the corresponding value. This is how the above example works:
It splits text into tokens containing <text>(<Value)
It then splits each token again, into text and value and places these into a Map.
Since you need to do this a number of times. I suggest you split the string and build a map from the text to its value, this is O(n). After that, your lookups are only O(1) if you use HashMap.
String text = "text(24), text_2(5), text_4(822)";
Map<String, Integer> map = new HashMap<>();
String[] split = text.split(", ");
for(String s:split){
//search for the position of "(" and ")"
int start = 0;
int end = s.length()-1;
while(s.charAt(start) != '(')
start++;
while(s.charAt(end) != ')')
end--;
//put string and matching value in the map
map.put(s.substring(0, start), Integer.parseInt(s.substring(start+1, end)));
}
System.out.println(map);
I also ran some benchmarks for a string containing 10000 entries. And this approach was about 4 times faster than a regex approach. (38 ms vs 163 ms)

Regex pattern for String with multiple leading and trailing ones and zeroes

I have a search String which contains the format below:
Search String
111651311
111651303
4111650024
4360280062
20167400
It needs to be matched with sequence of numbers below
001111651311000
001111651303000
054111650024000
054360280062000
201674000000000
Please note the search strings have been added with additional numbers either on each sides.
I have tried the regex below in java to match the search strings but it only works for some.
Pattern pattern = Pattern.compile("([0-9])\1*"+c4MIDVal+"([0-9])\1*");
Any advice ?
Update
Added the code I used below might provide some clarity on what am trying to do
Code Snippet
public void compare(String fileNameAdded, String fileNameToBeAdded){
List<String> midListAdded = readMID.readMIDAdded(fileNameAdded);
HashMap<String, String> midPairsToBeAdded = readMID.readMIDToBeAdded(fileNameToBeAdded);
List <String []> midCaptured = new ArrayList<String[]>();
for (Map.Entry<String, String> entry: midPairsToBeAdded.entrySet()){
String c4StoreKey = entry.getKey();
String c4MIDVal = entry.getValue();
Pattern pattern = Pattern.compile("([0-9]?)\\1*"+c4MIDVal+"([0-9]?)\\2*");
for (String mid : midListAdded){
Matcher match = pattern.matcher(mid);
// logger.info("Match Configured MID :: "+ mid+ " with Pattern "+"\\*"+match.toString()+"\\*");
if (match.find()){
midCaptured.add(new String []{ c4StoreKey +"-"+c4MIDVal, mid});
}
}
}
logger.info(midCaptured.size()+ " List of Configured MIDs ");
for (String [] entry: midCaptured){
logger.info(entry[0]+ "- "+entry[1] );
}
}
You need to refer the second capturing group in the second part and also you need to make both the patterns inside the capturing group as optional.
Pattern pattern = Pattern.compile("([0-9]?)\\1*"+c4MIDVal+"([0-9]?)\\2*");
DEMO
What is the problem by using the String.contains() method?
"001111651311000".contains("111651311"); // true
"201674000000000".contains("111651311"); // false

How to get the specific part of a string based on condition?

I have a requirement to get the substring of a string based on a condition.
String str = "ABC::abcdefgh||XYZ::xyz";
If input is "ABC", check if it contains in str and if it presents then it should print abcdefgh.
In the same way, if input is "XYZ", then it should print xyz.
How can i achieve this with string manipulation in java?
If I've guessed the format of your String correctly, then you could split it into tokens with something like this:
String[] tokens = str.split("||");
for(String token : tokens)
{
// Cycle through each token.
String key = token.split("::")[0];
String value = token.split("::")[1];
if(key.equals(input))
{
// input being the user's typed in value.
return value;
}
}
But let's have a think for a minute. Why keep this in a String, when a HashMap is a much cleaner solution to your problem? Stick the String into a config file, and on load,
some code can perform a similar task:
Map<String, String> inputMap = new HashMap<String, String>();
String[] tokens = str.split("||");
for(String token : tokens)
{
// Cycle through each token.
String key = token.split("::")[0];
String value = token.split("::")[1];
inputMap.put(key, value);
}
Then when the user types something in, it's as easy as:
return inputMap.get(input);
The idea is that, you should split your string with the delimiters of "::" and "||" , i.e. whichever of them is encountered it will be treated as a delimiter. So, the best way for achieving that is using regular expressions, I think.
String str = "ABC::abcdefgh||XYZ::xyz";
String[] parts = str.split("[::]|[/||]");
Map<String, String> map = new HashMap<String, String>();
for (int i = 0; i < parts.length - 2; i += 4) {
if (!parts[i].equals("")) {
map.put(parts[i], parts[i + 2]);
}
}
Short and concise, your code is ready. The for loop seems weird, if anyone comes up with a better regex for splitting (to get rid of the empty strings), it will become cleaner. I'm not a regex expert, so any suggestions are welcome.
Use the contains method to see if it has the sub string: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#contains%28java.lang.CharSequence%29
You could do it as follows:
String[] parts = st.split("||");
if (parts[0].startsWith("ABC")) {
String[] values = parts[0].split("::");
System.out.println(values[1]);
} else {
if (parts[1].startsWith("XYZ") {
String[] values = parts[0].split("::");
System.out.println(values[1]);
}
}
The above code will check first if ABC is there. If yes, it will print the result and then stop. If not, it will check the second section of the code to see if it starts with XYZ and then print the result. You can change it to suit your needs.

Best way to retrieve a value from a string java

If I am being passed a string that contains comma delimited key-value pairs like this
seller=1000,country="canada",address="123 1st st", etc.
There seems like there must be a better way than parsing then iterating through.
What is the best way to retreive a value from this string based on the key name in Java?
Since release 10 Google Guava provides a class MapSplitter which does exactly that kind of things:
Map<String, String> params = Splitter
.on(",")
.withKeyValueSeparator("=")
.split("k1=v1,k2=v2");
You can create your own CSV parser, it's not very complicated but there are a few corner cases to be carfull with assuming of course you are using standard CSV format.
But why reinventing the wheel...
You can try looking up a CSV parser like
OpenCSV
SuperCSV
Apache Commons
There are others, look around I'm sure you will find one that suits your needs.
Usually you will want to parse the string into a map because you will be pulling various values perhaps multiple times, so it often makes sense to pay the parsing cost up-front.
If not, then here is how I would solve the problem (assuming you want to differentiate between int values and String values).:
public Object pullValue(String pairs, String key) {
boolean returnString = false;
int keyStart = pairs.indexOf(key + "=");
if (keyStart < 0) {
logger.error("Key " + key + " not found in key-value pairs string");
return null;
}
int valueStart = keyStart + key.length() + 1;
if (pairs.charAt(valueStart) == '"') {
returnString = true;
valueStart++; // Skip past the quote mark
}
int valueEnd;
if (returnString) {
valueEnd = pairs.indexOf('"', valueStart);
if (valueEnd < 0) {
logger.error("Unmatched double quote mark extracting value for key " + key)
}
return pairs.substring(valueStart, valueEnd);
} else {
valueEnd = pairs.indexOf(',', valueStart);
if (valueEnd < 0) { // If this is the last key value pair in string
valueEnd = pairs.length();
}
return Integer.decode(pairs.substring(valueStart, valueEnd));
}
}
Note that this solution assumes no spaces between the key, the equals sign, and the value. If these are possible you will have to create some code to travel the string between them.
Another solution is to use a regular expression parser. You could do something like (this is untested):
Pattern lookingForString = Pattern.compile(key + "[ \t]*=[ \t]*[\"]([^\"]+)[\"]");
Pattern lookingForInt = Pattern.compile(key + "[ \t]*=[ \t]*([^,]+)");
Matcher stringFinder = lookingForString.matcher(pairs);
Matcher intFinder = lookingForInt.matcher(pairs);
if (stringFinder.find()) {
return stringFinder.group(1);
} else if (intFinder.find()) {
return Integer.decode(intFinder.group(1));
} else {
logger.error("Could not extract value for key " + key);
return null;
}
HTH
To separate the string by commas, the other posters are correct. It is best to use a CSV parser (your own or OTS). Considering things like commas inside quotes etc can lead to a lot of un-considered problems.
Once you have each separate token in the form:
key = "value"
I think it is easy enough to look for the first index of '='. Then the part before that will be the key, and the part after that will be the value. Then you can store them in a Map<String, String>.
This is assuming that your keys will be simple enough, and not contain = in them etc. Sometimes it's enough to take the simple route when you can restrict the problem scope.
If you just want one value out of such a string, you can use String's indexOf() and substring() methods:
String getValue(String str, String key)
{
int keyIndex = str.indexOf(key + "=");
if(keyIndex == -1) return null;
int startIndex = str.indexOf("\"", keyIndex);
int endIndex = str.indexOf("\"", startIndex);
String value = str.substring(startIndex + 1, endIndex);
return value;
}
First thing you should use a CSV parsing library to parse the comma separated values. Correctly parsing CSV data isn't as trivial as it first seems. There are lots of good arguments to not reinvent that wheel.
This will also future proof your code and be code you don't have to test or maintain.
I know the temptation to do something like data.split(','); is strong, but it is fragile and brittle solution. For just one example, what if any of the values contain the ','.
Second thing you should do is then parse the pairs. Again the temptation to use String.split("="); will be strong, but it can be brittle and fragile if the right hand side of the = has an = in it.
I am not a blind proponent of regular expressions, but used with restraint they can be just the right tool for the job. Here is the regular expression to parse the name value pairs.
The regular expression ^(.*)\s?=\s?("?([^"]*)"?|"(.*)")$, click the regular expression to test it online interactively. This works even for multiple double quotes in the right hand side of the name value pair.
This will match only what is on the left side of the first = and everything else on the right hand side, and strip the optional " off the string values, while still matching the non-quoted number values.
Given a List<String> list of the encoded name value pairs.
final Pattern p = Pattern.compile("^(.*)\s?=\s?("?([^"]*)"?|"(.*)")$");
final Map<String, String> map = new HashMap<String, String>(list.size());
for (final String nvp : list)
{
final Matcher m = p.matcher(nvp);
m.matches();
final String name = m.group(1);
final String value = m.group(2);
System.out.format("name = %s | value = %s\n", name, value);
}
Use String.split(yourdata, ',') and you will get a String[]. Then, perform String.split(String[i],"="), on each entry to separate property and values.
Ideally, you should move this data in a Properties object instance. You can then save/load it from XML easily. It has useful methods.
REM: I am assuming that you are savvy enough to understand that this solution won't work if values contain the separator (i.e., the comma) in them...

Categories

Resources