this is a continuation of my earlier post, My code:
public class Main {
static String theFile = "C:\\Users\\Pc\\Desktop\\textfile.txt";
public static boolean validate(String input) {
boolean status = false;
String REGEX = "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
status = true;
} else {
status = false;
}
return status;
}
public static void main(String[] args) {
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader(theFile));
String line;
int count = 0;
while ((line = br.readLine()) != null) {
String[] arr = line.split("#");
for (int x = 0; x < arr.length; x++) {
if (arr[x].equals(validate(theFile))) {
count++;
}
System.out.println("no of matches " + count);
}
}
} catch (IOException e) {
e.printStackTrace();
}
Main.validate(theFile);
}
}
It shows result :
no of matches 0
no of matches 0
no of matches 0
no of matches 0
and this is my text in input file
sjbfbhbs#yahoo.com # fgfgfgf#yahoo.com # ghghgh#gamil.com #fhfbs#y.com
my output should be three emails because the last string is not a standard email format
I know I'm not suppose to pass (arr[x].validate(theFile)))
I have always used this:
public static bool Validate(string email)
{
string expressions = #"^\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*$";
return Regex.IsMatch(email, expressions);
}
Note: I also have a function that "cleans" the string should there be multiple # symbols also.
Edit: Here is how I clean out extra # symbols. Note this will keep the FIRST # it finds and just remove the rest. This function should be used BEFORE you run the validation function on it.
public static string CleanEmail(string input)
{
string output = "";
try
{
if (input.Length > 0)
{
string first_pass = Regex.Replace(input, #"[^\w\.#-]", "");
List<string> second_pass = new List<string>();
string third_pass = first_pass;
string final_pass = "";
if (first_pass.Contains("#"))
{
second_pass = first_pass.Split('#').ToList();
if (second_pass.Count >= 2)
{
string second_pass_0 = second_pass[0];
string second_pass_1 = second_pass[1];
third_pass = second_pass_0 + "#" + second_pass_1;
second_pass.Remove(second_pass_0);
second_pass.Remove(second_pass_1);
}
}
if (second_pass.Count > 0)
{
final_pass = third_pass + string.Join("", second_pass.ToArray());
}
else
{
final_pass = third_pass;
}
output = final_pass;
}
}
catch (Exception Ex)
{
}
return output;
}
There are several errors in your code:
if (arr[x].equals(validate(theFile))) checks whether the mail address string equals the boolean value you get from the validate() method. This will never be the case.
In the validate() method, if you only want to check if the string matches a regex, you can simply do that with string.matches(pattern) - so you only need one line of code (not really in error, but it's more elegant this way)
After splitting your input string (the line), there are whitespaces left, because you only split at the # symbol. You can either trim() each string afterwards to remove those (see the code below) or split() at \\s*#\\s* instead of just #
Here is an example with all the fixes (i left out the part where you read the file and used a string with your mail addresses instead!):
public class Main {
private static final String PATTERN_MAIL
= "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#" + "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
public static boolean validate(String input) {
return input.matches(PATTERN_MAIL);
}
public static void main(String[] args) {
String line = "sjbfbhbs#yahoo.com # fgfgfgf#yahoo.com # ghghgh#gamil.com #fhfbs#y.com";
String[] arr = line.split("#");
int count = 0;
for (int x = 0; x < arr.length; x++) {
if (validate(arr[x].trim())) {
count++;
}
System.out.println("no of matches " + count);
}
}
}
It prints:
no of matches 1
no of matches 2
no of matches 3
no of matches 4
EDIT: If the pattern is not supposed to match the last mail address, you'll have to change the pattern. Right now it matches all of them.
Related
I need to search for a certain String in another character-by-character string and if the characters are the same get such a character;
I'm doing it this way
public String searchForSignature(String texto2) throws NoSuchAlgorithmException {
String myString = "", foundString = "";
myString = "aeiousrtmvb257";
for (int i = 0; i < texto2.length() || i <= 1000; i++) {
char c = texto2.charAt(i);
for (int j = 0; j < myString.length(); j++) {
if (c == myString.charAt(j)) {
foundString = foundString + c;
}
}
}
return foundString;
}
I would like to improve the performance and saw that there are forms and using regular expressions, as I am still a little lay I could not succeed in the way I did.
public String searchForSignature2(String texto2) {
Pattern pattern = Pattern.compile("aeiousrtmvb257");
Matcher matcher = pattern.matcher(texto2);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
return matcher.group(1).toString();
}
Does not return anything
//edit
Really, I guess I was not very clear on the question.
Actually I need to get all the characters equal to "aeiousrtmvb257" the ones that are in the String
I did it that way, now it seems OK, I just do not know if the performance is satisfactory.
public String searchForSignature2(String texto2) {
String foundString = "";
Pattern pattern = Pattern.compile("[aeiousrtmvb257]");
Matcher matcher = pattern.matcher(texto2);
while (matcher.find()) {
System.out.println(matcher.group());
foundString+=matcher.group();
}
return foundString;
}
}
As far as I understood your question, by using Patternand Matcher this should do the trick:
Code
private static final String PATTERN_TO_FIND = "[aeiousrtmvb257]";
public static void main(String[] args) {
System.out.println(searchForSignature2("111aeiousrtmvb257111"));
}
public static String searchForSignature2(String texto2) {
Pattern pattern = Pattern.compile(PATTERN_TO_FIND);
Matcher matcher = pattern.matcher(texto2);
StringBuilder result = new StringBuilder();
while (matcher.find()) {
result.append(matcher.group());
}
return result.toString();
}
Output
aeiousrtmvb257
I don't know what was the reason behind texto2.length() || i <= 1000, but based on the logic in your method, I could suggest the below solution:
public static void main(String... args) throws IOException {
System.out.println(searchForSignature("hello"));
}
public static String searchForSignature(String texto2) {
String myString = "aeiousrtmvb257";
StringBuilder builder = new StringBuilder();
for (char s : texto2.toCharArray()) {
if (myString.indexOf(s) != -1) {
builder.append(s);
}
}
return builder.toString();
}
Output: eo
I don't get it, why would you print the string that you found
public static String searchForSignature2(String texto2) {
String maaString = "aeiousrtmvb257";
String toSearch = ".*" + maaString +".*";
boolean b = Pattern.matches(toSearch, texto2);
return b ? maaString : "";
}
public static void main(String[] args)
{
String input = "4erdhrAW BLBAJJINJOI WETSEKMsef saemfosnens3bntu is5o3n029j29i30kwq23eki4"+
"maoifmakakmkakmsmfajiwfuanyi gaeniygaenigaenigeanige anigeanjeagjnageunega"+
"movmmklmklzvxmkxzcvmoifsadoi asfugufngs"+
"wpawfmaopfwamopfwampfwampofwampfawmfwamokfesomk"+
"3rwq3rqrq3rqetgwtgwaeiousrtmvb2576266wdgdgdgdgd";
String myString = searchForSignature2(input);
System.out.println(myString);
}
you need to add .* to tell that your string is surrounded by any char
I am writing a spell checker that takes a text file as input and outputs the file with spelling corrected.
The program should preserve formatting and punctuation.
I want to split the input text into a list of string tokens such that each token is either 1 or more: word, punctuation, whitespace, or digit characters.
For example:
Input:
words.txt:
asdf don't ]'.'..;'' as12....asdf.
asdf
Input as list:
["asdf" , " " , "don't" , " " , "]'.'..;''" , " " , "as" , "12" ,
"...." , "asdf" , "." , "\n" , "asdf"]
Words like won't and i'll should be treated as a single token.
Having the data in this format would allow me to process the tokens like so:
String output = "";
for(String token : tokens) {
if(isWord(token)) {
if(!inDictionary(token)) {
token = correctSpelling(token);
}
}
output += token;
}
So my main question is how can i split a string of text into a list of substrings as described above? Thank you.
The main difficulty here would be to find the regex that matches what you consider to be a "word". For my example I consider ' to be part of a word if it's proceeded by a letter or if the following character is a letter:
public static void main(String[] args) {
String in = "asdf don't ]'.'..;'' as12....asdf.\nasdf";
//The pattern:
Pattern p = Pattern.compile("[\\p{Alpha}][\\p{Alpha}']*|'[\\p{Alpha}]+");
Matcher m = p.matcher(in);
//If you want to collect the words
List<String> words = new ArrayList<String>();
StringBuilder result = new StringBuilder();
Now find something from the start
int pos = 0;
while(m.find(pos)) {
//Add everything from starting position to beginning of word
result.append(in.substring(pos, m.start()));
//Handle dictionary logig
String token = m.group();
words.add(token); //not used actually
if(!inDictionary(token)) {
token = correctSpelling(token);
}
//Add to result
result.append(token);
//Repeat from end position
pos = m.end();
}
//Append remainder of input
result.append(in.substring(pos));
System.out.println("Result: " + result.toString());
}
Because I like solving puzzles, I tried the following and I think it works fine:
public class MyTokenizer {
private final String str;
private int pos = 0;
public MyTokenizer(String str) {
this.str = str;
}
public boolean hasNext() {
return pos < str.length();
}
public String next() {
int type = getType(str.charAt(pos));
StringBuilder sb = new StringBuilder();
while(hasNext() && (str.charAt(pos) == '\'' || type == getType(str.charAt(pos)))) {
sb.append(str.charAt(pos));
pos++;
}
return sb.toString();
}
private int getType(char c) {
String sc = Character.toString(c);
if (sc.matches("\\d")) {
return 0;
}
else if (sc.matches("\\w")) {
return 1;
}
else if (sc.matches("\\s")) {
return 2;
}
else if (sc.matches("\\p{Punct}")) {
return 3;
}
else {
return 4;
}
}
public static void main(String... args) {
MyTokenizer mt = new MyTokenizer("asdf don't ]'.'..;'' as12....asdf.\nasdf");
while(mt.hasNext()) {
System.out.println(mt.next());
}
}
}
I want to check if a letter is a emoji. I've found some similiar questions on so and found this regex:
private final String emo_regex = "([\\u20a0-\\u32ff\\ud83c\\udc00-\\ud83d\\udeff\\udbb9\\udce5-\\udbb9\\udcee])";
However, when I do the following in a sentence like:
for (int k=0; k<letters.length;k++) {
if (letters[k].matches(emo_regex)) {
emoticon.add(letters[k]);
}
}
It doesn't add any letters with any emoji. I've also tried with a Matcher and a Pattern, but that didn't work either. Is there something wrong with the regex or am I missing something obvious in my code?
This is how I get the letter:
sentence = "Jij staat op 10 π"
String[] letters = sentence.split("");
The last π should be recognized and added to emoticon
You could use emoji4j library. The following should solve the issue.
String htmlifiedText = EmojiUtils.htmlify(text);
// regex to identify html entitities in htmlified text
Matcher matcher = htmlEntityPattern.matcher(htmlifiedText);
while (matcher.find()) {
String emojiCode = matcher.group();
if (isEmoji(emojiCode)) {
emojis.add(EmojiUtils.getEmoji(emojiCode).getEmoji());
}
}
This function I created checks if given String consists of only emojis.
in other words if the String contains any character not included in the Regex, it will return false.
private static boolean isEmoji(String message){
return message.matches("(?:[\uD83C\uDF00-\uD83D\uDDFF]|[\uD83E\uDD00-\uD83E\uDDFF]|" +
"[\uD83D\uDE00-\uD83D\uDE4F]|[\uD83D\uDE80-\uD83D\uDEFF]|" +
"[\u2600-\u26FF]\uFE0F?|[\u2700-\u27BF]\uFE0F?|\u24C2\uFE0F?|" +
"[\uD83C\uDDE6-\uD83C\uDDFF]{1,2}|" +
"[\uD83C\uDD70\uD83C\uDD71\uD83C\uDD7E\uD83C\uDD7F\uD83C\uDD8E\uD83C\uDD91-\uD83C\uDD9A]\uFE0F?|" +
"[\u0023\u002A\u0030-\u0039]\uFE0F?\u20E3|[\u2194-\u2199\u21A9-\u21AA]\uFE0F?|[\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55]\uFE0F?|" +
"[\u2934\u2935]\uFE0F?|[\u3030\u303D]\uFE0F?|[\u3297\u3299]\uFE0F?|" +
"[\uD83C\uDE01\uD83C\uDE02\uD83C\uDE1A\uD83C\uDE2F\uD83C\uDE32-\uD83C\uDE3A\uD83C\uDE50\uD83C\uDE51]\uFE0F?|" +
"[\u203C\u2049]\uFE0F?|[\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE]\uFE0F?|" +
"[\u00A9\u00AE]\uFE0F?|[\u2122\u2139]\uFE0F?|\uD83C\uDC04\uFE0F?|\uD83C\uDCCF\uFE0F?|" +
"[\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA]\uFE0F?)+");
}
Example of implementation:
public static int detectEmojis(String message){
int len = message.length(), NumEmoji = 0;
// if the the given String is only emojis.
if(isEmoji(message)){
for (int i = 0; i < len; i++) {
// if the charAt(i) is an emoji by it self -> ++NumEmoji
if (isEmoji(message.charAt(i)+"")) {
NumEmoji++;
} else {
// maybe the emoji is of size 2 - so lets check.
if (i < (len - 1)) { // some Emojis are two characters long in java, e.g. a rocket emoji is "\uD83D\uDE80";
if (Character.isSurrogatePair(message.charAt(i), message.charAt(i + 1))) {
i += 1; //also skip the second character of the emoji
NumEmoji++;
}
}
}
}
return NumEmoji;
}
return 0;
}
given is a function that runs on a string (of only emojis) and return the number of emojis in it. (with the help of other answers i found here on StackOverFlow).
It seems like those emojis are two characters long, but with split("") you are splitting between each single character, thus none of those letters can be the emoji you are looking for.
Instead, you could try splitting between words:
for (String word : sentence.split(" ")) {
if (word.matches(emo_regex)) {
System.out.println(word);
}
}
But of course this will miss emojis that are joined to a word, or punctuation.
Alternatively, you could just use a Matcher to find any group in the sentence that matches the regex.
Matcher matcher = Pattern.compile(emo_regex).matcher(sentence);
while (matcher.find()) {
System.out.println(matcher.group());
}
You can use Character class for determining is letter is part of surrogate pair. There some helpful methods to deal with surrogate pairs emoji symbols, for example:
String text = "π©";
if (text.length() > 1 && Character.isSurrogatePair(text.charAt(0), text.charAt(1))) {
int codePoint = Character.toCodePoint(text.charAt(0), text.charAt(1));
char[] c = Character.toChars(codePoint);
}
Try this project simple-emoji-4j
Compatible with Emoji 12.0 (2018.10.15)
Simple with:
EmojiUtils.containsEmoji(str)
It's worth bearing in mind that Java code can be written in Unicode. So you can just do:
#Test
public void containsEmoji_detects_smileys() {
assertTrue(containsEmoji("This π is a smiley "));
assertTrue(containsEmoji("This π is a different smiley"));
assertFalse(containsEmoji("No smiley here"));
}
private boolean containsEmoji(String s) {
String pattern = ".*[ππ].*";
return s.matches(pattern);
}
Although see: Should source code be saved in UTF-8 format for discussion on whether that's a good idea.
You can split a String into Unicode codepoints in Java 8 using String.codePoints(), which returns an IntStream. That means you can do something like:
Set<Integer> emojis = new HashSet<>();
emojis.add("π".codePointAt(0));
emojis.add("π".codePointAt(0));
String s = "1π34π5";
s.codePoints().forEach( codepoint -> {
System.out.println(
new String(Character.toChars(codepoint))
+ " "
+ emojis.contains(codepoint));
});
... prints ...
1 false
π true
3 false
4 false
π true
5 false
Of course if you prefer not to have literal unicode chars in your code you can just put numbers in your set:
emojis.add(0x1F601);
Here you go -
for (String word : sentence.split("")) {
if (word.matches(emo_regex)) {
System.out.println(word);
}
}
This is how Telegram does it:
private static boolean isEmoji(String message){
return message.matches("(?:[\uD83C\uDF00-\uD83D\uDDFF]|[\uD83E\uDD00-\uD83E\uDDFF]|" +
"[\uD83D\uDE00-\uD83D\uDE4F]|[\uD83D\uDE80-\uD83D\uDEFF]|" +
"[\u2600-\u26FF]\uFE0F?|[\u2700-\u27BF]\uFE0F?|\u24C2\uFE0F?|" +
"[\uD83C\uDDE6-\uD83C\uDDFF]{1,2}|" +
"[\uD83C\uDD70\uD83C\uDD71\uD83C\uDD7E\uD83C\uDD7F\uD83C\uDD8E\uD83C\uDD91-\uD83C\uDD9A]\uFE0F?|" +
"[\u0023\u002A\u0030-\u0039]\uFE0F?\u20E3|[\u2194-\u2199\u21A9-\u21AA]\uFE0F?|[\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55]\uFE0F?|" +
"[\u2934\u2935]\uFE0F?|[\u3030\u303D]\uFE0F?|[\u3297\u3299]\uFE0F?|" +
"[\uD83C\uDE01\uD83C\uDE02\uD83C\uDE1A\uD83C\uDE2F\uD83C\uDE32-\uD83C\uDE3A\uD83C\uDE50\uD83C\uDE51]\uFE0F?|" +
"[\u203C\u2049]\uFE0F?|[\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE]\uFE0F?|" +
"[\u00A9\u00AE]\uFE0F?|[\u2122\u2139]\uFE0F?|\uD83C\uDC04\uFE0F?|\uD83C\uDCCF\uFE0F?|" +
"[\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA]\uFE0F?)+");
}
It is Line 21,026 in ChatActivity.
Unicode has an entire document on this. Emojis and emoji sequences are a lot more complicated than just a few character ranges. There are emoji modifiers (for example, skin tones), regional indicator pairs (country flags), and some special sequences like the pirate flag.
You can use Unicodeβs emoji data files to reliably find emoji characters and emoji sequences. This will work even as new complex emojis are added:
import java.net.URL;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Collection;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class EmojiCollector {
private static String emojiSequencesBaseURI;
private final Pattern emojiPattern;
public EmojiCollector()
throws IOException {
StringBuilder sequences = new StringBuilder();
appendSequencesFrom(
uriOfEmojiSequencesFile("emoji-sequences.txt"),
sequences);
appendSequencesFrom(
uriOfEmojiSequencesFile("emoji-zwj-sequences.txt"),
sequences);
emojiPattern = Pattern.compile(sequences.toString());
}
private void appendSequencesFrom(String sequencesFileURI,
StringBuilder sequences)
throws IOException {
Path sequencesFile = download(sequencesFileURI);
Pattern range =
Pattern.compile("^(\\p{XDigit}{4,6})\\.\\.(\\p{XDigit}{4,6})");
Matcher rangeMatcher = range.matcher("");
try (BufferedReader sequencesReader =
Files.newBufferedReader(sequencesFile)) {
String line;
while ((line = sequencesReader.readLine()) != null) {
if (line.trim().isEmpty() || line.startsWith("#")) {
continue;
}
int semicolon = line.indexOf(';');
if (semicolon < 0) {
continue;
}
String codepoints = line.substring(0, semicolon);
if (sequences.length() > 0) {
sequences.append("|");
}
if (rangeMatcher.reset(codepoints).find()) {
String start = rangeMatcher.group(1);
String end = rangeMatcher.group(2);
sequences.append("[\\x{").append(start).append("}");
sequences.append("-\\x{").append(end).append("}]");
} else {
Scanner scanner = new Scanner(codepoints);
while (scanner.hasNext()) {
String codepoint = scanner.next();
sequences.append("\\x{").append(codepoint).append("}");
}
}
}
}
}
private static String uriOfEmojiSequencesFile(String baseName)
throws IOException {
if (emojiSequencesBaseURI == null) {
URL readme = new URL(
"https://www.unicode.org/Public/UCD/latest/ReadMe.txt");
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(readme.openStream(), "UTF-8"))) {
String line;
while ((line = reader.readLine()) != null) {
if (line.startsWith("Public/emoji/")) {
emojiSequencesBaseURI =
"https://www.unicode.org/" + line.trim();
if (!emojiSequencesBaseURI.endsWith("/")) {
emojiSequencesBaseURI += "/";
}
break;
}
}
}
if (emojiSequencesBaseURI == null) {
// Where else can we get this reliably?
String version = "15.0";
emojiSequencesBaseURI =
"https://www.unicode.org/Public/emoji/" + version + "/";
}
}
return emojiSequencesBaseURI + baseName;
}
private static Path download(String uri)
throws IOException {
Path cacheDir;
String os = System.getProperty("os.name");
String home = System.getProperty("user.home");
if (os.contains("Windows")) {
Path appDataDir;
String appData = System.getenv("APPDATA");
if (appData != null) {
appDataDir = Paths.get(appData);
} else {
appDataDir = Paths.get(home, "AppData");
}
cacheDir = appDataDir.resolve("Local");
} else if (os.contains("Mac")) {
cacheDir = Paths.get(home, "Library", "Application Support");
} else {
cacheDir = Paths.get(home, ".cache");
String cacheHome = System.getenv("XDG_CACHE_HOME");
if (cacheHome != null) {
Path dir = Paths.get(cacheHome);
if (dir.isAbsolute()) {
cacheDir = dir;
}
}
}
String baseName = uri.substring(uri.lastIndexOf('/') + 1);
Path dataDir = cacheDir.resolve(EmojiCollector.class.getName());
Path dataFile = dataDir.resolve(baseName);
if (!Files.isReadable(dataFile)) {
Files.createDirectories(dataDir);
URL dataURL = new URL(uri);
try (InputStream data = dataURL.openStream()) {
Files.copy(data, dataFile);
}
}
return dataFile;
}
public Collection<String> getEmojisIn(String letters) {
Collection<String> emoticons = new ArrayList<>();
Matcher emojiMatcher = emojiPattern.matcher(letters);
while (emojiMatcher.find()) {
emoticons.add(emojiMatcher.group());
}
return emoticons;
}
public static void main(String[] args)
throws IOException {
EmojiCollector collector = new EmojiCollector();
for (String arg : args) {
Collection<String> emojis = collector.getEmojisIn(arg);
System.out.println(arg + " => " + String.join("", emojis));
}
}
}
Here's some java logic that relies on java.lang.Character api that I have found pretty reliably tells apart an emoji from mere 'special characters' and non-latin alphabets. Give it a try.
import static java.lang.Character.UnicodeBlock.MISCELLANEOUS_SYMBOLS;
import static java.lang.Character.UnicodeBlock.MISCELLANEOUS_TECHNICAL;
import static java.lang.Character.UnicodeBlock.VARIATION_SELECTORS;
import static java.lang.Character.codePointAt;
import static java.lang.Character.codePointBefore;
import static java.lang.Character.isSupplementaryCodePoint;
import static java.lang.Character.isValidCodePoint;
public boolean checkStringEmoji(String someString) {
if(!someString.isEmpty() && someString.length() < 5) {
int firstCodePoint = codePointAt(someString, 0);
int lastCodePoint = codePointBefore(someString, someString.length());
if (isValidCodePoint(firstCodePoint) && isValidCodePoint(lastCodePoint)) {
if (isSupplementaryCodePoint(firstCodePoint) ||
isSupplementaryCodePoint(lastCodePoint) ||
Character.UnicodeBlock.of(firstCodePoint) == MISCELLANEOUS_SYMBOLS ||
Character.UnicodeBlock.of(firstCodePoint) == MISCELLANEOUS_TECHNICAL ||
Character.UnicodeBlock.of(lastCodePoint) == VARIATION_SELECTORS
) {
// string is emoji
return true;
}
}
}
return false;
}
I have been stuck at a question where i read from a file, line by line and store those lines as a string inside an arrayList. That list is then cycled through to find an int which comes after "rings=" the pattern i use is "(?<=rings=)[0-9]{1}" I have a print statement in the following code to show me this int but it is never used meaning the method probably does not find that int. An example of where it gets the int from is.
//Event=ThermostatDay,time=12000
//Event=Bell,time=9000,rings=5
//Event=WaterOn,time=6000
The code is
for (int i = 0; i < fileToArray.size(); i++) {
try {
String friskForX = fileToArray.get(i).toString();
Matcher xTimeSeeker = rinngerPat.matcher(friskForX);
if (xTimeSeeker.group() != null) {
System.out.println("will ring more then once ");
xTimesRing = xTimeSeeker.group();
int xTimeSeekerInt = Integer.parseInt(xTimesRing);
System.out.println(xTimeSeekerInt);
}
}
//this catches it but does nothing since some files might not have x value.
catch (IllegalStateException e) { }
}
Change your pattern to: .*rings=(\\d+).*
String pattern=".*rings=(\\d+).*";
Pattern rinngerPat = Pattern.compile(pattern);
Then group(1) will have your digit
for(String line : fileToArray) {
Matcher xTimeSeeker = rinngerPat.matcher(line);
if(xTimeSeeker.find()) {
xTimesRing = xTimeSeeker.group(1);
int xTimeSeekerInt = Integer.parseInt(xTimesRing);
System.out.println(xTimeSeekerInt);
}
}
Self contained example:
public static void main(String[] args) {
List<String> fileToArray = new ArrayList<>();
fileToArray.add("Event=ThermostatDay,time=12000");
fileToArray.add("Event=Bell,time=9000,rings=5");
fileToArray.add("Event=WaterOn,time=6000");
String pattern=".*rings=(\\d+).*";
Pattern rinngerPat = Pattern.compile(pattern);
String xTimesRing;
for(String line : fileToArray) {
Matcher xTimeSeeker = rinngerPat.matcher(line);
if(xTimeSeeker.find()) {
xTimesRing = xTimeSeeker.group(1);
int xTimeSeekerInt = Integer.parseInt(xTimesRing);
System.out.println(xTimeSeekerInt);
}
}
}
I have couple of similar strings. I want to extract the numbers from them, add the numbers and convert it back to the same string format.
And the logic should be generic, i.e., it should work for any given strings.
Example:
String s1 = "1/9"; String s2 = "12/4"; The total of the above two Strings should be "13/13" (String again)
I know how to extract numbers from any given String. I referred: How to extract numbers from a string and get an array of ints?
But I don't know how to put them up back again to the same String format.
Can any one please help me over this?
Note: the string format can be anything, I have just taken an example for explanation.
Take a look at this:
public class StringTest {
public static void main(String[] args) {
String divider = "/";
String s1 = "1/9";
String s2 = "12/4";
String[] fragments1 = s1.split(divider);
String[] fragments2 = s2.split(divider);
int first = Integer.parseInt(fragments1[0]);
first += Integer.parseInt(fragments2[0]);
int second = Integer.parseInt(fragments1[1]);
second += Integer.parseInt(fragments2[1]);
String output = first + divider + second;
System.out.println(output);
}
}
The code prints:
13/13
Using a regex (and Markus' code)
public class StringTest {
public static void main(String[] args) {
String s1 = "1/9";
String s2 = "12&4";
String[] fragments1 = s1.split("[^\\d]");
String[] fragments2 = s2.split("[^\\d]");
int first = Integer.parseInt(fragments1[0]);
first += Integer.parseInt(fragments2[0]);
int second = Integer.parseInt(fragments1[1]);
second += Integer.parseInt(fragments2[1]);
String output = first + divider + second;
System.out.println(output);
}
}
You should be able to get from here to joining back from an array. If you're getting super fancy, you'll need to use regular expression capture groups and store the captured delimiters somewhere.
First, split your strings into matches and non-matches:
public static class Token {
public final String text;
public final boolean isMatch;
public Token(String text, boolean isMatch) {
this.text = text;
this.isMatch = isMatch;
}
#Override
public String toString() {
return text + ":" + isMatch;
}
}
public static List<Token> tokenize(String src, Pattern pattern) {
List<Token> tokens = new ArrayList<>();
Matcher matcher = pattern.matcher(src);
int last = 0;
while (matcher.find()) {
if (matcher.start() != last) {
tokens.add(new Token(src.substring(last, matcher.start()), false));
}
tokens.add(new Token(src.substring(matcher.start(), matcher.end()), true));
last = matcher.end();
}
if (last < src.length()) {
tokens.add(new Token(src.substring(last), false));
}
return tokens;
}
Once this is done, you can create lists you can iterate over and process.
For example, this code:
Pattern digits = Pattern.compile("\\d+");
System.out.println(tokenize("1/2", digits));
...outputs:
[1:true, /:false, 2:true]
Damn quick and dirty not relying on knowing which separator is used. You have to make sure, m1.group(2) and m2.group(2) are equal (which represents the separator).
public static void main(String[] args) {
String s1 = "1/9";
String s2 = "12/4";
Matcher m1 = Pattern.compile("(\\d+)(.*)(\\d+)").matcher(s1);
Matcher m2 = Pattern.compile("(\\d+)(.*)(\\d+)").matcher(s2);
m1.matches(); m2.matches();
int sum1 = parseInt(m1.group(1)) + parseInt(m2.group(1));
int sum2 = parseInt(m2.group(3)) + parseInt(m2.group(3));
System.out.printf("%s%s%s\n", sum1, m1.group(2), sum2);
}
Consider function:
public String format(int first, int second, String separator){
return first + separator + second;
}
then:
System.out.println(format(6, 13, "/")); // prints "6/13"
Thanks #remus. Reading your logic I was able to build the following code. This code solves the problem for any given strings having same format.
public class Test {
public static void main(String[] args) {
ArrayList<Integer> numberList1 = new ArrayList<Integer>();
ArrayList<Integer> numberList2 = new ArrayList<Integer>();
ArrayList<Integer> outputList = new ArrayList<Integer>();
String str1 = "abc 11:4 xyz 10:9";
String str2 = "abc 9:2 xyz 100:11";
String output = "";
// Extracting numbers from the two similar string
Pattern p1 = Pattern.compile("-?\\d+");
Matcher m = p1.matcher(str1);
while (m.find()) {
numberList1.add(Integer.valueOf(m.group()));
}
m = p1.matcher(str2);
while (m.find()) {
numberList2.add(Integer.valueOf(m.group()));
}
// Numbers extracted. Printing them
System.out.println("List1: " + numberList1);
System.out.println("List2: " + numberList2);
// Adding the respective indexed numbers from both the lists
for (int i = 0; i < numberList1.size(); i++) {
outputList.add(numberList1.get(i) + numberList2.get(i));
}
// Printing the summed list
System.out.println("Output List: " + outputList);
// Splitting string to segregate numbers from text and getting the format
String[] template = str1.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
// building the string back using the summed list and format
int counter = 0;
for (String tmp : template) {
if (Test.isInteger(tmp)) {
output += outputList.get(counter);
counter++;
} else {
output += tmp;
}
}
// Printing the output
System.out.println(output);
}
public static boolean isInteger(String s) {
try {
Integer.parseInt(s);
} catch (NumberFormatException e) {
return false;
}
return true;
}
}
output:
List1: [11, 4, 10, 9]
List2: [9, 2, 100, 11]
Output List: [20, 6, 110, 20]
abc 20:6 xyz 110:20