How to count number of symbols like #,#,+ etc in Java - java

I'm trying to write a code to count number of letters,characters,space and symbols in a String. But I don't know how to count Symbols.
Is there any such function available in java?

That very much depends on your definition of the term symbol.
A straight forward solution could be something like
Set<Character> SYMBOLS = Set.of('#', ' ', ....
for (int i=0; i < someString.length(); i++} {
if (SYMBOLS.contains(someString.charAt(i)) {
That iterates the chars someString, and checks each char whether it can be found within that predefined SYMBOLS set.
Alternatively, you could use a regular expression to define "symbols", or, you can rely on a variety of existing definitions. When you check the regex Pattern language for java, you can find
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]
for example. And various other shortcuts that denote this or that set of characters already.

Please post what you have tried so far
If you need the count of individual characters - you better iterate the string and use a map to track the character with its count
Or
You can use a regex if just the overall count would enough like below
while (matcher.find() ) {count++}

One way of doing it would be to just iterate over the String and compare each character to their ASCII value
String str = "abcd!##";
for(int i=0;i<str.length();i++)
{
if(33==str.charAt(i))
System.out.println("Found !");
}
lookup here for ASCII values https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html

Related

regex- Remove repeated letters in Arabic text by StringToWordVector filter

I want to remove repeated Arabic letters in my text. How can I do this using regex in Java? I've tried different regex but it removes all the Arabic letters from my text! help plz.
BTW, I am using regex with StringToWordVector filter like what happening in here
This how I applied it:
filter.setStopwordsHandler(new RegExStopwords("([^\\u1F600-\\u1F6FF\\s].*|[A-Za-z0-9].*|[٠-٩].*|[\\u0617-\\u061A\\u064B-\\u0652].*|[ؐ-ًؚٟ].*|[/(آ|إ|أ)/g, 'ا']|[/(ة)/g, 'ه']|[/(ئ|ؤ)/g, 'ء']|[/(ى)/g, 'ي']|[/([^\\u0621-\\u063A\\u0641-\\u064A\\u0660-\\u0669])/g, '']")); So I tried the answers mentioned with .replaceAll() function but it did not work with me, or actually I did not know how to fit them in my code correctly.
I would more readily use a loop
String str = "hello"
char prevChar = ' ';
String result = "";
for(char ch : str.toCharArray()){
if(ch != prevChar)
result += ""+ch //concat casts to string for us
prevChar = ch
}
would return helo (with the repeated l removed)
EDIT:
If you would like to use a filter, the correct regex should be
/(.)(?<=\1{2,})/ig
(tested on refiddle set to .NET, they don't have java)
The first group, (.) captures any character
The next group (?<=\1{2,}) is broken down as follows:
\1 captures the character specified in the first group ((.))
{2,} captures any group that repeats twice
the ?<= is a look behind, which basically says we want to check the next term (\1) exists, but we don't wish to capture it.
So we're left with only {2,} which is captured, which is what you want
For more help, try these Stack Overflow links
Regular expression to match any character being repeated more than 10 times
Match Sequence using RegEx After a Specified Character
Best of luck!

Java Split regex

Given a string S, find the number of words in that string. For this problem a word is defined by a string of one or more English letters.
Note: Space or any of the special characters like ![,?.\_'#+] will act as a delimiter.
Input Format: The string will only contain lower case English letters, upper case English letters, spaces, and these special characters: ![,?._'#+].
Output Format: On the first line, print the number of words in the string. The words don't need to be unique. Then, print each word in a separate line.
My code:
Scanner sc = new Scanner(System.in);
String str = sc.nextLine();
String regex = "( |!|[|,|?|.|_|'|#|+|]|\\\\)+";
String[] arr = str.split(regex);
System.out.println(arr.length);
for(int i = 0; i < arr.length; i++)
System.out.println(arr[i]);
When I submit the code, it works for just over half of the test cases. I do not know what the test cases are. I'm asking for help with the Murphy's law. What are the situations where the regex I implemented won't work?
You don't escape some special characters in your regex. Let's start with []. Since you don't escape them, the part [|,|?|.|_|'|#|+|] is treated like a set of characters |,?._'#+. This means that your regex doesn't split on [ and ].
For example x..]y+[z is split to x, ]y and [z.
You can fix that by escaping those characters. That will force you to escape more of them and you end up with a proper definition:
String regex = "( |!|\\[|,|\\?|\\.|_|'|#|\\+|\\])+";
Note that instead of defining alternatives, you could use a set which will make your regex easier to read:
String regex = "[!\\[,?._'#+\\].]+";
In this case you only need to escape [ and ].
UPDATE:
There's also a problem with leading special character (like in your example ".Hi?there[broski.]#####"). You need to split on it but it produces an empty string in the results. I don't think there's a way to use split function without producing it but you can mitigate it by removing the first group before splitting using the same regex:
String[] arr = str.replaceFirst(regex, "").split(regex);

Determine if a String contains an odd number of quotation marks

I'm trying to write a Regex expression that can determine if a string contains an odd number of " - quotation marks.
An answerer on this question has accomplished something very similar for determining if a string of letters contains an odd number of a certain letter. However I am having trouble adapting it to my problem.
What I have so far, but is not exactly working:
String regexp = "(\\b[^\"]*\"(([^\"]*\"){2})*[^\"]*\\b)";
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher("bbacac");
if(matcher.find()){
System.out.println("Found");
}
else
System.out.println("Not Found");
Regex is a fairly poor solution for this. <-- I though you were talking about nesting, not pair matching.
Iterating over all characters in the string, counting instances of " would be a faster and more efficient way to achieve this.
int quoteCount = 0;
for(char ch : inputString.toCharArray())
{
if(ch == '"') quoteCount++;
}
boolean even = quoteCount % 2 == 0;
If you want a regex, this is simple to accomplish:
boolean oddQuotes = subjectString.matches("[^\"]*\"(?:[^\"]*\"[^\"]*\")*[^\"]*");
Explanation: (without all the Java quote escapes):
[^"]*" # Match any number of non-quote characters, then a quote
(?: # Now match an even number of quotes by matching:
[^"]*" # any number of non-quote characters, then a quote
[^"]*" # twice
)* # and repeat any number of times.
[^"]* # Finally, match any remaining non-quote characters
So far, this is probably slower than a simple "count the quotes" solution. But we can do one better: We can design the regex to also handle escaped quotes, i. e. not to count a quote if it's preceded by an odd number of backslashes:
boolean oddQuotes = subjectString.matches("(?:\\\\.|[^\\\\\"])*\"(?:(?:\\\\.|[^\\\\\"])*\"(?:\\\\.|[^\\\\\"])*\")*(?:\\\\.|[^\\\\\"])*");
Now admittedly, this looks horrible, but mainly because of Java's string escaping rules. The actual regex is straightforward:
(?: # Match either
\\. # an escaped character
| # or
[^\\"] # a character except backslash or quote
)* # any number of times.
" # Then match a quote.
(?: # The rest of the regex works just the same way (as above)
(?:\\.|[^\\"])*"
(?:\\.|[^\\"])*"
)*
(?:\\.|[^\\"])*
Don't use regex for this. Just iterate through the characters in the string and count the "". It's going to be a lot more efficient. It's an O(n) algorithm.
Especially if it's simple and make the solution a lot easier to read than some obscure regex pattern.
boolean odd = false;
for(int i=0; i<s.length(); i++) {
if(s.chartAt(i) == '\"') odd != odd;
}
Or, use a regex, replace everything except for quotation marks with empty strings, and check the length of the result.
You can use split and check if the nubmer of elements in the returned array is even or odd to gauge the odd or even-ness of that character's frequency
String s = ".. what ever is in your string";
String[] parts = s.split("\"");
if(parts.size()%2){
//String has odd number of quotes
}else{
//String has even number of quotes
}
I would have to say it probably better to just count the number of "s manually, but if you really want a regular expression, here is one that should work:
"(^(([^\"]*\"){2})*[^\"]*$)"
I just bound the expression to the front and back of the string and make sure there are only pairs of "s, blindly absorbing anything not a " between them.

How to tokenize in java without using the java.util tokenizer?

Consider the following as tokens:
+, -, ), (
alpha charactors and underscore
integer
Implement 1.getToken() - returns a string corresponding to the next token
2.getTokPos() - returns the position of the current token in the input string
Example input: (a+b)-21)
Output: (| a| +| b| )| -| 21| )|
Note: Cannot use the java string tokenizer class
Work in progress - Successfully tokenized +,-,),(. Need to figure out characters and numbers:
OUTPUT: +|-|+|-|(|(|)|)|)|(| |
java.util tokenizer is a deprecated class.
Tokenizing Strings in Java is much easier with "String.split()" since Java 1.4 :
String[] tokens = "(a+b)-21)".split("[+-)(]");
If it is a homework, you probably have to reimplement a "split" method:
read the String character by character
if the character is not a special char, add it to a buffer
when you encounter a special char, add the buffer content to a list and clear the buffer
Since it is (probably) a homework, I let you implement it.
Java lets you examine the characters in a String one by one with the charAt method. So use that in a for loop and examine each character. When you encounter a TOKEN you wrap that token with the pipes and any other character you just append to the output.
public static final char PLUS_TOKEN = '+';
// add all tokens as
public String doStuff(String input)
{
StringBuilder output = new StringBuilder();
for (int index = 0; index < input.length(); index++)
{
if (input.charAt(index) == PLUS_TOKEN)
{
// when you see a token you need to append the pipes (|) around it
output.append('|');
output.append(input.charAt(index);
output.append('|');
}
else if () //compare the current character with all tokens
else
{
// just add to new output
output.append(input.charAt(index);
}
}
return output.toString();
}
If it's not a homework assignment use String.split(). If is a homework assignment, say so and tag it so that we can give the appropriate level of help (I did so for you, just in case...).
Because the string needs to be cut in several different ways, not just on whitespace or parens, using the String.split method with any of the symbols there will not work. Split removes the character used as a seperator. You could try to split on the empty string, but this wouldn't get compound symbols, like 21. To correctly parse this string, you will need to effectively implement your own tokenizer. Try thinking about how you could tell you had a complete token if you looked at the string one character at a time. You could probably start a string that collects the characters until you have identified a complete token, and then you can remove the characters from the original and return the string. Starting from this point, you can probably make a basic tokenizer.
If you'd rather learn how to make a full strength tokenizer, most of them are defined by creating a regular expression that only matches the tokens.

Java, Make sure a String contains only alphanumeric, spaces and dashes

In Java, I need to make sure a String only contains alphanumeric, space and dash characters.
I found the class org.apache.commons.lang.StringUtils and the almost adequate method isAlphanumericSpace(String)... but I also need to include dashes.
What is the best way to do this? I don't want to use Regular Expressions.
You could use:
StringUtils.isAlphanumericSpace(string.replace('-', ' '));
Hum... just program it yourself using String.chatAt(int), it's pretty easy...
Iterate through all char in the string using a position index, then compare it using the fact that ASCII characters 0 to 9, a to z and A to Z use consecutive codes, so you only need to check that character x numerically verifies one of the conditions:
between '0' and '9'
between 'a' and 'z'
between 'A and 'Z'
a space ' '
a hyphen '-'
Here is a basic code sample (using CharSequence, which lets you pass a String but also a StringBuilder as arg):
public boolean isValidChar(CharSequence seq) {
int len = seq.length();
for(int i=0;i<len;i++) {
char c = seq.charAt(i);
// Test for all positive cases
if('0'<=c && c<='9') continue;
if('a'<=c && c<='z') continue;
if('A'<=c && c<='Z') continue;
if(c==' ') continue;
if(c=='-') continue;
// ... insert more positive character tests here
// If we get here, we had an invalid char, fail right away
return false;
}
// All seen chars were valid, succeed
return true;
}
Just iterate through the string, using the character-class methods in java.lang.Character to test whether each character is acceptable or not. Which is presumably all that the StringUtils methods do, and regular expressions are just a way of driving a generalised engine to do much the same.
You have 1 of 2 options:
1. Compose a list of chars that CAN be in the string, then loop over the string checking to make sure each character IS in the list.
2. Compose a list of chars that CANNOT be in the string, then loop over the string checking to make sure each character IS NOT in the list.
Choose whatever option is quicker to compose the list.
Definitely use a regex expression. There's no point in writing your own system when a very comprehensive system in place for this exact task. If you need to learn about or brush up on regex then check out this website, it's great: http://regexr.com
I would challenge yourself on this one.

Categories

Resources