Given a string find the first embedded occurrence of an integer - java

This was asked in an interview:
Given in any string, get me the first occurence of an integer.
For example
Str98 then it should return 98
Str87uyuy232 -- it should return 87
I gave the answer as loop through the string and compared it with numeric characters, as in
if ((c >= '0') && (c <= '9'))
Then I got the index of the number, parsed it and returned it. Somehow he was not convinced.
Can any one share the best possible solution?

With a regex, it's pretty simple:
String s = new String("Str87uyuy232");
Matcher matcher = Pattern.compile("\\d+").matcher(s);
matcher.find();
int i = Integer.valueOf(matcher.group());
(Thanks to Eric Mariacher)

Using java.util.Scanner :
int res = new Scanner("Str87uyuy232").useDelimiter("\\D+").nextInt();
The purpose of a Scanner is to extract tokens from an input (here, a String). Tokens are sequences of characters separated by delimiters. By default, the delimiter of a Scanner is the whitespace, and the tokens are thus whitespace-delimited words.
Here, I use the delimiter \D+, which means "anything that is not a digit". The tokens that our Scanner can read in our string are "87" and "232". The nextInt() method will read the first one.
nextInt() throws java.util.NoSuchElementException if there is no token to read. Call the method hasNextInt() before calling nextInt(), to check that there is something to read.

There are two issues with this solution.
Consider the test cases - there are 2 characters '8' and '7', and they both form the integer 87 that you should be returning. (This is the main issue)
This is somewhat pedantic, but the integer value of the character '0' isn't necessarily less than the value of '1', '2', etc. It probably almost always is, but I imagine interviewers like to see this sort of care. A better solution would be
if (Character.isDigit(c)) { ... }
There are plenty of different ways to do this. My first thought would be:
int i = 0;
while (i < string.length() && !Character.isDigit(string.charAt(i))) i++;
int j = i;
while (j < string.length() && Character.isDigit(string.charAt(j))) j++;
return Integer.parseInt(string.substring(i, j)); // might be an off-by-1 here
Of course, as mentioned in the comments, using the regex functionality in Java is likely the best way to do this. But of course many interviewers ask you to do things like this without libraries, etc...

String input = "Str87uyuy232";
Matcher m = Pattern.compile("[^0-9]*([0-9]+).*").matcher(input);
if (m.matches()) {
System.out.println(m.group(1));
}

Just in case you wanted non-regex and not using other utilities.
here you go
public static Integer returnInteger(String s)
{
if(s== null)
return null;
else
{
char[] characters = s.toCharArray();
Integer value = null;
boolean isPrevDigit = false;
for(int i =0;i<characters.length;i++)
{
if(isPrevDigit == false)
{
if(Character.isDigit(characters[i]))
{
isPrevDigit = true;
value = Character.getNumericValue(characters[i]);
}
}
else
{
if(Character.isDigit(characters[i]))
{
value = (value*10)+ Character.getNumericValue(characters[i]);
}
else
{
break;
}
}
}
return value;
}
}

You could go to a lower level too. A quick look at ASCII values reveals that alphabetical characters start at 65. Digits go from 48 - 57. With that being the case, you can simply 'and' n character against 127 and see if that value meets a threshold, 48 - 57.
char[] s = "Str87uyuy232".toCharArray();
String fint = "";
Boolean foundNum = false;
for (int i = 0; i < s.length; i++)
{
int test = s[i] & 127;
if (test < 58 && test > 47)
{
fint += s[i];
foundNum = true;
}
else if (foundNum)
break;
}
System.out.println(fint);
Doing this wouldn't be good for the real world (different character sets), but as a puzzle solution is fun.

Related

Java: Split string by number of characters but with guarantee that string will be split only after whitespace

I want to achieve something like this.
String str = "This is just a sample string";
List<String> strChunks = splitString(str,8);
and strChunks should should be like:
"This is ","just a ","sample ","string."
Please note that string like "sample " have only 7 characters as with 8 characters it will be "sample s" which will break down my next word "string".
Also we can go with the assumption that a word will never be larger than second argument of method (which is 8 in example) because in my use case second argument is always static with value 32000.
The obvious approach that I can think of is looping thru the given string, breaking the string after 8 chars and than searching the next white space from the end. And then repeating same thing again for remaining string.
Is there any more elegant way to achieve the same. Is there any utility method already available in some standard third libraries like Guava, Apache Commons.
Splitting on "(?<=\\G.{7,}\\s)" produces the result that you need (demo).
\\G means the end of previous match; .{7,} means seven or more of any characters; \\s means a space character.
Not a standard method, but this might suit your needs
See it on http://ideone.com/2RFIZd
public static List<String> splitString(String str, int chunksize) {
char[] chars = str.toCharArray();
ArrayList<String> list = new ArrayList<String>();
StringBuilder builder = new StringBuilder();
int count = 0;
for(char character : chars) {
if(count < chunksize - 1) {
builder.append(character);
count++;
}
else {
if(character == ' ') {
builder.append(character);
list.add(builder.toString());
count = 0;
builder.setLength(0);
}
else {
builder.append(character);
count++;
}
}
}
list.add(builder.toString());
builder.setLength(0);
return list;
}
Please note, I used the human notation for string length, because that's what your sample reflects( 8 = postion 7 in string). that's why the chunksize - 1 is there.
This method takes 3 milliseconds on a text the size of http://catdir.loc.gov/catdir/enhancements/fy0711/2006051179-s.html
Splitting String using method 1.
String text="This is just a sample string";
List<String> strings = new ArrayList<String>();
int index = 0;
while (index < text.length()) {
strings.add(text.substring(index, Math.min(index + 8,text.length())));
index += 8;
}
for(String s : strings){
System.out.println("["+s+"]");
}
Splitting String using Method 2
String[] s=text.split("(?<=\\G.{"+8+"})");
for (int i = 0; i < s.length; i++) {
System.out.println("["+s[i]+"]");
}
This uses a hacked reduction to get it done without much code:
String str = "This is just a sample string";
List<String> parts = new ArrayList<>();
parts.add(Arrays.stream(str.split("(?<= )"))
.reduce((a, b) -> {
if (a.length() + b.length() <= 8)
return a + b;
parts.add(a);
return b;
}).get());
See demo using edge case input (that breaks some other answers!)
This splits after each space, then either joins up parts or adds to the list depending on the length of the pair.

How to count number of letters in sentence

I'm looking for simple way to find the amount of letters in a sentence.
All I was finding during research were ways to find a specific letter, but not from all kinds.
How I do that?
What I currently have is:
sentence = the sentence I get from the main method
count = the number of letters I want give back to the main method
public static int countletters(String sentence) {
// ....
return(count);
}
You could manually parse the string and count number of characters like:
for (index = 1 to string.length()) {
if ((value.charAt(i) >= 'A' && value.charAt(i) <= 'Z') || (value.charAt(i) >= 'a' && value.charAt(i) <= 'z')) {
count++;
}
}
//return count
A way to do this could stripping every unwanted character from the String and then check it's length. This could look like this:
public static void main(String[] args) throws Exception {
final String sentence = " Hello, this is the 1st example sentence!";
System.out.println(countletters(sentence));
}
public static int countletters(String sentence) {
final String onlyLetters = sentence.replaceAll("[^\\p{L}]", "");
return onlyLetters.length();
}
The stripped String looks like:
Hellothisisthestexamplesentence
And the length of it is 31.
This code uses String#replaceAll which accepts a Regular Expression and it uses the category \p{L} which matches every letter in a String. The construct [^...] inverts that, so it replaces every character which is not a letter with an empty String.
Regular Expressions can be expensive (for the performance) and if you are bound to have the best performance, you can try to use other methods, like iterating the String, but this solution has the much cleaner code. So if clean code counts more for you here, then feel free to use this.
Also mind that \\p{L} detects unicode letters, so this will also correctly treat letters from different alphabets, like cyrillic. Other solutions currently only support latin letters.
SMA's answer does the job, but it can be slightly improved:
public static int countLetters(String sentence) {
int count = 0;
for (int i = 0; i < sentence.length; i ++)
{
char c = Character.toUpperCase(value.charAt(i));
if (c >= 'A' && c <= 'Z')
count ++;
}
return count;
}
This is so much easy if you use lambda expression:
long count = sentence.chars().count();
working example here: ideone
use the .length() method to get the length of the string, the length is the amount of characters it contains without the nullterm
if you wish to avoid spaces do something like
String input = "The quick brown fox";
int count = 0;
for (int i=0; i<input.length(); i++) {
if (input.charAt(i) != ' ') {
++count;
}
}
System.out.println(count);
if you wish to avoid other white spaces use a regex, you can refer to this question for more details
import java.util.Scanner;
public class Main {
public static void main(String[] args){
Scanner sc=new Scanner(System.in);
String str = sc.nextLine();
int count = 0;
for (int i = 0; i < str.length(); i++) {
if (Character.isLetter(str.charAt(i)))
count++;
}
System.out.println(count);
}
}

Finding the number of words in a string [duplicate]

This question already has answers here:
how to count the exact number of words in a string that has empty spaces between words?
(9 answers)
Closed 9 years ago.
I can't seem to figure out why this doesn't work, but I may have just missed some simple logic. The method doesn't seem to find the last word when there isn't a space after it, so i'm guessing something is wrong with i == itself.length() -1 , but it seems to me that it would return true; you're on the last character and it isn't a whitespace.
public void numWords()
{
int numWords = 0;
for (int i = 1; i <= itself.length()-1; i ++)
{
if (( i == (itself.length() - 1) || itself.charAt (i) <= ' ') && itself.charAt(i-1) > ' ')
numWords ++;
}
System.out.println(numWords);
}
itself is the string. I am comparing the characters the way I am because that's how it is shown in the book, but please let me know if there are better ways.
Naïve approach: treat everything that has a space following it as a word. With that, simply count the number of elements as the result of a String#split operation.
public int numWords(String sentence) {
if(null != sentence) {
return sentence.split("\\s").length;
} else {
return 0;
}
}
Try,
int numWords = (itself==null) ? 0 : itself.split("\\s+").length;
So basically what it seems you're trying to do it to count all chunks of whitespace in a string. I'll fix up your code and use my head compiler to help you out with the problems you're experiencing.
public void numWords()
{
int numWords = 0;
// Don't check the last character as it doesn't matter if it's ' '
for (int i = 1; i < itself.length() - 1; i++)
{
// If the char is space and the next one isn't, count a new word
if (itself.charAt(i) == ' ' && itself.charAt(i - 1) != ' ') {
numWords++;
}
}
System.out.println(numWords);
}
This is a very naive algorithm and fails in a few cases, if the string ends in multiple spaces for example 'hello world ', it would count 3 words.
Note that if I was going to implement such a method I would go with a regex approach similar to Makoto's answer in order to simplify the code.
The following code fragment does job better:
if(sentence == null) {
return 0;
}
sentence = sentence.trim();
if ("".equals(sentence)) {
return 0;
}
return sentence.split("\\s+").length;
The regex \\s+ works correctly in case of several spaces. trim()
removes trailng and leading spaces Additional empty line check
prevents result 1 for empty string.

How to remove surrogate characters in Java?

I am facing a situation where i get Surrogate characters in text that i am saving to MySql 5.1. As the UTF-16 is not supported in this, I want to remove these surrogate pairs manually by a java method before saving it to the database.
I have written the following method for now and I am curious to know if there is a direct and optimal way to handle this.
Thanks in advance for your help.
public static String removeSurrogates(String query) {
StringBuffer sb = new StringBuffer();
for (int i = 0; i < query.length() - 1; i++) {
char firstChar = query.charAt(i);
char nextChar = query.charAt(i+1);
if (Character.isSurrogatePair(firstChar, nextChar) == false) {
sb.append(firstChar);
} else {
i++;
}
}
if (Character.isHighSurrogate(query.charAt(query.length() - 1)) == false
&& Character.isLowSurrogate(query.charAt(query.length() - 1)) == false) {
sb.append(query.charAt(query.length() - 1));
}
return sb.toString();
}
Here's a couple things:
Character.isSurrogate(char c):
A char value is a surrogate code unit if and only if it is either a low-surrogate code unit or a high-surrogate code unit.
Checking for pairs seems pointless, why not just remove all surrogates?
x == false is equivalent to !x
StringBuilder is better in cases where you don't need synchronization (like a variable that never leaves local scope).
I suggest this:
public static String removeSurrogates(String query) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < query.length(); i++) {
char c = query.charAt(i);
// !isSurrogate(c) in Java 7
if (!(Character.isHighSurrogate(c) || Character.isLowSurrogate(c))) {
sb.append(firstChar);
}
}
return sb.toString();
}
Breaking down the if statement
You asked about this statement:
if (!(Character.isHighSurrogate(c) || Character.isLowSurrogate(c))) {
sb.append(firstChar);
}
One way to understand it is to break each operation into its own function, so you can see that the combination does what you'd expect:
static boolean isSurrogate(char c) {
return Character.isHighSurrogate(c) || Character.isLowSurrogate(c);
}
static boolean isNotSurrogate(char c) {
return !isSurrogate(c);
}
...
if (isNotSurrogate(c)) {
sb.append(firstChar);
}
Java strings are stored as sequences of 16-bit chars, but what they represent is sequences of unicode characters. In unicode terminology, they are stored as code units, but model code points. Thus, it's somewhat meaningless to talk about removing surrogates, which don't exist in the character / code point representation (unless you have rogue single surrogates, in which case you have other problems).
Rather, what you want to do is to remove any characters which will require surrogates when encoded. That means any character which lies beyond the basic multilingual plane. You can do that with a simple regular expression:
return query.replaceAll("[^\u0000-\uffff]", "");
why not simply
for (int i = 0; i < query.length(); i++)
char c = query.charAt(i);
if(!isHighSurrogate(c) && !isLowSurrogate(c))
sb.append(c);
you probably should replace them with "?", instead of out right erasing them.
Just curious. If char is high surrogate is there a need to check the next one? It is supposed to be low surrogate. The modified version would be:
public static String removeSurrogates(String query) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < query.length(); i++) {
char ch = query.charAt(i);
if (Character.isHighSurrogate(ch))
i++;//skip the next char is it's supposed to be low surrogate
else
sb.append(ch);
}
return sb.toString();
}
if remove, all these solutions are useful
but if repalce, below is better
StringBuffer sb = new StringBuffer();
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if(Character.isHighSurrogate(c)){
sb.append('*');
}else if(!Character.isLowSurrogate(c)){
sb.append(c);
}
}
return sb.toString();

Java Stringbuilder.replace

Consider the following inputs:
String[] input = {"a9", "aa9", "a9a9", "99a99a"};
What would be the most efficient way whilst using a StringBuilder to replace any digit directly prior to a nine with the next letter after it in the alphabet?
After processing these inputs the output should be:
String[] output = {"b9", "ab9", "b9b9", "99b99a"}
I've been scratching my head for a while and the StringBuilder.setCharAt was the best method I could think of.
Any advice or suggestions would be appreciated.
Since you have to look at every character, you'll never perform better than linear in the size of the buffer. So you can just do something like
for (int i=1; buffer.length() ++i) // Note this starts at "1"
if (buffer.charAt[i] == '9')
buffer.setCharAt(i-1, buffer.getCharAt(i-1) + 1);
You can following code:
String[] input = {"a9", "aa9", "a9a9", "99a99a", "z9", "aZ9"};
String[] output = new String[input.length];
Pattern pt = Pattern.compile("([a-z])(?=9)", Pattern.CASE_INSENSITIVE);
for (int i=0; i<input.length; i++) {
Matcher mt = pt.matcher(input[i]);
StringBuffer sb = new StringBuffer();
while (mt.find()) {
char ch = mt.group(1).charAt(0);
if (ch == 'z') ch = 'a';
else if (ch == 'Z') ch = 'A';
else ch++;
mt.appendReplacement(sb, String.valueOf(ch));
}
mt.appendTail(sb);
output[i] = sb.toString();
}
System.out.println(Arrays.toString(output));
OUTPUT:
[b9, ab9, b9b9, 99b99a, a9, aA9]
You want to use a very simple state machine. For each character you're looping through in the input string, keep track of a boolean. If the character is a 9, set the boolean to true. If the character is a letter add one to the letter and set the boolean to false. Then add the character to the output stringbuilder.
For input you use a Reader. For output use a StringBuilder.
Use a 1 token look ahead parser technique. Here is some psuedoish code:
for (int index = 0; index < buffer.length(); ++index)
{
if (index < buffer.length() - 1)
{
if (buffer.charAt(index + 1) == '9')
{
char current = buffer.charAt(index) + 1; // this is probably not the best technique for this.
buffer.setCharAt(index, current);
}
}
}
another solution is for example to use
StringUtils.indexOf(String str, char searchChar, int startPos)
in a way as Ernest Friedman-Hill pointed, take this as experimental example, not the most performant

Categories

Resources