JAVA: Space delimiting all non-numerical characters in a String

JAVA: Space delimiting all non-numerical characters in a String - java

I am having some trouble with modifying Strings to be space delimited under the special case of adding spaces to all non-numerical characters.
My code must take a string representing a math equation, and split it up into it's individual parts. It does so using space delimits between values This part works great if the string is already delimited.
The problem is that I do not always get a space delimited input. To deal with this, I want to first insert these spaces so that the array is created properly.
What my code must do is take any character that is NOT a number, and add a space before and after it.
Something like this:
3*24+321 becomes 3 * 24 + 321
or
((3.0)*(2.5)) becomes ( ( 3.0 ) * ( 2.5 ) )
Obviously I need to avoid inserting space in the numbers, or 2.5 becomes 2 . 5, and then gets entered into the array as 3 elements. which it is not.
So far, I have tried using
String InputLineDelmit = InputLine.replaceAll("\B", " ");
which successfully changes a string of all letters "abcd" to "a b c d"
But it makes mistakes when it runs into numbers. Using this method, I have gotten that:
(((1)*(2))) becomes ( ( (1) * (2) ) ) ---- * The numbers must be separate from parens
12.7+3.1 becomes 1 2.7+3.1 ----- * 12.7 is split
51/3 becomes 5 1/3 ----- * same issue
and 5*4-2 does not change at all.
So, I know that \D can be used as a regular expression for all non-numbers in java. However, my attempts to implement this (by replacing, or combining it with \B above) have led either to compiler errors or it REPLACING the char with a space, not adding one.
EDIT:
==== Answered! ====
It wont let me add my own answer because I'm new, but an edit to neo108's code below (which, itself, does not work) did the job. What i did was change it to check isDigit, not isLetter, and then do nothing in that case (or in the special case of a decimal, for doubles). Else, the character is changed to have spaces on either side.
public static void main(String[] args){
String formula = "12+((13.0)*(2.5)-17*2)+(100/3)-7";
StringBuilder builder = new StringBuilder();
for (int i = 0; i < formula.length(); i++){
char c = formula.charAt(i);
char cdot = '.';
if(Character.isDigit(c) || c == cdot) {
builder.append(c);
}
else {
builder.append(" "+c+" ");
}
}
System.out.println("OUTPUT:" + builder);
}
OUTPUT: 12 + ( ( 13.0 ) * ( 2.5 ) - 17 * 2 ) + ( 100 / 3 ) - 7
However, any ideas on how to do this more succinctly, and also a decent explanation of StringBuilders, would be appreciated. Namely what is with this limit of 16 chars that I read about on javadocs, as the example above shows that you CAN have more output.

Something like this should work...
String formula = "Ab((3.0)*(2.5))";
StringBuilder builder = new StringBuilder();
for (int i = 0; i < formula.length(); i++){
char c = formula.charAt(i);
if(Character.isLetter(c)) {
builder.append(" "+c+" ");
} else {
builder.append(c);
}
}

Define the operations in your math equation + - * / () etc
Convert your equation string to char[]
Traverse through the char[] one char at a time and append the read char to a StringBuilder object.
If you encounter any character matching with the operations defined, then add a space before and after that character and then append this t o the StringBuilder object.
Well this is one of the algorithm you can implement. There might be other ways of doing it as well.

Related

replacing all cases of ISO Control characters in a string with "CTRL"

static String clean(String identifier) {
String firstString = "";
for (int i = 0; i < identifier.length(); i++)
if (Character.isISOControl(identifier.charAt(i))){
firstString = identifier.replaceAll(identifier.charAt(i),
"CTRL");
}
return firstString;
}
The logic behind the code above is to replace all instances of ISO Control characters in the string 'identifier' with "CTRL". I'm however faced with this error: "char cannot be converted to java.lang.String"
Can someone help me to solve and improve my code to produce the right output?

String#replaceAll expects a String as parameter, but it has to be a regular expression. Use String#replace instead.
EDIT: I haven't seen that you want to replace a character by some string. In that case, you can use this version of String#replace but you need to convert the character to a String, e. g. by using Character.toString.
Update
Example:
String text = "AB\003DE";
text = text.replace(Character.toString('\003'), "CTRL");
System.out.println(text);
// gives: ABCTRLDE

Code points, and Control Picture characters
I can add two points:
The char type is essentially broken since Java 2, and legacy since Java 5. Best to use code point integers when working with individual characters.
Unicode defines characters for display as placeholders for control characters. See Control Pictures section of one Wikipedia page, and see another page, Control Pictures.
For example, the NULL character at code point 0 decimal has a matching SYMBOL FOR NULL character at 9,216 decimal: ␀. To see all the Control Picture characters, use this PDF section of the Unicode standard specification.
Get an array of the code point integers representing each of the characters in your string.
int[] codePoints = myString.codePoints().toArray() ;
Loop those code points. Replace those of interest.
Here is some untested code.
int[] replacedCodePoints = new int[ codePoints.length ] ;
int index = 0 ;
for ( int codePoint : codePoints )
{
if( codePoint >= 0 && codePoint <= 32 ) // 32 is SPACE, so you may want to use 31 depending on your context.
{
replacedCodePoints[ index ] = codePoint + 9_216 ; // 9,216 is the offset to the beginning of the Control Picture character range defined in Unicode.
} else if ( codePoint == 127 ) // DEL character.
{
replacedCodePoints[ index ] = 9_249 ;
} else // Any other character, we keep as-is, no replacement.
{
replacedCodePoints[ index ] = codePoint ;
}
i ++ ; // Set up the next loop.
}
Convert code points back into text. Use StringBuilder#appendCodePoint to build up the characters of text. You can use the following stream-based code as boilerplate. For explanation, see this Question.
String result =
Arrays
.stream( replacedCodePoints )
.collect( StringBuilder::new , StringBuilder::appendCodePoint , StringBuilder::append )
.toString();

How to convert next character pair to a hex integer

where I have to first reverse two bytes then convert that pair to hex integer. I am trying to convert it like below but it is giving error. Any idea to do that ? Thanks in advance
Here complete string : http://pastebin.com/1cSCyD78
Sample string
String str = "031890";
Error Message :
java.lang.NumberFormatException: Invalid int: "0x30"
Java Code
for ( int start = 0; start < str.length(); start += 2 ) {
try {
String thisByte = new StringBuilder(str.substring(start, start+2)).reverse().toString();
thisByte = "0x" + thisByte;
int value = Integer.parseInt(thisByte, 16);
char c = (char) value;
System.out.println(c);
} catch(Exception e) {
Log.e("MainActivity", e.getMessage());
}
}
Update
StringBuilder output = new StringBuilder();
for ( int start = 0; start < str.length(); start += 2 ) {
try {
String thisByte = new StringBuilder(str.substring(start, start+2)).reverse().toString();
output.append((char)Integer.parseInt(thisByte, 16));
} catch(Exception e) {
Log.e("MainActivity", e.getMessage());
}
}
Yes I Tried without prepending a "0x" to the string and now see my output looking weird.

Your string looks like you need 4 digits from your string per character, not 2.
Given that you interpret 2 digits as a character, though, at first glance, the output you showed in the picture does seem to match the string you posted on pastebin. You do get things that look like words in the output, so it's not totally off, and the gaps between the letters come from every second pair of 2 digits being '00'.
Not sure where this string came from, but if it was also generated by converting characters in some String to Bytes, it might make sense that it's 4 digits per character, since, for example, Java's chars are 16 bits (i.e. 2 bytes, i.e. 4 digits in your String) that encode the actual unicode symbol they represent in UTF-16.
If you are working off specs that someone else provided you with, maybe when they said "2 BYTES", they actually meant "two 8-bit numbers", which correspond to 4 digits (four 4-bit nibbles) in your hex string.
But your string looks like it contains binary data as well, not just characters. Do you know what are you actually expecting to see as the answer?
Update (as per comment request):
It's a trivial change to your code, but here it is:
StringBuilder output = new StringBuilder();
for ( int start = 0; start < str.length(); start += 4 ) {
try {
String thisByte = new StringBuilder(str.substring(start, start+4)).reverse().toString();
output.append((char)Integer.parseInt(thisByte, 16));
} catch(Exception e) {
Log.e("MainActivity", e.getMessage());
}
}
All I did was replace "2" with "4". :)
Update (as per chat):
The code posted here to convert the hex-string into characters (using 4 digits per character) seems to work fine, but the hex-string does not seem to follow the convention the OP is expecting based on the specifications of the data, which caused part of the confusion.
A side-note:
If this is a public application, it is highly risky to include unencrypted SQL statements in the network traffic. If these statements are part of a request and get executed on the server, a hacker can use this to perform unwanted operations on the underlying data (e.g. stealing all phone numbers in the database). If it is merely some debug-/log-information sent to the client, it's still not a good idea as it may give hints to a hacker about the structure of your database and the way you access it, significantly simplifying a potential SQL injection attack.

Try without prepending a "0x" to the string. This prefix is only for the compiler. It's actually a shortcut for saying to use 16 as the radix.

Counting the occurrences of string in Java using string.split()

I'm new to Java. I thought I would write a program to count the occurrences of a character or a sequence of characters in a sentence. I wrote the following code. But I then saw there are some ready-made options available in Apache Commons.
Anyway, can you look at my code and say if there is any rookie mistake? I tested it for a couple of cases and it worked fine. I can think of one case where if the input is a big text file instead of a small sentence/paragraph, the split() function may end up being problematic since it has to handle a large variable. However this is my guess and would love to have your opinions.
private static void countCharInString() {
//Get the sentence and the search keyword
System.out.println("Enter a sentence\n");
Scanner in = new Scanner(System.in);
String inputSentence = in.nextLine();
System.out.println("\nEnter the character to search for\n");
String checkChar = in.nextLine();
in.close();
//Count the number of occurrences
String[] splitSentence = inputSentence.split(checkChar);
int countChar = splitSentence.length - 1;
System.out.println("\nThe character/sequence of characters '" + checkChar + "' appear(s) '" + countChar + "' time(s).");
}
Thank you :)

Because of edge cases, split() is the wrong approach.
Instead, use replaceAll() to remove all other characters then use the length() of what's left to calculate the count:
int count = input.replaceAll(".*?(" + check + "|$)", "$1").length() / check.length();
FYI, the regex created (for example when check = 'xyz'), looks like ".*?(xyz|$)", which means "everything up to and including 'xyz' or end of input", and is replaced by the captured text (either `'xyz' or nothing if it's end of input). This leaves just a string of 0-n copies the check string. Then dividing by the length of check gives you the total.
To protect against the check being null or zero-length (causing a divide-by-zero error), code defensively like this:
int count = check == null || check.isEmpty() ? 0 : input.replaceAll(".*?(" + check + "|$)", "$1").length() / check.length();

A flaw that I can immediately think of is that if your inputSentence only consists of a single occurrence of checkChar. In this case split() will return an empty array and your count will be -1 instead of 1.
An example interaction:
Enter a sentence
onlyme
Enter the character to search for
onlyme
The character/sequence of characters 'onlyme' appear(s) '-1' time(s).
A better way would be to use the .indexOf() method of String to count the occurrences like this:
while ((i = inputSentence.indexOf(checkChar, i)) != -1) {
count++;
i = i + checkChar.length();
}

split is the wrong approach for a number of reasons:
String.split takes a regular expression
Regular expressions have characters with special meanings, so you cannot use it for all characters (without escaping them). This requires an escaping function.
Performance String.split is optimized for single characters. If this were not the case, you would be creating and compiling a regular expression every time. Still, String.split creates one object for the String[] and one object for each String in it, every time that you call it. And you have no use for these objects; all you want to know is the count. Although a future all-knowing HotSpot compiler might be able to optimize that away, the current one does not - it is roughly 10 times as slow as simply counting characters as below.
It will not count correctly if you have repeating instances of your checkChar
A better approach is much simpler: just go and count the characters in the string that match your checkChar. If you think about the steps you need to take count characters, that's what you'd end up with by yourself:
public static int occurrences(String str, char checkChar) {
int count = 0;
for (int i = 0, l = str.length(); i < l; i++) {
if (str.charAt(i) == checkChar)
count++;
}
return count;
}
If you want to count the occurrence of multiple characters, it becomes slightly tricker to write with some efficiency because you don't want to create a new substring every time.
public static int occurrences(String str, String checkChars) {
int count = 0;
int offset = 0;
while ((offset = str.indexOf(checkChars, offset)) != -1) {
offset += checkChars.length();
count++;
}
return count;
}
That's still 10-12 times as fast to match a two-character string than String.split()
Warning: Performance timings are ballpark figures that depends on many circumstances. Since the difference is an order of magnitude, it's safe to say that String.split is slower in general. (Tests performed on jdk 1.8.0-b28 64-bit, using 10 million iterations, verified that results were stable and the same with and without -Xcomp, after performing tests 10 times in same JVM instances.)

Checking if a character is an integer or letter

I am modifying a file using Java. Here's what I want to accomplish:
if an & symbol, along with an integer, is detected while being read, I want to drop the & symbol and translate the integer to binary.
if an & symbol, along with a (random) word, is detected while being read, I want to drop the & symbol and replace the word with the integer 16, and if a different string of characters is being used along with the & symbol, I want to set the number 1 higher than integer 16.
Here's an example of what I mean. If a file is inputted containing these strings:
&myword
&4
&anotherword
&9
&yetanotherword
&10
&myword
The output should be:
&0000000000010000 (which is 16 in decimal)
&0000000000000100 (or the number '4' in decimal)
&0000000000010001 (which is 17 in decimal, since 16 is already used, so 16+1=17)
&0000000000000101 (or the number '9' in decimal)
&0000000000010001 (which is 18 in decimal, or 17+1=18)
&0000000000000110 (or the number '10' in decimal)
&0000000000010000 (which is 16 because value of myword = 16)
Here's what I tried so far, but haven't succeeded yet:
for (i=0; i<anyLines.length; i++) {
char[] charray = anyLines[i].toCharArray();
for (int j=0; j<charray.length; j++)
if (Character.isDigit(charray[j])) {
anyLines[i] = anyLines[i].replace("&","");
anyLines[i] = Integer.toBinaryString(Integer.parseInt(anyLines[i]);
}
else {
continue;
}
if (Character.isLetter(charray[j])) {
anyLines[i] = anyLines[i].replace("&","");
for (int k=16; j<charray.length; k++) {
anyLines[i] = Integer.toBinaryString(Integer.parseInt(k);
}
}
}
}
I hope that I am articulate enough. Any suggestions on how to accomplish this task?

Character.isLetter() //tests to see if it is a letter
Character.isDigit() //tests the character to

It looks like something you could match against a regex. I don't know Java but you should have at least one regex engine at your disposal. Then the regex would be:
regex1: &(\d+)
and
regex2: &(\w+)
or
regex3: &(\d+|\w+)
in the first case, if regex1 matches, you know you ran into a number, and that number is into the first capturing group (eg: match.group(1)). If regex2 matches, you know you have a word. You can then lookup that word into a dictionary and see what its associated number is, or if not present, add it to the dictionary and associate it with the next free number (16 + dictionary size + 1).
regex3 on the other hand will match both numbers and words, so it's up to you to see what's in the capturing group (it's just a different approach).
If neither of the regex match, then you have an invalid sequence, or you need some other action. Note that \w in a regex only matches word characters (ie: letters, _ and possibly a few other characters), so &çSomeWord or &*SomeWord won't match at all, while the captured group in &Hello.World would be just "Hello".
Regex libs usually provide a length for the matched text, so you can move i forward by that much in order to skip already matched text.

You have to somehow tokenize your input. It seems you are splitting it in lines and then analyzing each line individually. If this is what you want, okay. If not, you could simply search for & (indexOf('%')) and then somehow determine what the next token is (either a number or a "word", however you want to define word).
What do you want to do with input which does not match your pattern? Neither the description of the task nor the example really covers this.
You need to have a dictionary of already read strings. Use a Map<String, Integer>.

I would post this as a comment, but don't have the ability yet. What is the issue you are running into? Error? Incorrect Results? 16's not being correctly incremented? Also, the examples use a '%' but in your description you say it should start with a '&'.
Edit2: Was thinking it was line by line, but re-reading indicates you could be trying to find say "I went to the &store" and want it to say "I went to the &000010000". So you would want to split by whitespace and then iterate through and pass the strings into your 'replace' method, which is similar to below.
Edit1: If I understand what you are trying to do, code like this should work.
Map<String, Integer> usedWords = new HashMap<String, Integer>();
List<String> output = new ArrayList<String>();
int wordIncrementer = 16;
String[] arr = test.split("\n");
for(String s : arr)
{
if(s.startsWith("&"))
{
String line = s.substring(1).trim(); //Removes &
try
{
Integer lineInt = Integer.parseInt(line);
output.add("&" + Integer.toBinaryString(lineInt));
}
catch(Exception e)
{
System.out.println("Line was not an integer. Parsing as a String.");
String outputString = "&";
if(usedWords.containsKey(line))
{
outputString += Integer.toBinaryString(usedWords.get(line));
}
else
{
outputString += Integer.toBinaryString(wordIncrementer);
usedWords.put(line, wordIncrementer++);
}
output.add(outputString);
}
}
else
{
continue; //Nothing indicating that we should parse the line.
}
}

How about this?
String input = "&myword\n&4\n&anotherword\n&9\n&yetanotherword\n&10\n&myword";
String[] lines = input.split("\n");
int wordValue = 16;
// to keep track words that are already used
Map<String, Integer> wordValueMap = new HashMap<String, Integer>();
for (String line : lines) {
// if line doesn't begin with &, then ignore it
if (!line.startsWith("&")) {
continue;
}
// remove &
line = line.substring(1);
Integer binaryValue = null;
if (line.matches("\\d+")) {
binaryValue = Integer.parseInt(line);
}
else if (line.matches("\\w+")) {
binaryValue = wordValueMap.get(line);
// if the map doesn't contain the word value, then assign and store it
if (binaryValue == null) {
binaryValue = wordValue;
wordValueMap.put(line, binaryValue);
wordValue++;
}
}
// I'm using Commons Lang's StringUtils.leftPad(..) to create the zero padded string
String out = "&" + StringUtils.leftPad(Integer.toBinaryString(binaryValue), 16, "0");
System.out.println(out);
Here's the printout:-
&0000000000010000
&0000000000000100
&0000000000010001
&0000000000001001
&0000000000010010
&0000000000001010
&0000000000010000
Just FYI, the binary value for 10 is "1010", not "110" as stated in your original post.

Wind blowing on String

I have some basic idea on how to do this task, but I'm not sure if I'm doing it right. So we have class WindyString with metod blow. After using it :
System.out.println(WindyString.blow(
"Abrakadabra! The second chance to pass has already BEGUN! "));
we should obtain something like this :
e a e a a ea y
br k d br ! Th s c nd ch nc t p ss h s lr d B G N!
A a a a a e o o a E U
so in a nutshell in every second word we pick every vowels and move them one line above. In the second half of words we move vowels one line below.
I know I should split string to tokens with tokenizer or split method,but what next ? Create 3 arrays each representing each row ?

Yes, that's probably an easy (not very performant) way to solve the problem.
Create 3 arrays; one is filled with the actual data and 2 arrays are filled (Arrays.fill) with ' '.
Then iterate over the array containing the actual data, and keep an integer of which word you're currently at and a boolean if you already matched whitespace.
While iterating, you check if the character is a vowel or not. If it's a vowel, check the word-count (oddness/evenness) and place it in the first or third array.
When you reach a whitespace, set the boolean and increase the word count. If you reach another whitespace, check whether the whitespace is already set: if so, continue. If you match a non-whitespace, reset the whitespace boolean.
Then join all arrays together and append a new-line character between each joined array and return the string.

The simplest way is to use regex. This should be instructive:
static String blow(String s) {
String vowels = "aeiouAEIOU";
String middle = s.replaceAll("[" + vowels + "]", " ");
int flip = 0;
String[] side = { "", "" };
Scanner sc = new Scanner(s);
for (String word; (word = sc.findInLine("\\s*\\S*")) != null; ) {
side[flip] += word.replaceAll(".", " ");
side[1-flip] += word.replaceAll("[^" + vowels + "]", " ");
flip = 1-flip;
}
return String.format("|%s|%n|%s|%n|%s|", side[0], middle, side[1]);
}
I added the | characters in the output to show that this processes excess whitespaces correctly -- all three lines are guaranteed the same length, taking care of leading blanks, trailing blanks, or even ALL blanks input.
If you're not familiar with regular expressions, this is definitely a good one to start learning with.
The middle is simply the original string with all vowels replaced with spaces.
Then, side[0] and side[1] are the top and bottom lines respectively. We use the Scanner to extract every word (preserving leading and trailing spaces). The way we process each word is that in one side, everything is replaced by blanks; in the other, only non-vowels are replaced by blanks. We flip sides with every word we process.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JAVA: Space delimiting all non-numerical characters in a String - java

Something like this should work... String formula = "Ab((3.0)*(2.5))"; StringBuilder builder = new StringBuilder(); for (int i = 0; i < formula.length(); i++){ char c = formula.charAt(i); if(Character.isLetter(c)) { builder.append(" "+c+" "); } else { builder.append(c); } }

Related

replacing all cases of ISO Control characters in a string with "CTRL"

How to convert next character pair to a hex integer

Counting the occurrences of string in Java using string.split()

Checking if a character is an integer or letter

Wind blowing on String

Categories

Resources