Splitting a string in the middle of a words issue

Splitting a string in the middle of a words issue - java

I have automated some flow of filling in a form from a website by taking the fields data from a csv.
Now, for the address there are 3 fields in the form:
Address 1 ____________
Address 2 ____________
Address 3 ____________
Each field have a limit of 35 characters, so whenever I get to 35 characters im continuing the address string in the second address field...
Now, the issue is that my current solution will split it but it will cut the word if it got to 35 chars, for instant, if the word 'barcelona' in the str and 'o' is the 35th char so in the address 2 will be 'na'.
in that case I want to identify if the 35th char is a middle of a word and take the whole word to the next field.
this is my current solution:
private def enterAddress(purchaseInfo: PurchaseInfo) = {
val webElements = driver.findElements(By.className("address")).toList
val strings = purchaseInfo.supplierAddress.grouped(35).toList
strings.zip(webElements).foreach{
case (text, webElement) => webElement.sendKeys(text)
}
}
I would appreciate some help here, preferably with Scala but java will be fine as well :)
thanks allot!

Since you said you'd accept Java code as well... the following code will wrap a given input string to several lines of a given maximum length:
import java.util.ArrayList;
import java.util.List;
public class WordWrap {
public static void main(String[] args) {
String input = "This is a rather long address, somewhere in a small street in Barcelona";
List<String> wrappedLines = wrap(input, 35);
for (String line : wrappedLines) {
System.out.println(line);
}
}
private static List<String> wrap(String input, int maxLength) {
String[] words = input.split(" ");
List<String> lines = new ArrayList<String>();
StringBuilder sb = new StringBuilder();
for (String word : words) {
if (sb.length() == 0) {
// Note: Will not work if a *single* word already exceeds maxLength
sb.append(word);
} else if (sb.length() + word.length() < maxLength) {
// Use < maxLength as we add +1 space.
sb.append(" " + word);
} else {
// Line is full
lines.add(sb.toString());
// Restart
sb = new StringBuilder(word);
}
}
// Add the last line
if (sb.length() > 0) {
lines.add(sb.toString());
}
return lines;
}
}
Output:
This is a rather long address,
somewhere in a small street in
Barcelona
This is not necessarily the best approach, but I guess you'll have to adapt it to Scala anyway.
If you prefer a library solution (because... why re-invent the wheel?) you can also have a look at WordUtils.wrap() from Apache Commons.

Words in the English language are delimited by space (or other punctuation, but that is irrelevant in this case unless you actually want to wrap lines based on that), and there are a couple of options for using this to your advantage:
One thing you could potentially do is take a substring of 35 characters from your string, use String.lastIndexOf to figure out where the space is, and add only up to that space to your address line, then repeating the process starting from that space character until you have entered the string.
Another method (showcased in Marvin's answer) is to just use String.split on spaces and concatenate them back together until the next word would cause the string to exceed 35 characters.

Related

Java change full name to initial. last name

I have a database of player names that i need converted for me to be able to further work with them (for example: I need Antonio Brown converted to A. Brown). My problem is that there are also names that only consist of the first name (for example Antonio) Therefore i get an ArrayIndexOutOfBoundsException: 1, is there another way to get what i want and why does it even with the if condition stil split?
if(spalte[1].contains(" ")){
String[] me = spalte[0].split(" ", 2);
String na = me[0].substring(0);
name = na + ". " + me[1];
} else {
name = spalte[1];
}

Firstly, I highly recommend you to keep your code formatted and variables named properly. It helps not only others to understand a snippet better but also makes debugging a bit easier.
While working with arrays and String::split, you have to be careful with indices because they might overflow easily.
Do you need to make the code handle multiple spaces: Antonio Light Brown -> A. L. Brown? The steps are simple and practically the same for any number of names:
Split by a space delimiter
Shorten the n-1 first partitions
Concatenate the String back
Here is the code:
String split[] = name.trim().split(" "); // Trim the multiple spaces inside to avoid empty parts
StringBuilder sb = new StringBuilder(); // StringBuilder builds the String
for (int i=0; i<split.length; i++) { // Iterate the parts
if (i<split.length -1) { // If not the last part
sb.append(split[i].charAt(0)).append(". "); // Append the first letter and a dot
} else sb.append(split[i]); // Or else keep the entire word
}
System.out.println(sb.toString()); // StringBuilder::toString returns a composed String
Hypothetically: How would you handle names such as O'Neil or de Anthony? You can include the conditional concatenation in the for-loop.

Java splitting string at index without cutting the word [duplicate]

This question already has answers here:
Large string split into lines with maximum length in java
(8 answers)
Closed 4 years ago.
I was just wondering it here is an API or some easy and quick way to split String at given index into String[] array but if there is a word at that index then put it to other String.
So lets say I have a string: "I often used to look out of the window, but I rarely do that anymore"
The length of that string is 68 and I have to cut it at 36, which is in this given sentence n, but now it should split the word at the so that the array would be ["I often used to look out of the", "window, but I rarely do that anymore"].
And if the new sentence is longer than 36 then it should be split aswell, so if I had a bit longer sentence: "I often used to look out of the window, but I rarely do that anymore, even though I liked it"
Would be ["I often used to look out of the", "window, but I rarely do that anymore", ",even though I liked it"]

Here's an old-fashioned, non-stream, non-regex solution:
public static List<String> chunk(String s, int limit)
{
List<String> parts = new ArrayList<String>();
while(s.length() > limit)
{
int splitAt = limit-1;
for(;splitAt>0 && !Character.isWhitespace(s.charAt(splitAt)); splitAt--);
if(splitAt == 0)
return parts; // can't be split
parts.add(s.substring(0, splitAt));
s = s.substring(splitAt+1);
}
parts.add(s);
return parts;
}
This doesn't trim additional spaces either side of the split point. Also, if a string cannot be split, because it doesn't contain any whitespace in the first limit characters, then it gives up and returns the partial result.
Test:
public static void main(String[] args)
{
String[] tests = {
"This is a short string",
"This sentence has a space at chr 36 so is a good test",
"I often used to look out of the window, but I rarely do that anymore, even though I liked it",
"I live in Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch",
};
int limit = 36;
for(String s : tests)
{
List<String> chunks = chunk(s, limit);
for(String st : chunks)
System.out.println("|" + st + "|");
System.out.println();
}
}
Output:
|This is a short string|
|This sentence has a space at chr 36|
|so is a good test|
|I often used to look out of the|
|window, but I rarely do that|
|anymore, even though I liked it|
|I live in|

This matches between 1 and 30 characters repetitively (greedy) and requires a whitespace behind each match.
public static List<String> chunk(String s, int size) {
List<String> chunks = new ArrayList<>(s.length()/size+1);
Pattern pattern = Pattern.compile(".{1," + size + "}(=?\\s|$)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
chunks.add(matcher.group());
}
return chunks;
}
Note that it doesn't work if there's a long string (>size) whitout whitespace.

Java extracting substring from sentences

There are combination of words like is, is not, does not contain. We have to match these words in a sentence and have to split it.
Intput : if name is tom and age is not 45 or name does not contain tom then let me know.
Expected output:
If name is
tom and age is not
45 or name does not contain
tom then let me know
I tried below code to split and extract but the occurrence of "is" is in "is not" as well which my code is not able to find out:
public static void loadOperators(){
operators.add("is");
operators.add("is not");
operators.add("does not contain");
}
public static void main(String[] args) {
loadOperators();
for(String s : operators){
System.out.println(str.split(s).length - 1);
}
}

Since there could be multiple occurence of a word split wouldn't solve your use case, as in is and is not being different operators for you. You would ideally :
Iterate :
1. Find the index of the 'operator'.
2. Search for the next space _ or word.
3. Then update your string as substring from its index to length-1.

I am not entirely sure about what you try to achieve, but let's give it a shot.
For your case, a simple "workaround" might work just fine:
Sort the operators by their length, descending. This way the "largest match" will get found first. You can define "largest" as either literally the longest string, or preferably the number of words (number of spaces contained), so is a has precedence over contains
You'll need to make sure that no matches overlap though, which can be done by comparing all matches' start and end indices and discarding overlaps by some criteria, like first match wins

This code does what you seem to be wanting to do (or what I guessed you are wanting to do):
public static void main(String[] args) {
List<String> operators = new ArrayList<>();
operators.add("is");
operators.add("is not");
operators.add("does not contain");
String input = "if name is tom and age is not 45 or name does not contain tom then let me know.";
List<String> output = new ArrayList<>();
int lastFoundOperatorsEndIndex = 0; // First start at the beginning of input
for (String operator : operators){
int indexOfOperator = input.indexOf(operator); // Find current operator's position
if (indexOfOperator > -1) { // If operator was found
int thisOperatorsEndIndex = indexOfOperator + operator.length(); // Get length of operator and add it to the index to include operator
output.add(input.substring(lastFoundOperatorsEndIndex, thisOperatorsEndIndex).trim()); // Add operator to output (and remove trailing space)
lastFoundOperatorsEndIndex = thisOperatorsEndIndex; // Update startindex for next operator
}
}
output.add(input.substring(lastFoundOperatorsEndIndex, input.length()).trim()); // Add rest of input as last entry to output
for (String part : output) { // Output to console
System.out.println(part);
}
}
But it is highly dependant on the order of the sentence and the operators. If we're talking about user-input, the task will be much more complicated.
A better method using regular expressions (regExp) would be:
public static void main(String... args) {
// Define inputs
String input1 = "if name is tom and age is not 45 or name does not contain tom then let me know.";
String input2 = "the name is tom and he is 22 years old but the name does not contain jack, but merry is 24 year old.";
// Output split strings
for (String part : split(input1)) {
System.out.println(part.trim());
}
System.out.println();
for (String part : split(input2)) {
System.out.println(part.trim());
}
}
private static String[] split(String input) {
// Define list of operators - 'is not' has to precede 'is'!!
String[] operators = { "\\sis not\\s", "\\sis\\s", "\\sdoes not contain\\s", "\\sdoes contain\\s" };
// Concatenate operators to regExp-String for search
StringBuilder searchString = new StringBuilder();
for (String operator : operators) {
if (searchString.length() > 0) {
searchString.append("|");
}
searchString.append(operator);
}
// Replace all operators by operator+\n and split resulting string at \n-character
return input.replaceAll("(" + searchString.toString() + ")", "$1\n").split("\n");
}
Notice the order of the operators! 'is' has to come after 'is not' or 'is not' will always be split.
You can prevent this by using a negative lookahead for the operator 'is'.
So "\\sis\\s" would become "\\sis(?! not)\\s" (reading like: "is", not followed by a " not").
A minimalist Version (with JDK 1.6+) could look like this:
private static String[] split(String input) {
String[] operators = { "\\sis(?! not)\\s", "\\sis not\\s", "\\sdoes not contain\\s", "\\sdoes contain\\s" };
return input.replaceAll("(" + String.join("|", operators) + ")", "$1\n").split("\n");
}

Word Count no duplicates

Here is my word count program using java. I need to reprogram this so that something, something; something? something! and something count as one word. That means it should not count the same word twice irregardless of case and punctuation.
import java.util.Scanner;
public class WordCount1
{
public static void main(String[]args)
{
final int Lines=6;
Scanner in=new Scanner (System.in);
String paragraph = "";
System.out.println( "Please input "+ Lines + " lines of text.");
for (int i=0; i < Lines; i+=1)
{
paragraph=paragraph+" "+in.nextLine();
}
System.out.println(paragraph);
String word="";
int WordCount=0;
for (int i=0; i<paragraph.length()-1; i+=1)
{
if (paragraph.charAt(i) != ' ' || paragraph.charAt(i) !=',' || paragraph.charAt(i) !=';' || paragraph.charAt(i) !=':' )
{
word= word + paragraph.charAt(i);
if(paragraph.charAt(i+1)==' ' || paragraph.charAt(i) ==','|| paragraph.charAt(i) ==';' || paragraph.charAt(i) ==':')
{
WordCount +=1;
word="";
}
}
}
System.out.println("There are "+WordCount +" words ");
}
}

Since this is homework, here are some hints and advice.
There is a clever little method called String.split that splits a string into parts, using a separator specified as a regular expression. If you use it the right way, this will give you a one line solution to the "word count" problem. (If you've been told not to use split, you can ignore that ... though it is the simple solution that a seasoned Java developer would consider first.)
Format / indent your code properly ... before you show it to other people. If your instructor doesn't deduct marks for this, he / she isn't doing his job properly.
Use standard Java naming conventions. The capitalization of Lines is incorrect. It could be LINES for a manifest constant or lines for variable, but a mixed case name starting with a capital letter should always be a class name.
Be consistent in your use of white space characters around operators (including the assignment operator).
It is a bad idea (and completely unnecessary) to hard wire the number of lines of input that the user must supply. And you are not dealing with the case where he / supplies less than 6 lines.

You should just remove punctuation and change to a single case before doing further processing. (Be careful with locales and unicode)
Once you have broken the input into words, you can count the number of unique words by passing them into a Set and checking the size of the set.

Here You Go. This Works. Just Read The Comments And You Should Be Able To Follow.
import java.util.Arrays;
import java.util.HashSet;
import javax.swing.JOptionPane;
// Program Counts Words In A Sentence. Duplicates Are Not Counted.
public class WordCount
{
public static void main(String[]args)
{
// Initialize Variables
String sentence = "";
int wordCount = 1, startingPoint = 0;
// Prompt User For Sentence
sentence = JOptionPane.showInputDialog(null, "Please input a sentence.", "Input Information Below", 2);
// Remove All Punctuations. To Check For More Punctuations Just Add Another Replace Statement.
sentence = sentence.replace(",", "").replace(".", "").replace("?", "");
// Convert All Characters To Lowercase - Must Be Done To Compare Upper And Lower Case Words.
sentence = sentence.toLowerCase();
// Count The Number Of Words
for (int i = 0; i < sentence.length(); i++)
if (sentence.charAt(i) == ' ')
wordCount++;
// Initialize Array And A Count That Will Be Used As An Index
String[] words = new String[wordCount];
int count = 0;
// Put Each Word In An Array
for (int i = 0; i < sentence.length(); i++)
{
if (sentence.charAt(i) == ' ')
{
words[count] = sentence.substring(startingPoint,i);
startingPoint = i + 1;
count++;
}
}
// Put Last Word In Sentence In Array
words[wordCount - 1] = sentence.substring(startingPoint, sentence.length());
// Put Array Elements Into A Set. This Will Remove Duplicates
HashSet<String> wordsInSet = new HashSet<String>(Arrays.asList(words));
// Format Words In Hash Set To Remove Brackets, And Commas, And Convert To String
String wordsString = wordsInSet.toString().replace(",", "").replace("[", "").replace("]", "");
// Print Out None Duplicate Words In Set And Word Count
JOptionPane.showMessageDialog(null, "Words In Sentence:\n" + wordsString + " \n\n" +
"Word Count: " + wordsInSet.size(), "Sentence Information", 2);
}
}

If you know the marks you want to ignore (;, ?, !) you could do a simple String.replace to remove the characters out of the word. You may want to use String.startsWith and String.endsWith to help
Convert you values to lower case for easier matching (String.toLowercase)
The use of a 'Set' is an excellent idea. If you want to know how many times a particular word appears you could also take advantage of a Map of some kind

You'll need to strip out the punctuation; here's one approach: Translating strings character by character
The above can also be used to normalize the case, although there are probably other utilities for doing so.
Now all of the variations you describe will be converted to the same string, and thus be recognized as such. As pretty much everyone else has suggested, as set would be a good tool for counting the number of distinct words.

What your real problem is, is that you want to have a Distinct wordcount, so, you should either keep track of which words allready encountered, or delete them from the text entirely.
Lets say that you choose the first one, and store the words you already encountered in a List, then you can check against that list whether you allready saw that word.
List<String> encounteredWords = new ArrayList<String>();
// continue after that you found out what the word was
if(!encounteredWords.contains(word.toLowerCase()){
encounteredWords.add(word.toLowerCase());
wordCount++;
}
But, Antimony, made a interesting remark as well, he uses the property of a Set to see what the distinct wordcount is. It is defined that a set can never contain duplicates, so if you just add more of the same word, the set wont grow in size.
Set<String> wordSet = new HashSet<String>();
// continue after that you found out what the word was
wordSet.add(word.toLowerCase());
// continue after that you scanned trough all words
return wordSet.size();

remove all punctuations
convert all strings to lowercase OR uppercase
put those strings in a set
get the size of the set

As you parse your input string, store it word by word in a map data structure. Just ensure that "word", "word?" "word!" all are stored with the key "word" in the map, and increment the word's count whenever you have to add to the map.

Checking if a character is an integer or letter

I am modifying a file using Java. Here's what I want to accomplish:
if an & symbol, along with an integer, is detected while being read, I want to drop the & symbol and translate the integer to binary.
if an & symbol, along with a (random) word, is detected while being read, I want to drop the & symbol and replace the word with the integer 16, and if a different string of characters is being used along with the & symbol, I want to set the number 1 higher than integer 16.
Here's an example of what I mean. If a file is inputted containing these strings:
&myword
&4
&anotherword
&9
&yetanotherword
&10
&myword
The output should be:
&0000000000010000 (which is 16 in decimal)
&0000000000000100 (or the number '4' in decimal)
&0000000000010001 (which is 17 in decimal, since 16 is already used, so 16+1=17)
&0000000000000101 (or the number '9' in decimal)
&0000000000010001 (which is 18 in decimal, or 17+1=18)
&0000000000000110 (or the number '10' in decimal)
&0000000000010000 (which is 16 because value of myword = 16)
Here's what I tried so far, but haven't succeeded yet:
for (i=0; i<anyLines.length; i++) {
char[] charray = anyLines[i].toCharArray();
for (int j=0; j<charray.length; j++)
if (Character.isDigit(charray[j])) {
anyLines[i] = anyLines[i].replace("&","");
anyLines[i] = Integer.toBinaryString(Integer.parseInt(anyLines[i]);
}
else {
continue;
}
if (Character.isLetter(charray[j])) {
anyLines[i] = anyLines[i].replace("&","");
for (int k=16; j<charray.length; k++) {
anyLines[i] = Integer.toBinaryString(Integer.parseInt(k);
}
}
}
}
I hope that I am articulate enough. Any suggestions on how to accomplish this task?

Character.isLetter() //tests to see if it is a letter
Character.isDigit() //tests the character to

It looks like something you could match against a regex. I don't know Java but you should have at least one regex engine at your disposal. Then the regex would be:
regex1: &(\d+)
and
regex2: &(\w+)
or
regex3: &(\d+|\w+)
in the first case, if regex1 matches, you know you ran into a number, and that number is into the first capturing group (eg: match.group(1)). If regex2 matches, you know you have a word. You can then lookup that word into a dictionary and see what its associated number is, or if not present, add it to the dictionary and associate it with the next free number (16 + dictionary size + 1).
regex3 on the other hand will match both numbers and words, so it's up to you to see what's in the capturing group (it's just a different approach).
If neither of the regex match, then you have an invalid sequence, or you need some other action. Note that \w in a regex only matches word characters (ie: letters, _ and possibly a few other characters), so &çSomeWord or &*SomeWord won't match at all, while the captured group in &Hello.World would be just "Hello".
Regex libs usually provide a length for the matched text, so you can move i forward by that much in order to skip already matched text.

You have to somehow tokenize your input. It seems you are splitting it in lines and then analyzing each line individually. If this is what you want, okay. If not, you could simply search for & (indexOf('%')) and then somehow determine what the next token is (either a number or a "word", however you want to define word).
What do you want to do with input which does not match your pattern? Neither the description of the task nor the example really covers this.
You need to have a dictionary of already read strings. Use a Map<String, Integer>.

I would post this as a comment, but don't have the ability yet. What is the issue you are running into? Error? Incorrect Results? 16's not being correctly incremented? Also, the examples use a '%' but in your description you say it should start with a '&'.
Edit2: Was thinking it was line by line, but re-reading indicates you could be trying to find say "I went to the &store" and want it to say "I went to the &000010000". So you would want to split by whitespace and then iterate through and pass the strings into your 'replace' method, which is similar to below.
Edit1: If I understand what you are trying to do, code like this should work.
Map<String, Integer> usedWords = new HashMap<String, Integer>();
List<String> output = new ArrayList<String>();
int wordIncrementer = 16;
String[] arr = test.split("\n");
for(String s : arr)
{
if(s.startsWith("&"))
{
String line = s.substring(1).trim(); //Removes &
try
{
Integer lineInt = Integer.parseInt(line);
output.add("&" + Integer.toBinaryString(lineInt));
}
catch(Exception e)
{
System.out.println("Line was not an integer. Parsing as a String.");
String outputString = "&";
if(usedWords.containsKey(line))
{
outputString += Integer.toBinaryString(usedWords.get(line));
}
else
{
outputString += Integer.toBinaryString(wordIncrementer);
usedWords.put(line, wordIncrementer++);
}
output.add(outputString);
}
}
else
{
continue; //Nothing indicating that we should parse the line.
}
}

How about this?
String input = "&myword\n&4\n&anotherword\n&9\n&yetanotherword\n&10\n&myword";
String[] lines = input.split("\n");
int wordValue = 16;
// to keep track words that are already used
Map<String, Integer> wordValueMap = new HashMap<String, Integer>();
for (String line : lines) {
// if line doesn't begin with &, then ignore it
if (!line.startsWith("&")) {
continue;
}
// remove &
line = line.substring(1);
Integer binaryValue = null;
if (line.matches("\\d+")) {
binaryValue = Integer.parseInt(line);
}
else if (line.matches("\\w+")) {
binaryValue = wordValueMap.get(line);
// if the map doesn't contain the word value, then assign and store it
if (binaryValue == null) {
binaryValue = wordValue;
wordValueMap.put(line, binaryValue);
wordValue++;
}
}
// I'm using Commons Lang's StringUtils.leftPad(..) to create the zero padded string
String out = "&" + StringUtils.leftPad(Integer.toBinaryString(binaryValue), 16, "0");
System.out.println(out);
Here's the printout:-
&0000000000010000
&0000000000000100
&0000000000010001
&0000000000001001
&0000000000010010
&0000000000001010
&0000000000010000
Just FYI, the binary value for 10 is "1010", not "110" as stated in your original post.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Splitting a string in the middle of a words issue - java

Related

Java change full name to initial. last name

Java splitting string at index without cutting the word [duplicate]

Java extracting substring from sentences

Word Count no duplicates

Checking if a character is an integer or letter

Categories

Resources