Java adding modified tokens into string

Java adding modified tokens into string - java

I currently have a program that individually converts tokens of a string into their piglatin counterparts. However, the program needs to insert them back into the string they were taken with, with ALL of the original characters in it.
Hasta la vista baby. - the Terminator.
Hasta
astaHay
la
alay
vista
istavay
baby
abybay
the
ethay
Terminator
erminatorTay
These are all of the words and their conversions. I tried a method directly placing them back in, however accounting for missing characters and different length made it hard for me to do that. I tried to insert characters based on the length of each token added up, but that ran into complications when there were more than 1 whitespace character. How would I insert these words back into the string so it looks like this:
Astahay alay istavay abybay. - ethay Erminatortay
PigOrig = key.readLine();
String[] PigSplit = PigOrig.split("\\W+");
for(int i = 0; i < PigSplit.length; i++)
{
if(PigSplit[i] != null)
{
FinalStr += Piggy.vowelOut(PigSplit[i]); // VowelOut returns the converted word only, no trailing whitespace or punctuation
lengthtot += PigSplit[i].length();
FinalStr += PigOrig.charAt(lengthtot); // attempt at adding up the words and inserting the original punctuation that was in the string PigOrig
lengthtot ++;
}
}

If I understand your question, it is 'how do I replace each word with its translation in a string?' The simplest way is to use String.replace.
So if you have created a translate method then you could do something like:
String line = key.readLine();
for (String word: line.split("\\W+"))
line = line.replace(word, translate(word));
The advantage of this approach is that you are replacing the words in the original string not putting the words back together again.
Also note that it might be easier to translate just using pattern matching. For example:
private String translate(String word) {
Matcher match = Pattern.compile("(\\w*)([aeiou]\\w*)").match(word);
if (match.matches())
return match.group(2) + match.group(1) + "ay";
else
return word;
}

If I understand correctly that you want to translate all the words in the input, my taste would be for building the new string from scratch:
String pigOrig = key.readLine();
String[] pigSplit = pigOrig.split("\\W+");
StringBuilder buf = new StringBuilder(pigOrig.length());
buf.append(translateWord(pigSplit[0]));
for(int i = 1; i < pigSplit.length; i++) {
buf.append(' ');
buf.append(translateWord(pigSplit[i]));
}
String result = buf.toString();

Related

Separating String on invisible character (tab, carriage return, group separator, etc)?

I am trying to split a String in an Android app on certain characters. The characters are pound sign, comma, semicolon, tab, carriage return, group separator, unit separator, and record separator.
Here's how I'm doing the splitting:
private ArrayList<String> splitdata(String data, String delimiter){
ArrayList<String> fields = new ArrayList<>();
int i = 0; int previous = 0; int index = 0;
boolean first = true;
while (i != -1) {
i = data.indexOf(delimiter,i);
if(i != -1){
if (first) {
fields.add(data.substring(0, i));
first = false;
} else {
fields.add(data.substring(previous + delimiter.length(), i));
}
Log.d(SCANNED_INTENT_TAG,"Newly found field: " + fields.get(index));
index++;
previous = i;
i += delimiter.length();
}
}
if (previous < (data.length()-1) && !first) {
fields.add(data.substring(previous+1));
Log.d(SCANNED_INTENT_TAG,"Newly found field: " + fields.get(index));
}
return fields;
}
This works for visible characters that I can enter from the keyboard, such as the pound sign, comma, and semicolon. However, I cannot get it to detect the special characters tab, carriage return, group separator, unit separator, or record separator. I'm passing them in like this:
some_arraylist = splitdata(some_str,"\t");
some_arraylist = splitdata(some_str,"\r");
some_arraylist = splitdata(some_str,Character.toString((char) 31));
some_arraylist = splitdata(some_str,Character.toString((char) 29));
some_arraylist = splitdata(some_str,Character.toString((char) 28));
What should I be passing them in as?

I'm not sure if you aware of it but the String class already has a split function that accepts regex and will return your split string data as a String array. Looking at your code, it seems like you are not doing anything different from what split would do. Furthermore, the split function handles regex which is really powerful and really complex to implement yourself. Use the tried and true method already included in the JDK.

How to preserve the punctuation when converting words to Pig Latin?

I've been working on a Java program to convert English words to Pig Latin. I've done all the basic rules such as appending -ay, -way, etc., and special cases like question -> estionquay, rhyme -> ymerhay, and I also dealt with capitalization (Thomas -> Omasthay). However, I have one problem that I can't seem to solve: I need to preserve before-and-after punctuation. For example, What? -> Atwhay? Oh!->Ohway! "hello" -> "ellohay" and "Hello!" -> "Ellohay!" This is not a duplicate by the way, I've checked tons of pig latin questions and I cannot seem to figure out how to do it.
Here is my code so far (I've removed all the punctuation but can't figure out how to put it back in):
public static String scrub(String s)
{
String punct = ".,?!:;\"(){}[]<>";
String temp = "";
String pigged = "";
int index, index1, index2, index3 = 0;
for(int i = 0; i < s.length(); i++)
{
if(punct.indexOf(s.charAt(i)) == -1) //if s has no punctuation
{
temp+= s.charAt(i);
}
} //temp equals word without punctuation
pigged = pig(temp); //pig is the piglatin-translator method that I have already written,
//didn't want to put it here because it's almost 200 lines
for(int x = 0; x < s.length(); x++)
{
if(s.indexOf(punct)!= -1)//punctuation exists
{
index = x;
}
}
}
I get that in theory you could search the string for punctuation and that it should be near the beginning or end, so you would have to store the index and replace it after it is "piglatenized", but I keep getting confused about the for loop part. if you do index = x inside the for-loop, you're just replacing index every time the loop runs.
Help would be appreciated greatly! Also, please keep in mind I can't use shortcuts, I can use String methods and such but not things like Collections or ArrayLists (not that you'd need them here), I have to reinvent the wheel, basically. By the way, in case it wasn't clear, I already have the translating-to-piglatin thing down. I only need to preserve the punctuation before and after translating.

If you are allowed to use regular expressions, you can use the following code.
String pigSentence(String sentence) {
Matcher m = Pattern.compile("\\p{L}+").matcher(sentence);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(pig(m.group()));
}
m.appendTail();
return sb.toString();
}
In plain English, the above code is:
for each word in the sentence:
replace it with pig(word)
But if regular expressions are forbidden, you can try this:
String pigSentence(String sentence) {
char[] chars = sentence.toCharArray();
int i = 0, len = chars.length;
StringBuilder sb = new StringBuilder();
while (i < len) {
while (i < len && !Character.isLetter(chars[i]))
sb.append(chars[i++]);
int wordStart = i;
while (i < len && Character.isLetter(chars[i]))
i++;
int wordEnd = i;
if (wordStart != wordEnd) {
String word = sentence.substring(wordStart, wordEnd - wordStart);
sb.append(pig(word));
}
}
return sb.toString();
}

What you need to do is: remove punctuation if it exists, convert to pig latin, add punctuation back.
Assuming punctuation is always and the end of the string, You can check for punctuation with the following:
String punctuation = "";
for (int i = str.length() - 1; i > 0; i--) {
if (!Character.isLetter(str.charAt(i))) {
punctuation = str.charAt(i) + punctuation;
} else {
break; // Found all punctuation
}
}
str = str.substring(0, str.length() - punctuation.length()); // Remove punctuation
// Convert str to pig latin
// Append punctuation to str

I'd find it troublesome to handle punctuation separate from the translation. For punctuation at the very beginning or very end, you can save them and tag them back on after translating.
But if you remove the punctuations from the middle of the word, it will be rather difficult to replace them back to their correct location. Their indices change from the original word to the pigged word, and by a variable amount. (For some a random example, consider "Hel'lo" and "Quest'ion". The apostrophe shifts left by either 1 or 2, and you won't know which.)
How does your translation method handle punctuation? Do you really need to remove all punctuation before passing it to the translator? I'd suggest having your pigging method handle at least the punctuation in the middle of the word.

How to prevent CR/LF?

I am reading from a pdf using pdfbox and apparently, at least on a Windows-based framework, for the line break it uses a unicode as such 
&#10.
My question is that how can I prevent this line breaking character to be concatenated to the string in below code?
tokenizer =new StringTokenizer(Text,"\\.");
while(tokenizer.hasMoreTokens())
{
String x= tokenizer.nextToken();
flag=0;
for(final String s :x.split(" ")) {
if(flag==1)
break;
if(Keyword.toLowerCase().equals(s.toLowerCase()) && !"".equals(s)) {
sum+=x+"."; //here need first to check for "
&#10"
// before concatenating the String "x" to String "sum"
flag=1;
}
}
}

You should discard the line separators when you split; e.g.
for (final String s : x.split("\\s+")) {
That is making the word separator one or more whitespace characters.
(Using trim() won't work in all cases. Suppose that x contains "word\r\nword". You won't split between the two words, and s will be "word\r\nword" at some point. Then s.trim() won't remove the line break characters because they are not at the ends of the string.)
UPDATE
I just spotted that you are actually appending x not s. So you also need to do something like this:
sum += x.replaceAll("\\s+", " ") + "."
That does a bit more than you asked for. It replaces each whitespace sequence with a single space.
By the way, your code would be simpler and more efficient if you used a break to get out of the loop rather than messing around with a flag. (And Java has a boolean type ... for heavens sake!)
if (Keyword.toLowerCase().equals(s.toLowerCase()) && !"".equals(s)) {
sum += ....
break;
}

Are you sure you want to be adding x here?
if(Keyword.toLowerCase().equals(s.toLowerCase()) && !"".equals(s)) {
sum+=x+"."; //here need first to check for "
&#10"
// before concatenating the String "x" to String "sum"
flag=1;
}
Don't you want s?
sum += s + ".";
UPDATE
Oh, I see. So what you really want is something more like:
tokenizer = new StringTokenizer(Text,"\\.");
Pattern KEYWORD = Pattern.compile("\\b"+Keyword+"\\b", Pattern.CASE_INSENSITIVE);
StringBuilder sb = new StringBuilder(sum);
while(tokenizer.hasMoreTokens())
{
String x = tokenizer.nextToken();
if (KEYWORD.matcher(x).find()) {
sb.append(x.replaceAll("\\s+", " ")).append('.');
}
}
sum = sb.toString();
(Assuming Keyword starts and ends with letters, and doesn't itself contain any RegEx codes)

Simplify & condense multiple editorial operations on an array. Java

I have some raw output that I want to clean up and make presentable but right now I go about it in a very ugly and cumbersome way, I wonder if anyone might know a clean and elegant way in which to perform the same operation.
int size = charOutput.size();
for (int i = size - 1; i >= 1; i--)
{
if(charOutput.get(i).compareTo(charOutput.get(i - 1)) == 0)
{
charOutput.remove(i);
}
}
for(int x = 0; x < charOutput.size(); x++)
{
if(charOutput.get(x) == '?')
{
charOutput.remove(x);
}
}
String firstOne = Arrays.toString(charOutput.toArray());
String secondOne = firstOne.replaceAll(",","");
String thirdOne = secondOne.substring(1, secondOne.length() - 1);
String output = thirdOne.replaceAll(" ","");
return output;

ZouZou has the right code for fixing the final few calls in your code. I have some suggestions for the for loops. I hope I got them right...
These work after you get the String represented by charOutput, using a method such as the one suggested by ZouZou.
Your first block appears to remove all repeated letters. You can use a regular expression for that:
Pattern removeRepeats = Pattern.compile("(.)\\1{1,}");
// "(.)" creates a group that matches any character and puts it into a group
// "\\1" gets converted to "\1" which is a reference to the first group, i.e. the character that "(.)" matched
// "{1,}" means "one or more"
// So the overall effect is "one or more of a single character"
To use:
removeRepeats.matcher(s).replaceAll("$1");
// This creates a Matcher that matches the regex represented by removeRepeats to the contents of s, and replaces the parts of s that match the regex represented by removeRepeats with "$1", which is a reference to the first group captured (i.e. "(.)", which is the first character matched"
To remove the question mark, just do
Pattern removeQuestionMarks = Pattern.compile("\\?");
// Because "?" is a special symbol in regex, you have to escape it with a backslash
// But since backslashes are also a special symbol, you have to escape the backslash too.
And then to use, do the same thing as was done above except with replaceAll("");
And you're done!
If you really wanted to, you can combine a lot of regex into two super-regex expressions (and one normal regex expression):
Pattern p0 = Pattern.compile("(\\[|\\]|\\,| )"); // removes brackets, commas, and spaces
Pattern p1 = Pattern.compile("(.)\\1{1,}"); // Removes duplicate characters
Pattern p2 = Pattern.compile("\\?");
String removeArrayCharacters = p0.matcher(charOutput.toString()).replaceAll("");
String removeDuplicates = p1.matcher(removeArrayCharacters).replaceAll("$1");
return p2.matcher(removeDuplicates).replaceAll("");

Use a StringBuilder and append each character you want, at the end just return myBuilder.toString();
Instead of this:
String firstOne = Arrays.toString(charOutput.toArray());
String secondOne = firstOne.replaceAll(",","");
String thirdOne = secondOne.substring(1, secondOne.length() - 1);
String output = thirdOne.replaceAll(" ","");
return output;
Simply do:
StringBuilder sb = new StringBuilder();
for(Character c : charOutput){
sb.append(c);
}
return sb.toString();
Note that you are doing a lot of unnecessary work (by iterating through the list and removing some elements). What you can actually do is just iterate one time and then if the condition fullfits your requirements (the two adjacent characters are not the same and no question mark) then append it to the StringBuilder directly.
This task could also be a job for a regular expression.

If you don't want to use Regex try this version to remove consecutive characters and '?':
int size = charOutput.size();
if (size == 1) return Character.toString((Character)charOutput.get(0));
else if (size == 0) return null;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < size - 1; i++) {
Character temp = (Character)charOutput.get(i);
if (!temp.equals(charOutput.get(i+1)) && !temp.equals('?'))
sb.append(temp);
}
//for the last element
if (!charOutput.get(size-1).equals(charOutput.get(size-2))
&& !charOutput.get(size-1).equals('?'))
sb.append(charOutput.get(size-1));
return sb.toString();

Java String replace with string in same position

I'm having problems with strings and I need a solution I'm trying to replace characters found at a certain position with a character found in also the same position for example
private String wordNormalize(String enteredWord,String dictionary){
String normalizedWord = null;
// remove empty spaces at beginning and at the end of the word and change to lower case
normalizedWord = enteredWord.trim().toLowerCase();
//normalize term, removing all punctuation marks
normalizedWord = normalizedWord.replaceAll("["+punctuationMarks2+"]", "[b,v]");
//normalize word removing to character if dictionary has english lang
normalizedWord = normalizedWord.replaceFirst("to ", " ");
//normalizeWord if dictionary has german
if(normalizedWord.length() > 0){
normalizedWord.replace("a,b,c","t,u,v");
/*for(int i = 0;i<normalizedWord.length();i++){
char currentChar = normalizedWord.charAt(i); // currently typed character
String s1= Character.toString(currentChar);
for(int j = 0;j<specialCharacters.length;j++){
s1.replaceAll("[ "+specialCharacters[i]+" ]",""+replaceCharactersDe[i]+"");
}
= str.replace("a,b,c","t,u,v");
}*/
}
//normalize term removing special characters and replacing them
/*for(int i = 0; i > specialCharacters.length;i++){
if(normalizedWord.equals(specialCharacters[i])){
normalizedWord = replaceCharactersDe[i];
}
}*/
return normalizedWord;
}
So if a user enters a its replaced with t and if a user enters b its replaced with u and if the user enters c it will be replaced with v and only in that order is this possible and if it is show me the right way its supposed to be done

It is not clear to me what you are trying to approach with
normalizedWord = normalizedWord.replaceAll("["+punctuationMarks2+"]", "[b,v]");
It does not seem right, but i don't know how to fix it because I don't know what it's trying to do. I guess what you are looking for is
normalizedWord = normalizedWord.replaceAll("\\p{Punct}", "");
On the other part you are doing nothing, because Strings are immutable. You want to do something like
normalizedWord = normalizedWord.replace("a,b,c","t,u,v");
but that would replace all occurrences of the substring "a,b,c" with the string "t,u,v"-
What you want is:
normalizedWord = normalizedWord.replace('a', 't');
normalizedWord = normalizedWord.replace('b', 'u');
normalizedWord = normalizedWord.replace('c', 'v');
We could work on a more general solution, but you have to show us how the dictionary , which is a String, is formatted.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java adding modified tokens into string - java

Related

Separating String on invisible character (tab, carriage return, group separator, etc)?

How to preserve the punctuation when converting words to Pig Latin?

How to prevent CR/LF?

Simplify & condense multiple editorial operations on an array. Java

Java String replace with string in same position

Categories

Resources