How to preserve the punctuation when converting words to Pig Latin?

How to preserve the punctuation when converting words to Pig Latin? - java

I've been working on a Java program to convert English words to Pig Latin. I've done all the basic rules such as appending -ay, -way, etc., and special cases like question -> estionquay, rhyme -> ymerhay, and I also dealt with capitalization (Thomas -> Omasthay). However, I have one problem that I can't seem to solve: I need to preserve before-and-after punctuation. For example, What? -> Atwhay? Oh!->Ohway! "hello" -> "ellohay" and "Hello!" -> "Ellohay!" This is not a duplicate by the way, I've checked tons of pig latin questions and I cannot seem to figure out how to do it.
Here is my code so far (I've removed all the punctuation but can't figure out how to put it back in):
public static String scrub(String s)
{
String punct = ".,?!:;\"(){}[]<>";
String temp = "";
String pigged = "";
int index, index1, index2, index3 = 0;
for(int i = 0; i < s.length(); i++)
{
if(punct.indexOf(s.charAt(i)) == -1) //if s has no punctuation
{
temp+= s.charAt(i);
}
} //temp equals word without punctuation
pigged = pig(temp); //pig is the piglatin-translator method that I have already written,
//didn't want to put it here because it's almost 200 lines
for(int x = 0; x < s.length(); x++)
{
if(s.indexOf(punct)!= -1)//punctuation exists
{
index = x;
}
}
}
I get that in theory you could search the string for punctuation and that it should be near the beginning or end, so you would have to store the index and replace it after it is "piglatenized", but I keep getting confused about the for loop part. if you do index = x inside the for-loop, you're just replacing index every time the loop runs.
Help would be appreciated greatly! Also, please keep in mind I can't use shortcuts, I can use String methods and such but not things like Collections or ArrayLists (not that you'd need them here), I have to reinvent the wheel, basically. By the way, in case it wasn't clear, I already have the translating-to-piglatin thing down. I only need to preserve the punctuation before and after translating.

If you are allowed to use regular expressions, you can use the following code.
String pigSentence(String sentence) {
Matcher m = Pattern.compile("\\p{L}+").matcher(sentence);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(pig(m.group()));
}
m.appendTail();
return sb.toString();
}
In plain English, the above code is:
for each word in the sentence:
replace it with pig(word)
But if regular expressions are forbidden, you can try this:
String pigSentence(String sentence) {
char[] chars = sentence.toCharArray();
int i = 0, len = chars.length;
StringBuilder sb = new StringBuilder();
while (i < len) {
while (i < len && !Character.isLetter(chars[i]))
sb.append(chars[i++]);
int wordStart = i;
while (i < len && Character.isLetter(chars[i]))
i++;
int wordEnd = i;
if (wordStart != wordEnd) {
String word = sentence.substring(wordStart, wordEnd - wordStart);
sb.append(pig(word));
}
}
return sb.toString();
}

What you need to do is: remove punctuation if it exists, convert to pig latin, add punctuation back.
Assuming punctuation is always and the end of the string, You can check for punctuation with the following:
String punctuation = "";
for (int i = str.length() - 1; i > 0; i--) {
if (!Character.isLetter(str.charAt(i))) {
punctuation = str.charAt(i) + punctuation;
} else {
break; // Found all punctuation
}
}
str = str.substring(0, str.length() - punctuation.length()); // Remove punctuation
// Convert str to pig latin
// Append punctuation to str

I'd find it troublesome to handle punctuation separate from the translation. For punctuation at the very beginning or very end, you can save them and tag them back on after translating.
But if you remove the punctuations from the middle of the word, it will be rather difficult to replace them back to their correct location. Their indices change from the original word to the pigged word, and by a variable amount. (For some a random example, consider "Hel'lo" and "Quest'ion". The apostrophe shifts left by either 1 or 2, and you won't know which.)
How does your translation method handle punctuation? Do you really need to remove all punctuation before passing it to the translator? I'd suggest having your pigging method handle at least the punctuation in the middle of the word.

Related

Java adding modified tokens into string

I currently have a program that individually converts tokens of a string into their piglatin counterparts. However, the program needs to insert them back into the string they were taken with, with ALL of the original characters in it.
Hasta la vista baby. - the Terminator.
Hasta
astaHay
la
alay
vista
istavay
baby
abybay
the
ethay
Terminator
erminatorTay
These are all of the words and their conversions. I tried a method directly placing them back in, however accounting for missing characters and different length made it hard for me to do that. I tried to insert characters based on the length of each token added up, but that ran into complications when there were more than 1 whitespace character. How would I insert these words back into the string so it looks like this:
Astahay alay istavay abybay. - ethay Erminatortay
PigOrig = key.readLine();
String[] PigSplit = PigOrig.split("\\W+");
for(int i = 0; i < PigSplit.length; i++)
{
if(PigSplit[i] != null)
{
FinalStr += Piggy.vowelOut(PigSplit[i]); // VowelOut returns the converted word only, no trailing whitespace or punctuation
lengthtot += PigSplit[i].length();
FinalStr += PigOrig.charAt(lengthtot); // attempt at adding up the words and inserting the original punctuation that was in the string PigOrig
lengthtot ++;
}
}

If I understand your question, it is 'how do I replace each word with its translation in a string?' The simplest way is to use String.replace.
So if you have created a translate method then you could do something like:
String line = key.readLine();
for (String word: line.split("\\W+"))
line = line.replace(word, translate(word));
The advantage of this approach is that you are replacing the words in the original string not putting the words back together again.
Also note that it might be easier to translate just using pattern matching. For example:
private String translate(String word) {
Matcher match = Pattern.compile("(\\w*)([aeiou]\\w*)").match(word);
if (match.matches())
return match.group(2) + match.group(1) + "ay";
else
return word;
}

If I understand correctly that you want to translate all the words in the input, my taste would be for building the new string from scratch:
String pigOrig = key.readLine();
String[] pigSplit = pigOrig.split("\\W+");
StringBuilder buf = new StringBuilder(pigOrig.length());
buf.append(translateWord(pigSplit[0]));
for(int i = 1; i < pigSplit.length; i++) {
buf.append(' ');
buf.append(translateWord(pigSplit[i]));
}
String result = buf.toString();

String manipulation of function names

For this Kata, i am given random function names in the PEP8 format and i am to convert them to camelCase.
(input)get_speed == (output)getSpeed ....
(input)set_distance == (output)setDistance
I have a understanding on one way of doing this written in pseudo-code:
loop through the word,
if the letter is an underscore
then delete the underscore
then get the next letter and change to a uppercase
endIf
endLoop
return the resultant word
But im unsure the best way of doing this, would it be more efficient to create a char array and loop through the element and then when it comes to finding an underscore delete that element and get the next index and change to uppercase.
Or would it be better to use recursion:
function camelCase takes a string
if the length of the string is 0,
then return the string
endIf
if the character is a underscore
then change to nothing,
then find next character and change to uppercase
return the string taking away the character
endIf
finally return the function taking the first character away
Any thoughts please, looking for a good efficient way of handing this problem. Thanks :)

I would go with this:
divide given String by underscore to array
from second word until end take first letter and convert it to uppercase
join to one word
This will work in O(n) (go through all names 3 time). For first case, use this function:
str.split("_");
for uppercase use this:
String newName = substring(0, 1).toUpperCase() + stre.substring(1);
But make sure you check size of the string first...
Edited - added implementation
It would look like this:
public String camelCase(String str) {
if (str == null ||str.trim().length() == 0) return str;
String[] split = str.split("_");
String newStr = split[0];
for (int i = 1; i < split.length; i++) {
newStr += split[i].substring(0, 1).toUpperCase() + split[i].substring(1);
}
return newStr;
}
for inputs:
"test"
"test_me"
"test_me_twice"
it returns:
"test"
"testMe"
"testMeTwice"

It would be simpler to iterate over the string instead of recursing.
String pep8 = "do_it_again";
StringBuilder camelCase = new StringBuilder();
for(int i = 0, l = pep8.length(); i < l; ++i) {
if(pep8.charAt(i) == '_' && (i + 1) < l) {
camelCase.append(Character.toUpperCase(pep8.charAt(++i)));
} else {
camelCase.append(pep8.charAt(i));
}
}
System.out.println(camelCase.toString()); // prints doItAgain

The question you pose is whether to use an iterative or a recursive approach. For this case I'd go for the recursive approach because it's straightforward, easy to understand doesn't require much resources (only one array, no new stackframe etc), though that doesn't really matter for this example.
Recursion is good for divide-and-conquer problems, but I don't see that fitting the case well, although it's possible.
An iterative implementation of the algorithm you described could look like the following:
StringBuilder buf = new StringBuilder(input);
for(int i = 0; i < buf.length(); i++){
if(buf.charAt(i) == '_'){
buf.deleteCharAt(i);
if(i != buf.length()){ //check fo EOL
buf.setCharAt(i, Character.toUpperCase(buf.charAt(i)));
}
}
}
return buf.toString();
The check for the EOL is not part of the given algorithm and could be ommitted, if the input string never ends with '_'

Simplify & condense multiple editorial operations on an array. Java

I have some raw output that I want to clean up and make presentable but right now I go about it in a very ugly and cumbersome way, I wonder if anyone might know a clean and elegant way in which to perform the same operation.
int size = charOutput.size();
for (int i = size - 1; i >= 1; i--)
{
if(charOutput.get(i).compareTo(charOutput.get(i - 1)) == 0)
{
charOutput.remove(i);
}
}
for(int x = 0; x < charOutput.size(); x++)
{
if(charOutput.get(x) == '?')
{
charOutput.remove(x);
}
}
String firstOne = Arrays.toString(charOutput.toArray());
String secondOne = firstOne.replaceAll(",","");
String thirdOne = secondOne.substring(1, secondOne.length() - 1);
String output = thirdOne.replaceAll(" ","");
return output;

ZouZou has the right code for fixing the final few calls in your code. I have some suggestions for the for loops. I hope I got them right...
These work after you get the String represented by charOutput, using a method such as the one suggested by ZouZou.
Your first block appears to remove all repeated letters. You can use a regular expression for that:
Pattern removeRepeats = Pattern.compile("(.)\\1{1,}");
// "(.)" creates a group that matches any character and puts it into a group
// "\\1" gets converted to "\1" which is a reference to the first group, i.e. the character that "(.)" matched
// "{1,}" means "one or more"
// So the overall effect is "one or more of a single character"
To use:
removeRepeats.matcher(s).replaceAll("$1");
// This creates a Matcher that matches the regex represented by removeRepeats to the contents of s, and replaces the parts of s that match the regex represented by removeRepeats with "$1", which is a reference to the first group captured (i.e. "(.)", which is the first character matched"
To remove the question mark, just do
Pattern removeQuestionMarks = Pattern.compile("\\?");
// Because "?" is a special symbol in regex, you have to escape it with a backslash
// But since backslashes are also a special symbol, you have to escape the backslash too.
And then to use, do the same thing as was done above except with replaceAll("");
And you're done!
If you really wanted to, you can combine a lot of regex into two super-regex expressions (and one normal regex expression):
Pattern p0 = Pattern.compile("(\\[|\\]|\\,| )"); // removes brackets, commas, and spaces
Pattern p1 = Pattern.compile("(.)\\1{1,}"); // Removes duplicate characters
Pattern p2 = Pattern.compile("\\?");
String removeArrayCharacters = p0.matcher(charOutput.toString()).replaceAll("");
String removeDuplicates = p1.matcher(removeArrayCharacters).replaceAll("$1");
return p2.matcher(removeDuplicates).replaceAll("");

Use a StringBuilder and append each character you want, at the end just return myBuilder.toString();
Instead of this:
String firstOne = Arrays.toString(charOutput.toArray());
String secondOne = firstOne.replaceAll(",","");
String thirdOne = secondOne.substring(1, secondOne.length() - 1);
String output = thirdOne.replaceAll(" ","");
return output;
Simply do:
StringBuilder sb = new StringBuilder();
for(Character c : charOutput){
sb.append(c);
}
return sb.toString();
Note that you are doing a lot of unnecessary work (by iterating through the list and removing some elements). What you can actually do is just iterate one time and then if the condition fullfits your requirements (the two adjacent characters are not the same and no question mark) then append it to the StringBuilder directly.
This task could also be a job for a regular expression.

If you don't want to use Regex try this version to remove consecutive characters and '?':
int size = charOutput.size();
if (size == 1) return Character.toString((Character)charOutput.get(0));
else if (size == 0) return null;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < size - 1; i++) {
Character temp = (Character)charOutput.get(i);
if (!temp.equals(charOutput.get(i+1)) && !temp.equals('?'))
sb.append(temp);
}
//for the last element
if (!charOutput.get(size-1).equals(charOutput.get(size-2))
&& !charOutput.get(size-1).equals('?'))
sb.append(charOutput.get(size-1));
return sb.toString();

Is there a way to remove characters from a string? Java

I am having trouble removing letters from a string. String ALPHABET = "abcdefghjklmnopqrstuvwxyz"; User puts in a string. "klmn". How would i remove klmn from the alphabet? Is there a way? Other then putting it into an array?
This is what i started with. This only removes the last letter in the string. Whats my problem here.
for(int i = 0; i < message.length(); i++){
for(int j = 0; j < ALPHABET.length(); j++){
letter = message.charAt(i);
if(ALPHABET.charAt(j) == message.charAt(i)){
newALPHABET = ALPHABET.replace(letter, ' ');
}
}
}

Don't know what you want to do but you can use String#replace
String alphabet = "abcdefghjklmnopqrstuvwxyz";
alphabet = alphabet.replace("klmn","");

Write a method to delete it.. the logic here is replace the char you want to delete with the next char.. and in place of second one keep the third char and so on..
if you want to delete a large length of String..
then use the method Replace..

You can do that with regular expressions. Try the next:
static String ALPHABET = "abcdefghjklmnopqrstuvwxyz";
public static void main(String[] args) {
String input = JOptionPane.showInputDialog("Letters: ");
Pattern p = Pattern.compile("[" + Pattern.quote(input) +"]");
Matcher m = p.matcher(ALPHABET);
String result = m.replaceAll("");
System.out.println(result);
}

If you simply wanted to replace a character or simple substring, then String.replace is the solution.
If you wanted to replace matches a regex, then String.replaceAll is the the solution.
The reason your code is not working is because there are a couple of bugs in it:
You appear to be under the impression that String.replace(char, char) replaces a single character instance. In fact, it replaces all instance of the first character in the String.
Each loop iteration creates a new String and assigns it to newALPHABET. But then you start again with ALPHABET on the next iteration.
If the aim is to produce an "alphabet" that excludes the letters in message, then the correct solution is something like this:
for (int i = 0; i < message.length(); i++) {
ALPHABET = ALPHABET.replace(message.charAt(i), ' ');
}
... except that you should NOT use ALPHABET as the name of a variable. It should be alphabet!!!

Removing duplicate same characters in a row

I am trying to create a method which will either remove all duplicates from a string or only keep the same 2 characters in a row based on a parameter.
For example:
helllllllo -> helo
or
helllllllo -> hello - This keeps double letters
Currently I remove duplicates by doing:
private String removeDuplicates(String word) {
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < word.length(); i++) {
char letter = word.charAt(i);
if (buffer.length() == 0 && letter != buffer.charAt(buffer.length() - 1)) {
buffer.append(letter);
}
}
return buffer.toString();
}
If I want to keep double letters I was thinking of having a method like private String removeDuplicates(String word, boolean doubleLetter)
When doubleLetter is true it will return hello not helo
I'm not sure of the most efficient way to do this without duplicating a lot of code.

why not just use a regex?
public class RemoveDuplicates {
public static void main(String[] args) {
System.out.println(new RemoveDuplicates().result("hellllo", false)); //helo
System.out.println(new RemoveDuplicates().result("hellllo", true)); //hello
}
public String result(String input, boolean doubleLetter){
String pattern = null;
if(doubleLetter) pattern = "(.)(?=\\1{2})";
else pattern = "(.)(?=\\1)";
return input.replaceAll(pattern, "");
}
}
(.) --> matches any character and puts in group 1.
?= --> this is called a positive lookahead.
?=\\1 --> positive lookahead for the first group
So overall, this regex looks for any character that is followed (positive lookahead) by itself. For example aa or bb, etc. It is important to note that only the first character is part of the match actually, so in the word 'hello', only the first l is matched (the part (?=\1) is NOT PART of the match). So the first l is replaced by an empty String and we are left with helo, which does not match the regex
The second pattern is the same thing, but this time we look ahead for TWO occurrences of the first group, for example helllo. On the other hand 'hello' will not be matched.
Look here for a lot more: Regex
P.S. Fill free to accept the answer if it helped.

try
String s = "helllllllo";
System.out.println(s.replaceAll("(\\w)\\1+", "$1"));
output
helo

Taking this previous SO example as a starting point, I came up with this:
String str1= "Heelllllllllllooooooooooo";
String removedRepeated = str1.replaceAll("(\\w)\\1+", "$1");
System.out.println(removedRepeated);
String keepDouble = str1.replaceAll("(\\w)\\1{2,}", "$1");
System.out.println(keepDouble);
It yields:
Helo
Heelo
What it does:
(\\w)\\1+ will match any letter and place it in a regex capture group. This group is later accessed through the \\1+. Meaning that it will match one or more repetitions of the previous letter.
(\\w)\\1{2,} is the same as above the only difference being that it looks after only characters which are repeated more than 2 times. This leaves the double characters untouched.
EDIT:
Re-read the question and it seems that you want to replace multiple characters by doubles. To do that, simply use this line:
String keepDouble = str1.replaceAll("(\\w)\\1+", "$1$1");

Try this, this will be most efficient way[Edited after comment]:
public static String removeDuplicates(String str) {
int checker = 0;
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - 'a';
if ((checker & (1 << val)) == 0)
buffer.append(str.charAt(i));
checker |= (1 << val);
}
return buffer.toString();
}
I am using bits to identify uniqueness.
EDIT:
Whole logic is that if a character has been parsed then its corrresponding bit is set and next time when that character comes up then it will not be added in String Buffer the corresponding bit is already set.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to preserve the punctuation when converting words to Pig Latin? - java

Related

Java adding modified tokens into string

String manipulation of function names

Simplify & condense multiple editorial operations on an array. Java

Is there a way to remove characters from a string? Java

Removing duplicate same characters in a row

Categories

Resources