Optimal solution of removing duplicates from an unsorted string

Optimal solution of removing duplicates from an unsorted string - java

I am working on a interview problem on removing duplicate characters from a string.
The naive solution actually is more difficult to implement, that is using two for-loops to check each index with a current index.
I tried this problems a couple times, with the first attempt only working on sorted strings i.e. aabbcceedfg that is O(n).
I then realized I could use a HashSet. This solution's time complexuty is O(n) as well, but uses two Java library classes such as StringBuffer and HashSet, making its space complexity not that great.
public static String duplicate(String s) {
HashSet<Character> dup = new HashSet<Character>();
StringBuffer string = new StringBuffer();
for(int i = 0; i < s.length() - 1; i++) {
if(!dup.contains(s.charAt(i))){
dup.add(s.charAt(i));
string.append(s.charAt(i));
}
}
return string.toString();
}
I was wondering - is this solution optimal and valid for a technical interview? If it's not the most optimal, what is the better method?
I did Google a lot for the most optimal solution to this problem, however, most solutions used too many Java-specific libraries that are totally not valid in an interview context.

You can't improve on the complexity but you can optimize the code while keeping the same complexity.
Use a BitSet instead of a HashSet (or even just a boolean[]) - there are only 65536 different characters, which fits in 8Kb. Each bit means "whether you have seen the character before".
Set the StringBuffer to a specified size - a very minor improvement
Bugfix: your for-loop ended at i < s.length() - 1 but it should end at i < s.length(), else it will ignore the last character of the string.
-
public static String duplicate(String s) {
BitSet bits = new BitSet();
StringBuffer string = new StringBuffer(s.length());
for (int i = 0; i < s.length(); i++) {
if (!bits.get(s.charAt(i))) {
bits.set(s.charAt(i));
string.append(s.charAt(i));
}
}
return string.toString();
}

When using sets/maps, don't forget that almost all methods return values. For example, Set.add returns whether it was actually added. Set.remove returns whether it was actually removed. Map.put and Map.remove return the previous value. Using this you don't need to query the set twice, just change to if(dup.add(s.charAt(i))) ....
The second improvement from the performance point of view could be to dump the String into char[] array and process it manually without any StringBuffer/StringBuilder:
public static String duplicate(String s) {
HashSet<Character> dup = new HashSet<Character>();
char[] chars = s.toCharArray();
int i=0;
for(char ch : chars) {
if(dup.add(ch))
chars[i++] = ch;
}
return new String(chars, 0, i);
}
Note that we are writing result in the same array which we are iterating. This works as resulting position never exceeds iterating position.
Of course using BitSet as suggested by #ErwinBolwidt would be even more performant in this case:
public static String duplicate(String s) {
BitSet dup = new BitSet();
char[] chars = s.toCharArray();
int i=0;
for(char ch : chars) {
if(!dup.get(ch)) {
dup.set(ch, true);
chars[i++] = ch;
}
}
return new String(chars, 0, i);
}
Finally just for completeness there's Java-8 Stream API solution which is slower, but probably more expressive:
public static String duplicateStream(String s) {
return s.codePoints().distinct()
.collect(StringBuilder::new, StringBuilder::appendCodePoint,
StringBuilder::append).toString();
}
Note that processing code points is better than processing chars as your method will work fine even for Unicode surrogate pairs.

If it's a really long string your algorithm will spend most of it's time just throwing away characters.
Another approach that could be faster with long strings (like book-long) is to simple go through the alphabet, looking for the first occurrence of each character and store the index at which is found. Once all characters have been found create the new string based on where it was found.
package se.wederbrand.stackoverflow.alphabet;
import java.util.HashMap;
import java.util.Map;
public class Finder {
public static void main(String[] args) {
String target = "some really long string"; // like millions of characters
HashMap<Integer, Character> found = new HashMap<Integer, Character>(25);
for (Character c = 'a'; c <= 'z'; c++) {
int foundAt = target.indexOf(c);
if (foundAt != -1) {
found.put(foundAt, c);
}
}
StringBuffer result = new StringBuffer();
for (Map.Entry<Integer, Character> entry : found.entrySet()) {
result.append(entry.getValue());
}
System.out.println(result.toString());
}
}
Note that on strings where at least one character is missing this will be slow.

Related

How to delete characters at x?

How to delete the characters at x and keep the rest? The output should be "12345678" Deleting every '9' in the position that x is on. X is i*(i+1)/2 so that the number is added to the next number. So every number at 0,1,3,6,10,15,21,28,etc.
public class removeMysteryI {
public static String removeMysteryI(String str) {
String newString = "";
int x=0;
for(int i=0;i<str.length();i++){
int y = (i*(i+1)/2)+1;
if(y<=str.length()){
x=i*(i+1)/2;
newString=str.substring(0, x) + str.substring(x + 1);
}
}
return newString;
}
public static void main(String[] args) {
String str = "9919239456978";
System.out.println(removeMysteryI(str));
}
}

OK, so there are a couple of mistakes in your code. One is easy to fix. The others not so easy.
The easy one first:
newString=str.substring(0, x) + str.substring(x + 1);
OK so that is creating a string with the character at position x removed. The problem is what it is operating on. The str variable is the input parameter. So at the end of the day newString will still only be str with one character removed.
The above actually needs to be operating on the string from the previous loop iterations ... if you are going to remove more than one character.
The next problem arises when you try to solve the first one. When you remove a character from a string, all characters after the removal point are renumbered; e.g. after removing the character at 5, the character at 6 becomes the character at 5, the character at 7 becomes the character at 6, and so on.
So if you are going to remove characters by "snipping" the string, you need to make sure that the indexes for the positions for the "snips" are adjusted for the number of characters you have already removed.
That can be done ... but you need to think about it.
The final problem is efficiency. Each time your current code removes a single character (as above), it is actually copying all remaining characters to a new string. For small strings, that's OK. For really large strings, the repeated copying could have a serious performance impact1.
The solution to this is to use a different approach to removing the characters. Instead of snipping out the characters you want to discard, copy the characters that you want to keep. The StringBuilder class is one way of doing this2. If you are not permitted to use that, then you could do it with an array of char, and an index variable to keep track of your "append" position in the array. Finally, there is a String constructor that can create a String from the relevant part of the char[].
I'll leave it to you to work out the details.
1 - Efficiency could be viewed as beyond the scope of this exercise.
2 - #Horse's answer uses a StringBuilder but in a different way to what I am suggesting. This will also suffer from the repeated copying problem because each deleteCharAt call will copy all characters after the deletion point.

Follow the steps below:
Initialize with builderIndexToDelete = 0
Initialize with counter = 1
Repeat the following till the index is valid:
delete character at builderIndexToDelete
update builderIndexToDelete to counter - 1 (-1 as a character is deleted in every iteration)
increment the counter
public static String deleteNaturalSumIndexes(String str) {
StringBuilder builder = new StringBuilder(str);
int counter = 1;
int builderIndexToDelete = 0;
while (builderIndexToDelete < builder.length()) {
builder.deleteCharAt(builderIndexToDelete);
builderIndexToDelete += (counter - 1);
counter++;
}
return builder.toString();
}
public static void main(String[] args) {
String str = "9919239456978";
System.out.println(deleteNaturalSumIndexes(str));
}
Thank you #dreamcrash and #StephenC
Using #StephenC suggestion to improve performance
public static String deleteNaturalSumIndexes(String str) {
StringBuilder builder = new StringBuilder();
int nextNum = 1;
int indexToDelete = 0;
while (indexToDelete < str.length()) {
// check whether this is a valid range to continue
// handles 0,1 specifically
if (indexToDelete + 1 < indexToDelete + nextNum) {
// min is used to limit the index of last iteration
builder.append(str, indexToDelete + 1, Math.min(indexToDelete + nextNum, str.length()));
}
indexToDelete += nextNum;
nextNum++;
}
return builder.toString();
}
public static void main(String[] args) {
System.out.println(deleteNaturalSumIndexes(""));
System.out.println(deleteNaturalSumIndexes("a"));
System.out.println(deleteNaturalSumIndexes("ab"));
System.out.println(deleteNaturalSumIndexes("abc"));
System.out.println(deleteNaturalSumIndexes("99192394569"));
System.out.println(deleteNaturalSumIndexes("9919239456978"));
}

Removing special character without using Java Matcher and Pattern API

I am trying to write one java program. This program take a string from the user as an input and display the output by removing the special characters in it. And display the each strings in new line
Let's say I have this string Abc#xyz,2016!horrible_just?kidding after reading this string my program should display the output by removing the special characters like
Abc
xyz
2016
horrible
just
kidding
Now I know there are already API available like Matcher and Patterns API in java to do this. But I don't want to use the API since I am a beginner to java so I am just trying to crack the code bit by bit.
This is what I have tried so far. What I have done here is I am taking the string from the user and stored the special characters in an array and doing the comparison till it get the special character. And also storing the new character in StringBuilder class.
Here is my code
import java.util.*;
class StringTokens{
public void display(String string){
StringBuilder stringToken = new StringBuilder();
stringToken.setLength(0);
char[] str = {' ','!',',','?','.','_','#'};
for(int i=0;i<string.length();i++){
for(int j =0;j<str.length;j++){
if((int)string.charAt(i)!=(int)str[j]){
stringToken.append(str[j]);
}
else {
System.out.println(stringToken.toString());
stringToken.setLength(0);
}
}
}
}
public static void main(String[] args){
if(args.length!=1)
System.out.println("Enter only one line string");
else{
StringTokens st = new StringTokens();
st.display(args[0]);
}
}
}
When I run this code I am only getting the special characters, I am not getting the each strings in new line.

One easy way - use a set to hold all invalid characters:
Set<Character> invalidChars = new HashSet<>(Arrays.asList('$', ...));
Then your check boils down to:
if(invaidChars.contains(string.charAt(i)) {
... invalid char
} else {
valid char
}
But of course, that still means: you are re-inventing the wheel. And one does only re-invent the wheel, if one has very good reasons to. One valid reason would be: your assignment is to implement your own solution.
But otherwise: just read about replaceAll. That method does exactly what your current code; and my solution would be doing. But in a straight forward way; that every good java programmer will be able to understand!
So, to match your question: yes, you can implement this yourself. But the next step is to figure the "canonical" solution to the problem. When you learn Java, then you also have to focus on learning how to do things "like everybody else", with least amount of code to solve the problem. That is one of the core beauties of Java: for 99% of all problems, there is already a straight-forward, high-performance, everybody-will-understand solution out there; most often directly in the Java standard libraries themselves! And knowing Java means to know and understand those solutions.
Every C coder can put down 150 lines of low-level array iterating code in Java, too. The true virtue is to know the ways of doing the same thing with 5 or 10 lines!

I can't comment because I don't have the reputation required. Currently you are appending str[j] which represents special character. Instead you should be appending string.charAt(i). Hope that helps.
stringToken.append(str[j]);
should be
stringToken.append(string.charAt(i));

Here is corrected version of your code, but there are better solutions for this problem.
public class StringTokens {
static String specialChars = new String(new char[]{' ', '!', ',', '?', '.', '_', '#'});
public static void main(String[] args) {
if (args.length != 1) {
System.out.println("Enter only one line string");
} else {
display(args[0]);
}
}
public static void display(String string) {
StringBuilder stringToken = new StringBuilder();
stringToken.setLength(0);
for(char c : string.toCharArray()) {
if(!specialChars.contains(String.valueOf(c))) {
stringToken.append(c);
} else {
stringToken.append('\n');
}
}
System.out.println(stringToken);
}
}

public static void main(String[] args) {
String a=",!?#_."; //Add other special characters too
String test="Abc#xyz,2016!horrible_just?kidding"; //Make this as user input
for(char c : test.toCharArray()){
if(a.contains(c+""))
{
System.out.println(); //to avoid printing the special character and to print newline
}
else{
System.out.print(c);
}
}
}

you can run a simple loop and check ascii value of each character. If its something other than A-Z and a-z print newline skip the character and move on. Time complexity will be O(n) + no extra classes used.
String str = "Abc#xyz,2016!horrible_just?kidding";
char charArray[] = str.toCharArray();
boolean flag=true;;
for (int i = 0; i < charArray.length; i++) {
int temp2 = (int) charArray[i];
if (temp2 >= (int) 'A' && temp2 <= (int) 'Z') {
System.out.print(charArray[i]);
flag=true;
} else if (temp2 >= (int) 'a' && temp2 <= (int) 'z') {
System.out.print(charArray[i]);
flag=true;
} else {
if(flag){
System.out.println("");
flag=false;
}
}
}

Improving this code for reversing the string and removing duplicate characters [duplicate]

This question already has answers here:
Reverse a string in Java
(36 answers)
Closed 8 years ago.
I recently attended an interview where I was asked to write a program.
The problem was:
Take a string. "Hammer", for example.
Reverse it and any character should not be repeated.
So, the output will be - "remaH".
This is the solution I gave:
public class Reverse {
public static void main(String[] args) {
String str = "Hammer";
String revStr = "";
for(int i=0; i<=str.length()-1;i++){
if(revStr.indexOf(str.charAt(i))==-1){
revStr = str.charAt(i)+revStr;
}
}
System.out.println(revStr);
}
}
How I can improve the above?

The problem is String is immutable object, and when using operator+ to concat a char with the current result, you actually create a new string.
This results in creating strings of length 1+2+...+n, which gives you total performance of O(n^2) (unless the compiler optimizes this for you).
Using a StringBuilder instead of concatting strings will give you O(n) performance, and with much better constants as well.
Note that a StringBuilder offers an efficient append() implementaiton, so you need to append elements to it, and NOT add them at the head of your StringBuilder.
You should also reconsider usage of indexOf() - if a characters cannot appear twice at all, consider using a Set<Chatacter> to maintain the list of 'used' characters, if it can appear twice, but not one after the other (for example "mam" is valid) - there is really no need for the indexOf() in the first place, just check the last character read.

Here is a solution without using any stringbuilder or intermediary String objects, just treating Strings as arrays of chars; this should be more efficient.
import java.util.Arrays;
public class Reverse {
public static void main(String[] args) {
String str = "Hammer";
String revStr = null;
char [] chars = str.toCharArray();
char [] reversedChars = new char[chars.length];
// copy first char
reversedChars[reversedChars.length - 1] = chars[0];
// process rest
int r = reversedChars.length - 2;
for(int i = 1 ; i < chars.length ; i++ ){
if(chars[i] != chars[i-1]){
reversedChars[r] = chars[i];
r--;
}
}
revStr = new String(Arrays.copyOfRange(reversedChars, r+1, reversedChars.length));
System.out.println(revStr);
}

package com.in.main;
public class Reverse {
public static void main(String[] args) {
String str = "Hammer";
StringBuilder revStr= new StringBuilder("");
for(int i=str.length(); i>=0;i--){
if(revStr.indexOf(str.charAt(i))==-1){
revStr.append(str.charAt(i));
}
}
System.out.println(revStr);
}
}

Faster way to split a string in java then add to an ArrayList

For a school project I was asked to write a simple math parser in Java. The program works fine. So fine that I used NetBeans profiler tool to check the performance of the program. For that I made a loop of 1000 calls to the math parser of the following expression: "1-((x+1)+1)*2", where x was replaced by the current loop count. It took 262ms. The thing is, it took 50% of the time in the method splitFormula, which I shall present below:
private static void splitFormula(String formula){
partialFormula=new ArrayList<>();
for(String temp: formula.split("\\+|\\-|\\*|\\/"))
partialFormula.add(temp);
}
, where partialFormula is an ArrayList of Strings. To numerically evaluate an expression I need to call the splitFormula method various times so I really need to clear the contents of the partialFormula ArrayList - first line.
My question is: is there a faster way to split a string then add the partial strings to the an arraylist? Or is there some other method that can be used to split a string then use the substrings?

Regular expressions can slow things down (String#split uses regex). In general, if you want to write easy code, regex is good, but if you want fast code, see if there is another way. Try doing this without regex:
Edit: This should be a better method (keep track of the indices instead of append to a StringBuilder):
private static void splitFormula(String formula){
partialFormula.clear(); // since there is a method for this, why not use it?
int lastIndex = 0;
for (int index = 0; index < formula.length(); index++) {
char c = formula.charAt(index);
if (c == '-' || c == '+' || c == '*' || c == '/') {
partialFormula.add(formula.substring(lastIndex, index));
lastIndex = index + 1; //because if it were index, it would include the operator
}
}
partialFormula.add(formula.substring(lastIndex));
}
StringBuilder approach:
private static void splitFormula(String formula){
partialFormula.clear();
StringBuilder newStr = new StringBuilder();
for (int index = 0; index < formula.length(); index++) {
char c = formula.charAt(index);
if (c == '-' || c == '+' || c == '*' || c == '/') {
partialFormula.add(newStr.toString());
newStr.setLength(0);
} else {
newStr.append(c);
}
}
partialFormula.add(newStr.toString());
}
If we look at the source code for String#split, it becomes apparent why that is slower (from GrepCode):
public String[] split(String regex, int limit) {
return Pattern.compile(regex).split(this, limit);
}
It compiles a regex every time! Thus, we can see that another way of speeding up the code is to compile our regex first, then use the Pattern#split to split:
//In constructor, or as a static variable.
//This regex is a better form of yours.
Pattern operatorPattern = Pattern.compile("[-*+/]");
...
private static void splitFormula(String formula){
partialFormula.clear();
for(String temp: operatorPattern.split(formula)) {
partialFormula.add(temp);
}
}

You don't need a for loop. split returns an array, and you can create an ArrayList out of the array:
partialFormula = new ArrayList<>(Arrays.asList(formula.split("\\+|\\-|\\*|\\/")));
Whether this is significantly faster or not, I don't know.

Try pre-allocating the ArrayList beforehand so we do not have to pay for reallocation when the list grows. The number 20 below is just a placeholder. Pick a number that is a little bigger than the largest expression you expect.
partialFormula=new ArrayList<String>(20);
See this question for a discussion of what this might gain you.

This will create an arrayList of strings.
String a= "1234+af/d53";
char [] blah=a.toCharArray();
ArrayList<String> list=new ArrayList<String>();
for (int i = 0; i < blah.length; i++) {
list.add(Character.toString(blah[i]));
}

Removing duplicate chars from a string passed as a parameter

I am a little confused how to approach this problem. The userKeyword is passed as a parameter from a previous section of the code. My task is to remove any duplicate chars from the inputted keyword(whatever it is). We have just finished while loops in class so some hints regarding these would be appreciated.
private String removeDuplicates(String userKeyword){
String first = userKeyword;
int i = 0;
while(i < first.length())
{
if (second.indexOf(first.charAt(i)) > -1){
}
i++;
return "";
Here's an update of what I have tried so far - sorry about that.

This is the perfect place to use java.util.Set, a construct which is designed to hold unique elements. By trying to add each word to a set, you can check if you've seen it before, like so:
static String removeDuplicates(final String str)
{
final Set<String> uniqueWords = new HashSet<>();
final String[] words = str.split(" ");
final StringBuilder newSentence = new StringBuilder();
for(int i = 0; i < words.length; i++)
{
if(uniqueWords.add(words[i]))
{
//Word is unique
newSentence.append(words[i]);
if((i + 1) < words.length)
{
//Add the space back in
newSentence.append(" ");
}
}
}
return newSentence.toString();
}
public static void main(String[] args)
{
final String str = "Words words words I love words words WORDS!";
System.out.println(removeDuplicates(str)); //Words words I love WORDS!
}

Have a look at this answer.
You might not understand this, but it does the job (it cleverly uses a HashSet that doesn't allow duplicate values).
I think your teacher might be looking for a solution using loops however - take a look at William Morisson's answer for this.
Good luck!

For future reference, StackOverflow normally requires you to post what you have, and ask for suggestions for improvement.
As its not an active day, and I am bored I've done this for you. This code is pretty efficient and makes use of no advanced data structures. I did this so you could more easily understand it.
Please do try to understand what I'm doing. Learning is what StackOverflow is for.
I've added comments in the code to assist you in learning.
private String removeDuplicates(String keyword){
//stores whether a character has been encountered before
//a hashset would likely use less memory.
boolean[] usedValues = new boolean[Character.MAX_VALUE];
//Look into using a StringBuilder. Using += operator with strings
//is potentially wasteful.
String output = "";
//looping over every character in the keyword...
for(int i=0; i<keyword.length(); i++){
char charAt = keyword.charAt(i);
//characters are just numbers. if the value in usedValues array
//is true for this char's number, we've seen this char.
boolean shouldRemove = usedValues[charAt];
if(!shouldRemove){
output += charAt;
//now this character has been used in output. Mark that in
//usedValues array
usedValues[charAt] = true;
}
}
return output;
}
Example:
//output will be the alphabet.
System.out.println(removeDuplicates(
"aaaabcdefghijklmnopqrssssssstuvwxyyyyxyyyz"));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.