Split string which contains escaped delimiters - java

delimiter is |
escaping character is \
and string is for example "A|B\|C\\|D\\\|E|\\\\F"
I want to get array:
{"A", "B|C\", "D\|E", "\\F"}
So delimiter can be escaped but escaping character can be also escaped. Does somebody know how to parse this in Java ?
Thanks.
Edit:
I created this terribly looking solution. At least It works perfectly and It is possible to define escaping character, delimiter and if empty string should be removed easily.
SOLUTION (Eggyal posted better one, look down):
private List<String> parseString(String string, String delimiter, boolean removeEmpty) {
String escapingChar = "\\";
String escapingCharInRegexp = "\\\\";
boolean begined = false;
List<String> parsed = new ArrayList<String>();
List<Integer> begins = new ArrayList<Integer>();
List<Integer> ends = new ArrayList<Integer>();
List<Integer> delimitersPositions = new ArrayList<Integer>();
List<String> explodedParts = new ArrayList<String>();
int i;
for(i = 0; i < string.length(); i++) {
if( ( string.substring(i, i+1).equals(escapingChar) || string.substring(i, i+1).equals(delimiter) ) && !begined ) {
begins.add(i);
begined = true;
if( i + 1 == string.length() ) {
begined = false;
ends.add(i+1);
}
} else if( ( !string.substring(i, i+1).equals(escapingChar) && !string.substring(i, i+1).equals(delimiter) && begined ) ) {
begined = false;
ends.add(i);
} else if( begined && string.substring(begins.get(begins.size()-1), i).indexOf(delimiter) != -1 ) {
begined = false;
ends.add(i);
begined = true;
begins.add(i);
}
if( ( i + 1 == string.length() && begined ) ) {
begined = false;
ends.add(i+1);
}
}
List<Integer> toRemove = new ArrayList<Integer>();
for( i = 0; i < begins.size(); i++ ) {
if( string.substring(begins.get(i), ends.get(i)).indexOf(delimiter) == -1 ) {
toRemove.add(i);
}
}
for( i = 0; i < toRemove.size(); i++ ) {
begins.remove(toRemove.get(i)-i);
ends.remove(toRemove.get(i)-i);
}
for( i = 0; i < begins.size(); i++ ) {
if( ( ends.get(i) - begins.get(i) ) % 2 != 0 ) {
delimitersPositions.add(ends.get(i)-1);
}
}
for( i = 0; i <= delimitersPositions.size(); i++ ) {
int start = (i == 0) ? 0 : delimitersPositions.get(i-1)+1;
int end = ( i != delimitersPositions.size()) ? delimitersPositions.get(i) : string.length();
if( removeEmpty ) {
if( !string.substring(start, end).equals("") ) {
explodedParts.add(string.substring(start, end));
}
} else {
explodedParts.add(string.substring(start, end));
}
}
for (i = 0; i < explodedParts.size(); i++)
parsed.add(explodedParts.get(i).replaceAll(escapingCharInRegexp+"(.)", "$1"));
return parsed;
}

static final char ESCAPING_CHAR = '\\';
private List<String> parseString(final String str,
final char delimiter,
final boolean removeEmpty)
throws IOException
{
final Reader input = new StringReader(str);
final StringBuilder part = new StringBuilder();
final List<String> result = new ArrayList<String>();
int c;
do {
c = input.read(); // get the next character
if (c != delimiter) { // so long as it isn't a delimiter...
if (c == ESCAPING_CHAR) // if it's an escape
c = input.read(); // use the following character instead
if (c >= 0) { // only if NOT at end of string...
part.append((char) c); // append to current part
continue; // move on to next character
}
}
/* we're at either a real delimiter, or end of string => part complete */
if (part.length() > 0 || !removeEmpty) { // keep this part?
result.add(part.toString()); // add current part to result
part.setLength(0); // reset for next part
}
} while (c >= 0); // repeat until end of string found
return result;
}

Because you are both splitting and unescaping, you need a separate step for each process:
String[] terms = input.split("(?<=[^\\\\]|[^\\\\]\\\\\\\\)\\|");
for (int i = 0; i < terms.length; i++)
terms[i] = terms[i].replaceAll("\\\\(.)", "$1");
Here's some test code:
public static void main(String[] args) {
String input = "A|B\\|C\\\\|D\\\\\\|E|\\\\\\\\F";
String[] terms = input.split("(?<=[^\\\\]|[^\\\\]\\\\\\\\)\\|");
for (int i = 0; i < terms.length; i++)
terms[i] = terms[i].replaceAll("\\\\(.)", "$1");
System.out.println(input);
System.out.println(Arrays.toString(terms));
}
Output:
A|B\|C\\|D\\\|E|\\\\F
[A, B|C\, D\|E, \\F]

There is no escape sequence in java like you've mentioned "\|".
It'll cause compile time error.

Related

Java - Compressed String

Given a string, I want to compress the string based on each character's number of consecutive occurrences next to it. For example, let's say we have a string like "abaasass". 'a' occurs one time, 'b' occurs one time, 'a' occurs two times consecutively, 's' occurs one time, 'a' occurs one time, and 's' occurs two times consecutively. The method should then return a string like "aba2sas2".
This is what I have so far:
public static String compressedString(String message) {
StringBuilder compressedString = new StringBuilder();
int total = 0;
for (int i = 0; i < message.length() - 1; i++){
if (message.charAt(i) == message.charAt(i+1)){
total += 2;
compressedString.append(message.charAt(i)).append(total);
}
else {
compressedString.append(message.charAt(i));
}
total = 0;
}
return compressedString.toString();
}
It instead returns: "aba2asas2" which is somewhat close, anyone sees the issue?
public static String compressedString(String message) {
StringBuilder compressedString = new StringBuilder();
int total = 1;
for (int i = 0; i < message.length() - 1; i++){
if (message.charAt(i) == message.charAt(i+1)){
total++;
}
else if(total==1){
compressedString.append(message.charAt(i));
}
else
{
compressedString.append(message.charAt(i)).append(total);
total = 1;
}
}
if(message.charAt(message.length()-2) != message.charAt(message.length()-1)
compressedString.append(message.charAt(message.length()-1));
else
compressedString.append(message.charAt(message.length()-1)).append(total);
return compressedString.toString();
}
public static String compressedString(String message)
{
String result = "" ;
for ( int i = 0, t = message.length() - 1 ; i < t ; )
{
String letter = String.valueOf( message.charAt(i) ) ;
int currentChain = consec( i, message ) ;
result += ( currentChain > 1 ? ( letter + currentChain ) : letter ) ;
i += currentChain ;
}
return result ;
}
private static int consec( int startIndex, String text )
{
int chain = 1 ;
for( int i = startIndex ; i < text.length() - 1 ; ++i )
{
if( text.charAt(i) == text.charAt(i+1) )
chain++ ;
else
break ;
}
return chain ;
}
This is your solution for your question
static void compressedString(String str) {
int n = str.length();
for (int i = 0; i < n; i++) {
// Count occurrences of current character
int count = 1;
while (i < n - 1 && str.charAt(i) == str.charAt(i + 1)) {
count++;
i++;
}
if (count == 1) {
System.out.print(str.charAt(i));
} else {
System.out.print(str.charAt(i));
System.out.print(count);
}
}
}
public static void main(String[] args) {
String str = "abaasass";
compressedString(str);
}

Replace each “#” with an “X” or an “O” iteratively

I've been asked to generate all possible combinations of a row where the hidden # squares can be either X or O. I did it recursively but now I have to do an iterative version.
I tried replacing UnHide(strChar, i+1) with strChar = strChar.substring(0, i+1), but that doesn't work.
public static void main(String[] args) {
String str = new String("XOXX#OO#XO");
UnHide(str, 0);
}
public static void UnHide(String str, int i) {
char[] charArr = str.toCharArray();
String strChar = new String(charArr);
if (i == charArr.length) {
System.out.println(charArr);
return;
}
//Replace masked "#" at each specified index by O or X
if (charArr[i] == '#') {
for (int j = 0; j < 2; j++) {
//Replace masked "#" by O
if (j == 0) {
charArr[i] = 'O';
strChar = String.copyValueOf(charArr);
UnHide(strChar, i + 1); //Call UnHide with an incremented index
strChar = strChar.substring(0, i + 1);
charArr[i] = '#';
}
//Replace masked "#" by X
else {
charArr[i] = 'X';
strChar = String.copyValueOf(charArr);
UnHide(strChar, i + 1);
charArr[i] = '#';
}
}
return;
}
UnHide(strChar, i + 1);
}
I am not sure where your code goes wrong, but you can try the following:
private static final char toReplace = '#';
private static final Set<Character> replacements = new HashSet<>(Arrays.asList('X', 'O'));
private static Set<String> UnHide(String s) {
Set<String> result = new HashSet<>();
result.add("");
for (char c : s.toCharArray()){
Set<String> updatedResult = new HashSet<>();
for (String temp : result) {
if (toReplace == c) {
for (Character replacement : replacements) {
updatedResult.add(temp + replacement);
}
} else {
updatedResult.add(temp + c);
}
}
result = updatedResult;
}
return result;
}
Then calling:
String str = "XOXX#OO#XO";
System.out.println(UnHide(str));
outputs:
[XOXXOOOXXO, XOXXXOOOXO, XOXXOOOOXO, XOXXXOOXXO]

remove unwanted consecutive char set if in argument(String) and return the filtered argument, else return original argument string

the code below throws the error " 'i' cannot be resolved to a variable ", any explanations please??
static String abc(String str) {
String[] sarr = str.split("");
String[] newsarr = new String[sarr.length-3];
String s = "";
for (int i = 1; i < sarr.length; i++); {
if ((sarr[i] == "a") && (sarr.length >= i+3)) {
if ((sarr[i+1] == "b") && (sarr[i+2] == "c")) {
newsarr[0] = sarr[0];
for (int x = 1; x < i; x++) {
newsarr[i] = sarr[i];
}
for (int y = i+3; y < newsarr.length; y++) {
newsarr[y-3] = sarr[y];
}
} else {}
} else {}
}
for (String o : newsarr) {
s += o;
}
return s;
}
Removing the semicolon on the first for loop will work wonders.
The semicolon ends the for statement and the scope of i.

Creating all variations based on the differences of two Strings

I do have a function waiting two Strings. I would like to return with a list of words containing all of the possible variations, which can be created based on the differences.
getAllVersions('cso','cső'); //--> [cso, cső]
getAllVersions('eges','igis'); //--> [eges, igis, egis, iges]
So far I have created the function which counts the differences, and saves their locations. Do you have any idea how to continue?
public ArrayList<String> getAllVersions(String q, String qW) {
int differences = 0;
ArrayList<Integer> locations = new ArrayList<>();
ArrayList<String> toReturn = new ArrayList<>();
for (int i = 0; i < q.length(); i++) {
if (q.charAt(i) != q.charAt(i)) {
differences++;
locations.add(i);
}
}
toReturn.add(q);
toReturn.add(qW);
for (int i = 0; i < q.length(); i++) {
for (int j = 0; j < q.length(); j++) {
}
}
return toReturn;
}
}
Here is a recursive solution
Time Complexity : O(n)
public List<String> allVariants(String x, String y) {
if ((x == null || x.isEmpty()) && (y == null || y.isEmpty())) {
return new ArrayList<String>();
}
List<String> l = new ArrayList<String>();
if (x == null || x.isEmpty()) {
l.add(y);
return l;
}
if (y == null || y.isEmpty()) {
l.add(x);
return l;
}
char xc = x.charAt(0);
char yc = y.charAt(0);
List<String> next = allVariants(x.substring(1), y.substring(1));
if (next.isEmpty()) {
l.add(xc + "");
if (xc != yc) {
l.add(yc + "");
}
} else {
for (String e : next) {
l.add(xc + e);
if (xc != yc) {
l.add(yc + e);
}
}
}
return l;
}
Test Code:
public static void main(String[] args) {
List<String> l = new Test().allVariants("igis", "eges");
for (String x : l) {
System.out.println(x);
}
}
Output:
igis
egis
iges
eges
for (int i = 0; i < q.length(); i++) //as before, but a little simplified...
if (q.charAt(i) != q.charAt(i))
locations.add(i);
//Now we're going to create those variations.
toReturn.add(q); //Start with the first string
for each location we found
Additions = a new empty list of Strings
for each element in toReturn
create a new String which is a copy of that element
alter its location-th character to match the corresponding char in qW
append it to Additions
append Additions to toReturn
When this is done, toReturn should start with q and end with qW, and have all variations between.

finding a supersequence of DNA Java

I am struggling with a "find supersequence" algorithm.
The input is for set of strings
String A = "caagccacctacatca";
String B = "cgagccatccgtaaagttg";
String C = "agaacctgctaaatgctaga";
the result would be properly aligned set of strings (and next step should be merge)
String E = "ca ag cca cc ta cat c a";
String F = "c gag ccat ccgtaaa g tt g";
String G = " aga acc tgc taaatgc t a ga";
Thank you for any advice (I am sitting on this task for more than a day)
after merge the superstring would be
cagagaccatgccgtaaatgcattacga
The definition of supersequence in "this case" would be something like
The string R is contained in supersequence S if and only if all characters in a string R are present in supersequence S in the order in which they occur in the input sequence R.
The "solution" i tried (and again its the wrong way of doing it) is:
public class Solution4
{
static boolean[][] map = null;
static int size = 0;
public static void main(String[] args)
{
String A = "caagccacctacatca";
String B = "cgagccatccgtaaagttg";
String C = "agaacctgctaaatgctaga";
Stack data = new Stack();
data.push(A);
data.push(B);
data.push(C);
Stack clone1 = data.clone();
Stack clone2 = data.clone();
int length = 26;
size = max_size(data);
System.out.println(size+" "+length);
map = new boolean[26][size];
char[] result = new char[size];
HashSet<String> chunks = new HashSet<String>();
while(!clone1.isEmpty())
{
String a = clone1.pop();
char[] residue = make_residue(a);
System.out.println("---");
System.out.println("OLD : "+a);
System.out.println("RESIDUE : "+String.valueOf(residue));
String[] r = String.valueOf(residue).split(" ");
for(int i=0; i<r.length; i++)
{
if(r[i].equals(" ")) continue;
//chunks.add(spaces.substring(0,i)+r[i]);
chunks.add(r[i]);
}
}
for(String chunk : chunks)
{
System.out.println("CHUNK : "+chunk);
}
}
static char[] make_residue(String candidate)
{
char[] result = new char[size];
for(int i=0; i<candidate.length(); i++)
{
int pos = find_position_for(candidate.charAt(i),i);
for(int j=i; j<pos; j++) result[j]=' ';
if(pos==-1) result[candidate.length()-1] = candidate.charAt(i);
else result[pos] = candidate.charAt(i);
}
return result;
}
static int find_position_for(char character, int offset)
{
character-=((int)'a');
for(int i=offset; i<size; i++)
{
// System.out.println("checking "+String.valueOf((char)(character+((int)'a')))+" at "+i);
if(!map[character][i])
{
map[character][i]=true;
return i;
}
}
return -1;
}
static String move_right(String a, int from)
{
return a.substring(0, from)+" "+a.substring(from);
}
static boolean taken(int character, int position)
{ return map[character][position]; }
static void take(char character, int position)
{
//System.out.println("taking "+String.valueOf(character)+" at "+position+" (char_index-"+(character-((int)'a'))+")");
map[character-((int)'a')][position]=true;
}
static int max_size(Stack stack)
{
int max=0;
while(!stack.isEmpty())
{
String s = stack.pop();
if(s.length()>max) max=s.length();
}
return max;
}
}
Finding any common supersequence is not a difficult task:
In your example possible solution would be something like:
public class SuperSequenceTest {
public static void main(String[] args) {
String A = "caagccacctacatca";
String B = "cgagccatccgtaaagttg";
String C = "agaacctgctaaatgctaga";
int iA = 0;
int iB = 0;
int iC = 0;
char[] a = A.toCharArray();
char[] b = B.toCharArray();
char[] c = C.toCharArray();
StringBuilder sb = new StringBuilder();
while (iA < a.length || iB < b.length || iC < c.length) {
if (iA < a.length && iB < b.length && iC < c.length && (a[iA] == b[iB]) && (a[iA] == c[iC])) {
sb.append(a[iA]);
iA++;
iB++;
iC++;
}
else if (iA < a.length && iB < b.length && a[iA] == b[iB]) {
sb.append(a[iA]);
iA++;
iB++;
}
else if (iA < a.length && iC < c.length && a[iA] == c[iC]) {
sb.append(a[iA]);
iA++;
iC++;
}
else if (iB < b.length && iC < c.length && b[iB] == c[iC]) {
sb.append(b[iB]);
iB++;
iC++;
} else {
if (iC < c.length) {
sb.append(c[iC]);
iC++;
}
else if (iB < b.length) {
sb.append(b[iB]);
iB++;
} else if (iA < a.length) {
sb.append(a[iA]);
iA++;
}
}
}
System.out.println("SUPERSEQUENCE " + sb.toString());
}
}
However the real problem to solve is to find the solution for the known problem of Shortest Common Supersequence http://en.wikipedia.org/wiki/Shortest_common_supersequence,
which is not that easy.
There is a lot of researches which concern the topic.
See for instance:
http://www.csd.uwo.ca/~lila/pdfs/Towards%20a%20DNA%20solution%20to%20the%20Shortest%20Common%20Superstring%20Problem.pdf
http://www.ncbi.nlm.nih.gov/pubmed/14534185
You can try finding the shortest combination like this
static final char[] CHARS = "acgt".toCharArray();
public static void main(String[] ignored) {
String A = "caagccacctacatca";
String B = "cgagccatccgtaaagttg";
String C = "agaacctgctaaatgctaga";
String expected = "cagagaccatgccgtaaatgcattacga";
List<String> ABC = new Combination(A, B, C).findShortest();
System.out.println("expected: " + expected.length());
System.out.println("Merged: " + ABC.get(0).length() + " " + ABC);
}
static class Combination {
int shortest = Integer.MAX_VALUE;
List<String> shortestStr = new ArrayList<>();
char[][] chars;
int[] pos;
int count = 0;
Combination(String... strs) {
chars = new char[strs.length][];
pos = new int[strs.length];
for (int i = 0; i < strs.length; i++) {
chars[i] = strs[i].toCharArray();
}
}
public List<String> findShortest() {
findShortest0(new StringBuilder(), pos);
return shortestStr;
}
private void findShortest0(StringBuilder sb, int[] pos) {
if (allDone(pos)) {
if (sb.length() < shortest) {
shortestStr.clear();
shortest = sb.length();
}
if (sb.length() <= shortest)
shortestStr.add(sb.toString());
count++;
if (++count % 100 == 1)
System.out.println("Searched " + count + " shortest " + shortest);
return;
}
if (sb.length() + maxLeft(pos) > shortest)
return;
int[] pos2 = new int[pos.length];
int i = sb.length();
sb.append(' ');
for (char c : CHARS) {
if (!tryChar(pos, pos2, c)) continue;
sb.setCharAt(i, c);
findShortest0(sb, pos2);
}
sb.setLength(i);
}
private int maxLeft(int[] pos) {
int maxLeft = 0;
for (int i = 0; i < pos.length; i++) {
int left = chars[i].length - pos[i];
if (left > maxLeft)
maxLeft = left;
}
return maxLeft;
}
private boolean allDone(int[] pos) {
for (int i = 0; i < chars.length; i++)
if (pos[i] < chars[i].length)
return false;
return true;
}
private boolean tryChar(int[] pos, int[] pos2, char c) {
boolean matched = false;
for (int i = 0; i < chars.length; i++) {
pos2[i] = pos[i];
if (pos[i] >= chars[i].length) continue;
if (chars[i][pos[i]] == c) {
pos2[i]++;
matched = true;
}
}
return matched;
}
}
prints many solutions which are shorter than the one suggested.
expected: 28
Merged: 27 [acgaagccatccgctaaatgctatcga, acgaagccatccgctaaatgctatgca, acgaagccatccgctaacagtgctaga, acgaagccatccgctaacatgctatga, acgaagccatccgctaacatgcttaga, acgaagccatccgctaacatgtctaga, acgaagccatccgctacaagtgctaga, acgaagccatccgctacaatgctatga, acgaagccatccgctacaatgcttaga, acgaagccatccgctacaatgtctaga, acgaagccatcgcgtaaatgctatcga, acgaagccatcgcgtaaatgctatgca, acgaagccatcgcgtaacagtgctaga, acgaagccatcgcgtaacatgctatga, acgaagccatcgcgtaacatgcttaga, acgaagccatcgcgtaacatgtctaga, acgaagccatcgcgtacaagtgctaga, acgaagccatcgcgtacaatgctatga, acgaagccatcgcgtacaatgcttaga, acgaagccatcgcgtacaatgtctaga, acgaagccatgccgtaaatgctatcga, acgaagccatgccgtaaatgctatgca, acgaagccatgccgtaacagtgctaga, acgaagccatgccgtaacatgctatga, acgaagccatgccgtaacatgcttaga, acgaagccatgccgtaacatgtctaga, acgaagccatgccgtacaagtgctaga, acgaagccatgccgtacaatgctatga, acgaagccatgccgtacaatgcttaga, acgaagccatgccgtacaatgtctaga, cagaagccatccgctaaatgctatcga, cagaagccatccgctaaatgctatgca, cagaagccatccgctaacagtgctaga, cagaagccatccgctaacatgctatga, cagaagccatccgctaacatgcttaga, cagaagccatccgctaacatgtctaga, cagaagccatccgctacaagtgctaga, cagaagccatccgctacaatgctatga, cagaagccatccgctacaatgcttaga, cagaagccatccgctacaatgtctaga, cagaagccatcgcgtaaatgctatcga, cagaagccatcgcgtaaatgctatgca, cagaagccatcgcgtaacagtgctaga, cagaagccatcgcgtaacatgctatga, cagaagccatcgcgtaacatgcttaga, cagaagccatcgcgtaacatgtctaga, cagaagccatcgcgtacaagtgctaga, cagaagccatcgcgtacaatgctatga, cagaagccatcgcgtacaatgcttaga, cagaagccatcgcgtacaatgtctaga, cagaagccatgccgtaaatgctatcga, cagaagccatgccgtaaatgctatgca, cagaagccatgccgtaacagtgctaga, cagaagccatgccgtaacatgctatga, cagaagccatgccgtaacatgcttaga, cagaagccatgccgtaacatgtctaga, cagaagccatgccgtacaagtgctaga, cagaagccatgccgtacaatgctatga, cagaagccatgccgtacaatgcttaga, cagaagccatgccgtacaatgtctaga, cagagaccatccgctaaatgctatcga, cagagaccatccgctaaatgctatgca, cagagaccatccgctaacagtgctaga, cagagaccatccgctaacatgctatga, cagagaccatccgctaacatgcttaga, cagagaccatccgctaacatgtctaga, cagagaccatccgctacaagtgctaga, cagagaccatccgctacaatgctatga, cagagaccatccgctacaatgcttaga, cagagaccatccgctacaatgtctaga, cagagaccatcgcgtaaatgctatcga, cagagaccatcgcgtaaatgctatgca, cagagaccatcgcgtaacagtgctaga, cagagaccatcgcgtaacatgctatga, cagagaccatcgcgtaacatgcttaga, cagagaccatcgcgtaacatgtctaga, cagagaccatcgcgtacaagtgctaga, cagagaccatcgcgtacaatgctatga, cagagaccatcgcgtacaatgcttaga, cagagaccatcgcgtacaatgtctaga, cagagaccatgccgtaaatgctatcga, cagagaccatgccgtaaatgctatgca, cagagaccatgccgtaacagtgctaga, cagagaccatgccgtaacatgctatga, cagagaccatgccgtaacatgcttaga, cagagaccatgccgtaacatgtctaga, cagagaccatgccgtacaagtgctaga, cagagaccatgccgtacaatgctatga, cagagaccatgccgtacaatgcttaga, cagagaccatgccgtacaatgtctaga, cagagccatcctagctaaagtgctaga, cagagccatcctagctaaatgctatga, cagagccatcctagctaaatgcttaga, cagagccatcctagctaaatgtctaga, cagagccatcctgactaaagtgctaga, cagagccatcctgactaaatgctatga, cagagccatcctgactaaatgcttaga, cagagccatcctgactaaatgtctaga, cagagccatcctgctaaatgctatcga, cagagccatcctgctaaatgctatgca, cagagccatcctgctaacagtgctaga, cagagccatcctgctaacatgctatga, cagagccatcctgctaacatgcttaga, cagagccatcctgctaacatgtctaga, cagagccatcctgctacaagtgctaga, cagagccatcctgctacaatgctatga, cagagccatcctgctacaatgcttaga, cagagccatcctgctacaatgtctaga]

Categories

Resources