Split string with alternative comma (,) - java

I know how to tokenize the String, but the Problem is I want to tokenize the as shown below.
String st = "'test1, test2','test3, test4'";
What I've tried is as below:
st.split(",");
This is giving me output as:
'test1
test2'
'test3
test4'
But I want output as:
'test1, test2'
'test3, test4'
How do i do this?

Since single quotes are not mandatory, split will not work, because Java's regex engine does not allow variable-length lookbehind expressions. Here is a simple solution that uses regex to match the content, not the delimiters:
String st = "'test1, test2','test3, test4',test5,'test6, test7',test8";
Pattern p = Pattern.compile("('[^']*'|[^,]*)(?:,?)");
Matcher m = p.matcher(st);
while (m.find()) {
System.out.println(m.group(1));
}
Demo on ideone.
You can add syntax for escaping single quotes by altering the "content" portion of the quoted substring (currently, it's [^']*, meaning "anything except a single quote repeated zero or more times).

The easiest and reliable solution would be to use a CSV parser. Maybe Commons CSV would help.
It will scape the strings based on CSV rules. So even '' could be used within the value without breaking it.
A sample code would be like:
ByteArrayInputStream baos = new ByteArrayInputStream("'test1, test2','test3, test4'".getBytes());
CSVReader reader = new CSVReader(new InputStreamReader(baos), ',', '\'');
String[] read = reader.readNext();
System.out.println("0: " + read[0]);
System.out.println("1: " + read[1]);
reader.close();
This would print:
0: test1, test2
1: test3, test4
If you use maven you can just import the dependency:
<dependency>
<groupId>net.sf.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>2.0</version>
</dependency>
And start using it.

Related

Replace single quote with double quote with Regex

I have an app that received a malformed JSON string like this:
{'username' : 'xirby'}
I need to replaced the single quotes ' with double quoates "
With these rule (I think):
A single quote comes after a { with one or more spaces
Comes before one or more spaces and :
Comes after a : with one more spaces
Comes before one or more spaces and }
So this String {'username' : 'xirby'} or
{ 'username' : 'xirby' }
Would be transformed to:
{"username" : "xirby"}
Update:
Also a possible malformed JSON String:
{ 'message' : 'there's not much to say' }
In this example the single quote inside the message value should not be replaced.
Try this regex:
\s*\'\s*
and a call to Replace with " will do the job. Look at here.
Instead of doing this, you're better off using a JSON parser which can read such malformed JSON and "normalize" it for you. Jackson can do that:
final ObjectReader reader = new ObjectMapper()
.configure(Feature.ALLOW_SINGLE_QUOTES, true)
.reader();
final JsonNode node = reader.readTree(yourMalformedJson);
// node.toString() does the right thing
This regex will capture all appropriate single quotes and associated white spaces while ignoring single quotes inside a message. One can replace the captured characters with double quotes, while preserving the JSON format. It also generalizes to JSON strings with multiple messages (delimited by commas ,).
((?<={)\s*\'|(?<=,)\s*\'|\'\s*(?=:)|(?<=:)\s*\'|\'\s*(?=,)|\'\s*(?=}))
I know you tagged your question for java, but I'm more familiar with python. Here's an example of how you can replace the single quotes with double quotes in python:
import re
regex = re.compile('((?<={)\s*\'|(?<=,)\s*\'|\'\s*(?=:)|(?<=:)\s*\'|\'\s*(?=,)|\'\s*(?=}))')
s = "{ 'first_name' : 'Shaquille' , 'lastname' : 'O'Neal' }"
regex.sub('"', s)
> '{"first_name":"Shaquille","lastname":"O\'Neal"}'
This method looks for single quotes next to the symbols {},: using look-ahead and look-behind operations.
String test = "{'username' : 'xirby'}";
String replaced = test.replaceAll("'", "\"");
Concerning your question's tag is JAVA, I answered in JAVA.
At first import the libraries:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Then:
Pattern p = Pattern.compile("((?<=(\\{|\\[|\\,|:))\\s*')|('\\s*(?=(\\}|(\\])|(\\,|:))))");
String s = "{ 'firstName' : 'Malus' , 'lastName' : ' Ms'Malus' , marks:[ ' A+ ', 'B+']}";
String replace = "\"";
String o;
Matcher m = p.matcher(s);
o = m.replaceAll(replace);
System.out.println(o);
Output:
{"firstName":"Malus","lastName":" Ms'Malus", marks:[" A+ ","B+"]}
If you're looking to exactly satisfy all of those conditions, try this:
'{(\s)?\'(.*)\'(\s)?:(\s)?\'(.*)\'(\s)?}'
as you regex. It uses (\s)? to match one or zero whitespace characters.
I recommend you to use a JSON parser instead of REGEX.
String strJson = "{ 'username' : 'xirby' }";
strJson = new JSONObject(strJson).toString();
System.out.println(strJson);

regex or string parsing

I am trying to parse a string which has a specific pattern. An example valid string is as follows:
<STX><DATA><ETX>
<STX>A?123<ETX>
<STX><DATA><ETX>
<STX>name!xyz<ETX>
<STX>age!27y<ETX>
<STX></DATA><ETX>
<STX>A?234<ETX>
<STX><DATA><ETX>
<STX>name!abc<ETX>
<STX>age!24y<ETX>
<STX></DATA><ETX>
<STX>A?345<ETX>
<STX><DATA><ETX>
<STX>name!bac<ETX>
<STX>age!22y<ETX>
<STX></DATA><ETX>
<STX>OK<ETX>
<STX></DATA><ETX>
this data is sent by device. All I need is to parse this string with id:123 name:xyz, age 27y.
I am trying to use this regex:
final Pattern regex = Pattern.compile("(.*?)", Pattern.DOTALL);
this does output the required data :
<ETX>
<STX>A?123<ETX>
<STX><DATA><ETX>
<STX>name!xyz<ETX>
<STX>age!27y<ETX>
<STX>
How can I loop the string recursively to copy all into list of string.
I am trying to loop over and delete the extracted pattern but it doesn't delete.
final Pattern regex = Pattern.compile("<DATA>(.*?)</DATA>", Pattern.DOTALL);// Q?(.*?)
final StringBuffer buff = new StringBuffer(frame);
final Matcher matcher = regex.matcher(buff);
while (matcher.find())
{
final String dataElements = matcher.group();
System.out.println("Data:" + dataElements);
}
}
Are there any beter ways to do this.
This is the output I am currently getting:
Data:<DATA><ETX><STX>A?123<ETX><STX><DATA><ETX><STX>name!xyz<ETX><STX>age!27y<ETX><STX> </DATA>
Data:<DATA><ETX><STX>name!abc<ETX><STX>age!24y<ETX><STX></DATA>
Data:<DATA><ETX><STX>name!bac<ETX><STX>age!22y<ETX><STX></DATA>
I am missing the A?234 and A?345 in the next two matches.
I really dont know what exactly you want to achieve by this but if you want to remove the occurances of that pattern this line:
buff.toString().replace(dataElements, "")
doesn't look good. you are just editing the string representation of that buff. You have to again replace the edited version back into the buff (after casting).
Using this regex solves my issue:
<STX>(A*)(.*?)<DATA>(.*?)</DATA>

Java CSV import regex help needed

I have used many different regex strings, all of them do the same thing.
One line of my .csv looks like this:
"999","Location","Alt. fare key","Table ID","Address","Line 2","City","State",19111,,,H,,, etc......(there are 139 columns.
As you can see, some of the entries are separated by quotation marks while others are not.
Also, quotation marks or not. Every entry is separated by a comma.
Here are two examples of regex strings that I've used:
String regex = "(?:(?<=\")([^\"]*)(?=\"))|(?<=,|^)([^,]*)(?=,|$)"
Object[] tokens = strLine.split(regex);
model.addRow(tokens);
jTable1.setModel(model);
and
String regex = ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"
Object[] tokens = strLine.split(regex);
model.addRow(tokens);
jTable1.setModel(model);
Both of these do the same thing.
Pretending the |(s) below are the lines of my jTable:
"999"|"Location"|"Alt. fare key"|"Table ID"|"Address"|"Line 2"|"City"|"State"|19111| | |H|
I want it to come out like this:
999|Location|Alt. fare key|Table ID|Address|Line 2|City|State|19111| | |H| etc.....
What else does the regex need to remove the unwanted parenthesis?
Thanks in advance for help.
JB
But does it handle embedded commas? the OpenCSV library will and you just do this (copied form opencsv doc):
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}

Escape special characters in java

I have a text file having | (pipe) as the separator. If I am reading a column and the column itself also contains | then it while separating another column is created.
Example :
name|date|age
zzz|20-03-22|23
"xx|zz"|23-23-33|32
How can I escape the character within the double quotes ""
how to escape the regular expression used in the split, so that it works for user-specified delimiters
i have tried
String[] cols = line.split("\|");
System.out.println("lets see column only=="+cols[1]);
How can I escape the character within the double quotes ""
Here's one approach:
String str = "\"xx|zz\"|23-23-33|32";
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(str);
StringBuffer sb = new StringBuffer();
while (m.find())
m.appendReplacement(sb, m.group().replace("|", "\\\\|"));
m.appendTail(sb);
System.out.println(sb); // prints "xx\|zz"|23-23-33|32
In order to get the columns back you'd do something like this:
String str = "\"xx\\|zz\"|23-23-33|32";
String[] cols = str.split("(?<!\\\\)\\|");
for (String col : cols)
System.out.println(col.replace("\\|", "|"));
Regarding your edit:
how to escape the regular expression used in the split, so that it works for user-specified delimiters
You should use Pattern.quote on the string you want to split on:
String[] cols = line.split(Pattern.quote(delimiter));
This will ensure that the split works as intended even if delimiter contains special regex-symbols such as . or |.
You can use a CSV parser like OpenCSV ou Commons CSV
http://opencsv.sourceforge.net
http://commons.apache.org/sandbox/csv
You can replace it with its unicode sequence (prior to delimiting with pipe)
But what you should do is adjust your parser to take that into account, rather than changing the files.
Here is one way to parse it
String str = "zzz|20-03-22|23 \"xx|zz\"|23-23-33|32";
String regex = "(?<=^|\\|)(([^\"]*?)|([^\"]+\"[^\"]+\".*?))(?=\\||$)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println(m.group());
}
Output:
zzz
20-03-22
23 "xx|zz"
23-23-33
32

Adding html tags with Java based on regex, keeping data in matches

Using java, I am writting a script to anchor link an html bibliography. That is going
from: [1,2]
to: [1, 2]
I think I have found the right regex expression: \[.*?\]
What I am having trouble with is writting the code that will retain the values inside the expression while surounding it with the link tags.
This is the most of I can think of
while(myScanner.hasNext())
{
line = myScanner.nextLine();
myMatcher = myPattern.matcher(line);
...
outputBufferedWritter.write(line+"\n");
}
The files aren't very large and there almost always less then 100 matches, so I don't care about performance.
First of all I think a better pattern to match the [tag] content is [\[\]]* instead of .*? (i.e. anything but opening and closing brackets).
For the replacement, if the URL varies depending on the [tag] content, then you need an explicit Matcher.find() loop combined with appendReplacement/Tail.
Here's an example that sets up a Map<String,String> of the URLs and a Matcher.find() loop for the replacement:
Map<String,String> hrefs = new HashMap<String,String>();
hrefs.put("[1,2]", "one-two");
hrefs.put("[3,4]", "three-four");
hrefs.put("[5,6]", "five-six");
String text = "p [1,2] \nq [3,4] \nr [5,6] \ns";
Matcher m = Pattern.compile("\\[[^\\[\\]]*\\]").matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String section = m.group(0);
String url = String.format("<a href='%s'>%s</a>",
hrefs.get(section),
section
);
m.appendReplacement(sb, url);
}
m.appendTail(sb);
System.out.println(sb.toString());
This prints:
p <a href='one-two'>[1,2]</a>
q <a href='three-four'>[3,4]</a>
r <a href='five-six'>[5,6]</a>
s
Note that appendReplacement/Tail do not have StringBuilder overload, so StringBuffer must be used.
References
java.util.regex.Matcher
Related questions
Difference between .*? and .* for regex
StringBuilder and StringBuffer in Java

Categories

Resources