I have a string that looks like this
"He said, ""What?"""
In the entire file, there's actually more lines like that, separated by commas. The output of that line should look something like this:
He said, "What?!!"
I'm trying to do that by using this method:
Pattern pattern = Pattern.compile("\\s*(\"[^\"]*\"|[^,]*)\\s*");
Matcher matcher = pattern.matcher(line);
while (matcher.find())
{
System.out.println(matcher.group(1));
lines.add(matcher.group(1)); //adds each line to an arraylist
}
However, the output I'm getting is this:
He said,
What?
I'm pretty sure the cause is with my regular expressions since all this does is remove all the double quotes.
Why not just use String#replaceAll
line.replaceAll("\"", "");
It's because your regular expression matches
"He said, "
then
"What?"
then
""
It seems like what you actually want is to remove one level of double-quotes. To do that, you need to use lookaround assertions:
Pattern pattern = Pattern.compile("\\s*\"(?!\")[^\"]*(?<!\")\"\\s*");
The process of forming quoted string is:
Escape (double) the double quotes in the string
Surround the resulting string with double quotes
The code below just reverses this process:
It first removes the outer double quotes, then un-escapes the inner double quotes, and then splits:
public static void main(String[] args) {
String input = "\"He said, \"\"What?\"\"\"";
String[] out = input.replaceAll("^(\")|(\")$", "").replace("\"\"", "\"").split(", ");
for (String o : out) {
System.out.println(o);
}
}
Output:
He said
"What?"
Related
I have following string with delimiter *
String temp=""Test1*Test2"*Test3*Test4";
require like this:
"Test1*Test2"
Test3
Test4
split(\\*) is not working it has given result like this:
"Test1
Test2"
Test3
Test4
Can you please suggest which time of delimiter should i used to split the string as required.
The split() method is great when it’s easy to write a regular expression to match the delimiters.
For example you can easily split a string along commas: String.split(",");.
But the method is terrible when the delimiters can occur in the split content.
A common job is to split a string along commas, except when those commas appear in double quotes.
Such a string might be a line in a CSV file.
In such cases, it is much easier to write a regex that matches the content you want to keep in the array,
and use Matcher.find() instead of String.split().
public static void main(String[] args) {
String regex = "\"[^\"]*\"|[^\\*]+";
String temp = "\"Test1*Test2\"*Test3*Test4";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(temp);
while(m.find()){
System.out.println(m.group());
}
}
The regex matches a pair of double quotes with anything except double quotes between them, or a series of characters that don’t include an asterisk (*).
String[] csvRawData = line.split(delimiter);
for(int i = 0; i < csvRawData.length; i++) {
if(csvRawData[i].startsWith("\"")) {
if(csvRawData[i+1].endsWith("\"")) {
csvRawData[i] = csvRawData[i] + "*" + csvRawData[i+1];
csvRawData = (String[]) ArrayUtils.remove(csvRawData, 1);
}
}
}
I have these two variations of this string
name='Anything can go here'
name="Anything can go here"
where name= can have spaces like so
name=(text)
name =(text)
name = (text)
I need to extract the text between the quotes, I'm not sure what's the best way to approach this, should I just have mechanism to cut the string off at quotes and do you have an example where I wont have many case handling, or should I use regex.
I'm not sure I understand the question exactly but I'll give it my best shot:
If you want to just assign a variable name2 to the string inside the quotation marks then you can easily do :
String name = 'Anything can go here';
String name2= name.replace("'","");
name2 = name2.replace("\"","");
You're wanting to get Anything can go here whether it's in between single quotes or double quotes. Regex has the capabilities of doing this regardless of the spaces before or after the "=" by using the following pattern:
"[\"'](.+)[\"']"
Breakdown:
[\"'] - Character class consisting of a double or single quote
(.+) - One or more of any character (may or may not match line terminators stored in capture group 1
[\"'] - Character class consisting of a double or single quote
In short, we are trying to capture anything between single or double quotes.
Example:
public static void main(String[] args) {
List<String> data = new ArrayList(Arrays.asList(
"name='Anything can go here'",
"name = \"Really! Anything can go here\""
));
for (String d : data) {
Matcher matcher = Pattern.compile("[\"'](.+)[\"']").matcher(d);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
}
Results:
Anything can go here
Really! Anything can go here
I am trying to preserve all the sentences between double quotes and put them in the array results[]
for example I can have the following code
public static void main (String[] args){
int i = 0 ;
System.out.println ( "words to be printed" );
}
In this example array results should have one string "words to be printed"
The technique I am using is splitting on the new line (\n) and checking if each String contains a double quotations and put it in results
I used "your string here".split("\"")[1] for extracting the text in between the quotations
The problem is that some Strings have quotations and some don't.
I tried:
if("your \"string\" here".split("\"")[1]) -> but this gives an exception if there is no quotation in the string
How can I check if the String has quotations or not?
This is an appropriate time to use regular expressions to match everything between the ". So a line like this
"myWord" and somewhere else "myOther words"
Should output
myWord
myOther words
Example code for paren matching:
Pattern pattern = Pattern.compile("\"(.*?)\"");
for (String line: myLines){
Matcher matcher = pattern.matcher(line);
while (matcher.find()){
System.out.println("found match '"+matcher.group(1)+"'");
}
}
If you only want to match a single line ignore the for loop, and just match against one input.
Use MyString.contains("\"") to check the presence of double quotes.
If exists you use split like you said.
If don't exists make yourString = "\""+youtString; and use split after that
If your string has two double quotes, then split("\"") will split in three pieces. So you can make a check like this (if expected not more then one double quote pair):
String[] s = input.split( "\"" );
if( s.length > 2 )
System.out.println( s[ 1 ] );
This is how you can check to see if string has quotation
if (yourText.contains("\"")){
//do something
}
Instead of splitting by \n use(If you are using java 1.7)
String newLine = System.getProperty("line.separator");
then use "your string here".split(newLine). Hope this help.
i have a problem to build following regex:
[1,2,3,4]
i found a work-around, but i think its ugly
String stringIds = "[1,2,3,4]";
stringIds = stringIds.replaceAll("\\[", "");
stringIds = stringIds.replaceAll("\\]", "");
String[] ids = stringIds.split("\\,");
Can someone help me please to build one regex, which i can use in the split function
Thanks for help
edit:
i want to get from this string "[1,2,3,4]" to an array with 4 entries. the entries are the 4 numbers in the string, so i need to eliminate "[","]" and ",". the "," isn't the problem.
the first and last number contains [ or ]. so i needed the fix with replaceAll. But i think if i use in split a regex for ",", i also can pass a regex which eliminates "[" "]" too. But i cant figure out, who this regex should look like.
This is almost what you're looking for:
String q = "[1,2,3,4]";
String[] x = q.split("\\[|\\]|,");
The problem is that it produces an extra element at the beginning of the array due to the leading open bracket. You may not be able to do what you want with a single regex sans shenanigans. If you know the string always begins with an open bracket, you can remove it first.
The regex itself means "(split on) any open bracket, OR any closed bracket, OR any comma."
Punctuation characters frequently have additional meanings in regular expressions. The double leading backslashes... ugh, the first backslash tells the Java String parser that the next backslash is not a special character (example: \n is a newline...) so \\ means "I want an honest to God backslash". The next backslash tells the regexp engine that the next character ([ for example) is not a special regexp character. That makes me lol.
Maybe substring [ and ] from beginning and end, then split the rest by ,
String stringIds = "[1,2,3,4]";
String[] ids = stringIds.substring(1,stringIds.length()-1).split(",");
Looks to me like you're trying to make an array (not sure where you got 'regex' from; that means something different). In this case, you want:
String[] ids = {"1","2","3","4"};
If it's specifically an array of integer numbers you want, then instead use:
int[] ids = {1,2,3,4};
Your problem is not amenable to splitting by delimiter. It is much safer and more general to split by matching the integers themselves:
static String[] nums(String in) {
final Matcher m = Pattern.compile("\\d+").matcher(in);
final List<String> l = new ArrayList<String>();
while (m.find()) l.add(m.group());
return l.toArray(new String[l.size()]);
}
public static void main(String args[]) {
System.out.println(Arrays.toString(nums("[1, 2, 3, 4]")));
}
If the first line your code is following:
String stringIds = "[1,2,3,4]";
and you're trying to iterate over all number items, then the follwing code-frag only could work:
try {
Pattern regex = Pattern.compile("\\b(\\d+)\\b", Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
I'm trying to perform some super simple parsing o log files, so I'm using String.split method like this:
String [] parts = input.split(",");
And works great for input like:
a,b,c
Or
type=simple, output=Hello, repeat=true
Just to say something.
How can I escape the comma, so it doesn't match intermediate commas?
For instance, if I want to include a comma in one of the parts:
type=simple, output=Hello, world, repeate=true
I was thinking in something like:
type=simple, output=Hello\, world, repeate=true
But I don't know how to create the split to avoid matching the comma.
I've tried:
String [] parts = input.split("[^\,],");
But, well, is not working.
You can solve it using a negative look behind.
String[] parts = str.split("(?<!\\\\), ");
Basically it says, split on each ", " that is not preceeded by a backslash.
String str = "type=simple, output=Hello\\, world, repeate=true";
String[] parts = str.split("(?<!\\\\), ");
for (String s : parts)
System.out.println(s);
Output:
type=simple
output=Hello\, world
repeate=true
(ideone.com link)
If you happen to be stuck with the non-escaped comma-separated values, you could do the following (similar) hack:
String[] parts = str.split(", (?=\\w+=)");
Which says split on each ", " which is followed by some word-characters and an =
(ideone.com link)
I'm afraid, there's no perfect solution for String.split. Using a matcher for the three parts would work. In case the number of parts is not constant, I'd recommend a loop with matcher.find. Something like this maybe
final String s = "type=simple, output=Hello, world, repeat=true";
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,|$)");
final Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group(1));
You'll probably want to skip the spaces after the comma as well:
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,\\s*|$)");
It's not really complicated, just note that you need four backslashes in order to match one.
Escaping works with the opposite of aioobe's answer (updated: aioobe now uses the same construct but I didn't know that when I wrote this), negative lookbehind
final String s = "type=simple, output=Hello\\, world, repeate=true";
final String[] tokens = s.split("(?<!\\\\),\\s*");
for(final String item : tokens){
System.out.println("'" + item.replace("\\,", ",") + "'");
}
Output:
'type=simple'
'output=Hello, world'
'repeate=true'
Reference:
Pattern: Special Constructs
I think
input.split("[^\\\\],");
should work. It will split at all commas that are not preceeded with a backslash.
BTW if you are working with Eclipse, I can recommend the QuickRex Plugin to test and debug Regexes.