Is there a way using Java String.split(regexp) to split on strings inside of quotes, and not get the quotes?
The strings I have to deal with are like the following. I don't have control of the format and the number of strings are variable:
"strA" : "strB" : "strC" : "strD",
"strE" : "strF" : "strG",
Note: The spaces are included, and each line is handled separately.
So what I would like to get is an array with all strings.
I could use replaceAll to strip the quotes, spaces and commas, then split on the colon:
line = line.replaceAll(/(\"|,\\s+)/,"");
usrArray = line.split(":");
But I'd like to do this with one regexp.
This should do the trick.
usrArray = line.split("(\" : \")|(\",?)");
This looks first for " : ". If it doesnt find that it will look for the edge cases, " and ",. If you need it to also search for newlines, use this regex.
usrArray = line.split("(\" : \")|(\",?\n?)");
Related
I have a CSV file in which the values are like this:
"12342","red","world"
For processing my code which is in java, I want the double quotes to be removed and assign it to a particular variable. Like this:
String number = num.replaceAll("^\"|\"$","");
Note:quotes will always be present in starting and in the end of the value.
But the output of number is "12342" instead of 12342. What should I write to replace those double quotes?
Thanks in advance!
String number = num.replaceAll("[^\\p{IsDigit}\\p{IsAlphabetic}.,]", "");
This will work with all Strings regardless they're numbers or just text. The regex replaces everything that's not a Digit nor Alphabetic, so will remove the quotes from the CSV fields. Doesn't remove . or , neither.
Alternative should be, for just the quotes:
String number = num.replace("\"", "");
You'd need the backslash \ to escape the double-quotes.
The below regular expression can also be used.
String number= num.replaceAll("^\"|\"$", "");
I am wondering if I am going about splitting a string on a . the right way? My code is:
String[] fn = filename.split(".");
return fn[0];
I only need the first part of the string, that's why I return the first item. I ask because I noticed in the API that . means any character, so now I'm stuck.
split() accepts a regular expression, so you need to escape . to not consider it as a regex meta character. Here's an example :
String[] fn = filename.split("\\.");
return fn[0];
I see only solutions here but no full explanation of the problem so I decided to post this answer
Problem
You need to know few things about text.split(delim). split method:
accepts as argument regular expression (regex) which describes delimiter on which we want to split,
if delim exists at end of text like in a,b,c,, (where delimiter is ,) split at first will create array like ["a" "b" "c" "" ""] but since in most cases we don't really need these trailing empty strings it also removes them automatically for us. So it creates another array without these trailing empty strings and returns it.
You also need to know that dot . is special character in regex. It represents any character (except line separators but this can be changed with Pattern.DOTALL flag).
So for string like "abc" if we split on "." split method will
create array like ["" "" "" ""],
but since this array contains only empty strings and they all are trailing they will be removed (like shown in previous second point)
which means we will get as result empty array [] (with no elements, not even empty string), so we can't use fn[0] because there is no index 0.
Solution
To solve this problem you simply need to create regex which will represents dot. To do so we need to escape that .. There are few ways to do it, but simplest is probably by using \ (which in String needs to be written as "\\" because \ is also special there and requires another \ to be escaped).
So solution to your problem may look like
String[] fn = filename.split("\\.");
Bonus
You can also use other ways to escape that dot like
using character class split("[.]")
wrapping it in quote split("\\Q.\\E")
using proper Pattern instance with Pattern.LITERAL flag
or simply use split(Pattern.quote(".")) and let regex do escaping for you.
Split uses regular expressions, where '.' is a special character meaning anything. You need to escape it if you actually want it to match the '.' character:
String[] fn = filename.split("\\.");
(one '\' to escape the '.' in the regular expression, and the other to escape the first one in the Java string)
Also I wouldn't suggest returning fn[0] since if you have a file named something.blabla.txt, which is a valid name you won't be returning the actual file name. Instead I think it's better if you use:
int idx = filename.lastIndexOf('.');
return filename.subString(0, idx);
the String#split(String) method uses regular expressions.
In regular expressions, the "." character means "any character".
You can avoid this behavior by either escaping the "."
filename.split("\\.");
or telling the split method to split at at a character class:
filename.split("[.]");
Character classes are collections of characters. You could write
filename.split("[-.;ld7]");
and filename would be split at every "-", ".", ";", "l", "d" or "7". Inside character classes, the "." is not a special character ("metacharacter").
As DOT( . ) is considered as a special character and split method of String expects a regular expression you need to do like this -
String[] fn = filename.split("\\.");
return fn[0];
In java the special characters need to be escaped with a "\" but since "\" is also a special character in Java, you need to escape it again with another "\" !
String str="1.2.3";
String[] cats = str.split(Pattern.quote("."));
Wouldn't it be more efficient to use
filename.substring(0, filename.indexOf("."))
if you only want what's up to the first dot?
Usually its NOT a good idea to unmask it by hand. There is a method in the Pattern class for this task:
java.util.regex
static String quote(String s)
The split must be taking regex as a an argument... Simply change "." to "\\."
The solution that worked for me is the following
String[] fn = filename.split("[.]");
Note: Further care should be taken with this snippet, even after the dot is escaped!
If filename is just the string ".", then fn will still end up to be of 0 length and fn[0] will still throw an exception!
This is, because if the pattern matches at least once, then split will discard all trailing empty strings (thus also the one before the dot!) from the array, leaving an empty array to be returned.
Using ApacheCommons it's simplest:
File file = ...
FilenameUtils.getBaseName(file.getName());
Note, it also extracts a filename from full path.
split takes a regex as argument. So you should pass "\." instead of "." because "." is a metacharacter in regex.
I have a csv file that looks like this:
12,2014-10-09 06:00:00,2014-10-09 06:15:00,"","","","123,456","","9,999","",""
I was able to replace the comma inbetween the digits and all double quoutes using:
String test = rowData.replaceAll("([0-9]),([0-9])","$1$2").replaceAll("\"","");
I'm not sure if this is the best approach to do this (opinion is appreciatted). My problem is I need to remove the first value before the comma also, so basically my output needs to be something like this
Orig: 12,2014-10-09 06:00:00,2014-10-09 06:15:00,"","","","123,456","","9,999","",""
Need: 2014-10-09 06:00:00,2014-10-09 06:15:00,,,,123456,,9999,,
I'm not sure if another regex is needed to do this as I don't know how exactly or use something like lastindex or firstindex to remove the fist value of the comma??? thank you
EDIT: I just noticed I can't use ([0-9]),([0-9]) cause it also remove the comma for the datetime. :(. Proper question is how to replace the csv to remove the:
1. first value
2. quotes
3. comma between the digit and quotes
Try this:
String test = rowData.replaceAll("^[^,]+|,(?!(([^\"]*\"){2})*[^\"]*$|\"(?=,)|(?<=,)\"", "");
There are three alternations that are replaced with blank (ie removed):
everything up to and including the first comma
all commas within quotes (those not followed by an even number of quotes)
all quotes adjacent to (immediately after or before) commas
To match your expected output you can do something like
String str = "12,2014-10-09 06:00:00,2014-10-09 "
+ "06:15:00,\"\",\"\",\"\",\"123,456\",\"\",\"9,999\",\"\",\"\"";
str = str.substring(str.indexOf(',') + 1);
str = str.replaceAll("\"(\\d+),(\\d+)\"", "$1$2").replace("\"", "");
String expected = "2014-10-09 06:00:00,2014-10-09 06:15:00,,,,123456,,9999,,";
System.out.println(str.equals(expected));
Output is
true
Try this
test = test.substring(test.indexOf(",") + 1, test.length());
Reasons this is better than the other guys answer: less overhead, no need for regex for this!
I have a set of Strings like "04/21 01:55 P ", "1", "10/21". I wrote a regex as follows
^([[0-9]{2}/[0-9]{2} [0-9]{2}:[0-9]{2} P|A ]+)
It should accept only the format of Strings like "04/21 01:55 P ". But it is also accepting strings like "1", "10/21"
Could any one let me know where I want wrong.
Replace the surrounding [] by ().
You'll need to change the P|A part too, either by (P|A) or [PA].
You've put everything in one big character class, which is why single digits are being matched as well. You can try something like
^(\d{2}/\d{2} \d{2}:\d{2} (?:P|A) )+
I cant seem to be able to split on a simple regex,
If i have a string [data, data2] and i attempt to split like so: I tried to escape the brackets.
String regex = "\\[,\\]";
String[] notifySplit = notifyWho.split(regex);
The output of looping through notifySplit shows this regex not working
notify: [Everyone, Teachers only]
Any help on what the proper regex is, i am expecting an array like so:
data, data2
where i could possibly ignore these two characters [ ,
First, you don't want to split on the brackets. You just want to exclude them from your end result. So first thing you'll probably want to do is strip those out:
notifyWho = notifyWho.replace("[", "").replace("]", "");
Then you can do a basic split on the comma:
String[] notifySplit = notifyWho.split(",");
I would do it in one line, first removing the square brackets, then splitting:
String[] notifySplit = notifyWho.replaceAll("[[\\]]", "").split(",");