Extract String(s) from big String selected by special characters - java

Let's say I have a String like this:
Hey man, my name is Jason and I like Pizza. #Pizza #Name #Cliche
My question is how to extract all the strings that start with # and put them to another string together?

Check out this tutorial on regex
Try
Matcher matcher = Pattern.compile("(\\s|^)#(\\S*)").matcher(string);
while(matcher.find()){
System.out.println(matcher.group(2));
}
EDIT:
As you wanted the other strings as well, you may try
Matcher matcher = Pattern.compile("(\\s|^)#(\\S*)|(\\S+)").matcher(string);
StringJoiner hashfull = new StringJoiner(" ");
StringJoiner hashless = new StringJoiner(" ");
while(matcher.find())
if(matcher.group(2) != null)
hashfull.add(matcher.group(2));
else if(matcher.group(3) != null)
hashless.add(matcher.group(3));
System.out.println(hashfull);
System.out.println(hashless);

I found this code working very well for me because I wanted also the rest of the string. Thanks to #Pshemo and #Mormod for helping me with this. Here is the code:
String string = "Hello my name is Jason and I like pizza. #Me #Pizza";
String[] splitedString = string.split(" "); //splits string at spaces
StringBuilder newString = new StringBuilder();
StringBuilder newString2 = new StringBuilder();
for(int i = 0; i<splitedString.length; i++){
if(splitedString[i].startsWith("#")){
newString.append(splitedString[i]);
newString.append(" "); }
else{
newString2.append(splitedString[i]);
newString2.append(" ");
}
}
System.out.println(newString2);
System.out.println(newString);

Maybe you search for something like this:
String string = "ab cd ef"
String[] splitedString = string.split(" "); //splits string at spaces
String newString = "";
for(int i = 0; i<splitedString; i++){
if(splitedString[i].startsWith("#")){
newString += splitedString[i];
}
}
mormod

Related

Read and Split(Parse) data in java

I am trying to split some simple data from a .txt file. I have found some useful structures on the internet but it was not enough to split the data the way I wanted. I get a string like this:
{X:0.8940594 Y:0.6853521 Z:1.470214}
And I want to transform it to like this;
0.8940594
0.6853521
1.470214
And then put them in a matrix in order X=[], Y=[], Z=[]; (the data is the coordinate of an object)
Here is my code:
BufferedReader in = null; {
try {
in = new BufferedReader(new FileReader("file.txt"));
String read = null;
while ((read = in.readLine()) != null) {
String[] splited = read.split("\\s+");
for (String part : splited) {
System.out.println(part);
}
}
} catch (IOException e) {
System.out.println("There was a problem: " + e);
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e) {
}
}
}
What do I need to add to my code to get the data the way I want?
Right now with this code I receive data like this:
{X:0.8940594
Y:0.6853521
Z:1.470214}
You can try using a regex similar to the following to match and capture the three numbers contained in each tuple:
{\s*X:(.*?)\s+Y:(.*?)\s+Z:(.*?)\s*}
Each quantity contained in parenthesis is a capture group, and is available after a match has taken place.
int size = 100; // replace with actual size of your vectors/matrix
double[] A = new double[size];
double[] B = new double[size];
double[] C = new double[size];
String input = "{X:0.8940594 Y:0.6853521 Z:1.470214}";
String regex = "\\{\\s*X:(.*?)\\s+Y:(.*?)\\s+Z:(.*?)\\s*\\}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
int counter = 0;
while (m.find()) {
A[counter] = Double.parseDouble(m.group(1));
B[counter] = Double.parseDouble(m.group(2));
C[counter] = Double.parseDouble(m.group(3));
++counter;
}
You can use this regex -?\d+\.\d+ for example :
String input = "{X:0.8940594 Y:0.6853521 Z:1.470214}";
Pattern pattern = Pattern.compile("-?\\d+\\.\\d+");
Matcher matcher = pattern.matcher(input);
List<String> result = new ArrayList<>();
while (matcher.find()) {
result.add(matcher.group());
}
System.out.println(result);
In your case you want to match the real number, you can check the Regex .
This code will solve your problem.
String input = "{X:0.8940594 Y:0.6853521 Z:1.470214} ";
String[] parts = input.split("(?<= )");
List<String> output = new ArrayList();
for (int i = 0; i < parts.length; i++) {
//System.out.println("*" + i);
//System.out.println(parts[i]);
String[] part = parts[i].split("(?<=:)");
String[] temp = part[1].split("}");
output.add(temp[0]);
}
System.out.println("This List contains numbers:" + output);
Output->This List contains numbers:[0.8940594 , 0.6853521 , 1.470214]
How about this?
public class Test {
public static void main(String[] args) {
String s = "{X:0.8940594 Y:0.6853521 Z:1.470214}";
String x = s.substring(s.indexOf("X:")+2, s.indexOf("Y:")-1);
String y = s.substring(s.indexOf("Y:")+2, s.indexOf("Z:")-1);
String z = s.substring(s.indexOf("Z:")+2, s.lastIndexOf("}"));
System.out.println(x);
System.out.println(y);
System.out.println(z);
}
}
Your regex splits on whitespace, but does not remove the curly braces.
So instead of splitting on whitespace, you split on a class of characters: whitespace and curly braces.
The line with the regex then becomes:
String[] splited = read.split("[\\s+\\{\\}]");
Here is an ideone link with the full snippet.
After this, you'll want to split the resulting three lines on the :, and parse the righthand side. You can use Double.parseDouble for this purpose.
Personally, I would try to avoid long regex expressions; they are hard to debug.
It may be best to remove the curly braces first, then split the result on whitespace and colons. This is more lines of code, but it's more robust and easier to debug.

Convert ArrayList of Integers to String Java

I want to convert an ArrayList of Integers to a single string.
For example:
List<Integers> listInt= new ArrayList();
String str = "";
listInt.add(1);
listInt.add(2);
listInt.add(3);
// I want the output to be: str = "123";
String numberString = listInt.stream().map(String::valueOf)
.collect(Collectors.joining(""));
//You can pass a delimiter if you want to the joining() function, like a comma. ","
Try this, the simplest way:
List<Integer> listInt = Arrays.asList(1,2,3);
String str = "";
StringBuilder builder = new StringBuilder();
for(Integer item : listInt)
builder.append(item);
str = builder.toString();
System.out.println(str);

Replace word in Java

There is some line, for example "1 qqq 4 aaa 2" and list {aaa, qqq}. I must change all words (consists only from letters) on words from list. Answer on this example "1 aaa 4 qqq 2". Try
StringTokenizer tokenizer = new StringTokenizer(str, " ");
while (tokenizer.hasMoreTokens()){
tmp = tokenizer.nextToken();
if(tmp.matches("^[a-z]+$"))
newStr = newStr.replaceFirst(tmp, words.get(l++));
}
But it's not working. In result I have the same line.
All my code:
String space = " ", tmp, newStr;
Scanner stdin = new Scanner(System.in);
while (stdin.hasNextLine()) {
int k = 0, j = 0, l = 0;
String str = stdin.nextLine();
newStr = str;
List<String> words = new ArrayList<>(Arrays.asList(str.split(" ")));
words.removeIf(new Predicate<String>() {
#Override
public boolean test(String s) {
return !s.matches("^[a-z]+$");
}
});
Collections.sort(words);
StringTokenizer tokenizer = new StringTokenizer(str, " ");
while (tokenizer.hasMoreTokens()){
tmp = tokenizer.nextToken();
if(tmp.matches("^[a-z]+$"))
newStr = newStr.replaceFirst(tmp, words.get(l++));
}
System.out.printf(newStr);
}
I think the problem might be that replaceFirst() expects a regular expression as first parameter and you are giving it a String.
Maybe try
newStr = newStr.replaceFirst("^[a-z]+$", words.get(l++));
instead?
Update:
Would that be a possibility for you:
StringBuilder _b = new StringBuilder();
while (_tokenizer.hasMoreTokens()){
String _tmp = _tokenizer.nextToken();
if(_tmp.matches("^[a-z]+$")){
_b.append(words.get(l++));
}
else{
_b.append(_tmp);
}
_b.append(" ");
}
String newStr = _b.toString().trim();
Update 2:
Change the StringTokenizer like this:
StringTokenizer tokenizer = new StringTokenizer(str, " ", true);
That will also return the delimiters (all the spaces).
And then concatenate the String like this:
StringBuilder _b = new StringBuilder();
while (_tokenizer.hasMoreTokens()){
String _tmp = _tokenizer.nextToken();
if(_tmp.matches("^[a-z]+$")){
_b.append(words.get(l++));
}
else{
_b.append(_tmp);
}
}
String newStr = _b.toString().trim();
That should work.
Update 3:
As #DavidConrad mentioned StrinkTokenizer should not be used anymore. Here is another solution with String.split():
final String[] _elements = str.split("(?=[\\s]+)");
int l = 0;
for (int i = 0; i < _tokenizer.length; i++){
if(_tokenizer[i].matches("^[a-z]+$")){
_b.append(_arr[l++]);
}
else{
_b.append(_tokenizer[i]);
}
}
Just out of curiosity, another solution (the others really don't answer the question), which takes the input line and sorts the words alphabetically in the result, as you commented in your question.
public class Replacer {
public static void main(String[] args) {
Replacer r = new Replacer();
Scanner in = new Scanner(System.in);
while (in.hasNextLine()) {
System.out.println(r.replace(in.nextLine()));
}
}
public String replace(String input) {
Matcher m = Pattern.compile("([a-z]+)").matcher(input);
StringBuffer sb = new StringBuffer();
List<String> replacements = new ArrayList<>();
while (m.find()) {
replacements.add(m.group());
}
Collections.sort(replacements);
m.reset();
for (int i = 0; m.find(); i++) {
m.appendReplacement(sb, replacements.get(i));
}
m.appendTail(sb);
return sb.toString();
}
}

Parsing a file and replacing White spaces fond within double quotes using Java

I am reading a file and trying to modify it in the following order:
if line is empty trim()
if line ends with \ strip that char and add next line to it.
The complete line contains double quotes and there are white spaces between the quotes, replace the white space with ~.
For example: "This is text within double quotes"
change to : "This~is~text~within~double~quotes"
This code is working but buggy.
Here is the issue when it finds a line that ends with \ and others that done.
for example:
line 1 and \
line 2
line 3
so Instead of having
line 1 and line 2
line 3
I have this:
line 1 and line 2 line 3
Coded updated:
public List<String> OpenFile() throws IOException {
try (BufferedReader br = new BufferedReader(new FileReader(path))) {
String line;
//StringBuilder concatenatedLine = new StringBuilder();
List<String> formattedStrings = new ArrayList<>();
//Pattern matcher = Pattern.compile("\"([^\"]+)\"");
while ((line = br.readLine()) != null) {
boolean addToPreviousLine;
if (line.isEmpty()) {
line.trim();
}
if (line.contains("\"")) {
Matcher matcher = Pattern.compile("\"([^\"]+)\"").matcher(line);
while (matcher.find()) {
String match = matcher.group();
line = line.replace(match, match.replaceAll("\\s+", "~"));
}
}
if (line.endsWith("\\")) {
addToPreviousLine = false;
line = line.substring(0, line.length() - 1);
formattedStrings.add(line);
} else {
addToPreviousLine = true;
}
if (addToPreviousLine) {
int previousLineIndex = formattedStrings.size() - 1;
if (previousLineIndex > -1) {
// Combine the previous line and current line
String previousLine = formattedStrings.remove(previousLineIndex);
line = previousLine + " " + line;
formattedStrings.add(line);
}
}
testScan(formattedStrings);
//concatenatedLine.setLength(0);
}
return formattedStrings;
}
Update
I'm giving you what you need, without trying to write all the code for you. You just need to figure out where to place these snippets.
If line is empty trim()
if (line.matches("\\s+")) {
line = "";
// I don't think you want to add an empty line to your return result. If you do, just omit the continue;
continue;
}
If line contains double quotes and white spaces in them, replace the white space with ~. For example: "This is text within double quotes" change to : "This~is~text~within~double~quotes"
Matcher matcher = Pattern.compile("\"([^\"]+)\"").matcher(line);
while (matcher.find()) {
String match = matcher.group();
line = line.replace(match, match.replaceAll("\\s+", "~"));
}
If line ends with \ strip that char and add the next line. You need to have flag to track when to do this.
if (line.endsWith("\\")) {
addToPreviousLine = true;
line = line.substring(0, line.length() - 1);
} else {
addToPreviousLine = false;
}
Now, to add the next line to the previous line you'll need something like (Figure out where to place this snippet):
if (addToPreviousLine) {
int previousLineIndex = formattedStrings.size() - 1;
if (previousLineIndex > -1) {
// Combine the previous line and current line
String previousLine = formattedStrings.remove(previousLineIndex);
line = previousLine + " " + line;
}
}
You still do not need the StringBuffer or StringBuilder. Just modify the current line and add the current line to your formattedStrings List.
I'm not very good with regex, so here's a programmatic method to do it:
String string = "He said, \"Hello Mr Nice Guy\"";
// split it along the quotes
String splitString[] = string.split("\"");
// loop through, each odd indexted item is inside quotations
for(int i = 0; i < splitString.length; i++) {
if(i % 2 > 0) {
splitString[i] = splitString[i].replaceAll(" ", "~");
}
}
String finalString = "";
// re-build the string w/ quotes added back in
for(int i = 0; i < splitString.length; i++) {
if(i % 2 > 0) {
finalString += "\"" + splitString[i] + "\"";
} else {
finalString += splitString[i];
}
}
System.out.println(finalString);
Output: He said, "Hello~Mr~Nice~Guy"
Step 3:
String text;
text = text.replaceAll("\\s", "~");
If you want to replace spaces occur within double quotes with ~s,
if (line.contains("\"")) {
String line = "\"This is a line with spaces\"";
String result = "";
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(line);
while (m.find()) {
result = m.group(1).replace(" ", "~");
}
}
instead of
if (line.contains("\"")) {
StringBuffer sb = new StringBuffer();
Matcher matcher = Pattern.compile("\"([^\"]+)\"").matcher(line);
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group().replaceAll("\\s+", ""));
}
I would do this
if (line.matches(("\"([^\"]+)\"")) {
line= line.replaceAll("\\s+", ""));
}
How can I add this to what I have above ?
concatenatedLine.append(line);
String fullLine = concatenatedLine.toString();
if (fullLine.contains("\"")) {
StringBuffer sb = new StringBuffer();
Matcher matcher = Pattern.compile("\"([^\"]+)\"").matcher(fullLine);
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group().replaceAll("\\s+", ""));
formattedStrings.add(sb.toString());
}else
formattedStrings.add(fullLine);

Words inside square brackes - RegExp

String linkPattern = "\\[[A-Za-z_0-9]+\\]";
String text = "[build]/directory/[something]/[build]/";
RegExp reg = RegExp.compile(linkPattern,"g");
MatchResult matchResult = reg.exec(text);
for (int i = 0; i < matchResult.getGroupCount(); i++) {
System.out.println("group" + i + "=" + matchResult.getGroup(i));
}
I am trying to get all blocks which are encapsulated by squared bracets form a path string:
and I only get group0="[build]" what i want is:
1:"[build]" 2:"[something]" 3:"[build]"
EDIT:
just to be clear words inside the brackets are generated with random text
public static String genText()
{
final int LENGTH = (int)(Math.random()*12)+4;
StringBuffer sb = new StringBuffer();
for (int x = 0; x < LENGTH; x++)
{
sb.append((char)((int)(Math.random() * 26) + 97));
}
String str = sb.toString();
str = str.substring(0,1).toUpperCase() + str.substring(1);
return str;
}
EDIT 2:
JDK works fine, GWT RegExp gives this problem
SOLVED:
Answer from Didier L
String linkPattern = "\\[[A-Za-z_0-9]+\\]";
String result = "";
String text = "[build]/directory/[something]/[build]/";
RegExp reg = RegExp.compile(linkPattern,"g");
MatchResult matchResult = null;
while((matchResult=reg.exec(text)) != null){
if(matchResult.getGroupCount()==1)
System.out.println( matchResult.getGroup(0));
}
I don't know which regex library you are using but using the one from the JDK it would go along the lines of
String linkPattern = "\\[[A-Za-z_0-9]+\\]";
String text = "[build]/directory/[something]/[build]/";
Pattern pat = Pattern.compile(linkPattern);
Matcher mat = pat.matcher(text);
while (mat.find()) {
System.out.println(mat.group());
}
Output:
[build]
[something]
[build]
Try:
String linkPattern = "(\\[[A-Za-z_0-9]+\\])*";
EDIT:
Second try:
String linkPattern = "\\[(\\w+)\\]+"
Third try, see http://rubular.com/r/eyAQ3Vg68N

Categories

Resources