Read and Split(Parse) data in java - java

I am trying to split some simple data from a .txt file. I have found some useful structures on the internet but it was not enough to split the data the way I wanted. I get a string like this:
{X:0.8940594 Y:0.6853521 Z:1.470214}
And I want to transform it to like this;
0.8940594
0.6853521
1.470214
And then put them in a matrix in order X=[], Y=[], Z=[]; (the data is the coordinate of an object)
Here is my code:
BufferedReader in = null; {
try {
in = new BufferedReader(new FileReader("file.txt"));
String read = null;
while ((read = in.readLine()) != null) {
String[] splited = read.split("\\s+");
for (String part : splited) {
System.out.println(part);
}
}
} catch (IOException e) {
System.out.println("There was a problem: " + e);
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e) {
}
}
}
What do I need to add to my code to get the data the way I want?
Right now with this code I receive data like this:
{X:0.8940594
Y:0.6853521
Z:1.470214}

You can try using a regex similar to the following to match and capture the three numbers contained in each tuple:
{\s*X:(.*?)\s+Y:(.*?)\s+Z:(.*?)\s*}
Each quantity contained in parenthesis is a capture group, and is available after a match has taken place.
int size = 100; // replace with actual size of your vectors/matrix
double[] A = new double[size];
double[] B = new double[size];
double[] C = new double[size];
String input = "{X:0.8940594 Y:0.6853521 Z:1.470214}";
String regex = "\\{\\s*X:(.*?)\\s+Y:(.*?)\\s+Z:(.*?)\\s*\\}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
int counter = 0;
while (m.find()) {
A[counter] = Double.parseDouble(m.group(1));
B[counter] = Double.parseDouble(m.group(2));
C[counter] = Double.parseDouble(m.group(3));
++counter;
}

You can use this regex -?\d+\.\d+ for example :
String input = "{X:0.8940594 Y:0.6853521 Z:1.470214}";
Pattern pattern = Pattern.compile("-?\\d+\\.\\d+");
Matcher matcher = pattern.matcher(input);
List<String> result = new ArrayList<>();
while (matcher.find()) {
result.add(matcher.group());
}
System.out.println(result);
In your case you want to match the real number, you can check the Regex .

This code will solve your problem.
String input = "{X:0.8940594 Y:0.6853521 Z:1.470214} ";
String[] parts = input.split("(?<= )");
List<String> output = new ArrayList();
for (int i = 0; i < parts.length; i++) {
//System.out.println("*" + i);
//System.out.println(parts[i]);
String[] part = parts[i].split("(?<=:)");
String[] temp = part[1].split("}");
output.add(temp[0]);
}
System.out.println("This List contains numbers:" + output);
Output->This List contains numbers:[0.8940594 , 0.6853521 , 1.470214]

How about this?
public class Test {
public static void main(String[] args) {
String s = "{X:0.8940594 Y:0.6853521 Z:1.470214}";
String x = s.substring(s.indexOf("X:")+2, s.indexOf("Y:")-1);
String y = s.substring(s.indexOf("Y:")+2, s.indexOf("Z:")-1);
String z = s.substring(s.indexOf("Z:")+2, s.lastIndexOf("}"));
System.out.println(x);
System.out.println(y);
System.out.println(z);
}
}

Your regex splits on whitespace, but does not remove the curly braces.
So instead of splitting on whitespace, you split on a class of characters: whitespace and curly braces.
The line with the regex then becomes:
String[] splited = read.split("[\\s+\\{\\}]");
Here is an ideone link with the full snippet.
After this, you'll want to split the resulting three lines on the :, and parse the righthand side. You can use Double.parseDouble for this purpose.
Personally, I would try to avoid long regex expressions; they are hard to debug.
It may be best to remove the curly braces first, then split the result on whitespace and colons. This is more lines of code, but it's more robust and easier to debug.

Related

Java - cut string in a specific way

I have a string (taken from file):
Computer: intel, graphic card: Nvidia,
Mouse: razer, color: white
etc.
I need to take words between ":" and ",".
When I'm doing this in that way
Scanner sc = new Scanner(new File(path));
String str = sc.nextLine();
ArrayList<String> list = new ArrayList<String>();
while (sc.hasNextLine()) {
for (int i = 0; i < str.length(); i++) {
list.add(str.substring(str.indexOf(":"), str.indexOf(",")));
}
System.out.println("test");
sc.nextLine();
}
I'm only taking ": intel".
I don't know how to take more word from same line and them word from next line.
Assuming the content of the file, test.txt is as follows:
Computer: intel, graphic card: Nvidia
Mouse: razer, color: white
The following program will
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
class Coddersclub {
public static void main(String[] args) throws FileNotFoundException {
Scanner sc = new Scanner(new File("test.txt"));
ArrayList<String> list = new ArrayList<String>();
String str = "";
while (sc.hasNextLine()) {
str = sc.nextLine();
String[] specs = str.split(",");
for (String item : specs) {
list.add(item.substring(item.indexOf(":") + 1).trim());
}
}
System.out.println(list);
}
}
output:
[intel, Nvidia, razer, white]
Note: if you are looking for the list to be as [: intel, : Nvidia, : razer, : white], replace list.add(item.substring(item.indexOf(":") + 1).trim()); with list.add(item.substring(item.indexOf(":")).trim());.
Feel free to comment if you are looking for something else.
You are facing this problem because the indexof() function returns the first occurrence of that character in the string. Hence you you are getting the substring between the first occurrence of ':' and first occurrence of ',' . To solve your problem use the functions FirstIndexOf() and LastIndexOf() with str.substring instead of the function IndexOf(). This will return the substring between the first occurrence of ':' and the last occurrence of ',' . I hope you find this answer helpful.
An evergreen solution is :
String string = "Computer: intel, graphic card: Nvidia,";
Map<String,String> map = Pattern.compile("\\s*,\\s*")
.splitAsStream(string.trim())
.map(s -> s.split(":", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length>1? a[1]: ""));
System.out.println(map.values());
Output:
[ Nvidia, intel]
You can use regex for that.
To extract the text between a : and the last , in the line, use something like:
(?<=\:)(.*?)(?=\,\n)
You can then perform an operation like this:
String mytext = "Computer: intel, graphic card: Nvidia,\n" +
"Mouse: razer, color: white etc.";
Pattern pattern = Pattern.compile("(?<=\:)(.*?)(?=\,\n)");
Matcher matcher = pattern.matcher(mytext);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
The output will be:
intel, graphic card: Nvidia
Inspired by this and this other threads.
modify your code as follows to solve the issue.
Scanner sc = new Scanner(new File(path));
ArrayList<String> list = new ArrayList<String>();
while (sc.hasNextLine()) {
String str = sc.nextLine();
String[] sp = testString.split(",");
for(int i=0;i<sp.length;i++){
list.add(sp[i].split(":")[1]);
}
}
// you will get the list as intel ,nvdia, razor etc..
I think in order to get all the keys between ':' and ',' it would be good to split each line by ','and each element of the line by ':' then get the right hand value.
Please try this code :
Scanner sc;
try {
sc = new Scanner(new File(path));
ArrayList<String> list = new ArrayList<String>();
while (sc.hasNextLine()) {
String informations = sc.nextLine();
String[] parts = informations.split(",");
for( String part : parts) {
list.add(part.substring(part.indexOf(':')+1));
}
}
}
catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

How to match the text file against multiple regex patterns and count the number of occurences of these patterns?

I want to find and count all the occurrences of the words unit, device, method, module in every line of the text file separately. That's what I've done, but I don't know how to use multiple patterns and how to count the occurrence of every word in the line separately? Now it counts only occurrences of all words together for every line. Thank you in advance!
private void countPaterns() throws IOException {
Pattern nom = Pattern.compile("unit|device|method|module|material|process|system");
String str = null;
BufferedReader r = new BufferedReader(new FileReader("D:/test/test1.txt"));
while ((str = r.readLine()) != null) {
Matcher matcher = nom.matcher(str);
int countnomen = 0;
while (matcher.find()) {
countnomen++;
}
//intList.add(countnomen);
System.out.println(countnomen + " davon ist das Wort System");
}
r.close();
//return intList;
}
Better to use a word boundary and use a map to keep counts of each matched keyword.
Pattern nom = Pattern.compile("\\b(unit|device|method|module|material|process|system)\\b");
String str = null;
BufferedReader r = new BufferedReader(new FileReader("D:/test/test1.txt"));
Map<String, Integer> counts = new HashMap<>();
while ((str = r.readLine()) != null) {
Matcher matcher = nom.matcher(str);
while (matcher.find()) {
String key = matcher.group(1);
int c = 0;
if (counts.containsKey(key))
c = counts.get(key);
counts.put(key, c+1)
}
}
r.close();
System.out.println(counts);
Here's a Java 9 (and above) solution:
public static void main(String[] args) {
List<String> expressions = List.of("(good)", "(bad)");
String phrase = " good bad bad good good bad bad bad";
for (String regex : expressions) {
Pattern gPattern = Pattern.compile(regex);
Matcher matcher = gPattern.matcher(phrase);
long count = matcher.results().count();
System.out.println("Pattern \"" + regex + "\" appears " + count + (count == 1 ? " time" : " times"));
}
}
Outputs
Pattern "(good)" appears 3 times
Pattern "(bad)" appears 5 times

Parsing a file and replacing White spaces fond within double quotes using Java

I am reading a file and trying to modify it in the following order:
if line is empty trim()
if line ends with \ strip that char and add next line to it.
The complete line contains double quotes and there are white spaces between the quotes, replace the white space with ~.
For example: "This is text within double quotes"
change to : "This~is~text~within~double~quotes"
This code is working but buggy.
Here is the issue when it finds a line that ends with \ and others that done.
for example:
line 1 and \
line 2
line 3
so Instead of having
line 1 and line 2
line 3
I have this:
line 1 and line 2 line 3
Coded updated:
public List<String> OpenFile() throws IOException {
try (BufferedReader br = new BufferedReader(new FileReader(path))) {
String line;
//StringBuilder concatenatedLine = new StringBuilder();
List<String> formattedStrings = new ArrayList<>();
//Pattern matcher = Pattern.compile("\"([^\"]+)\"");
while ((line = br.readLine()) != null) {
boolean addToPreviousLine;
if (line.isEmpty()) {
line.trim();
}
if (line.contains("\"")) {
Matcher matcher = Pattern.compile("\"([^\"]+)\"").matcher(line);
while (matcher.find()) {
String match = matcher.group();
line = line.replace(match, match.replaceAll("\\s+", "~"));
}
}
if (line.endsWith("\\")) {
addToPreviousLine = false;
line = line.substring(0, line.length() - 1);
formattedStrings.add(line);
} else {
addToPreviousLine = true;
}
if (addToPreviousLine) {
int previousLineIndex = formattedStrings.size() - 1;
if (previousLineIndex > -1) {
// Combine the previous line and current line
String previousLine = formattedStrings.remove(previousLineIndex);
line = previousLine + " " + line;
formattedStrings.add(line);
}
}
testScan(formattedStrings);
//concatenatedLine.setLength(0);
}
return formattedStrings;
}
Update
I'm giving you what you need, without trying to write all the code for you. You just need to figure out where to place these snippets.
If line is empty trim()
if (line.matches("\\s+")) {
line = "";
// I don't think you want to add an empty line to your return result. If you do, just omit the continue;
continue;
}
If line contains double quotes and white spaces in them, replace the white space with ~. For example: "This is text within double quotes" change to : "This~is~text~within~double~quotes"
Matcher matcher = Pattern.compile("\"([^\"]+)\"").matcher(line);
while (matcher.find()) {
String match = matcher.group();
line = line.replace(match, match.replaceAll("\\s+", "~"));
}
If line ends with \ strip that char and add the next line. You need to have flag to track when to do this.
if (line.endsWith("\\")) {
addToPreviousLine = true;
line = line.substring(0, line.length() - 1);
} else {
addToPreviousLine = false;
}
Now, to add the next line to the previous line you'll need something like (Figure out where to place this snippet):
if (addToPreviousLine) {
int previousLineIndex = formattedStrings.size() - 1;
if (previousLineIndex > -1) {
// Combine the previous line and current line
String previousLine = formattedStrings.remove(previousLineIndex);
line = previousLine + " " + line;
}
}
You still do not need the StringBuffer or StringBuilder. Just modify the current line and add the current line to your formattedStrings List.
I'm not very good with regex, so here's a programmatic method to do it:
String string = "He said, \"Hello Mr Nice Guy\"";
// split it along the quotes
String splitString[] = string.split("\"");
// loop through, each odd indexted item is inside quotations
for(int i = 0; i < splitString.length; i++) {
if(i % 2 > 0) {
splitString[i] = splitString[i].replaceAll(" ", "~");
}
}
String finalString = "";
// re-build the string w/ quotes added back in
for(int i = 0; i < splitString.length; i++) {
if(i % 2 > 0) {
finalString += "\"" + splitString[i] + "\"";
} else {
finalString += splitString[i];
}
}
System.out.println(finalString);
Output: He said, "Hello~Mr~Nice~Guy"
Step 3:
String text;
text = text.replaceAll("\\s", "~");
If you want to replace spaces occur within double quotes with ~s,
if (line.contains("\"")) {
String line = "\"This is a line with spaces\"";
String result = "";
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(line);
while (m.find()) {
result = m.group(1).replace(" ", "~");
}
}
instead of
if (line.contains("\"")) {
StringBuffer sb = new StringBuffer();
Matcher matcher = Pattern.compile("\"([^\"]+)\"").matcher(line);
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group().replaceAll("\\s+", ""));
}
I would do this
if (line.matches(("\"([^\"]+)\"")) {
line= line.replaceAll("\\s+", ""));
}
How can I add this to what I have above ?
concatenatedLine.append(line);
String fullLine = concatenatedLine.toString();
if (fullLine.contains("\"")) {
StringBuffer sb = new StringBuffer();
Matcher matcher = Pattern.compile("\"([^\"]+)\"").matcher(fullLine);
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group().replaceAll("\\s+", ""));
formattedStrings.add(sb.toString());
}else
formattedStrings.add(fullLine);

Extract String(s) from big String selected by special characters

Let's say I have a String like this:
Hey man, my name is Jason and I like Pizza. #Pizza #Name #Cliche
My question is how to extract all the strings that start with # and put them to another string together?
Check out this tutorial on regex
Try
Matcher matcher = Pattern.compile("(\\s|^)#(\\S*)").matcher(string);
while(matcher.find()){
System.out.println(matcher.group(2));
}
EDIT:
As you wanted the other strings as well, you may try
Matcher matcher = Pattern.compile("(\\s|^)#(\\S*)|(\\S+)").matcher(string);
StringJoiner hashfull = new StringJoiner(" ");
StringJoiner hashless = new StringJoiner(" ");
while(matcher.find())
if(matcher.group(2) != null)
hashfull.add(matcher.group(2));
else if(matcher.group(3) != null)
hashless.add(matcher.group(3));
System.out.println(hashfull);
System.out.println(hashless);
I found this code working very well for me because I wanted also the rest of the string. Thanks to #Pshemo and #Mormod for helping me with this. Here is the code:
String string = "Hello my name is Jason and I like pizza. #Me #Pizza";
String[] splitedString = string.split(" "); //splits string at spaces
StringBuilder newString = new StringBuilder();
StringBuilder newString2 = new StringBuilder();
for(int i = 0; i<splitedString.length; i++){
if(splitedString[i].startsWith("#")){
newString.append(splitedString[i]);
newString.append(" "); }
else{
newString2.append(splitedString[i]);
newString2.append(" ");
}
}
System.out.println(newString2);
System.out.println(newString);
Maybe you search for something like this:
String string = "ab cd ef"
String[] splitedString = string.split(" "); //splits string at spaces
String newString = "";
for(int i = 0; i<splitedString; i++){
if(splitedString[i].startsWith("#")){
newString += splitedString[i];
}
}
mormod

Words inside square brackes - RegExp

String linkPattern = "\\[[A-Za-z_0-9]+\\]";
String text = "[build]/directory/[something]/[build]/";
RegExp reg = RegExp.compile(linkPattern,"g");
MatchResult matchResult = reg.exec(text);
for (int i = 0; i < matchResult.getGroupCount(); i++) {
System.out.println("group" + i + "=" + matchResult.getGroup(i));
}
I am trying to get all blocks which are encapsulated by squared bracets form a path string:
and I only get group0="[build]" what i want is:
1:"[build]" 2:"[something]" 3:"[build]"
EDIT:
just to be clear words inside the brackets are generated with random text
public static String genText()
{
final int LENGTH = (int)(Math.random()*12)+4;
StringBuffer sb = new StringBuffer();
for (int x = 0; x < LENGTH; x++)
{
sb.append((char)((int)(Math.random() * 26) + 97));
}
String str = sb.toString();
str = str.substring(0,1).toUpperCase() + str.substring(1);
return str;
}
EDIT 2:
JDK works fine, GWT RegExp gives this problem
SOLVED:
Answer from Didier L
String linkPattern = "\\[[A-Za-z_0-9]+\\]";
String result = "";
String text = "[build]/directory/[something]/[build]/";
RegExp reg = RegExp.compile(linkPattern,"g");
MatchResult matchResult = null;
while((matchResult=reg.exec(text)) != null){
if(matchResult.getGroupCount()==1)
System.out.println( matchResult.getGroup(0));
}
I don't know which regex library you are using but using the one from the JDK it would go along the lines of
String linkPattern = "\\[[A-Za-z_0-9]+\\]";
String text = "[build]/directory/[something]/[build]/";
Pattern pat = Pattern.compile(linkPattern);
Matcher mat = pat.matcher(text);
while (mat.find()) {
System.out.println(mat.group());
}
Output:
[build]
[something]
[build]
Try:
String linkPattern = "(\\[[A-Za-z_0-9]+\\])*";
EDIT:
Second try:
String linkPattern = "\\[(\\w+)\\]+"
Third try, see http://rubular.com/r/eyAQ3Vg68N

Categories

Resources