Replace a million different regex of a string - java

I'm doing a million different regex replacements of a string. Thus I decided to save all String regex's and String replacements in a file.txt. I tried reading the file line by line and replacing it but it is not working.
replace_regex_file.txt
aaa zzz
^cc eee
ww$ sss
...
...
...
...
a million data
Coding
String user_input = "assume 100,000 words"; // input from user
String regex_file = "replace_regex_file.txt";
String result="";
String line;
try (BufferedReader reader = new BufferedReader(new FileReader(regex_file)) {
while ((line = reader.readLine()) != null) { // while line not equal null
String[] parts = line.split("\\s+", 2); //split process
if (parts.length >=2) {
String regex = parts[0]; // String regex stored in first array
String replace = parts[1]; // String replacement stored in second array
result = user_input.replaceAll(regex, replace); // replace processing
}
}
} System.out.println(result); // show the result
But it does not replace anything. How can I fix this?

Your current code will only apply the last matching regex, because you don't assign the result of the replacement back to the input string:
result = user_input.replaceAll(regex, replace);
Instead, try:
String result = user_input;
outside the loop and
result = result.replaceAll(regex, replace);

Related

Splitting data into Arrays

I am trying to read data from a text file using a Buffered Reader. I'm trying to split the data into two Arrays, one of them is a double and the other one is a string. Below is the text file content:
55.6
Scholtz
85.6
Brown
74.9
Alawi
45.2
Weis
68.0
Baird
55
Baynard
68.5
Mills
65.1
Gibb
80.7
Grovner
87.6
Weaver
74.8
Kennedy
83.5
Landry.
Basically I'm trying to take all the numbers and put it into the double array, and take all the names and put it into the string array. Any ideas?
You could possibly get the entire string from the buffered reader and then use regex to parse out the digits and other data. A regex like \d+\.*\d should work to parse out the digits. And then a regex like [A-Za-z]+ should get all of the names. Then take each set of data from the regular expressions and split them into their respective arrays using .split("").
Try this:
String file = "path to file";
double dArr[] = new double[100];
String sArr[] = new String[100];
int i = 0, j = 0;
try {
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
String line;
while ((line = br.readLine()) != null) {
Pattern p = Pattern.compile("([0-9]*)\\.[0-9]*"); // should start with any number of 0-9 then "." and then any number of 0-9
Matcher m = p.matcher(line);
if (m.matches()) {
dArr[i] = Double.parseDouble(line);
i++;
} else {
sArr[j] = line;
j++;
}
}
} catch (IOException e) {
e.printStackTrace();
}
Suggestion: Try List instead of array if uncertain about number of elements
55 is treated as String as it is int

Java sequentially parse information from file

lets say I have a file with a structure like this:
Line 0:
354858 Some String That Is Important AA OTHER STUFF SOMESTUFF
THAT SHOULD BE IGNORED
Line 1:
543788 Another String That Is Important AA OTHER STUFF
SOMESTUFF THAT SHOULD BE IGNORED
and so on...
Now I would like to get the information that is marked in my example (see gray background). The sequence AA is always present (and could be used as a break and skip to the next line) while the information string varies in length.
What will be the best way to parse the information? A buffered reader with if, then, else or is there some kind of parser that you can tell, read a number of lenth XYZ then read everything into a String until you find AA then skip line.
To tell you which is best for your problem is not possible without more information.
One solution might be
String s = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
String[] split = s.substring(0, s.indexOf(" AA")).split(" ", 2);
System.out.println("split = " + Arrays.toString(split));
output
split = [354858, Some String That Is Important]
You can read the file line by line and exclude the part which contains the AA charSequence:
final String charSequence = "AA";
String line;
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("yourfilename")));
try {
while ((line = r.readLine()) != null) {
int pos = line.indexOf(charSequence);
if (pos > 0) {
String myImportantStuff = line.substring(0, pos);
//do something with your useful string
}
}
} finally {
r.close();
}
I would read the file line by line and match each line against a regular expression. I hope my comments in the code below will be detailed enough.
// The pattern to use
Pattern p = Pattern.compile("^([0-9]+)\\s+(([^A]|A[^A])+)AA");
// Read file line by line
BufferedReader br = new BufferedReader(new FileReader(myFile));
String line;
while((line = br.readLine()) != null) {
// Match line against our pattern
Matcher m = p.matcher(line);
if(m.find()) {
// Line is valid, process it however you want
// m.group(1) contains the number
// m.group(2) contains the text between number and AA
} else {
// Line has invalid format (pattern does not match)
}
}
Explanation of the regular expression (Pattern) I used:
^([0-9]+)\s+(([^A]|A[^A])+)AA
^ matches the start of the line
([0-9]+) matches any integral number
\s+ matches one or more whitespace characters
(([^A]|A[^A])+) matches any characters which are either not A or not followed by another A
AA matches the terminating AA
Update as a reply to comment:
If every line has a preceding | character, the expression looks like this:
^\|([0-9]+)\s+(([^A]|A[^A])+)AA
In JAVA, you need to escape it like this:
"^\\|([0-9]+)\\s+(([^A]|A[^A])+)AA"
The character | has a special meaning in regular expressions and has to be escaped.
Here is a solution for you:
public static void main(String[] args) {
InputStream source; //select a text source (should be a FileInputStream)
{
String fileContent = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED\n" +
"543788 Another String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
source = new ByteArrayInputStream(fileContent.getBytes(StandardCharsets.UTF_8));
}
try(BufferedReader stream = new BufferedReader(new InputStreamReader(source))) {
Pattern pattern = Pattern.compile("^([0-9]+) (.*?) AA .*$");
while(true) {
String line = stream.readLine();
if(line == null) {
break;
}
Matcher matcher = pattern.matcher(line);
if(matcher.matches()) {
String someNumber = matcher.group(1);
String someText = matcher.group(2);
//do something with someNumber and someText
} else {
throw new ParseException(line, 0);
}
}
} catch (IOException | ParseException e) {
e.printStackTrace(); // TODO ...
}
}
You could use a regular expression, but if you know every line contains AA and you want the content up to AA you could can simply do substring(int,int) to get the part of the line up to AA
public List read(Path path) throws IOException {
return Files.lines(path)
.map(this::parseLine)
.collect(Collectors.toList());
}
public String parseLine(String line){
int index = line.indexOf("AA");
return line.substring(0,index);
}
Here's the non-Java8 version of read
public List read(Path path) throws IOException {
List<String> content = new ArrayList<>();
try(BufferedReader reader = new BufferedReader(new FileReader(path.toFile()))){
String line;
while((line = reader.readLine()) != null){
content.add(parseLine(line));
}
}
return content;
}
Use Regex : .+?(?=AA).
Check Here is the Demo

How to replace a substring without using replace() methods

I am trying to convert a text document to shorthand, without using any of the replace() methods in java. One of the strings I am converting is "the" to "&". The problem is, that I do not know the substring of each word that contains the "the" string. So how do I replace that part of a string without using the replace() method?
Ex: "their" would become "&ir", "together" would become "toge&r"
This is what I have started with,
String the = "the";
Scanner wordScanner = new Scanner(word);
if (wordScanner.contains(the)) {
the = "&";
}
I am just not sure how to go about the replacement.
You could try this :
String word = "your string with the";
word = StringUtils.join(word.split("the"),"&");
Scanner wordScanner = new Scanner(word);
I do not get your usage of Scanner for this, but you can read each character into a buffer (StringBuilder) until you read "the" into the buffer. Once you've done that, you can delete the word and then append the word you want to replace with.
public static void main(String[] args) throws Exception {
String data = "their together the them forever";
String wordToReplace = "the";
String wordToReplaceWith = "&";
Scanner wordScanner = new Scanner(data);
// Using this delimiter to get one character at a time from the scanner
wordScanner.useDelimiter("");
StringBuilder buffer = new StringBuilder();
while (wordScanner.hasNext()) {
buffer.append(wordScanner.next());
// Check if the word you want to replace is in the buffer
int wordToReplaceIndex = buffer.indexOf(wordToReplace);
if (wordToReplaceIndex > -1) {
// Delete the word you don't want in the buffer
buffer.delete(wordToReplaceIndex, wordToReplaceIndex + wordToReplace.length());
// Append the word to replace the deleted word with
buffer.append(wordToReplaceWith);
}
}
// Output results
System.out.println(buffer);
}
Results:
&ir toge&r & &m forever
This can be done without a Scanner using just a while loop and StringBuilder
public static void main(String[] args) throws Exception {
String data = "their together the them forever";
StringBuilder buffer = new StringBuilder(data);
String wordToReplace = "the";
String wordToReplaceWith = "&";
int wordToReplaceIndex = -1;
while ((wordToReplaceIndex = buffer.indexOf(wordToReplace)) > -1) {
buffer.delete(wordToReplaceIndex, wordToReplaceIndex + wordToReplace.length());
buffer.insert(wordToReplaceIndex, wordToReplaceWith);
}
System.out.println(buffer);
}
Results:
&ir toge&r & &m forever
You can use Pattern and Matcher Regex:
Pattern pattern = Pattern.compile("the ");
Matcher matcher = pattern.matcher("the cat and their owners");
StringBuffer sb = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(sb, "& ");
}
matcher.appendTail(sb);
System.out.println(sb.toString());

Java parse a Sting with letters and numbers for an Integer

I'm working with data that is a String followed by spaces and then a numeric value.
ncols 10812
nrows 10812
xllcorner -107.0005555556
yllcorner 36.99944444444
cellsize 9.2592592593e-05
I'm trying to just read in just the numeric value. I know that from going to String to Integer or Double I can use the standard type conversions.
Integer.valueOf(stringOfInteger);
Double.valueOf(stringOfDouble);
In order to get just the numeric value I tried this as a test:
BufferedReader br = new BufferedReader(new FileReader(path));
String line = br.readLine();
line.replace("[a-z]","");
line.replace(" ","");
System.out.println(line);
and it output ncols 10812
I'm also worried about reading the cellsize value as it has an exponential.
You can do this for each line:
...
String[] fields = line.split("\\s+");
String name = fields[0];
float value = Float.parseFloat(fields[1]);
...
This code will split each line in fields using the spaces as a separator. The first field is a String so you can use it directly (or ignore it). The second one is a Float value so you have to convert it before using it. You can use Double if you prefer.
Try this one
BufferedReader br = new BufferedReader(new FileReader(path));
String line = null;
while ((line = br.readLine()) != null) {
// split each line based on spaces
String[] words = line.split("\\s+");
//first word is name
String name = words[0];
// next word is actual number
Double value = Double.valueOf(words[1]);
System.out.println(name + ":" + value);
}
// don't forget to close the stream
br.close();
Output:
ncols:10812.0
nrows:10812.0
xllcorner:-107.0005555556
yllcorner:36.99944444444
cellsize:9.2592592593E-5
If all you want all the numeric values do a split on the space and the second item will contain your numeric value. Then you can do any conversions as needed and not have to worry about removing any exponents.
String[] data = new line.split(" ");
//remove all the spaces from the second array for your data
data[1] = data[1].replaceAll("\\s", "");
//parse to whatever you need data[1] to be
You could use the split function in Java as follows:
String [] dataArr = data.split("[ \t]+"); //assumes #data is you data string variable name
The dataArr, then, will look like this:
dataArr[0] = "ncols"
dataArr[1] = "10812"
dataArr[2] = "nrows"
dataArr[3] = "10812"
.
.
.
dataArr[n - 1] = "9.2592592593e-05" // #n is the size of the array
You could, then, use the Integer.parseInt(String str) to parse your numerical data into integers.

Split string with three words

What is the best way to split a string containing three words?
My code looks like this right now (see below for updated code):
BufferedReader infile = new BufferedReader(new FileReader("file.txt"));
String line;
int i = 0;
while ((line = infile.readLine()) != null) {
String first, second, last;
//Split line into first, second and last (word)
//Do something with words (no help needed)
i++;
}
Here is the full file.txt:
Allegrettho Albert 0111-27543
Brio Britta 0113-45771
Cresendo Crister 0111-27440
Dacapo Dan 0111-90519
Dolce Dolly 0116-31418
Espressivo Eskil 0116-19042
Fortissimo Folke 0118-37547
Galanto Gunnel 0112-61805
Glissando Gloria 0112-43918
Grazioso Grace 0112-43509
Hysterico Hilding 0119-71296
Interludio Inga 0116-22709
Jubilato Johan 0111-47678
Kverulando Kajsa 0119-34995
Legato Lasse 0116-26995
Majestoso Maja 0116-80308
Marcato Maria 0113-25788
Molto Maja 0117-91490
Nontroppo Maistro 0119-12663
Obligato Osvald 0112-75541
Parlando Palle 0112-84460
Piano Pia 0111-10729
Portato Putte 0112-61412
Presto Pelle 0113-54895
Ritardando Rita 0117-20295
Staccato Stina 0112-12107
Subito Sune 0111-37574
Tempo Kalle 0114-95968
Unisono Uno 0113-16714
Virtuoso Vilhelm 0114-10931
Xelerando Axel 0113-89124
New code as #Pshemo suggested:
public String load() {
try {
Scanner scanner = new Scanner(new File("reg.txt"));
while (scanner.hasNextLine()) {
String firstname = scanner.next();
String lastname = scanner.next();
String number = scanner.next();
list.add(new Entry(firstname, lastname, number));
}
msg = "The file reg.txt has been opened";
return msg;
} catch (NumberFormatException ne) {
msg = ("Can't find reg.txt");
return msg;
} catch (IOException ie) {
msg = ("Can't find reg.txt");
return msg;
}
}
I receive multiple errors, what's wrong?
Assuming that each line always contains exactly three words instead of split you can simply use Scanners method next three times for each line.
Scanner scanner = new Scanner(new File("file.txt"));
int i = 0;
while (scanner.hasNextLine()) {
String first = scanner.next();
String second = scanner.next();
String last = scanner.next();
//System.out.println(first+": "+second+": "+last);
i++;
}
line.split("\\s+"); // don't use " ". use "\\s+" for more than one whitespace
Assuming the line has 3+ words, use the split(delimiter) method:
String line = ...;
String[] parts = line.split("\\s+"); // Assuming words are separated by whitespaces, use another if required
then you can access to the first, second and last respectively:
String first = parts[0];
String second = parts[1];
String last = parts[parts.length() - 1];
Remember that indexes starts with 0.
String []parts=line.split("\\s+");
System.out.println(parts[0]);
System.out.println(parts[1]);
System.out.println(parts[parts.length-1]);

Categories

Resources