Replace a specific character in a regex match - java

How can I replace a specific character in a Regex match which is present multiple times.
I found this post about it which helped me a lot, but it replaces only 1 character.
I would need it to replace all.
For example from this:
href="https://Link.com/test/Release+This+Has+Some+Pluses
Linkcom//test/Release+This+Has+Some+Pluses
To:
href="https://Link.com/test/Release%20This%20Has%20Some%20Pluses
Link.com/test/Release+This+Has+Some+Pluses
Here is what I got so far
Line.replaceAll("(https://Link.com/test/)(\w+)\+" , "$1%20")
But as I already mentioned. This only replaces one character and not all like this:
href="https://Link.com/display/release5/Release%20This+Has+Some+Pluses
Link.com/test/Release+This+Has+Some+Pluses
How would I replace all?
EDIT
Here is the code snippet from Java:
public class ExchangeLink {
public static void main(String[] args) {
try {
Path path = Paths.get("C:\\Users\\Mati\\Desktop\\test.txt");
Stream<String> lines = Files.lines(path);
List<String> replaced = lines.map(line -> line.replaceAll("(href=\"https://link.com/test/)(\\w+)\\+", "$1$2%20")).collect(Collectors.toList());
Files.write(path, replaced);
lines.close();
System.out.println("Find and Replace done!!!");
} catch (IOException e) {
e.printStackTrace();
}
}
}

Just do it in 2 steps.
Pattern links = Pattern.compile("href=\"https://link.com/test/((\\w+)\\+?)+");
Matcher matcher = links.matcher(line);
while (matcher.find()) {
line = line.replace(matcher.group(), matcher.group().replace("+", "%20"));
}

Related

finding character count between two special symbols

Am trying to find the character count between = and \n new line character using below java code. But \n is not considering in my case.
am using import org.apache.commons.lang3.StringUtils; package
Please find my below java code.
public class CharCountInLine {
public static void main(String[] args)
{
BufferedReader reader = null;
try
{
reader = new BufferedReader(new FileReader("C:\\wordcount\\sample.txt"));
String currentLine = reader.readLine();
String[] line = currentLine.split("=");
while (currentLine != null ){
String res = StringUtils.substringBetween(currentLine, "=", "\n"); // \n is not working.
if(res != null) {
System.out.println("line -->"+res.length());
}
currentLine = reader.readLine();
}
}
catch (IOException e)
{
e.printStackTrace();
}
finally
{
try
{
reader.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
}
Please find my sample text file.
sample.txt
Karthikeyan=123456
sathis= 23546
Arun = 23564
Well, you're reading the string using readLine(), which according to the Javadoc (emphasis mine):
Returns:
A String containing the contents of the line, not including
any line-termination characters, or null if the end of the stream has
been reached
So your code doesn't work because the string does not contain a newline character.
You can address this in a number of ways:
Use StringUtils.substringAfter() instead of StringUtils.substringBetween().
If it meets the requirements, treat your file as a Java properties file so you don't need to parse it yourself.
Use String.split().
Use String.lastIndexOf().
Some simple regex matching and grouping.
You don't need to change how you read the lines, simply change your logic to extract the text after =.
Pattern p = Pattern.compile("(?:.+)=(.+)$");
Matcher m = p.matcher("Karthikeyan=123456");
if (m.find()) {
System.out.println(m.group(1).length());
}
No need for Apache StringUtils either, simple Java regex will do. If you don't want to count whitespace, trim the string before calling length().
Alternatively, you can also split the line around = as discussed here.
10x simpler code:
Path p = Paths.get("C:\\wordcount\\sample.txt");
Files.lines(p)
.forEach { line ->
// Put the above code here
}

Regex for replacing Exact String match [duplicate]

My input:
1. end
2. end of the day or end of the week
3. endline
4. something
5. "something" end
Based on the above discussions, If I try to replace a single string using this snippet, it removes the appropriate words from the line successfully
public class DeleteTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
String delete="end";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+delete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
}
My output If I use the above snippet:(Also my expected output)
1.
2. of the day or of the week
3. endline
4. something
5. "something"
But when I include more words to delete, and for that purpose when I use Set, I use the below code snippet:
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
Set<String> toDelete = new HashSet<>();
toDelete.add("end");
toDelete.add("something");
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+toDelete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
I get my output as: (It just removes the space)
1. end
2. endofthedayorendoftheweek
3. endline
4. something
5. "something" end
Can u guys help me on this?
Click here to follow the thread
You need to create an alternation group out of the set with
String.join("|", toDelete)
and use as
line = line.replaceAll("\\b(?:"+String.join("|", toDelete)+")\\b", "");
The pattern will look like
\b(?:end|something)\b
See the regex demo. Here, (?:...) is a non-capturing group that is used to group several alternatives without creating a memory buffer for the capture (you do not need it since you remove the matches).
Or, better, compile the regex before entering the loop:
Pattern pat = Pattern.compile("\\b(?:" + String.join("|", toDelete) + ")\\b");
...
line = pat.matcher(line).replaceAll("");
UPDATE:
To allow matching whole "words" that may contain special chars, you need to Pattern.quote those words to escape those special chars, and then you need to use unambiguous word boundaries, (?<!\w) instead of the initial \b to make sure there is no word char before and (?!\w) negative lookahead instead of the final \b to make sure there is no word char after the match.
In Java 8, you may use this code:
Set<String> nToDel = new HashSet<>();
nToDel = toDelete.stream()
.map(Pattern::quote)
.collect(Collectors.toCollection(HashSet::new));
String pattern = "(?<!\\w)(?:" + String.join("|", nToDel) + ")(?!\\w)";
The regex will look like (?<!\w)(?:\Q+end\E|\Qsomething-\E)(?!\w). Note that the symbols between \Q and \E is parsed as literal symbols.
The problem is that you're not creating the correct regex for replacing the words in the set.
"\\b"+toDelete+"\\b" will produce this String \b[end, something]\b which is not what you need.
To fix that you can do something like this:
for(String del : toDelete){
line = line.replaceAll("\\b"+del+"\\b", "");
}
What this does is to go through the set, produce a regex from each word and remove that word from the line String.
Another approach will be to produce a single regex from all the words in the set.
Eg:
String regex = "";
for(String word : toDelete){
regex+=(regex.isEmpty() ? "" : "|") + "(\\b"+word+"\\b)";
}
....
line = line.replace(regex, "");
This should produce a regex that looks something like this: (\bend\b)|(\bsomething\b)

How to separate string in text file into different array (java)

I have a text file that consist of string. What i want to do is to separate the string with "[ham]" and the string with "[spam]" inside to the different array, how can i do that, i think about to use regex to recognize the pattern (ham & spam), but i have no idea to start. please help me.
String in text file:
good [ham]
very good [ham]
bad [spam]
very bad [spam]
very bad, very bad [spam]
and i want the output to be like this:
Ham array:
good
very good
Spam array:
bad
very bad
very bad, very bad
Help me please.
Instead of using array I think you should go for ArrayList
List<String> ham=new ArrayList<String>();
List<String> spam=new ArrayList<String>();
if(line.contains("[ham]"))
ham.add(line.substring(0,line.indexOf("[ham]")));
if(line.contains("[spam]"))
spam.add(line.substring(0,line.indexOf("[spam]")));
If you really need do this that way (with regex & array as output) write code like this:
public class StringResolve {
public static void main(String[] args) {
try {
// read data from some source
URL exampleTxt = StringResolve.class.getClassLoader().getResource("me/markoutte/sandbox/_25989334/example.txt");
Path path = Paths.get(exampleTxt.toURI());
List<String> strings = Files.readAllLines(path, Charset.forName("UTF8"));
// init all my patterns & arrays
Pattern ham = getPatternFor("ham");
List<String> hams = new LinkedList<>();
Pattern spam = getPatternFor("spam");
List<String> spams = new LinkedList<>();
// check all of them
for (String string : strings) {
Matcher hamMatcher = ham.matcher(string);
if (hamMatcher.matches()) {
// we choose only text without label here
hams.add(hamMatcher.group(1));
}
Matcher spamMatcher = spam.matcher(string);
if (spamMatcher.matches()) {
// we choose only text without label here
spams.add(spamMatcher.group(1));
}
}
// output data through arrays
String[] hamArray = hams.toArray(new String[hams.size()]);
System.out.println("Ham array");
for (String s : hamArray) {
System.out.println(s);
}
System.out.println();
String[] spamArray = spams.toArray(new String[spams.size()]);
System.out.println("Spam array");
for (String s : spamArray) {
System.out.println(s);
}
} catch (URISyntaxException | IOException e) {
e.printStackTrace();
}
}
private static Pattern getPatternFor(String label) {
// Regex pattern for string with same kind: some text [label]
return Pattern.compile(String.format("(.+?)\\s(\\[%s\\])", label));
}
}
You can use Paths.get("some/path/to/file") if you need to read it from somewhere in your drive.

Cant match Srt subtitle using Regex in Java

In try in this code to parse an srt subtitle:
public class MatchArray {
public static void main(String args[]) {
File file = new File(
"C:/Users/Thiago/workspace/SubRegex/src/Dirty Harry VOST - Clint Eastwood.srt");
{
try {
Scanner in = new Scanner(file);
try {
String contents = in.nextLine();
while (in.hasNextLine()) {
contents = contents + "\n" + in.nextLine();
}
String pattern = "([\\d]+)\r([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})[\\s]*-->[\\s]*([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})\r(([^|\r]+(\r|$))+)";
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(contents);
ArrayList<String> start = new ArrayList<String>();
while (m.find()) {
start.add(m.group(1));
start.add(m.group(2));
start.add(m.group(3));
start.add(m.group(4));
start.add(m.group(5));
start.add(m.group(6));
start.add(m.group(7));
System.out.println(start);
}
}
finally {
in.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
But when i execute it, it dosent capture any group, when try to capture only the time with this pattern:
([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})[\\s]*-->[\\s]*([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})
It works. So how do I make it capture the entire subtitle?
I can not quite understand your need but i thought this can help.
Please try the regex:
(\\d+?)\\s*(\\d+?:\\d+?:\\d+?,\\d+?)\\s+-->\\s+(\\d+?:\\d+?:\\d+?,\\d+?)\\s+(.+)
I tried it on http://www.myregextester.com/index.php and it worked.
I hope this can help.

Parse a string line by opening a file using Regex

This is the below text file(log.txt) I am opening and need to match each line using regular expressions.
Jerty|gas|petrol|2.42
Tree|planet|cigar|19.00
Karie|entertainment|grocery|9.20
So I wrote this regular expressions but it is not getting matched.
public static String pattern = "(.*?)|(.*?)|(.*?)|(.*?)";
public static void main(String[] args) {
File file = new File("C:\\log.txt");
try {
Pattern regex = Pattern.compile(pattern);
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
Matcher m = regex.matcher(line);
if(m.matches()) {
System.out.println(m.group(1));
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Any suggestions will be appreciated.
The | is a special regex symbol which means 'or'. So, you have to escape it.
public static String pattern = "(.*?)\\|(.*?)\\|(.*?)\\|(.*?)";
You can greatly simplify the regex for this. Since the data appears to be pipe-separated, you should just split on the pipe character. You'll end up with an array of fields which can you further parse as needed:
String[] fields = line.split("\\|");

Categories

Resources