I am trying to convert a text document to shorthand, without using any of the replace() methods in java. One of the strings I am converting is "the" to "&". The problem is, that I do not know the substring of each word that contains the "the" string. So how do I replace that part of a string without using the replace() method?
Ex: "their" would become "&ir", "together" would become "toge&r"
This is what I have started with,
String the = "the";
Scanner wordScanner = new Scanner(word);
if (wordScanner.contains(the)) {
the = "&";
}
I am just not sure how to go about the replacement.
You could try this :
String word = "your string with the";
word = StringUtils.join(word.split("the"),"&");
Scanner wordScanner = new Scanner(word);
I do not get your usage of Scanner for this, but you can read each character into a buffer (StringBuilder) until you read "the" into the buffer. Once you've done that, you can delete the word and then append the word you want to replace with.
public static void main(String[] args) throws Exception {
String data = "their together the them forever";
String wordToReplace = "the";
String wordToReplaceWith = "&";
Scanner wordScanner = new Scanner(data);
// Using this delimiter to get one character at a time from the scanner
wordScanner.useDelimiter("");
StringBuilder buffer = new StringBuilder();
while (wordScanner.hasNext()) {
buffer.append(wordScanner.next());
// Check if the word you want to replace is in the buffer
int wordToReplaceIndex = buffer.indexOf(wordToReplace);
if (wordToReplaceIndex > -1) {
// Delete the word you don't want in the buffer
buffer.delete(wordToReplaceIndex, wordToReplaceIndex + wordToReplace.length());
// Append the word to replace the deleted word with
buffer.append(wordToReplaceWith);
}
}
// Output results
System.out.println(buffer);
}
Results:
&ir toge&r & &m forever
This can be done without a Scanner using just a while loop and StringBuilder
public static void main(String[] args) throws Exception {
String data = "their together the them forever";
StringBuilder buffer = new StringBuilder(data);
String wordToReplace = "the";
String wordToReplaceWith = "&";
int wordToReplaceIndex = -1;
while ((wordToReplaceIndex = buffer.indexOf(wordToReplace)) > -1) {
buffer.delete(wordToReplaceIndex, wordToReplaceIndex + wordToReplace.length());
buffer.insert(wordToReplaceIndex, wordToReplaceWith);
}
System.out.println(buffer);
}
Results:
&ir toge&r & &m forever
You can use Pattern and Matcher Regex:
Pattern pattern = Pattern.compile("the ");
Matcher matcher = pattern.matcher("the cat and their owners");
StringBuffer sb = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(sb, "& ");
}
matcher.appendTail(sb);
System.out.println(sb.toString());
Related
I am trying to write a method that accepts an input string to be found and an input string to replace all instances of the found word and to return the number of replacements made. I am trying to use pattern and matcher from JAVA regex. I have a text file called "text.txt" which includes "this is a test this is a test this is a test". When I try to search for "test" and replace it with "mess", the method returns 1 each time and none of the words test are replaced.
public int findAndRepV2(String word, String replace) throws FileNotFoundException, IOException
{
int cnt = 0;
BufferedReader input = new BufferedReader( new FileReader(this.filename));
Writer fw = new FileWriter("test.txt");
String line = input.readLine();
while (line != null)
{
Pattern pattern = Pattern.compile(word, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {matcher.replaceAll(replace); cnt++;}
line = input.readLine();
}
fw.close();
return cnt;
}
First, you need to ensure that the text you are searching for is not interpreted as a regex. You should do:
Pattern pattern = Pattern.compile(Pattern.quote(word), Pattern.CASE_INSENSITIVE);
Second, replaceAll does something like this:
public String replaceAll(String replacement) {
reset();
boolean result = find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
appendReplacement(sb, replacement);
result = find();
} while (result);
appendTail(sb);
return sb.toString();
}
return text.toString();
}
Note how it calls find until it can't find anything. This means that your loop will only be run once, since after the first call to replaceAll, the matcher has already found everything.
You should use appendReplacement instead:
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(buffer, replace);
cnt++;
}
buffer.append(line.substring(matcher.end()));
// "buffer" contains the string after the replacement
I noticed that in your method, you didn't actually do anything with the string after the replacement. If that's the case, just count how many times find returns true:
while (matcher.find()) {
cnt++;
}
I want to find a special charsequence in a file and I want to read the whole line where the occurrences are.
The following code just checks the first line and fetchess this ( the first ) line.
How can I fix it?
Scanner scanner = new Scanner(file);
String output = "";
output = output + scanner.findInLine(pattern) + scanner.next();
pattern and file are parameters
UPDATED ANSWER according to the comments on this very answer
In fact, what is used is Scanner#findWithHorizon, which in fact calls the Pattern#compile method with a set of flags (Pattern#compile(String, int)).
The result seems to be applying this pattern over and over again in the input text over lines of a file; and this supposes of course that a pattern cannot match multiple lines at once.
Therefore:
public static final String findInFile(final Path file, final String pattern,
final int flags)
throws IOException
{
final StringBuilder sb = new StringBuilder();
final Pattern p = Pattern.compile(pattern, flags);
String line;
Matcher m;
try (
final BufferedReader br = Files.newBufferedReader(path);
) {
while ((line = br.readLine()) != null) {
m = p.matcher(line);
while (m.find())
sb.append(m.group());
}
}
return sb.toString();
}
For completeness I should add that I have developed some time ago a package which allows a text file of arbitrary length to be read as a CharSequence and which can be used to great effect here: https://github.com/fge/largetext. It would work beautifully here since a Matcher matches against a CharSequence, not a String. But this package needs some love.
One example returning a List of matching strings in a file can be:
private static List<String> findLines(final Path path, final String pattern)
throws IOException
{
final Predicate<String> predicate = Pattern.compile(pattern).asPredicate();
try (
final Stream<String> stream = Files.lines(path);
) {
return stream.filter(predicate).collect(Collectors.toList());
}
}
I'm doing a million different regex replacements of a string. Thus I decided to save all String regex's and String replacements in a file.txt. I tried reading the file line by line and replacing it but it is not working.
replace_regex_file.txt
aaa zzz
^cc eee
ww$ sss
...
...
...
...
a million data
Coding
String user_input = "assume 100,000 words"; // input from user
String regex_file = "replace_regex_file.txt";
String result="";
String line;
try (BufferedReader reader = new BufferedReader(new FileReader(regex_file)) {
while ((line = reader.readLine()) != null) { // while line not equal null
String[] parts = line.split("\\s+", 2); //split process
if (parts.length >=2) {
String regex = parts[0]; // String regex stored in first array
String replace = parts[1]; // String replacement stored in second array
result = user_input.replaceAll(regex, replace); // replace processing
}
}
} System.out.println(result); // show the result
But it does not replace anything. How can I fix this?
Your current code will only apply the last matching regex, because you don't assign the result of the replacement back to the input string:
result = user_input.replaceAll(regex, replace);
Instead, try:
String result = user_input;
outside the loop and
result = result.replaceAll(regex, replace);
I was trying to tokenize an input file from sentences into tokens(words).
For example,
"This is a test file." into five words "this" "is" "a" "test" "file", omitting the punctuations and the white spaces. And store them into an arraylist.
I tried to write some codes like this:
public static ArrayList<String> tokenizeFile(File in) throws IOException {
String strLine;
String[] tokens;
//create a new ArrayList to store tokens
ArrayList<String> tokenList = new ArrayList<String>();
if (null == in) {
return tokenList;
} else {
FileInputStream fStream = new FileInputStream(in);
DataInputStream dataIn = new DataInputStream(fStream);
BufferedReader br = new BufferedReader(new InputStreamReader(dataIn));
while (null != (strLine = br.readLine())) {
if (strLine.trim().length() != 0) {
//make sure strings are independent of capitalization and then tokenize them
strLine = strLine.toLowerCase();
//create regular expression pattern to split
//first letter to be alphabetic and the remaining characters to be alphanumeric or '
String pattern = "^[A-Za-z][A-Za-z0-9'-]*$";
tokens = strLine.split(pattern);
int tokenLen = tokens.length;
for (int i = 1; i <= tokenLen; i++) {
tokenList.add(tokens[i - 1]);
}
}
}
br.close();
dataIn.close();
}
return tokenList;
}
This code works fine except I found out that instead of make a whole file into several words(tokens), it made a whole line into a token. "area area" becomes a token, instead of "area" appeared twice. I don't see the error in my codes. I believe maybe it's something wrong with my trim().
Any valuable advices is appreciated. Thank you so much.
Maybe I should use scanner instead?? I'm confused.
I think Scanner is more approprate for this task. As to this code, you should fix regex, try "\\s+";
Try pattern as String pattern = "[^\\w]"; in the same code
I'm trying to take a file that store data of this form:
Name=”Biscuit”
LatinName=”Retrieverus Aurum”
ImageFilename=”Biscuit.png”
DNA=”ITAYATYITITIAAYI”
and read it with a regex to locate the useful information; namely, the fields and their contents.
I have created the regex already, but I can only seem to get one match at any given time, and would like instead to put each of the matches from each line in the file in their own index of a string.
Here's what I have so far:
Scanner scanFile = new Scanner(file);
while (scanFile.hasNextLine()){
System.out.println(scanFile.findInLine(".*"));
scanFile.nextLine();
}
MatchResult result = null;
scanFile.findInLine(Constants.ANIMAL_INFO_REGEX);
result = scanFile.match();
for (int i=1; i<=result.groupCount(); i++){
System.out.println(result.group(i));
System.out.println(result.groupCount());
}
scanFile.close();
MySpecies species = new MySpecies(null, null, null, null);
return species;
Thanks so much for your help!
I hope I understand your question correctly... Here is an example that is coming from the Oracle website:
/*
* This code writes "One dog, two dogs in the yard."
* to the standard-output stream:
*/
import java.util.regex.*;
public class Replacement {
public static void main(String[] args)
throws Exception {
// Create a pattern to match cat
Pattern p = Pattern.compile("cat");
// Create a matcher with an input string
Matcher m = p.matcher("one cat," +
" two cats in the yard");
StringBuffer sb = new StringBuffer();
boolean result = m.find();
// Loop through and create a new String
// with the replacements
while(result) {
m.appendReplacement(sb, "dog");
result = m.find();
}
// Add the last segment of input to
// the new String
m.appendTail(sb);
System.out.println(sb.toString());
}
}
Hope this helps...