In Java, I need to find all occurrences of a String inside of a String.
eg.
String myString;
myString = "Random_XML_Stuff_Here <tag k="name" v="Example Road"/>
More_Random_XML_Stuff <tag k="name" v="Another name"/> More_XML_Stuff" Etc...
So I need to be able go grab the contents off all the Road names. In the first example, I need to be able to set a String to "Example Road".
Pseudo-Code:
String streets = "";
while(more occurrences of street names exist)
{
streets = streets + "," + (street.occurrence of street name);
}
In the above example, the string would have the contents "Example Road, Another name".
You could use something like this String[] parseValue(String) function
public static String[] parseValue(String in) {
String openTag = "<tag k=\"name\" v=\"";
int p1 = in.indexOf(openTag);
java.util.List<String> al = new java.util.ArrayList<String>();
while (p1 > -1) {
int p2 = in.indexOf("\"/>", p1 + 1);
if (p2 > -1) {
al.add(in.substring(p1 + openTag.length(), p2));
} else {
break;
}
p1 = in.indexOf(openTag, p2 + 1);
}
String[] out = new String[al.size()];
return al.toArray(out);
}
public static void main(String[] args) {
String myString = "Random_XML_Stuff_Here <tag k=\"name\" v=\"Example Road\"/> "
+ "More_Random_XML_Stuff <tag k=\"name\" v=\"Another name\"/>";
System.out.println(java.util.Arrays
.toString(parseValue(myString)));
}
which outputs
[Example Road, Another name]
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class welcome {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
String text = "Random_XML_Stuff_Here <tag k=\"name\" v=\"Example Road\"/> More_Random_XML_Stuff <tag k=\"name\" v=\"Another name\"/> More_XML_Stuff";
String road_string = GetRoadString(text);
System.out.println(road_string);
}
static String GetRoadString(String text)
{
Pattern pattern = Pattern.compile("<tag\\s+k=\"name\"\\s+v=\"(.*?)\"/>");
Matcher matcher = pattern.matcher(text);
// using Matcher find(), group(), start() and end() methods
String road_string = "";
while (matcher.find()) {
road_string = road_string + matcher.group(1)+ ",";
}
return road_string.substring(0, road_string.length()-1);
}
}
enter code here
Since you have a fixed XML format, you can use a regular expression match to find the road names mentioned.
Pattern roadNamePattern = Pattern.compile("v=\"(.*?)\"/>");
Matcher matcher = roadNamePattern.matcher(xmlString);
while (matcher.find()) {
String roadName = matcher.group(1);
}
ANswering following question using python:
inside a string find all characters with its occurrences eg:
I/P str='aababbdf'
O/P: [('a', 2), ('b', 1), ('a', 1), ('b', 2), ('d', 1), ('f', 1)]
i=j=total=0;
data=[];
for i in range (0,len(str)):
count=0;
i = j;
if( total != len(str)):
for j in range (i,len(str)):
if str[i] == str[j]:
count=count+1;
total = total +1;
char=str[i];
else:
break;
data.append((char,count));
print("\n data",data);
Related
I am writing a spell checker that takes a text file as input and outputs the file with spelling corrected.
The program should preserve formatting and punctuation.
I want to split the input text into a list of string tokens such that each token is either 1 or more: word, punctuation, whitespace, or digit characters.
For example:
Input:
words.txt:
asdf don't ]'.'..;'' as12....asdf.
asdf
Input as list:
["asdf" , " " , "don't" , " " , "]'.'..;''" , " " , "as" , "12" ,
"...." , "asdf" , "." , "\n" , "asdf"]
Words like won't and i'll should be treated as a single token.
Having the data in this format would allow me to process the tokens like so:
String output = "";
for(String token : tokens) {
if(isWord(token)) {
if(!inDictionary(token)) {
token = correctSpelling(token);
}
}
output += token;
}
So my main question is how can i split a string of text into a list of substrings as described above? Thank you.
The main difficulty here would be to find the regex that matches what you consider to be a "word". For my example I consider ' to be part of a word if it's proceeded by a letter or if the following character is a letter:
public static void main(String[] args) {
String in = "asdf don't ]'.'..;'' as12....asdf.\nasdf";
//The pattern:
Pattern p = Pattern.compile("[\\p{Alpha}][\\p{Alpha}']*|'[\\p{Alpha}]+");
Matcher m = p.matcher(in);
//If you want to collect the words
List<String> words = new ArrayList<String>();
StringBuilder result = new StringBuilder();
Now find something from the start
int pos = 0;
while(m.find(pos)) {
//Add everything from starting position to beginning of word
result.append(in.substring(pos, m.start()));
//Handle dictionary logig
String token = m.group();
words.add(token); //not used actually
if(!inDictionary(token)) {
token = correctSpelling(token);
}
//Add to result
result.append(token);
//Repeat from end position
pos = m.end();
}
//Append remainder of input
result.append(in.substring(pos));
System.out.println("Result: " + result.toString());
}
Because I like solving puzzles, I tried the following and I think it works fine:
public class MyTokenizer {
private final String str;
private int pos = 0;
public MyTokenizer(String str) {
this.str = str;
}
public boolean hasNext() {
return pos < str.length();
}
public String next() {
int type = getType(str.charAt(pos));
StringBuilder sb = new StringBuilder();
while(hasNext() && (str.charAt(pos) == '\'' || type == getType(str.charAt(pos)))) {
sb.append(str.charAt(pos));
pos++;
}
return sb.toString();
}
private int getType(char c) {
String sc = Character.toString(c);
if (sc.matches("\\d")) {
return 0;
}
else if (sc.matches("\\w")) {
return 1;
}
else if (sc.matches("\\s")) {
return 2;
}
else if (sc.matches("\\p{Punct}")) {
return 3;
}
else {
return 4;
}
}
public static void main(String... args) {
MyTokenizer mt = new MyTokenizer("asdf don't ]'.'..;'' as12....asdf.\nasdf");
while(mt.hasNext()) {
System.out.println(mt.next());
}
}
}
I have couple of similar strings. I want to extract the numbers from them, add the numbers and convert it back to the same string format.
And the logic should be generic, i.e., it should work for any given strings.
Example:
String s1 = "1/9"; String s2 = "12/4"; The total of the above two Strings should be "13/13" (String again)
I know how to extract numbers from any given String. I referred: How to extract numbers from a string and get an array of ints?
But I don't know how to put them up back again to the same String format.
Can any one please help me over this?
Note: the string format can be anything, I have just taken an example for explanation.
Take a look at this:
public class StringTest {
public static void main(String[] args) {
String divider = "/";
String s1 = "1/9";
String s2 = "12/4";
String[] fragments1 = s1.split(divider);
String[] fragments2 = s2.split(divider);
int first = Integer.parseInt(fragments1[0]);
first += Integer.parseInt(fragments2[0]);
int second = Integer.parseInt(fragments1[1]);
second += Integer.parseInt(fragments2[1]);
String output = first + divider + second;
System.out.println(output);
}
}
The code prints:
13/13
Using a regex (and Markus' code)
public class StringTest {
public static void main(String[] args) {
String s1 = "1/9";
String s2 = "12&4";
String[] fragments1 = s1.split("[^\\d]");
String[] fragments2 = s2.split("[^\\d]");
int first = Integer.parseInt(fragments1[0]);
first += Integer.parseInt(fragments2[0]);
int second = Integer.parseInt(fragments1[1]);
second += Integer.parseInt(fragments2[1]);
String output = first + divider + second;
System.out.println(output);
}
}
You should be able to get from here to joining back from an array. If you're getting super fancy, you'll need to use regular expression capture groups and store the captured delimiters somewhere.
First, split your strings into matches and non-matches:
public static class Token {
public final String text;
public final boolean isMatch;
public Token(String text, boolean isMatch) {
this.text = text;
this.isMatch = isMatch;
}
#Override
public String toString() {
return text + ":" + isMatch;
}
}
public static List<Token> tokenize(String src, Pattern pattern) {
List<Token> tokens = new ArrayList<>();
Matcher matcher = pattern.matcher(src);
int last = 0;
while (matcher.find()) {
if (matcher.start() != last) {
tokens.add(new Token(src.substring(last, matcher.start()), false));
}
tokens.add(new Token(src.substring(matcher.start(), matcher.end()), true));
last = matcher.end();
}
if (last < src.length()) {
tokens.add(new Token(src.substring(last), false));
}
return tokens;
}
Once this is done, you can create lists you can iterate over and process.
For example, this code:
Pattern digits = Pattern.compile("\\d+");
System.out.println(tokenize("1/2", digits));
...outputs:
[1:true, /:false, 2:true]
Damn quick and dirty not relying on knowing which separator is used. You have to make sure, m1.group(2) and m2.group(2) are equal (which represents the separator).
public static void main(String[] args) {
String s1 = "1/9";
String s2 = "12/4";
Matcher m1 = Pattern.compile("(\\d+)(.*)(\\d+)").matcher(s1);
Matcher m2 = Pattern.compile("(\\d+)(.*)(\\d+)").matcher(s2);
m1.matches(); m2.matches();
int sum1 = parseInt(m1.group(1)) + parseInt(m2.group(1));
int sum2 = parseInt(m2.group(3)) + parseInt(m2.group(3));
System.out.printf("%s%s%s\n", sum1, m1.group(2), sum2);
}
Consider function:
public String format(int first, int second, String separator){
return first + separator + second;
}
then:
System.out.println(format(6, 13, "/")); // prints "6/13"
Thanks #remus. Reading your logic I was able to build the following code. This code solves the problem for any given strings having same format.
public class Test {
public static void main(String[] args) {
ArrayList<Integer> numberList1 = new ArrayList<Integer>();
ArrayList<Integer> numberList2 = new ArrayList<Integer>();
ArrayList<Integer> outputList = new ArrayList<Integer>();
String str1 = "abc 11:4 xyz 10:9";
String str2 = "abc 9:2 xyz 100:11";
String output = "";
// Extracting numbers from the two similar string
Pattern p1 = Pattern.compile("-?\\d+");
Matcher m = p1.matcher(str1);
while (m.find()) {
numberList1.add(Integer.valueOf(m.group()));
}
m = p1.matcher(str2);
while (m.find()) {
numberList2.add(Integer.valueOf(m.group()));
}
// Numbers extracted. Printing them
System.out.println("List1: " + numberList1);
System.out.println("List2: " + numberList2);
// Adding the respective indexed numbers from both the lists
for (int i = 0; i < numberList1.size(); i++) {
outputList.add(numberList1.get(i) + numberList2.get(i));
}
// Printing the summed list
System.out.println("Output List: " + outputList);
// Splitting string to segregate numbers from text and getting the format
String[] template = str1.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
// building the string back using the summed list and format
int counter = 0;
for (String tmp : template) {
if (Test.isInteger(tmp)) {
output += outputList.get(counter);
counter++;
} else {
output += tmp;
}
}
// Printing the output
System.out.println(output);
}
public static boolean isInteger(String s) {
try {
Integer.parseInt(s);
} catch (NumberFormatException e) {
return false;
}
return true;
}
}
output:
List1: [11, 4, 10, 9]
List2: [9, 2, 100, 11]
Output List: [20, 6, 110, 20]
abc 20:6 xyz 110:20
I just wrote my program in C# but I want rewrite it in Java. I want create spintax text.
My C# code:
static string spintax(Random rnd, string str)
{
// Loop over string until all patterns exhausted.
string pattern = "{[^{}]*}";
Match m = Regex.Match(str, pattern);
while (m.Success)
{
// Get random choice and replace pattern match.
string seg = str.Substring(m.Index + 1, m.Length - 2);
string[] choices = seg.Split('|');
str = str.Substring(0, m.Index) + choices[rnd.Next(choices.Length)] + str.Substring(m.Index + m.Length);
m = Regex.Match(str, pattern);
}
// Return the modified string.
return str;
}
I've Updated My Code to
static String Spintax(Random rnd,String str)
{
String pat = "\\{[^{}]*\\}";
Pattern ma;
ma = Pattern.compile(pat);
Matcher mat = ma.matcher(str);
while(mat.find())
{
String segono = str.substring(mat.start() + 1,mat.end() - 1);
String[] choies = segono.split("\\|",-1);
str = str.substring(0, mat.start()) + choies[rnd.nextInt(choies.length)].toString() + str.substring(mat.start()+mat.group().length());
mat = ma.matcher(str);
}
return str;
}
works like a charm :D thanks all for your support..
You need to escape the brackets
String pat = "\\{[^{}]*\\}";
I have a string eg : DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67
Now I want to split the string and create a map as
DIGITAL SPORTS $8.95
HD AO $9.95
UCC REC $1.28
RENTAL FEE $7.00
LOCAL FRANCHISE $4.67
I wrote a regular expression to split the string. Please find below piece of code
private static String ledgerString = "DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67";
private static Pattern pattern1 = Pattern.compile("([[a-zA-Z ]*\\$[0-9]*.[0-9][0-9]]*)");
private static Matcher matcher = null;
public static void main(String[] args) {
// TODO Auto-generated method stub
matcher = pattern1.matcher(ledgerString.trim());
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
could you please some one help me how to extract the data from the above string
Your pattern in group 1 is in character class [...] which is probably now that you ware trying to do. Maybe change your pattern to
Pattern.compile("([a-zA-Z ]*)(\\$[0-9]*.[0-9][0-9]*)");
and use it like this
while (matcher.find()) {
System.out.println(matcher.group(1)+" "+matcher.group(2));
}
Also since Java7 you can name groups (?<name>...) so this is also possible
Pattern.compile("(?<name>[a-zA-Z ]*)(?<price>\\$[0-9]*.[0-9][0-9]*)");
while (matcher.find()) {
System.out.println(matcher.group("name")+" "+matcher.group("price"));
}
Output
DIGITAL SPORTS $8.95
HD AO $9.95
UCC REC $1.28
RENTAL FEE $7.00
LOCAL FRANCHISE $4.67
private static String ledgerString = "DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67";
private static Pattern pattern1 = Pattern.compile("([a-zA-Z ]+)(\\$[0-9]*\\.[0-9][0-9])");
private static Matcher matcher = null;
public static void main(String[] args) {
// TODO Auto-generated method stub
matcher = pattern1.matcher(ledgerString.trim());
while (matcher.find()) {
System.out.println(matcher.group(1) + " " + matcher.group(2));
}
}
Try this:
The Regex: (?:(.+?)(\$\d*(?:\.\d+)?))
String regex = "(?:(.+?)(\\$\\d*(?:\\.\\d+)?))";
Demo
Maybe you could replace all occurrences of the "$" symbol with ",$" (comma dollar) symbol. After which you may split it using "," (comma). Do something like:
ledgerString = ledgerString.replaceAll("$", ",$");
String[] tokens = ledgerString.split(",");
The regex you want to use is one that matches each String that you are interested in. Therefore you want to use
Pattern.compile("([a-zA-Z]\$[0-9].[0-9][0-9])");
as this identifies each 'line' you're interested in. You can then use split("$") on each line to separate description from price.
Here is, yet another way of doing that:
String mainString = "DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67";
String[] splittedArray = mainString.split("[0-9][A-Z]");
int currentLength = 0;
for(int i =0; i < splittedArray.length; i++) {
String splitedString;
if(i == 0) {
char endChar = mainString.charAt(splittedArray[i].length());
splitedString = splittedArray[i] + endChar;
currentLength += splittedArray[i].length();
}
else if(i == splittedArray.length -1){
char beginChar = mainString.charAt(currentLength + 1);
splitedString = beginChar + splittedArray[i];
}
else {
char beginChar = mainString.charAt(currentLength + 1);
char endChar = mainString.charAt(currentLength+splittedArray[i].length()+2);
splitedString = beginChar + splittedArray[i] + endChar;
currentLength += splittedArray[i].length()+2;
}
System.out.println(splitedString);
}
So say I have a string called x that = "Hello world". I want to somehow make it so that it will flip those two words and instead display "world Hello". I am not very good with loops or arrays and obviously am a beginner. Could I accomplish this somehow by splitting my string? If so, how? If not, how could I do this? Help would be appreciated, thanks!
1) split string into String array on space.
String myArray[] = x.split(" ");
2) Create new string with words in reverse order from array.
String newString = myArray[1] + " " + myArray[0];
Bonus points for using a StringBuilder instead of concatenation.
String abc = "Hello world";
String cba = abc.replace( "Hello world", "world Hello" );
abc = "This is a longer string. Hello world. My String";
cba = abc.replace( "Hello world", "world Hello" );
If you want, you can explode your string as well:
String[] pieces = abc.split(" ");
for( int i=0; i<pieces.length-1; ++i )
if( pieces[i]=="Hello" && pieces[i+1]=="world" ) swap(pieces[i], pieces[i+1]);
There are many other ways you can do it too. Be careful for capitalization. You can use .toUpperCase() in your if statements and then make your matching conditionals uppercase, but leave the results with their original capitalization, etc.
Here's the solution:
import java.util.*;
public class ReverseWords {
public String reverseWords(String phrase) {
List<String> wordList = Arrays.asList(phrase.split("[ ]"));
Collections.reverse(wordList);
StringBuilder sbReverseString = new StringBuilder();
for(String word: wordList) {
sbReverseString.append(word + " ");
}
return sbReverseString.substring(0, sbReverseString.length() - 1);
}
}
The above solution was coded by me, for Google Code Jam and is also blogged here: Reverse Words - GCJ 2010
Just use this method, call it and pass the string that you want to split out
static String reverseWords(String str) {
// Specifying the pattern to be searched
Pattern pattern = Pattern.compile("\\s");
// splitting String str with a pattern
// (i.e )splitting the string whenever their
// is whitespace and store in temp array.
String[] temp = pattern.split(str);
String result = "";
// Iterate over the temp array and store
// the string in reverse order.
for (int i = 0; i < temp.length; i++) {
if (i == temp.length - 1) {
result = temp[i] + result;
} else {
result = " " + temp[i] + result;
}
}
return result;
}
Depending on your exact requirements, you may want to split on other forms of whitespace (tabs, multiple spaces, etc.):
static Pattern p = Pattern.compile("(\\S+)(\\s+)(\\S+)");
public String flipWords(String in)
{
Matcher m = p.matcher(in);
if (m.matches()) {
// reverse the groups we found
return m.group(3) + m.group(2) + m.group(1);
} else {
return in;
}
}
If you want to get more complex see the docs for Pattern http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
Try something as follows:
String input = "how is this";
List<String> words = Arrays.asList(input.split(" "));
Collections.reverse(words);
String result = "";
for(String word : words) {
if(!result.isEmpty()) {
result += " ";
}
result += word;
}
System.out.println(result);
Output:
this is how
Too much?
private static final Pattern WORD = Pattern.compile("^(\\p{L}+)");
private static final Pattern NUMBER = Pattern.compile("^(\\p{N}+)");
private static final Pattern SPACE = Pattern.compile("^(\\p{Z}+)");
public static String reverseWords(final String text) {
final StringBuilder sb = new StringBuilder(text.length());
final Matcher wordMatcher = WORD.matcher(text);
final Matcher numberMatcher = NUMBER.matcher(text);
final Matcher spaceMatcher = SPACE.matcher(text);
int offset = 0;
while (offset < text.length()) {
wordMatcher.region(offset, text.length());
numberMatcher.region(offset, text.length());
spaceMatcher.region(offset, text.length());
if (wordMatcher.find()) {
final String word = wordMatcher.group();
sb.insert(0, reverseCamelCase(word));
offset = wordMatcher.end();
} else if (numberMatcher.find()) {
sb.insert(0, numberMatcher.group());
offset = numberMatcher.end();
} else if (spaceMatcher.find()) {
sb.insert(0, spaceMatcher.group(0));
offset = spaceMatcher.end();
} else {
sb.insert(0, text.charAt(offset++));
}
}
return sb.toString();
}
private static final Pattern CASE_REVERSAL = Pattern
.compile("(\\p{Lu})(\\p{Ll}*)(\\p{Ll})$");
private static String reverseCamelCase(final String word) {
final StringBuilder sb = new StringBuilder(word.length());
final Matcher caseReversalMatcher = CASE_REVERSAL.matcher(word);
int wordEndOffset = word.length();
while (wordEndOffset > 0 && caseReversalMatcher.find()) {
sb.insert(0, caseReversalMatcher.group(3).toUpperCase());
sb.insert(0, caseReversalMatcher.group(2));
sb.insert(0, caseReversalMatcher.group(1).toLowerCase());
wordEndOffset = caseReversalMatcher.start();
caseReversalMatcher.region(0, wordEndOffset);
}
sb.insert(0, word.substring(0, wordEndOffset));
return sb.toString();
}