how to parse text files, where order matters - java

I'm having trouble determining a way to parse a given text file.
Here is an entry in the file:
type = "book"
callnumber = "1"
authors = "a"
title = "t"
publisher = "p"
year = "2023"
each entry is separated by a line of whitespace (newline character).
so i have these variables (type, callnumber, authors, title....), and need to read this text and determine what values to set them to. For example, when i read the line "callnumber = 1", then I need set that variable to 1.
This is what I have so far. I read in a line at a time, so type = "book" for example, and then I split that line into an array of strings, with the delimiter being ", so the array would contain type = and book .
Now my problem comes in going further from there. I figured I could cycle through each string in the array, character by character, until I hit whitespace. So i would have type, but I don't have any data yet to store in type, and the grab will give me book (ignoring the = and whitespace), but how can I attribute book to type?
In summary, I'm looking for a way to parse a text file line by line, and assign variables values, based on the words I find.
Thanks.

Ignoring the current route, why not make use of Properties.load(InputStream inputStream)
Properties properties = new Properties();
properties.load(new FileInputStream("filename"));
string type = properties.getProperty("type");
System.out.println(type);
book

I agree you should take the Properties route if your requirements allow you to. The next best option would be to deal with each line individually through a regular expression.
String type = "default";
int callnumber = 0;
String line = "type = \"book\"";
// String line = "callnumber = \"1\"";
Pattern linePattern = Pattern.compile("(\\w*) = \"(.*)\"");
Matcher matcher = linePattern.matcher(line);
if ( !matcher.matches() ) {
System.err.println("Bad line");
}
String name = matcher.group(1);
String value = matcher.group(2);
if ( "type".equals(name) ) {
type = value;
} else if ( "callnumber".equals(name) ) {
callnumber = Integer.parseInt(value);
} //...
In your case you would want to integrate this into your while loop that reads from the file, and replace line with the line you've just read from the file.

To add to Aaron's solution:
Properties.load(new FileInputStream("<fileName>"));
will load the properties and to get any particular property,
use
for example,
Properties.getProperty("type")
will give you string "book".

Is the order of the variables in the text file always going to be the same?
I'm guessing you wouldn't be asking if that was the case.
Why not just make a method:
void assignVariableByName(String name, <type> value) {
if(name.contains("type"))
type = value;
else if(name.contains("callnumber"))
callnumber = value;
}
Then usage ->
You have the array of strings you split... and you call
assignVariableByName(parsedLine[0],parsedLine[1]);

Assigning values to variables has probably been done elsewhere more cleanly. If you want to 'tokenize' your string however, use a string tokenizer.
The new school is to use the split method of the String class.
token[] = line.split("\s++")
http://download.oracle.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String)
Below is the old school way:
http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
While(String line = someInput.readLine())
StringTokenizer st = new StringTokenizer(line)
while(st.hasMoreTokens)
{
String token = st.nextToken()
//branch on token command, skip token '=', and assign on values
}

Related

Java String to parse with different parameters

Need to parse a string having format like this -
"xxx.xxx.xxx.xxx ProxyHost=prod.loacl.com ProxyUser=test ProxyPas=tes#123 ProxyPort=1809".
Need to split or parse in such a manner that I get "prod.loacl.com" "test" "tes#123" "1809" in some strings and if any of parameters is not defined like ProxyPas then it should be null.
We need to ignore the IP addr xxx.xxx.xxx.xxx it will be always concatenated.
Do we have split or use some list to get this done...which is the best possible way to extract this information and why?
Note: Input string can change except ProxyHost parameter, user may not input the ProxyPass etc.
If you assume that format of the input string will not change, you can do something like this:
string inputString = "xxx.xxx.xxx.xxx ProxyHost=prod.loacl.com ProxyUser=test ProxyPas=tes#123 ProxyPort=1809";
string[] eachPart = inputString.Split(" ");
for(int i = 1; i < eachPart.Length; i++) // Skip the IP address
{
string[] partData = eachPart[i].Split("=");
string dataName = partData[0];
string dataValue = partData[1];
// do something with dataName and dataValue
}
However, if input string can change its format you should add some additional logic to this code.
Use regex with groups for this, sample:
var myString = "xxx.xxx.xxx.xxx ProxyHost=prod.loacl.com ProxyUser=test ProxyPas=tes123 ProxyPort=1809";
var regex = new Regex(#"ProxyHost=([^\s]+) ProxyUser=([^\s]+) ProxyPas=([^\s]+) ProxyPort=(\d+)");
var match = regex.Match(myString);
while(match != null && match.Success)
{
int i = 0;
foreach(var group in match.Groups)
{
Console.WriteLine($"Group {i}: Value:'{group}'");
i++;
}
match = match.NextMatch();
}
now you can match the groups to your properties.
One of the possible approaches is to do this Regular Expression:
([^=]+?)\=((\"[^"]+?\")|([^ ]+))
on the whole string. This allows variable input like this:
variable="this has spaces but still is recognized as one"
Problem is that seems like the variable content will be in either 3rd or 4th Group of such match, according to online regex testers, depends on if it has quotes or simply one string - must have more elegant way to do this, but can't come up with any now.
You can check this document to understand more about C#'s regexp groups:
Match.Groups
You will have to deal with null inputs accordingly, when you are putting the content into your C# variable.

Extracting digits in the middle of a string using delimiters

String ccToken = "";
String result = "ssl_transaction_type=CCGETTOKENssl_result=0ssl_token=4366738602809990ssl_card_number=41**********9990ssl_token_response=SUCCESS";
String[] elavonResponse = result.split("=|ssl");
for (String t : elavonResponse) {
System.out.println(t);
}
ccToken = (elavonResponse[6]);
System.out.println(ccToken);
I want to be able to grab a specific part of a string and store it in a variable. The way I'm currently doing it, is by splitting the string and then storing the value of the cell into my variable. Is there a way to specify that I want to store the digits after "ssl_token="?
I want my code to be able to obtain the value of ssl_token without having to worry about changes in the string that are not related to the token since I wont have control over the string. I have searched online but I can't find answers for my specific problem or I maybe using the wrong words for searching.
You can use replaceAll with this regex .*ssl_token=(\\d+).* :
String number = result.replaceAll(".*ssl_token=(\\d+).*", "$1");
Outputs
4366738602809990
You can do it with regex. It would probably be better to change the specifications of the input string so that each key/value pair is separated by an ampersand (&) so you could split it (similar to HTTP POST parameters).
Pattern p = Pattern.compile(".*ssl_token=([0-9]+).*");
Matcher m = p.matcher(result);
if(m.matches()) {
long token = Long.parseLong(m.group(1));
System.out.println(String.format("token: [%d]", token));
} else {
System.out.println("token not found");
}
Search index of ssl_token. Create substring from that index. Convert substring to number. To number can extract number when it is at the beggining of the string.

Replace parts of a string in Java

I need to replace parts of a string by looking up the System properties.
For example, consider the string It was {var1} beauty killed {var2}
I need to parse the string, and replace all the words contained within the parenthesis by looking up their value in System properties. If System.getProperty() returns null, then simply replace with empty character. This is pretty straightforward when I know the variables well ahead. But the string that I need to parse is not defined ahead. I wouldn't know how many number of variables are in the string and what the variable names are. Assuming a simple, well formatted string (no nested parenthesis, open - close matches), what is the simplest or the most elegant way to parse through the string and replace all the character sequences that are enclosed in the parenthesis?
Only solution I could come up with is to traverse the string from the first character, note down the positions of the start and end positions of the parenthesis, replace the string between them, and then continue until reaching the end of the string. Is there simpler way to do this?
You can use the parentheses to break the initial string into substrings, and then replace every other substring.
String[] substituteValues = {"the", "str", "other", "another"};
int substituteValuesIndex = 0;
String test = "Here is {var1} string called {var2}";
// split the string up into substrings
test = test.replaceAll("\\}", "\\{");
String[] splitString = test.split("\\{");
// now sub in your values
for (int k=1; k < splitString.length; k = k+2) {
splitString[k] = substituteValues[substituteValuesIndex];
substituteValuesIndex++;
}
String result = "";
for (String s : splitString) {
result = result + s;
}

How to replace matched character String in java?

I have a question about how to replace String when matched character found. In this case, i read java file that contains variable which marked with underscore. Here the java file:
public int[][] initArray(int rows, int cols, int init_value)
{
int[][] _bb = (int[][])null;
if ((rows > 1) && (cols > 1)) {
_bb = new int[rows][cols];
for (int _ii = 0; _ii < rows; _ii++) {
for (int _ee = 0; _ee < cols; _ee++) {
_bb[_ii][_ee] = init_value;
}
}
} else {
warning("Array length must be greater than zero!");
}
return _bb;
}
All of variable that contain underscore will be replaced with generate string. Well, then this is the code that i have used to read that file and replace matched string:
HashMap<String, String> map = new HashMap<String, String>();
if (line.contains(" _") && line.contains(";")) {
String get = varname(line);
RandomString r = new RandomString();
String[] split = get.split("\\s+");
String gvarname = split[0];
ss = "_"+gvarname;
map.put(ss, "l"+r.generateRandomString());
for(String key: map.keySet()){
if(line.contains(key)){
line = line.replace(key, map.get(key));
}
}
Then, this is a method to get an index of variable name:
String varname(String str){
int startIdx = str.indexOf("_");
int endIdx = str.indexOf(';');
String content = str.substring(startIdx + 1, endIdx);
return content;
}
Actually above code is working and replace some variables name, but some character noted matched when i put space example _bb[_ii] is not working, but _bb[ _ii ] is working. I don't know how, help me!
Thanks
Use regex to recognize the entire variable, here using \b to find word boundaries.
public class Obfuscate {
private static final Pattern VAR_PATTERN = Pattern.compile("\\b_(\\w+)\\b");
private final Map<String, String> renames = new HashMap<>();
public String obfuscate(String sourceCode) {
StringBuffer buf = new StringBuffer(sourceCode.length() + 100);
Matcher m = VAR_PATTERN.matcher(sourceCode);
while (m.find()) {
String var = m.group(1);
String newVar = renames.get(var);
if (newVar == null) {
newVar = randomVar();
renames.put(var, newVar);
}
m.appendReplacement(buf, newVar);
}
m.appendTail(buf);
return buf.toString();
}
}
A map is needed to match the same old variable to the same new name.
A Set<String> of new names might be needed to check that the generated name does not repeat.
Your approach of doing a replaceAll of the same var is fine too, but requires reading all. The method above can be repeated (say per line), hence the map as field.
In your first if-statement you check if the string contains " _" (an underscore with a leading space).
If in the following line of your source-java-file
_bb[_ii][_ee] = init_value;
_bb... is indented with tabulators, <tab>_bb will not match <space>_bb. There is no leading space before _ii and _ee either, so the if returns false.
If you put a space between [ and _ii, you find a match for <space>_ii and your if results in true and executes your replacement code.
If you are sure that there will be no other use of an underscore in your source text other than as a replacement indicator, you can simply remove the space from your if-condition and use line.contains("_") instead.
Btw: Are you sure that you want to check that the line must contain a ; aswell? What if your source text contains a line like while(_xx==true) {?
Also, because of
String[] split = get.split("\\s+");
String gvarname = split[0];
your code is not able to split a line like _bb[_ii][_ee] correctly (and even if it would be, because of split[0] you would only replace the first identifier you found, subsequent ones would be ignored). Your split searches for spaces and the source line doesn't contain any. Again, you could probably change this and split for underscores (this would return an array containing bb[, ii][ and ee]) and then loop every returned array element until you find the first character that can't be part of your variable identifier (e.g. until the first non-alphanumeric character).
An _ plus the part of the array element up to that non-alphanumeric character is then the identifier that you want to replace.

How to get the specific part of a string based on condition?

I have a requirement to get the substring of a string based on a condition.
String str = "ABC::abcdefgh||XYZ::xyz";
If input is "ABC", check if it contains in str and if it presents then it should print abcdefgh.
In the same way, if input is "XYZ", then it should print xyz.
How can i achieve this with string manipulation in java?
If I've guessed the format of your String correctly, then you could split it into tokens with something like this:
String[] tokens = str.split("||");
for(String token : tokens)
{
// Cycle through each token.
String key = token.split("::")[0];
String value = token.split("::")[1];
if(key.equals(input))
{
// input being the user's typed in value.
return value;
}
}
But let's have a think for a minute. Why keep this in a String, when a HashMap is a much cleaner solution to your problem? Stick the String into a config file, and on load,
some code can perform a similar task:
Map<String, String> inputMap = new HashMap<String, String>();
String[] tokens = str.split("||");
for(String token : tokens)
{
// Cycle through each token.
String key = token.split("::")[0];
String value = token.split("::")[1];
inputMap.put(key, value);
}
Then when the user types something in, it's as easy as:
return inputMap.get(input);
The idea is that, you should split your string with the delimiters of "::" and "||" , i.e. whichever of them is encountered it will be treated as a delimiter. So, the best way for achieving that is using regular expressions, I think.
String str = "ABC::abcdefgh||XYZ::xyz";
String[] parts = str.split("[::]|[/||]");
Map<String, String> map = new HashMap<String, String>();
for (int i = 0; i < parts.length - 2; i += 4) {
if (!parts[i].equals("")) {
map.put(parts[i], parts[i + 2]);
}
}
Short and concise, your code is ready. The for loop seems weird, if anyone comes up with a better regex for splitting (to get rid of the empty strings), it will become cleaner. I'm not a regex expert, so any suggestions are welcome.
Use the contains method to see if it has the sub string: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#contains%28java.lang.CharSequence%29
You could do it as follows:
String[] parts = st.split("||");
if (parts[0].startsWith("ABC")) {
String[] values = parts[0].split("::");
System.out.println(values[1]);
} else {
if (parts[1].startsWith("XYZ") {
String[] values = parts[0].split("::");
System.out.println(values[1]);
}
}
The above code will check first if ABC is there. If yes, it will print the result and then stop. If not, it will check the second section of the code to see if it starts with XYZ and then print the result. You can change it to suit your needs.

Categories

Resources