Making a string of postorder data of a given tree - java

As part of my assignment , I am given an expression tree and I need to convert it to a in-fix with O(n) run-time.
For example,
To convert this tree to "( ( 1 V ( 2 H ( 3 V 4 ) ) ) H ( 5 V 6 ) )".
I couldn't think of a way to convert it straight to infix so I thought of first converting it to post-fix and than to in-fix. (If there's a better way please tell me).
Now, my problem is with converting the tree to post-order.
I have tried the following:
private String treeToPost(Node node, String post) {
if (node != null) {
treeToPost(node.left, post);
treeToPost(node.right, post);
post = post + node.data.getName();
}
return post;
}
Now I have two problems with this method, the first one is that doesn't work because it only saves the last node it traveled, the second one is that I'm not sure it will run at O(n) because it will have to create a new string each time.
I found a solution to this issue here but it used StringBuilder which I am not allowed to use. I thought of making an array of chars to save the data , but because I dont know the size of the tree I cant know the size of the needed array.
Thank you for your time :)

Going directly to infix is probably easier, just always add parenthesis.
Secondly, doing something like this will save all nodes:
private String treeToPost(Node node) {
String returnString = "";
if (node != null) {
returnString += treeToPost(node.left);
returnString += treeToPost(node.right);
returnString += node.data.getName();
}
return returnString;
}
For infix, this should work
private String treeToPost(Node node) {
String returnString = "";
if (node != null) {
returnString += "(" + treeToPost(node.left);
returnString += node.data.getName();
returnString += treeToPost(node.right) + ")";
}
return returnString;
}
These both make new String objects each time. So i think it technically is O(n^2), because the string grows each time, but no professor of mine would deduct points for that.
However if you want to avoid this behaviour and can't use StringBuilder. You can use a CharArrayWriter. This is a buffer that grows dynamically. You can then make two methods. One that appends to the buffer and returns nothing. And one that returns a String. You would then call the buffer one from inside the String one.

Related

How to strip ' \n' in Java?

I have the following piece of code:
String name='ishtiaq\n'
How can I strip the newline-character?
Thank you in advance.
If the "\n" always occurs at the end, use String.trim() [this won't remove the period, however, if you care about doing that]. If you want to eliminate internal newlines, you could use String.replaceAll(). You could also copy the string into a StringBuilder or array to construct a new string, skipping over the elements you wish to discard, or you could locate the relevant indices and use substring() to get a substring that excludes the elements you don't like. In short, there are many ways to do this.
Here is just one of many ways to do it (this one removing both the period symbol and the newline):
private static String nameWithoutSuffix(String nameWithSuffix) {
int periodIndex = nameWithSuffix.indexOf('.');
int newlineIndex = nameWithSuffix.indexOf('\n');
if ((periodIndex == -1) && (newlineIndex == -1)) {
return nameWithSuffix;
}
int suffixStartIndex = -1;
if (periodIndex != -1) {
suffixStartIndex = periodIndex;
}
if ((newlineIndex != -1)
&& ((suffixStartIndex == -1)
|| (newlineIndex < suffixStartIndex))) {
suffixStartIndex = newlineIndex;
}
return nameWithSuffix.substring(0, suffixStartIndex);
}
Java Strings have .replace!
With it, your problem becomes super easy:
String strippedString = name.replace("\n", "");
Remember: When you use replace, Java won't work on the existing String. That's why you'll have to store the return-value in another object. (You could use the same of course.)

Formatting Strings and String arrays with tabs in java

I have an assignment where I'm supposed to have a method that formats an array of String objects to be tabulated a certain way with a header, and put all the objects (after being formatted) nicely into a single String for the method to return. This method is inside an object class, so it ultimately will be formatting multiple objects the same way, so I need it to format the same way with various String lengths.
Here's what I need the output to look like:
Hashtags:
#firstHashtag
#secondHashtag
Each hashtag is in a String[] of hashtags,
i.e.
String[] hashtags = ["#firstHashtag", "#secondHashtag"]
So basically I need to use string.format() to create on single string containing a tabbed "Hashtags:" header, and then each String in the "hashtags" array to be on a new line, and double-tabbed. The size of the "hashtag" array changes since it is in an object class.
Could someone help me use String.formatter?
This is what my method looks like so far:
public String getHashtags()
{
String returnString = "Hashtags:";
String add;
int count = 0;
while(count < hashtags.length)
{
//hashtags is an array of String objects with an unknown size
returnString += "\n";
add = String.format("%-25s", hashtags[count]);
//here I'm trying to use .format, but it doesn't tabulate, and I
//don't understand how to make it tabulate!!
count++;
returnString = returnString + add;
}
if(hashtags == null)
{
returnString = null;
}
return returnString;
}
Any helpful advice on what to do here with formatting would be greatly appreciated!!!
If you are trying to use real tabs and not spaces, then just change your program to be like this one:
public String getHashtags()
{
if(hashtags == null)
{
return null;
}
String returnString = "Hashtags:";
int count = 0;
while(count < hashtags.length)
{
//hashtags is an array of String objects with an unknown size
returnString = returnString + "\n\t\t"+hashtags[count];
count++;
}
return returnString;
}
Your String.format() statement will create a String that is left-justified and padded to 25 spaces. For example, this line:
System.out.println("left-justified >" + String.format("%-25s", "hello") + "<");
outputs:
left-justified >hello <
The other thing is that you're not really using tabs (I don't see the tab character in your program). String.format() is creating Strings that are length 25 and left-justified. Keep that in mind as you create the return string. Also, your loop as adding a newline character each time. That's why you're getting multi-line output.

java replace HTML_Escapecodes

i need to develope a new methode, that should replace all Umlaute (ä, ö, ü) of a string entered with high performance with the correspondent HTML_Escapecodes. According to statistics only 5% of all strings entered contain Umlauts. As it is supposed that the method will be used extensively, any instantiation that is not necessary should be avoided.
Could someone show me a way to do it?
These are the HTML escape codes. Additionally, HTML features arbitrary escaping with codes of the format : and equivalently :
A simple string-replace is not going to be efficient with so many strings to replace. I suggest you split the string by entity matches, such as this:
String[] parts = str.split("&([A-Za-z]+|[0-9]+|x[A-Fa-f0-9]+);");
if(parts.length <= 1) return str; //No matched entities.
Then you can re-build the string with the replaced parts inserted.
StringBuilder result = new StringBuilder(str.length());
result.append(parts[0]); //First part always exists.
int pos = parts[0].length + 1; //Skip past the first entity and the ampersand.
for(int i = 1;i < parts.length;i++) {
String entityName = str.substring(pos,str.indexOf(';',pos));
if(entityName.matches("x[A-Fa-f0-9]+") && entityName.length() <= 5) {
result.append((char)Integer.decode("0" + entityName));
} else if(entityName.matches("[0-9]+")) {
result.append((char)Integer.decode(entityName));
} else {
switch(entityName) {
case "euml": result.append('ë'); break;
case "auml": result.append('ä'); break;
...
default: result.append("&" + entityName + ";"); //Unknown entity. Give the original string.
}
}
result.append(parts[i]); //Append the text after the entity.
pos += entityName.length() + parts[i].length() + 2; //Skip past the entity name, the semicolon and the following part.
}
return result.toString();
Rather than copy-pasting this code, type it in your own project by hand. This gives you the opportunity to look at how the code actually works. I didn't run this code myself, so I can't guarantee it being correct. It can also be made slightly more efficient by pre-compiling the regular expressions.

Why is the size of this vector 1?

When I use System.out.println to show the size of a vector after calling the following method then it shows 1 although it should show 2 because the String parameter is "7455573;photo41.png;photo42.png" .
private void getIdClientAndPhotonames(String csvClientPhotos)
{
Vector vListPhotosOfClient = new Vector();
String chainePhotos = "";
String photoName = "";
String photoDirectory = new String(csvClientPhotos.substring(0, csvClientPhotos.indexOf(';')));
chainePhotos = csvClientPhotos.substring(csvClientPhotos.indexOf(';')+1);
chainePhotos = chainePhotos.substring(0, chainePhotos.lastIndexOf(';'));
if (chainePhotos.indexOf(';') == -1)
{
vListPhotosOfClient.addElement(new String(chainePhotos));
}
else // aaa;bbb;...
{
for (int i = 0 ; i < chainePhotos.length() ; i++)
{
if (chainePhotos.charAt(i) == ';')
{
vListPhotosOfClient.addElement(new String(photoName));
photoName = "";
continue;
}
photoName = photoName.concat(String.valueOf(chainePhotos.charAt(i)));
}
}
}
So the vector should contain the two String photo41.png and photo42.png , but when I print the vector content I get only photo41.png.
So what is wrong in my code ?
The answer is not valid for this question anymore, because it has been retagged to java-me. Still true if it was Java (like in the beginning): use String#split if you need to handle csv files.
It's be far easier to split the string:
String[] parts = csvClientPhotos.split(";");
This will give a string array:
{"7455573","photo41.png","photo42.png"}
Then you'd simply copy parts[1] and parts[2] to your vector.
You have two immediate problems.
The first is with your initial manipulation of the string. The two lines:
chainePhotos = csvClientPhotos.substring(csvClientPhotos.indexOf(';')+1);
chainePhotos = chainePhotos.substring(0, chainePhotos.lastIndexOf(';'));
when applied to 7455573;photo41.png;photo42.png will end up giving you photo41.png.
That's because the first line removes everything up to the first ; (7455573;) and the second strips off everything from the final ; onwards (;photo42.png). If your intent is to just get rid of the 7455573; bit, you don't need the second line.
Note that fixing this issue alone will not solve all your ills, you still need one more change.
Even though your input string (to the loop) is the correct photo41.png;photo42.png, you still only add an item to the vector each time you encounter a delimiting ;. There is no such delimiter at the end of that string, meaning that the final item won't be added.
You can fix this by putting the following immediately after the for loop:
if (! photoName.equals(""))
vListPhotosOfClient.addElement(new String(photoName));
which will catch the case of the final name not being terminated with the ;.
These two lines are the problem:
chainePhotos = csvClientPhotos.substring(csvClientPhotos.indexOf(';') + 1);
chainePhotos = chainePhotos.substring(0, chainePhotos.lastIndexOf(';'));
After the first one the chainePhotos contains "photo41.png;photo42.png", but the second one makes it photo41.png - which trigers the if an ends the method with only one element in the vector.
EDITED: what a mess.
I ran it with correct input (as provided by the OP) and made a comment above.
I then fixed it as suggested above, while accidently changing the input to 7455573;photo41.png;photo42.png; which worked, but is probably incorrect and doesn't match the explanation above input-wise.
I wish someone would un-answer this.
You can split the string manually. If the string having the ; symbol means why you can do like this? just do like this,
private void getIdClientAndPhotonames(String csvClientPhotos)
{
Vector vListPhotosOfClient = split(csvClientPhotos);
}
private vector split(String original) {
Vector nodes = new Vector();
String separator = ";";
// Parse nodes into vector
int index = original.indexOf(separator);
while(index>=0) {
nodes.addElement( original.substring(0, index) );
original = original.substring(index+separator.length());
index = original.indexOf(separator);
}
// Get the last node
nodes.addElement( original );
return nodes;
}

Optimizing a lot of Scanner.findWithinHorizon(pattern, 0) calls

I'm building a process which extracts data from 6 csv-style files and two poorly laid out .txt reports and builds output CSVs, and I'm fully aware that there's going to be some overhead searching through all that whitespace thousands of times, but I never anticipated converting about 50,000 records would take 12 hours.
Excerpt of my manual matching code (I know it's horrible that I use lists of tokens like that, but it was the best thing I could think of):
public static String lookup(Pattern tokenBefore,
List<String> tokensAfter)
{
String result = null;
while(_match(tokenBefore)) { // block until all input is read
if(id.hasNext())
{
result = id.next(); // capture the next token that matches
if(_matchImmediate(tokensAfter)) // try to match tokensAfter to this result
return result;
} else
return null; // end of file; no match
}
return null; // no matches
}
private static boolean _match(List<String> tokens)
{
return _match(tokens, true);
}
private static boolean _match(Pattern token)
{
if(token != null)
{
return (id.findWithinHorizon(token, 0) != null);
} else {
return false;
}
}
private static boolean _match(List<String> tokens, boolean block)
{
if(tokens != null && !tokens.isEmpty()) {
if(id.findWithinHorizon(tokens.get(0), 0) == null)
return false;
for(int i = 1; i <= tokens.size(); i++)
{
if (i == tokens.size()) { // matches all tokens
return true;
} else if(id.hasNext() && !id.next().matches(tokens.get(i))) {
break; // break to blocking behaviour
}
}
} else {
return true; // empty list always matches
}
if(block)
return _match(tokens); // loop until we find something or nothing
else
return false; // return after just one attempted match
}
private static boolean _matchImmediate(List<String> tokens)
{
if(tokens != null) {
for(int i = 0; i <= tokens.size(); i++)
{
if (i == tokens.size()) { // matches all tokens
return true;
} else if(!id.hasNext() || !id.next().matches(tokens.get(i))) {
return false; // doesn't match, or end of file
}
}
return false; // we have some serious problems if this ever gets called
} else {
return true; // empty list always matches
}
}
Basically wondering how I would work in an efficient string search (Boyer-Moore or similar). My Scanner id is scanning a java.util.String, figured buffering it to memory would reduce I/O since the search here is being performed thousands of times on a relatively small file. The performance increase compared to scanning a BufferedReader(FileReader(File)) was probably less than 1%, the process still looks to be taking a LONG time.
I've also traced execution and the slowness of my overall conversion process is definitely between the first and last like of the lookup method. In fact, so much so that I ran a shortcut process to count the number of occurrences of various identifiers in the .csv-style files (I use 2 lookup methods, this is just one of them) and the process completed indexing approx 4 different identifiers for 50,000 records in less than a minute. Compared to 12 hours, that's instant.
Some notes (updated 6/6/2010):
I still need the pattern-matching behaviour for tokensBefore.
All ID numbers I need don't necessarily start at a fixed position in a line, but it's guaranteed that after the ID token is the name of the corresponding object.
I would ideally want to return a String, not the start position of the result as an int or something.
Anything to help me out, even if it saves 1ms per search, will help, so all input is appreciated. Thankyou!
Usage scenario 1: I have a list of objects in file A, who in the old-style system have an id number which is not in file A. It is, however, POSSIBLY in another csv-style file (file B) or possibly still in a .txt report (file C) which each also contain a bunch of other information which is not useful here, and so file B needs to be searched through for the object's full name (1 token since it would reside within the second column of any given line), and then the first column should be the ID number. If that doesn't work, we then have to split the search token by whitespace into separate tokens before doing a search of file C for those tokens as well.
Generalised code:
String field;
for (/* each record in file A */)
{
/* construct the rest of this object from file A info */
// now to find the ID, if we can
List<String> objectName = new ArrayList<String>(1);
objectName.add(Pattern.quote(thisObject.fullName));
field = lookup(objectSearchToken, objectName); // search file B
if(field == null) // not found in file B
{
lookupReset(false); // initialise scanner to check file C
objectName.clear(); // not using the full name
String[] tokens = thisObject.fullName.split(id.delimiter().pattern());
for(String s : tokens)
objectName.add(Pattern.quote(s));
field = lookup(objectSearchToken, objectName); // search file C
lookupReset(true); // back to file B
} else {
/* found it, file B specific processing here */
}
if(field != null) // found it in B or C
thisObject.ID = field;
}
The objectName tokens are all uppercase words with possible hyphens or apostrophes in them, separated by spaces (a person's name).
As per aioobe's answer, I have pre-compiled the regex for my constant search tokens, which in this case is just \r\n. The speedup noticed was about 20x in another one of the processes, where I compiled [0-9]{1,3}\\.[0-9]%|\r\n|0|[A-Z'-]+, although it was not noticed in the above code with \r\n. Working along these lines, it has me wondering:
Would it be better for me to match \r\n[^ ] if the only usable matches will be on lines beginning with a non-space character anyway? It may reduce the number of _match executions.
Another possible optimisation is this: concatenate all tokensAfter, and put a (.*) beforehand. It would reduce the number of regexes (all of which are literal anyway) that would be compiled by about 2/3, and also hopefully allow me to pull out the text from that grouping instead of keeping a "potential token" from every line with an ID on it. Is that also worth doing?
The above situation could be resolved if I could get java.util.Scanner to return the token previous to the current one after a call to findWithinHorizon.
Something to start with: Every single time you run id.next().matches(tokens.get(i)) the following code is executed:
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
return m.matches();
Compiling a regular expression is non-trivial and you should consider compiling the patterns once and for all in your program:
pattern[i] = Pattern.compile(tokens.get(i));
And then simply invoke something like
pattern[i].matcher(str).matches()

Categories

Resources