Lexicographic sorting with a Trie

Lexicographic sorting with a Trie - java

According to Wikipedia, regarding a trie:
Lexicographic sorting of a set of keys can be accomplished with a simple trie-based algorithm as follows:
Insert all keys in a trie.
Output all keys in the trie by means of pre-order traversal, which results in output that is in lexicographically increasing order.
However, this is my testing with my standard trie implementation:
Trie trie = new Trie();
trie.add("doll");
trie.add("ball");
trie.add("bat");
trie.add("dork");
trie.add("dorm");
trie.add("send");
trie.add("sense");
trie.add("sent");
Pre-order printout:
public List<String> allWords(){
List<String> words = new ArrayList<String>();
if(root == null){
return words;
}
StringBuilder prefix = new StringBuilder();
getAllWords(root.children,prefix,words);
return words;
}
// depth first search
private void getAllWords(List<TrieNode> children, StringBuilder prefix, List<String> words){
for(int i= 0; i<children.size(); i++){
TrieNode child = children.get(i);
if(!child.isWord_){
prefix.append(child.data_);
allWordsHelper(child.children, prefix, words);
}else{
prefix.append(child.data_);
words.add(prefix.toString());
}
prefix.deleteCharAt(prefix.length()-1);
}
}
And the output order is: doll dork dorm ball bat send sense sent
What does the 'lexicographic sorting' mean? It seems the output order is more related to the insertion order, not lexicographic order. Am I getting something wrong?
Take this tree as one example, the pre-order printout would be "to tea ted ten a inn". Where is the lexicographic order?

The correct way of saying about lexicographical sorting using tries is :
The preorder of the nodes in a trie is the same as the lexicographical order of the strings they represent assuming the children of a node are ordered by the edge labels.
Now, if you tried this in your code, pre-order traversal should give you strings in lexicographic order.
Here is the reference which has an example : http://www.cs.helsinki.fi/u/tpkarkka/opetus/12s/spa/lecture02.pdf

According to me, it should be Inorder traversal. One can traverse the trie such that all branches having keys less than root's value are processed first, then root's value is printed, and lastly all values greater than root's value. This is standard inorder traversal but the only point I wanted to make was, you will have to write custom logic for processing all branches having keys less than root's key first, which in a regular binary tree would just have been root.left, but since there is not left/right in trie we'll have to fall back on the essence of left and right in a BST.( whose inorder also outputs lexicographically sorted values.`)

Can Also be done by TreeMap -
import java.util.Map;
import java.util.TreeMap;
class TrieNodeTreeMap{
Map<Character, TrieNodeTreeMap> children;
String key;
public TrieNodeTreeMap(){
key = null;
children = new TreeMap<>();
}
}
public class LexicographPrint {
public static TrieNodeTreeMap root;
public static void main(String[] args) {
root = new TrieNodeTreeMap();
//String s = "lexicographic sorting of a set";
String s = "lexicographic sorting of a set of keys can be accomplished with " +
"a simple trie based algorithm we insert all keys in a trie output " +
"all keys in the trie by means of preorder traversal which results " +
"in output that is in lexicographically increasing order preorder " +
"traversal is a kind of depth first traversal";
String dict[] = s.split(" ");
for (String word : dict){
insertion(root,word);
}
lexicographlySort(root);
}
private static void insertion(TrieNodeTreeMap root, String str){
TrieNodeTreeMap curr = root;
for (char ch : str.toCharArray()){
if (!curr.children.containsKey(ch)){
curr.children.put(ch, new TrieNodeTreeMap());
}
curr = curr.children.get(ch);
}
curr.key = str;
}
private static void lexicographlySort(TrieNodeTreeMap root){
TrieNodeTreeMap curr = root;
if (curr == null)
return;
for (Map.Entry<Character, TrieNodeTreeMap> entry : curr.children.entrySet()){
TrieNodeTreeMap tmp = entry.getValue();
if (tmp.key != null)
System.out.println(tmp.key);
lexicographlySort(entry.getValue());
}
}
}

Related

How to iterate through a list of nodes which might have sub-lists of nodes (unknown depth levels)

I have a list of nodes, and each node might have a list of subNodes (the number of levels are unknown):
class Node {
int score;
boolean selected;
List<Node> subNodes;
}
Here's how an hypothetical structure might look like:
NODE
+ NODE
+ NODE
+ NODE
+ NODE
+ NODE
+ NODE
+ NODE
+ NODE
+ NODE
+ NODE
+ NODE
Combinations are just countless. I need a way to sum NODE.score for all those nodes that have NODE.selected set to true, possibly using Java 8 features. Any hints would be really appreciated.

Something like:
public int recursiveTotal(final Node node) {
//node not select, don't count the node or any of its subnodes
if (!node.selected) {
return 0;
}
//no subnodes, only node score counts
if (node.subNodes.isEmpty()) {
return node.score;
}
//node has subnodes, recursively count subnode score + parent node score
int totalScore = node.score;
for (final Node subNode : node.subNodes) {
totalScore += recursiveTotal(subNode);
}
return totalScore;
}
Coded using stackoverflow as an IDE, no guarantee against compilation errors ;)

Create a recursive method in your Node class which returns a stream of nodes concatenating a stream of the parent node and the sub nodes:
class Node {
int score;
boolean selected;
List<Node> subNodes;
public Stream<Node> streamNodes() {
return Stream.concat(Stream.of(this), subNodes.stream().flatMap(Node::streamNodes));
}
}
and use it like below to stream over your list:
List<Node> myNodes = //your list
int sum = myNodes.stream()
.flatMap(Node::streamNodes)
.filter(Node::isSelected)
.mapToInt(Node::getScore)
.sum();

TL;DR
Judging by the structure, you've provided each Node in your List is the root of an N-ary Tree data structure (I assume that there are no circles).
And in order to get the required data, we can utilize one of the classic tree-traversal algorithms. In case when the average depth is lower than the average width Depth first search algorithm would be more suitable because it would be more space-efficient, in the opposite situation it would be better to use Breadth first search. I'll go with DFS.
It's easier to come up with a recursive implementation, so I'll start with it. But it has no practical value in Java, hence we would proceed with a couple of improvements.
Streams + recursion
You can create a helper-method responsible for flattening the nodes which would be called from the stream.
List<Node> nodes = // initializing the list
long totalScore = nodes.stream()
.flatMap(node -> flatten(node).stream())
.filter(Node::isSelected)
.mapToLong(Node::getScore)
.sum();
Recursive auxiliary method:
public static List<Node> flatten(Node node) {
if (node.getSubNodes().isEmpty()) {
return List.of(node);
}
List<Node> result = new ArrayList<>();
result.add(node);
node.getSubNodes().forEach(n -> result.addAll(flatten(n)));
return result;
}
No recursion
To avoid StackOverflowError method flatten() can be implemented without recursion by polling and allocating new nodes on the stack (represented by an ArrayDeque) iterativelly.
public static List<Node> flatten(Node node) {
List<Node> result = new ArrayList<>();
Deque<Node> stack = new ArrayDeque<>();
stack.add(node);
while (!stack.isEmpty()) {
Node current = stack.poll();
result.add(current);
current.getSubNodes().forEach(stack::push);
}
return result;
}
No recursion & No intermediate data allocation
Allocating intermediate data in the form of nodes which eventually would not be used is impractical.
Instead, we can make the auxiliary method to be responsible for calculating the total score produced by summarizing the score of each selected node in the tree of nodes.
For that we need to perform isSelected() while traversing the tree.
List<Node> nodes = // initializing the list
long totalScore = nodes.stream()
.mapToLong(node -> getScore(node))
.sum();
public static long getScore(Node node) {
long total = 0;
Deque<Node> stack = new ArrayDeque<>();
stack.push(node);
while (!stack.isEmpty()) {
Node current = stack.poll();
if (current.isSelected()) total += current.getScore();
current.getSubNodes().forEach(stack::push);
}
return total;
}

Data structures get maximum value at each level of N-ary tree

Lets say I have a n-ary tree something like below I need to find maximum value at each level and return like :
[8,7,32] .
8
4 3 7
1 4 3 3 5 6 7 12 32 3 1
My Node will look something like below :
public class Node {
public int val;
public List<Node> children;
public Node() {
}
public Node(int _val,List<Node> _children) {
val=_val;
children=_children;
}
I tried through recursion at each level get the elements and find the maximum but unable to do so.

We can get the level-maximum by a level order traversal / Breadth-first search. The idea is that we have a list/queue of nodes on one level. For all nodes in this list the algorithm does two things:
It calculates the maximum value on this level.
It iterates over all nodes of the list/queue, gets all children of those nodes and put them in a new list/queue, which it can then process in the next iteration.
The algorithm starts with a list/queue holding the root of the (sub)-tree and ends when the list/queue is empty.
This can be expressed nicely with Stream operations:
public static List<Integer> getMaxValuePerLevel(Node node) {
final ArrayList<Integer> maxPerLevel = new ArrayList();
maxPerLevel.add(node.getValue());
List<Node> children = node.getChildren();
while (!children.isEmpty()) {
maxPerLevel.add(children.stream()
.mapToInt(Node::getValue)
.max()
.getAsInt());
children = children.stream()
.map(Node::getChildren)
.flatMap(List::stream)
.collect(Collectors.toList());
}
return maxPerLevel;
}
Ideone demo
This implementation has two nice properties:
It is iterative, not recursive, i.e. the algorithm is not subject to a StackOverflowError
It has linear time- and memory complexity
With a little bit of effort, we are even able to make the algorithm work with generic Node<T extends Comparable<T>>:
public static <T extends Comparable<T>> List<T> getMaxValuePerLevel(Node<T> node) {
final ArrayList<T> maxPerLevel = new ArrayList<>();
maxPerLevel.add(node.getValue());
List<Node<T>> children = node.getChildren();
while (!children.isEmpty()) {
final Node<T> defaultNode = children.get(0);
maxPerLevel.add(children.stream()
.map(Node::getValue)
.max(Comparator.naturalOrder())
.orElseGet(defaultNode::getValue));
children = children.stream()
.map(Node::getChildren)
.flatMap(List::stream)
.collect(Collectors.toList());
}
return maxPerLevel;
}
Ideone demo

The root node is going to be the highest of its level. For the subsequent levels, call Collections.sort() (or any other comparison that will order your list) on the list of children nodes and take the last element (or whichever has the highest value according to the sorting method you used). Then iterate through the list of children nodes that you just sorted and for each node, apply the same treatment to its list of children.

A recursive solution is surprisingly simple. First create a list to hold the result. Then iterate through all the nodes: at each node you compare the node's value with the value in the list at the same level. If the node's value is greater, you replace the value in the list.
class Node {
public int val;
public List<Node> children;
public Node(int _val, List<Node> _children) {
val = _val;
children = _children;
}
public List<Integer> getMaxPerLevel() {
List<Integer> levels = new ArrayList<>();
getMaxPerLevel(0, levels);
return levels;
}
private void getMaxPerLevel(int level, List<Integer> levels) {
if (level >= levels.size()) {
levels.add(level, val);
} else {
levels.set(level, Math.max(val, levels.get(level)));
}
for (Node child : children) {
child.getMaxPerLevel(level + 1, levels);
}
}
}

Thanks everyone I did using below solution:
public List<Integer> levelOrder(Node node){
List<Integer> result = new ArrayList<>();
Queue<Node> queue = new LinkedList<Node>();
queue.add(node);
while(!queue.isEmpty()) {
int size = queue.size();
List<Integer> currentLevel = new ArrayList<Integer>();
for(int i=0;i<size;i++) {
Node current = queue.remove();
currentLevel.add(current.val);
for(Integer inte:currentLevel) {
System.out.println(inte);
}
if(current.children !=null) {
for(Node node1:current.children)
queue.add(node1);
}
}
result.add(Collections.max(currentLevel));
}
return result;
}

How do I sort an ArrayList<String> that contains integers?

I made a Word Counter binary search tree that increments the count of a word when it is entered more than once. Both the word and word count are saved in the tree. I am attempting to print the highest count words first, and go down in descending count order.
I converted the BST to an ArrayList in order to do this, but now I cannot seem to figure out how to sort the list by decreasing count order. Here's what I have so far:
public ArrayList<String> toArray() {
ArrayList<String> result = new ArrayList<String>();
toArrayHelp(root, result);
Collections.sort(result);
return result;
}
private void toArrayHelp(Node<String, Integer> node, ArrayList<String> result) {
if (node == null) {
return;
}
toArrayHelp(node.left, result);
result.add("count: " + String.valueOf(node.count) + "/t word: " + node.data);
toArrayHelp(node.right, result);
}
I have tried Collections.sort() but that isn't ordering it by string, only by word.

traverse the tree, generating a List<Node<String, Integer>> from all elements
sort the List, ordering by the int part of the nodes
create a list retaining only the strings, in the same order

You are constructing the output string too soon: you need to sort the list first by using the count as a key, and afterwards print the results. You can make a simple wrapper that will contain the result:
public class WordCount implements Comparable<WordCount>{
private String word;
private Integer count;
//constructors, getters, setters etc..
#Override
public int compareTo(WordCount other) {
return Integer.compare(this.count, other.count);
}
}
and construct a List<WordCount> list while you traverse the tree. After you are done you just need to sort the list by Collections.sort(list) and print the results.

1.For DESC order use Collections.sort(result, Collections.reverseOrder()); because default sorting order is ASC.
2.Make sure that count's string representation has the same length. Otherwise, lexicographical order assumes 11 < 2:
List<String> list = Arrays.asList("11", "1", "2");
Collections.sort(list, Collections.reverseOrder());
System.out.println(list); // output: [2, 11, 1]
But if numbers have the same length works fine:
List<String> list = Arrays.asList("11", "01", "02");
Collections.sort(list, Collections.reverseOrder());
System.out.println(list); // output: [11, 02, 01]
How to add leading zeroes you can find here https://stackoverflow.com/a/275715/4671833.
Should be something like this result.add("count: " + String.format("%02d", String.valueOf(node.count)) + "/t word: " + node.data);

Two brief points: Let name selection and formatting be your friends! You'll want to make a habit of choosing simple and expressive variable names, and of keeping your code neatly formatted.
Let's start by putting this into clear steps:
(1) There is a source of word data, expressed as a tree of nodes. Avoiding too much detail, lets set the important details of the node type, and have the node tree available using a getter.
An important detail to mention is that the nodes are intended to be kept in a sorted binary tree that has distinct key values, and for which the value of any left node is strictly less than the value of the node, and the value of any right node is strictly greater than the value of the node. That has an important consequence which is that the values of the left sub-tree of a node are all strictly less than the value of the node, and the values of the right sub-tree are similarly all strictly greater than the value of the node.
public class Node<K, V> {
public K key;
public V value;
public Node<K, V> left;
public Node<K, V> right;
public Node(K key, V value) {
this.key = key;
this.value = value;
}
}
public Node<String, Integer> getRootNode() {
// Undetailed ...
}
(2) There are three main operations which are needed: An operation to collect the nodes of the tree into a list, an operation to sort this list, and an operation to display the sorted list:
public List<Node<String, Integer>> flatten(Node<String, Integer> rootNode) {
// Undetailed ...
}
public void sort(List<Node<String, Integer>> nodes) {
// Undetailed ...
}
public void print(List<Node<String, Integer>> nodes) {
// Undetailed ...
}
(3) This fits together, for example, as follows:
public void tester() {
Node<String, Integer> rootNode = getRootNode();
List<Node<String, Integer>> flatNodes = flatten(rootNode);
sort(flatNodes);
print(flatNodes)l
}
(4) What remains are to detail the several methods. We begin with 'flatten'. That will be implemented as a recursive operation. And, since passing around the storage for the flat list is simpler, the method will be split into two parts, one which allocates storage, and another which does the recursive processing. This technique of passing along a storage collection is typical of this sort of processing.
'flatten' makes use of the ordering property of a node with respect to the node's left node and the node's right node: 'flatten' adds all values of the left sub-tree to the flat nodes list, followed by the node, followed by all values of the right sub-tree.
public List<Node<String, Integer>> flatten(Node<String, Integer> rootNode) {
List<Node<String, Integer>> flatNodes = new ArrayList<Node<String, Integer>>();
flatten(rootNode, flatNodes);
return flatNodes;
}
public void flatten(Node<String, Integer> node, List<Node<String, Integer>> flatNodes) {
if ( node == null ) {
return;
}
flatten(node.left, flatNodes);
flatNodes.add(node);
flatten(node.right, flatNodes);
}
(5) At a cost of clarity, this can be made somewhat more efficient by moving the null checks. For a fully balanced tree, this will avoid about 2/3's of the recursive calls, which is pretty good reduction. This only matters if the number of nodes is high. And a good compiler will likely convert the code in this fashion anyways.
public List<Node<String, Integer>> flatten(Node<String, Integer> rootNode) {
List<Node<String, Integer>> flatNodes = new ArrayList<Node<String, Integer>>();
if ( rootNode != null ) {
flatten(rootNode, flatNodes);
}
return flatNodes;
}
public void flatten(Node<String, Integer> node, List<Node<String, Integer>> flatNodes) {
Node<String, Integer> leftNode = node.left;
if ( leftNode != null ) {
flatten(leftNode, flatNodes);
}
flatNodes.add(node);
Node<String, Integer> rightNode = node.right;
if ( rightNode != null ) {
flatten(rightNode, flatNodes);
}
}
(6) The next bit is sorting the flat nodes list. Two implementations are presented, a more modern one which uses lambdas, and an older style one which uses an explicit comparator. The comparisons are written to generate a list sorted from smallest to largest. To reverse the sort order, exchange the order of comparison.
public void sort(List<Node<String, Integer>> nodes) {
Collections.sort(
nodes,
((Node<String, Integer> n1, Node<String, Integer> n2) -> Integer.compare(n1.value, n2.value)) );
}
public static final Comparator<Node<String, Integer>> NODE_COMPARATOR =
new Comparator<Node<String, Integer>>() {
public int compare(Node<String, Integer> n1, Node<String, Integer> n2) {
return Integer.compare(n1.value, n2.value);
}
};
public void sort(List<Node<String, Integer>> nodes) {
Collections.sort(nodes, NODE_COMPARATOR);
}
(7) Printing of the resulting sorted list is left as an exercise.

Search by inorder in binary search tree

I use inorder to show result of search name which store in binary search tree but when i run it example i have: Employee name "abc" and "ab" and i input name ="abc" it show 2 of them.Anyone can help me what is my fault :( ty
public void searchFull(String name) {
EmployeeSLLNode p = root;
n=0;
if (p != null) {
inorder(p.left);
if(p.info.getFullname().equals(name)) {
n++;
System.out.printf("%2s %-5s %-8s %-6s %-6s%n", n, p.info.getID(), p.info.getFullname(), p.info.getAge(), p.info.getGender());
}
inorder(p.right);
}
}

In-order traversal is equivalent to iterating a TreeMap's entrySet.
final Map<String, Employee> employees = new TreeMap<String, Employee>();
...
for (final Map.Entry<String, Employee> entry : employees.entrySet()) {
/* iterating in-order */
}
TreeMap simply uses a binary search tree (in particular, according to the specification, a red-black tree). Consider using it instead of rolling your own solution ;-)
That being said, if you're intent on rolling your own, maybe try something like this...
public EmployeeSSLnode search(final EmployeeSSLnode root, final String name) {
EmployeeSSLnode left;
return root == null
? null
: (left = search(root.left, name)) == null
? root.info.getFullname().equals(name)
? root
: search(root.right, name)
: left;
}

I think this is what you can do.But ensure that your your tree doesn't have the duplicate names.
public void searchFull(EmployeeSLLnode p, String name) {
if (p == null)
return;
searchFull(p -> left, name);
if (p.info.getFullname().equals(name)) {
//This is the node do other stuff here
return;
}
searchFull(p -> right, name);
}
Also it would be better to do general search in BST instead of searching through Inorder. Inorder searcing in BST would actually ruin the whole purpose of BST. Compare the input Name with node using compareTo() method of String class and depending on whether name is alphabetically later or earlier move either to right or left.

Most of this code should be inside the inorder() method. Undoubtedly it actually is, so you have two prints, so you get two outputs. All the searchFull() method should do is call inorder(root).

Using a generic Pair class and a Splaytree to count and store words and their frequencies in Java

I'm implementing a splaytree to hold words and their frequencies and chose to create a Pair class that would hold each word-frequency (key-value) pair. That is, each node of the splaytree holds a pair of the Pair class. The Pair class looks like this:
public class SplayEntry<K, V> implements Comparable<SplayEntry<K, V>>{
public K word;
public V frequency;
public SplayEntry(K word, V frequency) {
this.word = word;
this.frequency = frequency;
}
getters, setters, hashCode, equals, compareTo etc...
The Splaytree:
public class SplayTree<AnyType extends Comparable<? super AnyType>> {
public SplayTree( )
{
nullNode = new BinaryNode<AnyType>( null );
nullNode.left = nullNode.right = nullNode;
root = nullNode;
}
And has BinaryNode class.
What I'm having trouble with is how to, for every word and frequency pair put it into the tree and also check whether the pair already exists and if so up the frequency by one. I read in a text file line by line and split each line into words then do a countWords() method that right now is a mess:
public void countWords(String line) {
line = line.toLowerCase();
String[] words = line.split("\\P{L}+");
SplayEntry<String, Integer> entry = new SplayEntry<String, Integer>(null, null);
for (int i = 0, n = words.length; i < n; i++) {
Integer occurances = 0;
entry.setWord(words[i]);
entry.setFrequency(occurances);
if (tree.contains(entry.equals(entry)) && entry.getFrequency() == 0) {
occurances = 1;
} else {
int value = occurances.intValue();
occurances = new Integer(value + 1);
entry.setFrequency(occurances);
}
entry = new SplayEntry<String, Integer>(words[i], occurances);
tree.insert(entry);
}
}
I know this isn't really working and I need help in figuring out how I should instantiate the SplayEntry class and in what order? I also want the method to, for every word in the words array, check whether it exists in a SplayEntry which is inside the tree (contains) and if the word is a new word then the frequency will be 1, else, the frequency will be +1. finally I just add the new SplayEntry into the Splaytree and let that put it in an appropriate node.
Right now I've just confused myself by working on the same piece of code for way too many hours than should be necessary, I would very much appreciate some pointers that can lead me in the right direction!
Please tell me if I've not made myself clear.

I suggest using a standard implementation of a splay tree, i.e. without the counters, and having a separate HashMap for frequencies. This does not sacrifice complexity, since operations on a splay tree are O(log n), while operations on a HashMap are O(1). To preserve encapsulation and invariants, you can put both within a larger class that exposes the required operations.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Lexicographic sorting with a Trie - java

Related

How to iterate through a list of nodes which might have sub-lists of nodes (unknown depth levels)

Data structures get maximum value at each level of N-ary tree

How do I sort an ArrayList<String> that contains integers?

Search by inorder in binary search tree

Using a generic Pair class and a Splaytree to count and store words and their frequencies in Java

Categories

Resources