Why saving the char '�' to a file saves it as '?'?

Why saving the char '�' to a file saves it as '?'? - java

I learned about Huffman Coding and tried to apply. So I made a very basic text reader that can only open and save files. And wrote a decorator that can be used to compress the text before saving (which uses Huffman Coding).
There was a bug that I couldn't find and after alot of debugging I figured out that when I compress the text, as a result the character � may be in the compressed text. For example, the text ',-.:BCINSabcdefghiklmnoprstuvwy gets compressed to 앐낧淧翵�ဌ䤺큕㈀.
I figured out that the bug lies in the saving function. When I save the compressed text, it changes every occurence of � to ?. For example, when saving 앐낧淧翵�ဌ䤺큕㈀, I get 앐낧淧翵?ဌ䤺큕㈀.
When I try to read the saved file to decompress it, I get a different string so the decompression fails.
What makes it more difficult is that the saving function alone works fine, but it doesn't work when using it in my code. the function looks like this:
public void save() throws IOException {
FileWriter fileWriter = new FileWriter(this.filename);
fileWriter.write(this.text);
fileWriter.close();
}
It's confusing that this.text at the moment of saving is 앐낧淧翵�ဌ䤺큕㈀ yet it saves it as 앐낧淧翵?ဌ䤺큕㈀.
As I said before, the function works fine when alone, but doesn't work in my code. I couldn't do any thing more that removing as much as possible from my code and and putting it here. Anyways, a breakpoint can be put at the function FileEditor::save and you'll find that this.text at the moment of saving is 앐낧淧翵�ဌ䤺큕㈀ and the content of the file is 앐낧淧翵?ဌ䤺큕㈀.
Code:
FileEditor is right below Main.
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.PriorityQueue;
import java.util.TreeMap;
import static pack.BitsManipulator.CHAR_SIZE_IN_BITS;
public class Main {
public static void main(String[] args) throws IOException {
String text = " ',-.:BCINSabcdefghiklmnoprstuvwy";
FileEditor fileEditor2 = new FileEditor("file.txt");
HuffmanDecorator compressor = new HuffmanDecorator(fileEditor2);
compressor.setText(text);
System.out.println(compressor.getText());
compressor.save();
}
}
class FileEditor implements BasicFileEditor {
private String filename;
private String text;
public FileEditor(String filename) throws IOException {
this.filename = filename;
File file = new File(filename);
StringBuilder builder = new StringBuilder();
if (!file.createNewFile()) {
FileReader reader = new FileReader(file);
int ch;
while ((ch = reader.read()) != -1)
builder.append((char) ch);
}
this.text = builder.toString();
}
#Override
public String getText() {
return text;
}
#Override
public void setText(String text) {
this.text = text;
}
#Override
public void save() throws IOException {
FileWriter fileWriter = new FileWriter(this.filename);
fileWriter.write(this.text);
fileWriter.close();
}
}
interface BasicFileEditor {
String getText();
void setText(String text);
void save() throws IOException;
}
abstract class FileEditorDecorator implements BasicFileEditor {
FileEditor fileEditor;
public FileEditorDecorator(FileEditor fileEditor) {
this.fileEditor = fileEditor;
}
#Override
public String getText() {
return fileEditor.getText();
}
#Override
public void setText(String text) {
fileEditor.setText(text);
}
#Override
public void save() throws IOException {
String oldText = getText();
setText(getModifiedText());
fileEditor.save();
setText(oldText);
}
protected abstract String getModifiedText();
}
class HuffmanDecorator extends FileEditorDecorator {
public HuffmanDecorator(FileEditor fileEditor) {
super(fileEditor);
}
#Override
protected String getModifiedText() {
HuffmanCodingCompressor compressor = new HuffmanCodingCompressor(getText());
return compressor.getCompressedText();
}
}
class HuffmanCodingCompressor {
String text;
public HuffmanCodingCompressor(String text) {
this.text = text;
}
public String getCompressedText() {
EncodingBuilder builder = new EncodingBuilder(text);
return builder.getCompressedText();
}
}
class Node implements Comparable<Node> {
public Node left;
public Node right;
public int value;
public Character character;
public Node(Node left, Node right, int value) {
this(left, right, value, null);
}
public Node(Node left, Node right, int value, Character character) {
this.left = left;
this.right = right;
this.character = character;
this.value = value;
}
#Override
public int compareTo(Node o) {
return this.value - o.value;
}
public boolean isLeafNode() {
return left == null && right == null;
}
Node getLeft() {
if (left == null)
left = new Node(null, null, 0);
return left;
}
Node getRight() {
if (right == null)
right = new Node(null, null, 0);
return right;
}
}
class EncodingBuilder {
private String text;
private Node encodingTree;
private TreeMap<Character, String> encodingTable;
public EncodingBuilder(String text) {
this.text = text;
buildEncodingTree();
buildEncodingTableFromTree(encodingTree);
}
private void buildEncodingTableFromTree(Node encodingTree) {
encodingTable = new TreeMap<>();
buildEncodingTableFromTreeHelper(encodingTree, new StringBuilder());
}
public void buildEncodingTableFromTreeHelper(Node root, StringBuilder key) {
if (root == null)
return;
if (root.isLeafNode()) {
encodingTable.put(root.character, key.toString());
} else {
key.append('0');
buildEncodingTableFromTreeHelper(root.left, key);
key.deleteCharAt(key.length() - 1);
key.append('1');
buildEncodingTableFromTreeHelper(root.right, key);
key.deleteCharAt(key.length() - 1);
}
}
public void buildEncodingTree() {
TreeMap<Character, Integer> freqArray = new TreeMap<>();
for (int i = 0; i < text.length(); i++) {
// improve here.
char c = text.charAt(i);
if (freqArray.containsKey(c)) {
Integer freq = freqArray.get(c) + 1;
freqArray.put(c, freq);
} else {
freqArray.put(c, 1);
}
}
PriorityQueue<Node> queue = new PriorityQueue<>();
for (Character c : freqArray.keySet())
queue.add(new Node(null, null, freqArray.get(c), c));
if (queue.size() == 1)
queue.add(new Node(null, null, 0, '\0'));
while (queue.size() > 1) {
Node n1 = queue.poll();
Node n2 = queue.poll();
queue.add(new Node(n1, n2, n1.value + n2.value));
}
encodingTree = queue.poll();
}
public String getCompressedTextInBits() {
StringBuilder bits = new StringBuilder();
for (int i = 0; i < text.length(); i++)
bits.append(encodingTable.get(text.charAt(i)));
return bits.toString();
}
public String getCompressedText() {
String compressedInBits = getCompressedTextInBits();
int remainder = compressedInBits.length() % CHAR_SIZE_IN_BITS;
int paddingNeededToBeDivisibleByCharSize = CHAR_SIZE_IN_BITS - remainder;
String compressed = BitsManipulator.convertBitsToText(compressedInBits + "0".repeat(paddingNeededToBeDivisibleByCharSize));
return compressed;
}
}
class BitsManipulator {
public static final int CHAR_SIZE_IN_BITS = 16;
public static int bitsInStringToInt(String bits) {
int result = 0;
for (int i = 0; i < bits.length(); i++) {
result *= 2;
result += bits.charAt(i) - '0';
}
return result;
}
public static String convertBitsToText(String bits) {
if (bits.length() % CHAR_SIZE_IN_BITS != 0)
throw new NumberOfBitsNotDivisibleBySizeOfCharException();
StringBuilder result = new StringBuilder();
for (int i = 0; i < bits.length(); i += CHAR_SIZE_IN_BITS)
result.append(asciiInBitsToChar(bits.substring(i, i + CHAR_SIZE_IN_BITS)));
return result.toString();
}
public static char asciiInBitsToChar(String bits) {
return (char) bitsInStringToInt(bits);
}
public static class NumberOfBitsNotDivisibleBySizeOfCharException extends RuntimeException {
}
}

� is the Unicode replacement character U+FFFD. If you encode that in a non-unicode encoding, it will get converted to a regular question mark, as non-unicode encodings can't encode all unicode characters, and this provides a "safety" (i.e. convert everything to question marks that we can't encode).
You seem to be confused about the difference between binary data and text data, leading you to look at compressed data as if it were Korean text instead of binary data. You need to store (and observe) the data as bytes, not chars or Strings.

Related

Java: Multi-line File Read Iterator

I am trying to override the next() and nextLine() methods in the LineIterator class (org.apache.commons.io). Basically I want to specify the maximum number of lines to read from the text file for each invocation (default for the base class is of course 1).
Here is the derived class that I have come up with. Unfortunately it throws a StackOverflowError exception.
import java.io.File;
import java.io.FileReader;
import java.io.Reader;
import org.apache.commons.io.LineIterator;
public class MultiLineIterator extends LineIterator{
int maxLines = 1;
public static void main(String[] args) throws Exception {
File file = new File ("/path/to/inputfile.txt");
LineIterator iterator = new MultiLineIterator(new FileReader(file), 3);
while(iterator.hasNext()) {
System.out.println(iterator.next());
}
}
public MultiLineIterator(Reader reader, int maxLines) {
super(reader);
this.maxLines = maxLines;
}
#Override
public String next() {
String retVal = null;
if(hasNext()) {
retVal = "";
}
String nextFragment = "";
for(int i = 1; i <= maxLines; i++) {
if(hasNext()) {
nextFragment = super.next();
retVal += (nextFragment + " ");
}
else
break;
}
return retVal;
}
#Override
public String nextLine() {
return next();
}
}

To fix StackOverflowError you should remove:
#Override
public String nextLine() {
return next();
}
, because you have an infinite recursion:
this:next() -> super:next() -> this:nextLine() -> this:next() -> ... and so on
Also I would suggest do not override next() method in this way it leads to inconsistent results.
I would propose to use simple counter. Increment it on next line and check the counter on hasNext :
public class MultiLineIterator extends LineIterator {
private int maxLines = 1;
private int cursor = 0;
public static void main(String[] args) throws Exception {
File file = new File("/path/inputfile.txt");
LineIterator iterator = new MultiLineIterator(new FileReader(file), 3);
while (iterator.hasNext()) {
System.out.println(iterator.next());
}
}
public MultiLineIterator(Reader reader, int maxLines) {
super(reader);
this.maxLines = maxLines;
}
#Override
public boolean hasNext() {
return (cursor < maxLines) && super.hasNext();
}
#Override
public String next() {
String next = super.next();
cursor++;
return next;
}
}
In this implementation we do not concatenate lines but just limit the number.

Programming a BST for Morse code decryption

I'm working on a project in which I'm supposed to develop a set of classes to implement a BST that allows me to code a word in Morse code and translate Morse code to alphanumeric format.
I've been provided with the following:
BST generic implementation -> https://pastebin.com/mGLY2V25
This class implements the following interface:
public interface BSTInterface<E> {
boolean isEmpty();
void insert(E element);
void remove(E element);
int size();
int height();
E smallestElement();
Iterable<E> inOrder();
Iterable<E> preOrder();
Iterable<E> posOrder();
Map<Integer, List<E>> nodesByLevel();
}
And so far I came up with the following:
class MorseTree extends BST<MorseNode> {
public MorseTree() {
init();
}
private void init() {
BufferedReader in;
InputStream inputStream = getClass().getResourceAsStream("/morse_V3.txt");
in = new BufferedReader(new InputStreamReader(inputStream));
insert(new MorseNode(new String[]{"start"}));
MorseNode last = root.getElement();
in.lines().forEach(line -> {
String[] temp = line.trim().replaceAll(" +", " ").split(" ");
MorseNode w = new MorseNode(temp);
for (int i = 0; i < temp[0].length(); i++) {
}
});
}
#Override
public void insert(MorseNode element) {
root = insert(element, root);
}
private Node<MorseNode> insert(MorseNode element, Node<MorseNode> node) {
if (node == null) {
return new Node<>(element, null, null);
}
}
}
public class MorseNode implements Comparable<MorseNode> {
private String morse;
private Character letter;
private String tipo;
public MorseNode(String[] word) {
this.morse = word[0];
this.letter = word[1].charAt(0);
this.tipo = word[2];
}
#Override
public int compareTo(MorseNode o) {
if (letter == o.letter) {
return 0;
}
}
#Override
public String toString() {
return Character.toString(letter);
}
}
(Pastebin)
To build the tree I'm supposed to read a csv/txt file and insert it. From what I've gathered the idea is that I should implement a compareTo method in class in a way that the provided insert method would automatically insert it.
That's my problem. For two weeks now I've been trying and I just can't get an idea on how to compare two Morse strings.
Any help or suggestions are appreciated!

Suffix array & Binary Search

I have been following a tutorial I found. It is however in C++ and I'm using Java so there might have been a few things lost in translation. I've tried both googling and searching here and while there seem to be plenty of asked questions I still remain stuck. Though it feels like I'm very close.
According to the tutorial, there should be a match for the pattern 'nan' but there simply is no match when I'm running it. What am I missing? Sorry for code that unformated itself when pasted.
package u1;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.Arrays;
import java.util.Scanner;
public class SuffixSort {
public Element[] processPattern(String pattern) {
Element[] patternArray = new Element[pattern.length()];
for (int i = 0; i < pattern.length(); i++) {
patternArray[i] = new Element(i, pattern.substring(i, pattern.length()));
}
Arrays.sort(patternArray);
return patternArray;
}
public void binarySearch(String text, String pattern, Element[] array) {
int left = 0, right = text.length() - 1;
int mid = 0, result;
while (left <= right) {
mid = left + (right - left) / 2;
result = pattern.compareTo(array[mid].getSuffix());
if (result == 0) {
System.out.println("Match: " + array[mid].getIndex());
return;
} else if (result < 0) {
right = mid - 1;
} else {
left = mid + 1;
}
}
}
public static void main(String[] args) {
try {
String text = "banana";
String pattern = "nan";
SuffixSort ss = new SuffixSort();
Scanner in = new Scanner(new FileReader("src/resources/100k.txt"));
/*
* while (in.hasNextLine()) { text += in.nextLine(); }
*/
Element[] suffixArray = ss.processPattern(text);
double runtime = System.nanoTime();
ss.binarySearch(text, pattern, suffixArray);
runtime = (System.nanoTime() - runtime) / 1000000;
in.close();
System.out.println(runtime);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
Other class
package u1;
public class Element implements Comparable<Element>{
private int index;
private String suffix;
public Element(int index, String suffix){
this.index = index;
this.suffix = suffix;
}
#Override
public int compareTo(Element o) {
return this.getSuffix().compareTo(o.getSuffix());
}
public int getIndex() {
return index;
}
public String getSuffix() {
return suffix;
}
public void setSuffix(String suffix) {
this.suffix = suffix;
}
}

I'm getting a Out Of Memory Error: Java heap space Exception

I am currently trying to take in a text file and read each word in the file into a binary tree the specific error i get is:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
the text file i am reading into the project was given to me by the professor for the assignment so i know this should not be running into any memory problems i have never felt with this type of exception before and don't know where to start please help. here is my code:
public class Tester {
public static void main(String[] args) throws FileNotFoundException {
Tester run = new Tester();
run.it();
}
public void it() throws FileNotFoundException {
BTree theTree = new BTree();
String str = this.readInFile();
String [] firstWords = this.breakIntoWords(str);
String [] finalWords = this.removeNullValues(firstWords);
for(int i = 0; i < finalWords.length; i++) {
theTree.add(finalWords[i]);
}
theTree.print();
}
public String readInFile() throws FileNotFoundException {
String myFile = "";
int numWords = 0;
Scanner myScan = new Scanner(new File("Dracula.txt"));
while(myScan.hasNext() == true) {
myFile += myScan.nextLine() + " ";
}
return myFile;
}
public String [] breakIntoWords(String myFile) {
String[] words = new String[myFile.length()];
String nextWord = "";
int position = 0;
int i = 0;
while(myFile.length() > position) {
char next = myFile.charAt(position);
next = Character.toLowerCase(next);
// First trim beginning
while (((next < 'a') || (next > 'z')) && !Character.isDigit(next)) {
position++;
next = myFile.charAt(position);
next = Character.toLowerCase(next);
}
// Now pull only letters or numbers until we hit a space
while(!Character.isWhitespace(next)) {
if (Character.isLetterOrDigit(next)) {
nextWord += myFile.charAt(position);
}
position++;
next = myFile.charAt(position);
}
words [i] = nextWord;
i++;
}
return words;
}
public String[] removeNullValues(String[] myWords) {
String[] justMyWords = new String[myWords.length];
for (int i = 0; i < myWords.length; i++) {
if (myWords[i] != null) {
justMyWords[i] = myWords[i];
}
}
return justMyWords;
}
}
Here's my B-tree class:
public class BTree {
private BTNode root;
private int nodeCount;
public boolean add(String word) {
BTNode myNode = new BTNode(word);
if(root == null) {
root = myNode;
nodeCount++;
return true;
}
if(findNode(word)) {
int tmp = myNode.getNumInstance();
tmp++;
myNode.setNumInstance(tmp);
return false;
}
BTNode temp = root;
while(temp != null) {
if(word.compareTo(temp.getMyWord()) < 0) {
if(temp.getRightChild() == null) {
temp.setLeftChild(myNode);
nodeCount++;
return true;
} else {
temp = temp.getRightChild();
}
} else {
if(temp.getLeftChild() == null) {
temp.setLeftChild(myNode);
nodeCount++;
return true;
} else {
temp = temp.getLeftChild();
}
}
}
return false;
}
public boolean findNode(String word) {
return mySearch(root, word);
}
private boolean mySearch(BTNode root, String word) {
if (root == null) {
return false;
}
if ((root.getMyWord().compareTo(word) < 0)) {
return true;
} else {
if (word.compareTo(root.getMyWord()) > 0) {
return mySearch(root.getLeftChild(), word);
} else {
return mySearch(root.getRightChild(), word);
}
}
}
public void print() {
printTree(root);
}
private void printTree(BTNode root) {
if (root == null) {
System.out.print(".");
return;
}
printTree(root.getLeftChild());
System.out.print(root.getMyWord());
printTree(root.getRightChild());
}
public int wordCount() {
return nodeCount;
}
}
And my B-tree node class:
public class BTNode {
private BTNode rightChild;
private BTNode leftChild;
private String myWord;
private int numWords;
private int numInstance;
private boolean uniqueWord;
private boolean isRoot;
private boolean isDeepest;
public BTNode(String myWord){
this.numInstance = 1;
this.myWord = myWord;
this.rightChild = null;
this.leftChild = null;
}
public String getMyWord() {
return myWord;
}
public void setMyWord(String myWord) {
this.myWord = myWord;
}
public BTNode getRightChild() {
return rightChild;
}
public void setRightChild(BTNode rightChild) {
this.rightChild = rightChild;
}
public BTNode getLeftChild() {
return leftChild;
}
public void setLeftChild(BTNode leftChild) {
this.leftChild = leftChild;
}
public int getnumWords() {
return numWords;
}
public void setnumWords(int numWords) {
this.numWords = numWords;
}
public boolean isUniqueWord() {
return uniqueWord;
}
public void setUniqueWord(boolean uniqueWord) {
this.uniqueWord = uniqueWord;
}
public boolean isRoot() {
return isRoot;
}
public void setRoot(boolean isRoot) {
this.isRoot = isRoot;
}
public boolean isDeepest() {
return isDeepest;
}
public void setDeepest(boolean isDeepest) {
this.isDeepest = isDeepest;
}
public int getNumInstance() {
return numInstance;
}
public void setNumInstance(int numInstance) {
this.numInstance = numInstance;
}
}

This little file should not be the reason for the OutOfMemory error.
Performance
That is no error, but if you want to read a whole file in the memory
don't read line per line and concatenate the strings. This slows down your programm.
You can use:
String myFile = new String(Files.readAllBytes(Paths.get("Dracula.txt")));
myFile = myFile.replaceAll("\r\n", " ");
return myFile;
That is also not superfast, but faster.
Now the Errors
word array is too large
public String[] breakIntoWords(String myFile) {
String[] words = new String[myFile.length()];
You define words as an array of lengh lenght of file . That is much too large if you
the name is mnemonic and means that you need an array of length count of words in file
nextWord is never resetted (Cause of OutOfMemory)
// Now pull only letters or numbers until we hit a space
while (!Character.isWhitespace(next)) {
if (Character.isLetterOrDigit(next)) {
nextWord += myFile.charAt(position);
}
position++;
next = myFile.charAt(position);
}
words[i] = nextWord;
i++;
because next word is never set to "" after assigning it to words[i]. So that next word grow
up word by word and your array contents looks like as:
words[0] = "Word1"
words[1] = "Word1Word2"
words[2] = "Word1Word2Word3"
As you can imagine, that will result in an very large amount of used space.

When you are building the tree, you are inserting nodes in the wrong side when you should insert the element to the right.
You should replace this code at BTree class:
while(temp != null) {
if(word.compareTo(temp.getMyWord()) < 0) {
if(temp.getRightChild() == null) {
temp.setRightChild(myNode); // <-- You were using setLeftChild()
nodeCount++;
return true;
} else {
temp = temp.getRightChild();
}
....
}
You are probably creating a huge tree with all the elements to the left side and getting the OutOfMemoryError

Add VM arguments :
-Xms<size> set initial Java heap size
-Xmx<size> set maximum Java heap size
-Xss<size> set java thread stack size
or run it using : java -Xmx256m yourclass.java

It depends on various factors.
Amount of java heap you are running with (default values differ for 32 bit and 64 bit JDK)
Size of the file you feed to the java program

You are trying to load entire contents of the file(i.e. stream object) into Java Memory. In such case, your file size limited(i.e small) Then above code will work in your limited memory but if the file size is increased(i.e. Contents of the file is increased). Then you will face issue.
You have to follow better approach to solve this problem by reading the file contents in chuck. Otherwise you will face same issue.
If you increase JVM arguments also won't work for larger files.
I feel your professor also testing the implementation of your project.

automation of data format conversion to parent child format

This is an excel sheet which has only a single column filled for each row.
(explanation : all CITY categories fall under V21 , all handset categories fall under CityJ and so on )
V21
CITYR
CITYJ
HandsetS
HandsetHW
HandsetHA
LOWER_AGE<=20
LOWER_AGE>20
SMS_COUNT<=0
RECHARGE_MRP<=122
RECHARGE_MRP>122
SMS_COUNT>0
I need to change this format to a double column format
with parent and child category format.
therefore
the output sheet would be
V21 CITYR
V21 CITYJ
CITYJ HandsetS
CITYJ HandsetHW
CITYJ HandsetHA
HandsetHA LOWER_AGE<=20
HandsetHA LOWER_AGE>20
LOWER_AGE>20 SMS_COUNT<=0
SMS_COUNT<=0 RECHARGE_MRP<=122
SMS_COUNT<=0 RECHARGE_MRP>122
LOWER_AGE>20 SMS_COUNT>0
the datas are huge so i cant do them manually . how can i automate this ?

There are 3 parts of the task so I want to know what is that you are asking help about.
Reading excel sheet data into Java
Manipulating data
Writing data back into the excel sheet.
You have said that the data sheet is large and cannot be pulled as a whole into memory. Can I ask you how many top level elements do you have ? i.e, How many V21s do you have? If it is just ONE, then how many CITYR/CITYJ do you have?
--
Adding some source code from my previous answer about how to manipulate data. I gave it an input file which was separated by tabs (4 spaces equals to one column for you in excel) and the following code printed stuff out neatly. Please note that there is a condition of level == 1 left empty. If you think ur JVM has too many objects, you could clear the entries and stack at that point :)
package com.ekanathk;
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Stack;
import java.util.logging.Logger;
import org.junit.Test;
class Entry {
private String input;
private int level;
public Entry(String input, int level) {
this.input = input;
this.level = level;
}
public String getInput() {
return input;
}
public int getLevel() {
return level;
}
#Override
public String toString() {
return "Entry [input=" + input + ", level=" + level + "]";
}
}
public class Tester {
private static final Logger logger = Logger.getLogger(Tester.class.getName());
#SuppressWarnings("unchecked")
#Test
public void testSomething() throws Exception {
InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream("samplecsv.txt");
BufferedReader b = new BufferedReader(new InputStreamReader(is));
String input = null;
List entries = new ArrayList();
Stack<Entry> stack = new Stack<Entry>();
stack.push(new Entry("ROOT", -1));
while((input = b.readLine()) != null){
int level = whatIsTheLevel(input);
input = input.trim();
logger.info("input = " + input + " at level " + level);
Entry entry = new Entry(input, level);
if(level == 1) {
//periodically clear out the map and write it to another excel sheet
}
if (stack.peek().getLevel() == entry.getLevel()) {
stack.pop();
}
Entry parent = stack.peek();
logger.info("parent = " + parent);
entries.add(new String[]{parent.getInput(), entry.getInput()});
stack.push(entry);
}
for(Object entry : entries) {
System.out.println(Arrays.toString((String[])entry));
}
}
private int whatIsTheLevel(String input) {
int numberOfSpaces = 0;
for(int i = 0 ; i < input.length(); i++) {
if(input.charAt(i) != ' ') {
return numberOfSpaces/4;
} else {
numberOfSpaces++;
}
}
return numberOfSpaces/4;
}
}

This considers that you have a file small enough to fit in computer memory. Even 10MB file should be good.
It has 2 parts:
DataTransformer which does all the
required transformation of data
TreeNode is custom simple Tree data
structure
public class DataTransformer {
public static void main(String[] args) throws IOException {
InputStream in = DataTransformer.class
.getResourceAsStream("source_data.tab");
BufferedReader br = new BufferedReader(
new InputStreamReader(in));
String line;
TreeNode root = new TreeNode("ROOT", Integer.MIN_VALUE);
TreeNode currentNode = root;
while ((line = br.readLine()) != null) {
int level = getLevel(line);
String value = line.trim();
TreeNode nextNode = new TreeNode(value, level);
relateNextNode(currentNode, nextNode);
currentNode = nextNode;
}
printAll(root);
}
public static int getLevel(String line) {
final char TAB = '\t';
int numberOfTabs = 0;
for (int i = 0; i < line.length(); i++) {
if (line.charAt(i) != TAB) {
break;
}
numberOfTabs++;
}
return numberOfTabs;
}
public static void relateNextNode(
TreeNode currentNode, TreeNode nextNode) {
if (currentNode.getLevel() < nextNode.getLevel()) {
currentNode.addChild(nextNode);
} else {
relateNextNode(currentNode.getParent(), nextNode);
}
}
public static void printAll(TreeNode node) {
if (!node.isRoot() && !node.getParent().isRoot()) {
System.out.println(node);
}
for (TreeNode childNode : node.getChildren()) {
printAll(childNode);
}
}
}
class TreeNode implements Serializable {
private static final long serialVersionUID = 1L;
private TreeNode parent;
private List<TreeNode> children = new ArrayList<TreeNode>();
private String value;
private int level;
public TreeNode(String value, int level) {
this.value = value;
this.level = level;
}
public void addChild(TreeNode child) {
child.parent = this;
this.children.add(child);
}
public void addSibbling(TreeNode sibbling) {
TreeNode parent = this.parent;
parent.addChild(sibbling);
}
public TreeNode getParent() {
return parent;
}
public List<TreeNode> getChildren() {
return children;
}
public String getValue() {
return value;
}
public int getLevel() {
return level;
}
public boolean isRoot() {
return this.parent == null;
}
public String toString() {
String str;
if (this.parent != null) {
str = this.parent.value + '\t' + this.value;
} else {
str = this.value;
}
return str;
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why saving the char '�' to a file saves it as '?'? - java

Related

Java: Multi-line File Read Iterator

Programming a BST for Morse code decryption

Suffix array & Binary Search

I'm getting a Out Of Memory Error: Java heap space Exception

automation of data format conversion to parent child format

Categories

Resources