I have a text file that contains data that was saved using the cvs encryption and I want to open it in java and display it lined up perfectly. I have come to the point of reading it from the text file but now I want it to be split at the commas and now I want to display it all perfectly aligned.
Last, First, car year, car model
barry, john, 1956, chevy impala
and I want it to display like this:
last First car year car model
barry john 1956 chevy impala
and I am just using the scanner class to get the data from the text file.
Determine the max lengths of the column values (including column headers), then create a format String and use that format string to build the aligned rows:
// some easy magic first
String[][] values = getCsvValues(file);
int[] maxLengths = determineMaxLengths(values);
// create formatstring, something like "%10s %5s %10s %n"
StringBuilder formatBuilder = new StringBuilder();
for (int maxLength:maxLengths)
formatBuilder.append("%").append(maxLength).append("s ");
formatBuilder.append("%n"); // newline
// output
for (String[] row:values)
System.out.printf(formatBuilder.toString, row);
depending on how certain you need to be that it all lines up, you might have to go through and find the longest string for each column, but I would use tabs: "\t". Two or three between columns usually works for me, for simple debugging that's what I use.
If you're really serious about it being right all the way down, try looking at Formatters and printf.
Related
basically i have this java assignment in which I need to take the following text file:
https://drive.google.com/file/d/1NqWtApSHovOfSXzVCeU_GPtsCo_m6ZtJ/view?usp=sharing
The legend for the text file is here:
E: did the elector result in promotion (1) or not (0)
T: time election was closed
U: user id (and screen name) of editor that is being considered
for promotion
N: user id (and screen name) of the nominator
V: vote(1:support, 0:neutral, -1:oppose), user id, time, screen_name
And create the following two methods:
Given a user id, output the total number of times the user has voted to support or be neutral or oppose the candidate considered for promotion
for all people the user has voted for collectively in all the elections.
Given a user id considered for promotion, output the user id and
screen name of the nominator. For multiple nominations, you will
output all nominators. If the user is not nominated ever, output
an appropriate message.
I'm really lost on how I should go about splitting the text file into pieces to help me obtain the information needed to create the two methods. Any insight would really be great
Basically you need to create different string reader loops that iterate over the text file line by line. Nobody here will do this for you because it´s to much work, and it´s your work!
Here´s a discussion about iterating-over-the-content-of-a-text
You can use the str.split() method. You write the point in the text at which you would like to split in the brackets.
For example if you have the following;
String str "Hello world";
String [] arrayString = str.split (" ");// splits the string at the space
for( String a:arrayString);
System.out.println(a);
Output:
Hello
World
In my Java game, I would like to be able to display the user's name, win score and lose score when it's game over.
For example:
Even after the game has been exited and then recompiled and run again, the info from the text file would be read into the program and added to the arrays. At the end of that game, the list will grow longer and then the text file will have the updated info for the next game.
Thanks so much in advance
assuming the arrays are of the same length and each individual index is associated with the same player just use one for loop and write each item at the index in the three arrays to a file separated by commas then you can read back into file the same way you wrote into the file
PrintWriter writer = new PrintWriter("the-file-name.txt", "UTF-8");
writer.println("NAME W L");
writer.println("--------");
for(int i=0; i<a.length; i++){
writer.println(a[i] + ", " + b[i] + ", " + c[i] );
}
Then before each game just read the text file back into to the arrays splitting on the commas and trimming white space and taking each string leftover and writing it back into the right arrays(make sure you cast to the right types if you want ints). Also you should probably use ArrayLists instead of arrays if they are going to be growing and you don't know the size they are going to grow to.
There are many ways to read and write to files but this is probably the general pattern you want to follow for your needs.
I currently have a huge csv file. which contains reddit post titles. I would like to create a feature vector for each post.
suppose the post tile is "to be or not to be" and it belongs to "some_category".
the csv file is in the below format.
"some_category1", "some title1"
"some_category2", "some title2"
"some_category1", "some title3"
I would like to create a feature vector as below.
"some_category" : to(2) be(2) or(1) not(1).
I need to do this whole thing on hadoop. I am stuck at the very first step, How do i convert each line into a feature vector(I feel its similar to word count but how do i apply it for each line).
My initial thoughts towards this problem was key to each line(i.e. each post's title and category) is the category of the post and the value is the feature vector of the title (i.e. word count of the title.).
Any help is appreciated regarding how to approach this problem.
To answer your first part:
Reading a csv linewise in Hadoop has been answered in this post:StackOverflow:how-to-read-first-line-in-hadoop-hdfs-file-efficiently-using-java.
Just change the last line to:
final Scanner sc = new Scanner(input);
while (sc.hastNextLine()) {
//doStuff with sc.nextLine()!
}
To create a feature vector, I would use your mentioned counting strategy:
/**
* We will use Java8-Style to do that easily
* 0) Split each line by space separated (split("\\s")
* 1) Create a stream: Arrays.stream(Array)
* 2) Collect the input (.collect) and group it by every identical word (Function.identity) to the corresponding amount (Collectors.counting)
*
* #param title the right hand side after the comma
* #return a map mapping each word to its count
**/
private Map<String, Long> createFeatureVectorForTitle(String title) {
return Arrays.stream(title.split("\\s").collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}
Your idea for keying each category to the created feature vector sounds legit. Although I'm not too familiar with Hadoop, perhaps somebody can point out a better solution.
I solved it using two map reduce functions and adding index to make each row unique to process.
1, "some_category1", "some title1"
2, "some_category2", "some title2"
3, "some_category1", "some title3"
The output of first map reduce
"1, some_category" to 2
"1, some_category" be 2
"1, some_category" or 3
"1, some_category" not 1
where index and category are the keys to the values i.e. words in the title.
In the second map reduce it final output is of this format.
"some_category" : to(2) be(2) or(1) not(1).
I'm a pretty newbie programmer and basically I'm trying to parse and manipulate a DL_POLY config file, which has the layout
CONFIG file created from Xmol file config.xmol
2 3 10000000 0.5000000000E-03
31.309729731729 0.000000000000 0.000000000000
0.000000000000 31.309729731729 0.000000000000
0.000000000000 0.000000000000 31.309729731729
Ca 1
6.421269411 -1.034199034 1.228702751
-1.06475894897 1.10274459622 1.31459311620
-6319.67959205 -10299.4183311 468.606019012
which sort of goes on for about 150 odd more entries of just the
Ca 1
6.421269411 -1.034199034 1.228702751
-1.06475894897 1.10274459622 1.31459311620
-6319.67959205 -10299.4183311 468.606019012
segment, where the second row represents x, y and z coordinates, which I need to manipulate by adding a slight displacement to, and the top row, where Ca represents the atom (in this instance, calcium) and the integer is an atom counter (this is the first atom, I have a system of about 75 CaCO3).
Now I've written some java code which reads in the string, sticks it in an arrayList and tokenises it and from there I'm pretty sure how to add the displacement only maintaining this weird formatting complicates it all. Obviously I'm aiming for as general a solution as I can get here, so I can reuse this, whilst I'm sure I could force it into the correct format, it means I can only ever use it for that file.
So, my questions are, how can I manipulate values in a file in java, keeping the format 100% intact? And within this system, how can I tell it to add the displacement on only the second row of each segment?
It's a bit complicated (or maybe not, I really don't know) but I would really appreciate some help.
So, I've got something like this:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Scanner;
import java.util.ArrayList;
import java.io.FileReader;
public class testArrayReader {
static ArrayList<String> temp = new ArrayList<String>();
public static void main(String[] args) {
String[] arr = null;
String[][] twodim = null;
System.out.println("Array List initialised!");
try{
FileReader input = new FileReader(urlfortextfile);
BufferedReader reader = new BufferedReader(input);
System.out.println("Scanned!");
String line;
int onedimcounter = 0;
while((line = reader.readLine()) != null){
temp.add(onedimcounter++, line);
}
System.out.println(temp);
twodim = temp.toArray(new String[temp.size()][temp.get(0).length()]);
System.out.println("stage 2 complete");
System.out.println(twodim);
}
catch(FileNotFoundException ex){
System.out.println("No file found boss.");
}
catch(IOException ex){
System.out.println("IO error.");
}
}
}
Few more queries,
1) [1st line, 2nd line, ..., nth line] - the comma denotes that the first and second line are separate elements, right?
2) I'm getting an ArrayStoreException and I'm really not 100% sure why - the documentation mentioned something about a casting error, so I'm assuming my arraylist items are still stuck as objects. How do I fix this?
3) Current plan for modification is to list the element index in the final array, modify and reprint, but I've chunked it line by line to preserve the formatting. Need a bit of conformation I'm on the right track here, my idea was to parse the line for doubles, do what I need to do and then try and get the computer to count the number of whitespaces between digits and replace build a string, which then I can just reinsert. Something like a counter with an if statement based off of some boolean looking for white space, then the counter will insert " " when I concatenate the final string.
Cheers.
First, parse the file to a table of values with associated position-in-file metadata.
Second implement all mutations on that table in terms of atomic duplication/insertion/removal of cells/rows/columns which also update position-in-file.
Third, implement a table serialize operator which takes in the old content so that you can look up the white-space between data lines and between cells within a line, and so you can deduce the number format (number of sig digits) from the old file when serializing changed numeric values.
how do I find and parse the position in file metadata?
To associate position information, keep track of
/** Number of line breaks since start of file */
int lineNumber;
/** Number of chars since start of file */
int charInFile;
/** Number of chars since start of line (if on the zero-th line) or last line break. */
int charInLine;
Then with each token, associate the position before the first character, and the position after the last character in the token.
When you parse a complex construct like a table, table row, or table cell, store with it the position before the first token that it spans, and the position after the last token it spans.
what's a table serialize operator? I know of serialization just not that
By operator, I just means part of a programming language that allows you to specify a relation between inputs and outputs. I use it to avoid language-specific jargon like function, method, or procedure.
how do you enter a return key in stack overflow
See "What is the reason for the top secret two space newline markdown weirdness?"
I have an app that will create 3 arrays : 2 with double values and one with strings that can contain anything,alphanumeric,commas,points,anything the user might want to type or type by accident. The double arrays are easy.The string one i find to be tricky.
It can contain stuff like cake red,blue 1kg paper-clip,you get the ideea.
I will need to store those arrays somehow(i guess in a file is the easiest way),read them and get them back into the app whenever the user wants to.
Also,it would be well if they wouldn't be human readable,to only be able to read them thru my app.
What's the best way to do this ? My issue is,how can i read them back into arrays.Its easy to write to a file but then to get them back in the same array i put them in...How can i separate array elements for it not to split one element in two because it has a space or any other element.
Can i like,make 3 rows of text,each element split by a tab \t or something and when i read it each element will by split by that tab ? Will this be able to create any issues when reading ?
I guess i want to know how can i split the elements of the array so that it won't be able to ever read them wrong.
Thanks and have a nice day !
If you don't want the file to be human readable, you could usejava.io.RandomAccessFile.
You would probably want to specify a maximum string size if you did this.
To save a string:
String str = "hello";
RandomAccessFile file = new RandomAccessFile(new File("filename"));
final int MAX_STRING_BYTES = 100; // max number of bytes the string could use in the file
file.writeUTF(str);
file.skipBytes(MAX_STRING_BYTES - str.getBytes().length);
// then write another..
To read a string:
// instantiate again
final int STRING_POSITION = 100; // or whichever place you saved it
file.seek(STRING_POSITION);
String str = new String(file.read(MAX_STRING_BYTES));
You would probably want a use the beginning of the file to store the size of each array. Then just store all the values one by one in the file, no need for separators.