I'm quite new to programming things in Java, especially doing it to create a Excel file. But maybe someone could help me with this problem.
I currently created via Apache Poi and Eclipse a spreadsheet in Excel. In there I got 3 columns and 40 Rows. These are filled with coordinates ( x and y-coordinates) and their names (in my case, 1 - 40). Now that I finally got these random numbers, I want to create a distance matrix (with euclidean distance) between those points.
For Example I want to have it look like:
1 2 3
1 0 1 2
2 1 0 4
3 2 4 0
I'm not sure how to get this created random numbers and to calculate with them. As well as I'm not sure how to implement the formular for the euclidean distance. It would be awesome if someone could help me! Thanks in advance!
Here is my code so far:
import java.io.File;
import java.io.FileOutputStream;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.ss.util.CellRangeAddress;
import java.util.Random;
public class poiexample {
public static void main(String[] args) throws Exception {
XSSFWorkbook Datei = new XSSFWorkbook();
FileOutputStream out = new FileOutputStream (new File ("Dateien.xlsx"));
for(int i=0;i<101;i++)
{ XSSFSheet Blatt = Datei.createSheet("Tabelle" + i);
XSSFRow row1 = Blatt.createRow(0);
row1.createCell(2).setCellValue("x");
row1.createCell(3).setCellValue("y");
for(int j=0; j<25; j++) {
XSSFRow row = Blatt.createRow(j+1);
row.createCell(0).setCellValue("P"+j);
row.createCell(1).setCellValue(j+1);
row.createCell(2).setCellValue(Math.round(Math.random()*10));
row.createCell(3).setCellValue(Math.round(Math.random()*10));
}
for(int j=25; j<40; j++) {
XSSFRow row = Blatt.createRow(j+1);
row.createCell(0).setCellValue("D");
row.createCell(1).setCellValue(j+1);
row.createCell(2).setCellValue(Math.round(Math.random()*10));
row.createCell(3).setCellValue(Math.round(Math.random()*10));
}
}
try {
Datei.write(out);
out.close();
}
catch(Exception e) {
System.out.println(e);
}
System.out.println("Excel file created");
}
}
I See you are generating the co-ordinates and writing into excel sheet. Then you want to create the euclidean matrix.
I suggest, first you have these co-ordinates in a 2D array and arrive at the algorithm to calculate the euclidean distance matrix. Then it is just a matter of writing all the co-ordinates and the distance matrix into excel file.
If you have the co-ordinates already in an excel, just read the excel and populate the 2D array and then call your routine to calculate and generate the euclidean matrix
Of course there may be better solutions, this what I can think of right now.
Related
Say there is a file too big to be put to memory. How can I get a random line from it? Thanks.
Update:
I want to the probabilities of getting each line to be equal.
Reading the entire file if you want only one line seems a bit excessive. The following should be more efficient:
Use RandomAccessFile to seek to a random byte position in the file.
Seek left and right to the next line terminator. Let L the line between them.
With probability (MIN_LINE_LENGTH / L.length) return L. Otherwise, start over at step 1.
This is a variant of rejection sampling.
Line lengths include the line terminator character(s), hence MIN_LINE_LENGTH >= 1. (All the better if you know a tighter bound on line length).
It is worth noting that the runtime of this algorithm does not depend on file size, only on line length, i.e. it scales much better than reading the entire file.
Here's a solution. Take a look at the choose() method which does the real thing (the main() method repeatedly exercises choose(), to show that the distribution is indeed quite uniform).
The idea is simple: when you read the first line it has a 100% chance of being chosen as the result. When you read the 2nd line it has a 50% chance of replacing the first line as the result. When you read the 3rd line it has a 33% chance of becoming the result. The fourth line has a 25%, and so on....
import java.io.*;
import java.util.*;
public class B {
public static void main(String[] args) throws FileNotFoundException {
Map<String,Integer> map = new HashMap<String,Integer>();
for(int i = 0; i < 1000; ++i)
{
String s = choose(new File("g:/temp/a.txt"));
if(!map.containsKey(s))
map.put(s, 0);
map.put(s, map.get(s) + 1);
}
System.out.println(map);
}
public static String choose(File f) throws FileNotFoundException
{
String result = null;
Random rand = new Random();
int n = 0;
for(Scanner sc = new Scanner(f); sc.hasNext(); )
{
++n;
String line = sc.nextLine();
if(rand.nextInt(n) == 0)
result = line;
}
return result;
}
}
Either you
read the file twice - once to count the number of lines, the second time to extract a random line, or
use reservoir sampling
Looking over Itay's answer, it looks as though it reads the file a thousand times over after sampling one line of the code, whereas true reservoir sampling should only go over the 'tape' once. I've devised some code to go over code once with real reservoir sampling, based on this and the various descriptions on the web.
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.List;
public class reservoirSampling {
public static void main(String[] args) throws FileNotFoundException, IOException{
Sampler mySampler = new Sampler();
List<String> myList = mySampler.sampler(10);
for(int index = 0;index<myList.size();index++){
System.out.println(myList.get(index));
}
}
}
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.Scanner;
public class Sampler {
public Sampler(){}
public List<String> sampler (int reservoirSize) throws FileNotFoundException, IOException
{
String currentLine=null;
//reservoirList is where our selected lines stored
List <String> reservoirList= new ArrayList<String>(reservoirSize);
// we will use this counter to count the current line number while iterating
int count=0;
Random ra = new Random();
int randomNumber = 0;
Scanner sc = new Scanner(new File("Open_source.html")).useDelimiter("\n");
while (sc.hasNext())
{
currentLine = sc.next();
count ++;
if (count<=reservoirSize)
{
reservoirList.add(currentLine);
}
else if ((randomNumber = (int) ra.nextInt(count))<reservoirSize)
{
reservoirList.set(randomNumber, currentLine);
}
}
return reservoirList;
}
}
The basic premise is that you fill up the reservoir, and then go back to it and fill in random lines with a 1/ReservoirSize chance. I hope this provides more efficient code. Please let me know if this doesn't work for you, as I've literally knocked it up in half an hour.
Use RandomAccessFile:
Construct a RandomAccessFile, file
Get the length of that file, filelen, by calling file.length()
Generate a random number, pos, between 0 and filelen
Call file.seek(pos) to seek to the random position
Call file.readLine() to get to the end of the current line
Read the next line by calling file.readLine() again
Using this method, I've been sampling lines from the Brown Corpus at random, and can easily retrieve a 1000 random samples from randomly chosen files in a few seconds. If I tried to do the same by reading through each file line-by-line it would take me much longer.
The same principle can be used for selecting random elements from a list. Rather than reading through the list and stopping at a random place, if you generate a random number between 0 and the length of the list, then you can index directly into the list.
Reading a random line from a file in java:
public String getRandomLineFromTheFile(String filePathWithFileName) throws Exception {
File file = new File(filePathWithFileName);
final RandomAccessFile f = new RandomAccessFile(file, "r");
final long randomLocation = (long) (Math.random() * f.length());
f.seek(randomLocation);
f.readLine();
String randomLine = f.readLine();
f.close();
return randomLine;
}
Use a BufferedReader and read line wise. Use the java.util.Random object to stop randomly ;)
The same question was asked here a few years ago:
how to remove all formulas from an excel sheet by java POI api?.
However, it did not receive an answer at the time that works for me.
I have a workbook with several large sheets and want to loop over all cells to replace the cell contents with strings. The problem is, many cells contain formulas which I have to get rid of first. Neither cell.setCellFormula(null) nor cell.setCellType(CellType.STRING) (nor BLANK) is satisfying, as the underlying processes to remove array formulas take ages and make the entire job far too slow.
The following works but leaves a corrupt excel workbook which can only be opened with a repairing step on the first time:
Method m = XSSFCell.class.getDeclaredMethod("setBlank");
m.setAccessible(true);
m.invoke(cell);
Is there any other fast and cleaner way to simply set certain cells blank, regardless of any formulas?
The problem why the corrupted workbook occurs is that there is a calculation chain stored in /xl/calcChain.xml. The normal slow methods to remove the formulas will updating this calculation chain. But, as you found already, they also attempt to be usable for removing single formulas only and not all. So they must be carefully while removing parts of array formulas which makes them slow.
But if really all formulas shall be removed, this carefulness is not necessary and then simply the whole /xl/calcChain.xml can be removed.
Example:
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.xssf.model.CalculationChain;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellFormulaType;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.POIXMLDocumentPart;
import java.lang.reflect.Method;
class ExcelRemoveFormulasAndCalcChain {
private static void removeCalcChain(XSSFWorkbook workbook) throws Exception {
CalculationChain calcchain = workbook.getCalculationChain();
Method removeRelation = POIXMLDocumentPart.class.getDeclaredMethod("removeRelation", POIXMLDocumentPart.class);
removeRelation.setAccessible(true);
removeRelation.invoke(workbook, calcchain);
}
public static void main(String[] args) throws Exception {
XSSFWorkbook workbook = (XSSFWorkbook)WorkbookFactory.create(new FileInputStream("Test.xlsx"));
for (Sheet sheet : workbook) {
for (Row row : sheet) {
for (Cell cell : row) {
XSSFCell xssfcell = (XSSFCell)cell;
if (xssfcell.getCTCell().isSetF() && xssfcell.getCTCell().getF().getT() != STCellFormulaType.DATA_TABLE) {
xssfcell.getCTCell().unsetF();
}
}
}
}
removeCalcChain(workbook);
workbook.write(new FileOutputStream("Test_1.xlsx"));
workbook.close();
}
}
This should remove all formulas and let all cells back containing only the values and styles.
I think I was able to find how to remove formulas in some cell range.
I noticed that if I delete sheet first formulas with link to it are deleted fast.
If I swap deleting of formulas and deleting of sheets it takes a lot of time.
So if we create a sheet, rewrite all formulas using link to it, and delete sheet, formulas are removed fast (setting formulas with link to non-existing sheet doesn't work).
It takes seconds for 15k+ rows. Here is the experiment:
File fReport = new File(".xlsx");
XSSFWorkbook book = new XSSFWorkbook(new FileInputStream(fReport));
XSSFSheet sheet = book.getSheet("");
XSSFSheet dummy = book.createSheet("dummy");
int lastRow = sheet.getLastRowNum();
for (int i = 8; i <= lastRow; i++) {
XSSFRow rowToClean = sheet.getRow(i);
XSSFCell cell = rowToClean.getCell(2);
System.out.println(i);
if (cell != null) {
cell.setCellFormula("'dummy'!A1");
}
}
book.removeSheetAt(book.getSheetIndex(dummy));
for (int i = 8; i <= lastRow; i++) {
XSSFRow rowToClean = sheet.getRow(i);
XSSFCell cell = rowToClean.getCell(2);
System.out.println(i);
if (cell != null) {
cell.removeFormula();
}
}
book.write(new FileOutputStream(fReport));
book.close();
My thoughts/Questions:
I'm working on a Java challenge(Directions bellow). I have Part 1 finished(shown in the code bellow). I'm very close to having Part 2/3 finished.
As you'll see in my code I have 2 for-loops. The first, to iterate through my array of sorted names. The second, to iterate through the characters in each name.
As stated in the directions, an int values is to be generated for each character, and these values are then added. So, A is 1, B is 2, C is 3...and ABC is 6. This int values is then multiplied by the index number of the given String/name. So, if ABC(with a value of 6) was at index 2, it's score would be 12.
After the above step is complete, I am to total all of the all of the scores(each names score).
The above is my understanding of the directions.
The problem is my output looks like this:
"AARON"
0
"ABBEY"
-25
"ABBIE"
-82
"ABBY"
-90
"ABDUL"
-80
"ABE"
-260
"ABEL"
-240
"ABIGAIL"
-133
"ABRAHAM"
-128
"ABRAM"
-225
"ADA"
-540
"ADAH"
-506
"ADALBERTO"
216
"ADALINE"
-182
"ADAM"
-574
"ADAN"
-600
"ADDIE"
-592
"ADELA"
-629
I've ran through my logic a few times and it seems correct to me, but I don't know how I'm generating these numbers. The only thought I have is that the quotation marks(") are throwing off my calculations. They have an ASCII value of 34. I have attempted to remove them at multiple places in my code with both replace() & replaceAll(), but I have not been able too.
What am I doing wrong/how can I fix it/what do I need to do to complete this assignment/how can I improve my code?
Challenge Directions:
Use the names.txt file, a 46K text file containing over five-thousand first names found in the resources directory.
Part 1: Begin by sorting the list into alphabetical order. Save this new file as p4aNames.txt in the answers directory.
Part 2: Using p4aNames.txt, take the alphabetical value for each name, and multiply this value by its alphabetical position in the list to obtain a name score. For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 × 53 = 49714. Save the list of all name scores as p4bNames.txt.
Part 3: What is the total of all the name scores in the file?
Pic Link Showing Output & Directory:
http://screencast.com/t/tiiBoyOpR
My Current Code:
package app;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Arrays;
public class AlphabetizedList {
public static void main() throws IOException {
new AlphabetizedList().sortingList();
}
public void sortingList() throws IOException {
FileReader fb = new FileReader("resources/names.txt");
BufferedReader bf = new BufferedReader(fb);
String out = bf.readLine();
out = out.substring(out.indexOf("\"")); //get rid of strange characters appearingbeforefirstname
// System.out.println(out); // output:
// "MARY","PATRICIA","LINDA","BARBARA","ELIZABETH","JENNIFER","MARIA"...
String[] sortedStr = out.split(",");
Arrays.sort(sortedStr);
PrintWriter pw = new PrintWriter(new BufferedWriter(new FileWriter("answers/p4aNames.txt")));
for (int i = 0; i < sortedStr.length; i++) {
pw.println(sortedStr[i]);
System.out.println(sortedStr[i]);// print to console just to see output
int score = 0;
// sortedStr[i].replaceAll("\"", ""); // I used this to try to remove the "s from my Strings
for (char ch: sortedStr[i].toUpperCase().toCharArray()) {
score += ((int)ch - 64); /* A is decimal 65 */
}
score = score * i; /* multiply by position in the list */
pw.println(score);
System.out.println(score);
}
bf.close();
fb.close();
pw.close();
}
}
You wrote
// sortedStr[i].replaceAll("\"", ""); // I used this to try to remove the "s from my Strings
Java String is immutable. That function returns a new string with the quotes removed. You can use
sortedStr[i] = sortedStr[i].replaceAll("\"", "");
and it should work fine.
How can we apply PCA to a one dimensional array ?
double[][] data = new double [1][600];
PCA pca = new PCA(data, 20);
data = pca.getPCATransformedDataAsDoubleArray();
When a print the values in data array, the features in the data array decrease 600 to 20, but all values zero.
Why?
package VoiceRecognation;
import Jama.Matrix;
import comirva.data.DataMatrix;
import comirva.util.PCA;
import javax.print.attribute.standard.Finishings;
import java.io.File;
/**
* Created by IntelliJ IDEA.
* User: SAHIN
* Date: 11.06.2011
* Time: 19:33
* To change this template use File | Settings | File Templates.
*/
public class Deneme {
public static void main(String[] args) {
int[] group = Groups.getGroups();
File[] files = Files.getFiles();
double[][] data = FindMfccOfFiles.findMFCCValuesOfFiles(files);
PCA pca = new PCA(data, 20);
data = pca.getPCATransformedDataAsDoubleArray();
File file = new File("src/main/resources/Karisik/E-Mail/(1).wav");
double[] testdata = MFCC.getMFCC(file);
double[][] result = new double[1][600];
result[0] = testdata;
PCA p = new PCA(result, 20);
double [][] sum = p.getPCATransformedDataAsDoubleArray();
for (int i = 0; i < sum[0].length; i++) {
System.out.print(sum[0][i] + " ");
}
}
}
Principal component analysis is used for reducing the dimensionality of your problem. The dimensions of the audio file are the channels (e.g. left speaker, right speaker), not the individual samples. In that case, you really have only one dimension for a mono audio stream. So, you're not going to reduce the number of samples using PCA, but you could reduce the number of channels in the audio. But you could do that without PCA just by averaging the samples on each channel. So unless you're trying to convert stereo audio into mono, I think you need a different approach to your problem.
You overwrite the data array with the result of the method getPCATransformedDataAsDoubleArray. I assume, this is an array with 20 entries because of the constructor arg. I don't know, why all values are zero, i think, because it's defined in the class PCA.
Say there is a file too big to be put to memory. How can I get a random line from it? Thanks.
Update:
I want to the probabilities of getting each line to be equal.
Reading the entire file if you want only one line seems a bit excessive. The following should be more efficient:
Use RandomAccessFile to seek to a random byte position in the file.
Seek left and right to the next line terminator. Let L the line between them.
With probability (MIN_LINE_LENGTH / L.length) return L. Otherwise, start over at step 1.
This is a variant of rejection sampling.
Line lengths include the line terminator character(s), hence MIN_LINE_LENGTH >= 1. (All the better if you know a tighter bound on line length).
It is worth noting that the runtime of this algorithm does not depend on file size, only on line length, i.e. it scales much better than reading the entire file.
Here's a solution. Take a look at the choose() method which does the real thing (the main() method repeatedly exercises choose(), to show that the distribution is indeed quite uniform).
The idea is simple: when you read the first line it has a 100% chance of being chosen as the result. When you read the 2nd line it has a 50% chance of replacing the first line as the result. When you read the 3rd line it has a 33% chance of becoming the result. The fourth line has a 25%, and so on....
import java.io.*;
import java.util.*;
public class B {
public static void main(String[] args) throws FileNotFoundException {
Map<String,Integer> map = new HashMap<String,Integer>();
for(int i = 0; i < 1000; ++i)
{
String s = choose(new File("g:/temp/a.txt"));
if(!map.containsKey(s))
map.put(s, 0);
map.put(s, map.get(s) + 1);
}
System.out.println(map);
}
public static String choose(File f) throws FileNotFoundException
{
String result = null;
Random rand = new Random();
int n = 0;
for(Scanner sc = new Scanner(f); sc.hasNext(); )
{
++n;
String line = sc.nextLine();
if(rand.nextInt(n) == 0)
result = line;
}
return result;
}
}
Either you
read the file twice - once to count the number of lines, the second time to extract a random line, or
use reservoir sampling
Looking over Itay's answer, it looks as though it reads the file a thousand times over after sampling one line of the code, whereas true reservoir sampling should only go over the 'tape' once. I've devised some code to go over code once with real reservoir sampling, based on this and the various descriptions on the web.
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.List;
public class reservoirSampling {
public static void main(String[] args) throws FileNotFoundException, IOException{
Sampler mySampler = new Sampler();
List<String> myList = mySampler.sampler(10);
for(int index = 0;index<myList.size();index++){
System.out.println(myList.get(index));
}
}
}
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.Scanner;
public class Sampler {
public Sampler(){}
public List<String> sampler (int reservoirSize) throws FileNotFoundException, IOException
{
String currentLine=null;
//reservoirList is where our selected lines stored
List <String> reservoirList= new ArrayList<String>(reservoirSize);
// we will use this counter to count the current line number while iterating
int count=0;
Random ra = new Random();
int randomNumber = 0;
Scanner sc = new Scanner(new File("Open_source.html")).useDelimiter("\n");
while (sc.hasNext())
{
currentLine = sc.next();
count ++;
if (count<=reservoirSize)
{
reservoirList.add(currentLine);
}
else if ((randomNumber = (int) ra.nextInt(count))<reservoirSize)
{
reservoirList.set(randomNumber, currentLine);
}
}
return reservoirList;
}
}
The basic premise is that you fill up the reservoir, and then go back to it and fill in random lines with a 1/ReservoirSize chance. I hope this provides more efficient code. Please let me know if this doesn't work for you, as I've literally knocked it up in half an hour.
Use RandomAccessFile:
Construct a RandomAccessFile, file
Get the length of that file, filelen, by calling file.length()
Generate a random number, pos, between 0 and filelen
Call file.seek(pos) to seek to the random position
Call file.readLine() to get to the end of the current line
Read the next line by calling file.readLine() again
Using this method, I've been sampling lines from the Brown Corpus at random, and can easily retrieve a 1000 random samples from randomly chosen files in a few seconds. If I tried to do the same by reading through each file line-by-line it would take me much longer.
The same principle can be used for selecting random elements from a list. Rather than reading through the list and stopping at a random place, if you generate a random number between 0 and the length of the list, then you can index directly into the list.
Reading a random line from a file in java:
public String getRandomLineFromTheFile(String filePathWithFileName) throws Exception {
File file = new File(filePathWithFileName);
final RandomAccessFile f = new RandomAccessFile(file, "r");
final long randomLocation = (long) (Math.random() * f.length());
f.seek(randomLocation);
f.readLine();
String randomLine = f.readLine();
f.close();
return randomLine;
}
Use a BufferedReader and read line wise. Use the java.util.Random object to stop randomly ;)