Display labels of data [deep4j] - java

I would like to print the labels of traindata / testdata used in classification. Here is the definition of both inputs (using deep4j).
InputSplit[] inputSplit = fileSplit.sample(pathFilter, splitTrainTest, 1 - splitTrainTest);
InputSplit trainData = inputSplit[0];
InputSplit testData = inputSplit[1];
that are then transformed in DataSetIterator like this :
ImageRecordReader recordReader = new ImageRecordReader(height, width, channels, labelMaker);
recordReader.initialize(trainData, null);
trainIter = new RecordReaderDataSetIterator(recordReader, batchSize, 1, numLabels);
Then I want to print how many examples per labels where found in each iterator in this function :
public void print(DataSetIterator iter){
HashMap<String, Integer> hash = new HashMap<String, Integer>();
while(iter.hasNext()){
DataSet example = iter.next();
for(int i = 0 ; i<numLabels ; i++){
if(example.getLabels().getDouble(i)==1.){
String label = example.getLabelName(i);
if(hash.containsKey(label))
hash.put(label, hash.get(label)+1);
else
hash.put(label, 1);
}
}
}
for (String label: hash.keySet()){
System.out.println(" label : " + label.toString() + ", " + hash.get(label) + " examples");
}
}
The issue is that it displays only one example per label, whereas there should much more... And when I don't split my dataset using fileSplit.sample() the function displays the right number of examples.
Any suggestion ?

If you use a dataset you can use the toString() of the dataset.getFeatureMatrix() and dataset.getLabels()
If you want to print just the label counts, you can use dataset.labelCounts() I would look more at the dl4j javadoc:
http://deeplearning4j.org/doc

Related

Storing my output scraped from website into an array and printing specific part of it

I am trying to get the top five highest percentage gains from this website's table and store them into an array. I want to print the top 5 highest percentage gains.
http://www.wsj.com/mdc/public/page/2_3021-gainnyse-gainer.html
As of right now my code gets all of rows and columns and prints them in the output. I am having trouble getting only the top 5, and storing them into my array.
please help.
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect("http://www.wsj.com/mdc/public/page/2_3021-gainnyse-gainer.html").get();
Elements rows = doc.select("tr");
for(Element row :rows)
{
Elements columns = row.select("td");
String[][] trtd = new String[columns.size()][];
for (Element column:columns)
{
System.out.println(column.text());
}
System.out.println();
}
}
Curent output is:
SEARCH
Issue(Roll over for charts and headlines)
Price
Chg
% Chg
Volume
1
PHH (PHH)
$10.71
2.19
25.65
10,865,948
2
Chico's Fas (CHS)
10.03
1.35
15.63
4,514,899
3
Veeva Systems Cl A (VEEV)
70.48
8.41
13.55
3,300,989
4
Tutor Perini (TPC)
24.70
2.85
13.04
1,723,950
5
TriNet Group (TNET)
46.93
5.35
12.87
1,089,758
6
Nelnet Cl A (NNI)
57.60
5.99
11.61
121,379
7
Federal Signal (FSS)
21.35
1.74
8.87
272,982
etc......
i use map as storing data ,name for stock name (as i think) and value for present if data is always like this it will work but i recommend to ask site admin maybe there have a simple api
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect("http://www.wsj.com/mdc/public/page/2_3021-gainnyse-gainer.html").get();
Elements rows = doc.select("tr");
Map<Integer, HashMap<String, String>> top5 = new HashMap<>(5);
int arrayFill = 0;
for (int i = 0; i < rows.size(); i++) {
Elements columns = rows.get(i).select("td");
String[][] trtd = new String[columns.size()][];
for (Element column : columns) {
System.out.println(column.text());
}
System.out.println();
if (i > 2 &&i <8&& columns.size() > 4) {
HashMap<String, String> map = new HashMap<>(1);
map.put(columns.get(1).text(), columns.get(4).text());
top5.put(Integer.parseInt(columns.get(0).text()), map);
}
}
System.out.println("using keySet");
for (Integer key : top5.keySet()) {
System.out.println(key + "=" + top5.get(key));
}
}

manipulate and sort text file

I am working on a project where I have been given a text file and I have to add up the points for each team and printout the top 5 teams.
The text file looks like this:
FRAMae Berenice MEITE 455.455<br>
CHNKexin ZHANG 454.584<br>
UKRNatalia POPOVA 453.443<br>
GERNathalie WEINZIERL 452.162<br>
RUSEvgeny PLYUSHCHENKO 191.399<br>
CANPatrick CHAN 189.718<br>
CHNHan YAN 185.527<br>
CHNCheng & Hao 271.018<br>
ITAStefania & Ondrej 270.317<br>
USAMarissa & Simon 264.256<br>
GERMaylin & Daniel 260.825<br>
FRAFlorent AMODIO 179.936<br>
GERPeter LIEBERS 179.615<br>
JPNYuzuru HANYU 197.9810<br>
USAJeremy ABBOTT 165.654<br>
UKRYakov GODOROZHA 160.513<br>
GBRMatthew PARR 157.402<br>
ITAPaul Bonifacio PARKINSON 153.941<br>
RUSTatiana & Maxim 283.7910<br>
CANMeagan & Eric 273.109<br>
FRAVanessa & Morgan 257.454<br>
JPNNarumi & Ryuichi 246.563<br>
JPNCathy & Chris 352.003<br>
UKRSiobhan & Dmitri 349.192<br>
CHNXintong &Xun 347.881<br>
RUSYulia LIPNITSKAYA 472.9010<br>
ITACarolina KOSTNER 470.849<br>
JPNMao ASADA 464.078<br>
UKRJulia & Yuri 246.342<br>
GBRStacey & David 244.701<br>
USAMeryl &Charlie 375.9810<br>
CANTessa & Scott 372.989<br>
RUSEkaterina & Dmitri 370.278<br>
FRANathalie & Fabian 369.157<br>
ITAAnna & Luca 364.926<br>
GERNelli & Alexander 358.045<br>
GBRPenny & Nicholas 352.934<br>
USAAshley WAGNER 463.107<br>
CANKaetlyn OSMOND 462.546<br>
GBRJenna MCCORKELL 450.091<br>
The first three letters represent the team.
the rest of the text is the the competitors name.
The last digit is the score the competitor recived.
Code so far:
import java.util.Arrays;
public class project2 {
public static void main(String[] args) {
// TODO Auto-generated method stub
String[] array = new String[41];
String[] info = new String[41];
String[] stats = new String[41];
String[] team = new String[41];
//.txt file location
FileInput fileIn = new FileInput();
fileIn.openFile("C:\\Users\\O\\Desktop\\turn in\\team.txt");
// txt file to array
int i = 0;
String line = fileIn.readLine();
array[i] = line;
i++;
while (line != null) {
line = fileIn.readLine();
array[i] = line;
i++;
}
//Splitting up Info/team/score into seprate arrays
for (int j = 0; j < 40; j++) {
team[j] = array[j].substring(0, 3).trim();
info[j] = array[j].substring(3, 30).trim();
stats[j] = array[j].substring(36).trim();
}
// Random stuff i have been trying
System.out.println(team[1]);
System.out.println(info[1]);
System.out.println(stats[1]);
MyObject ob = new MyObject();
ob.setText(info[0]);
ob.setNumber(7, 23);
ob.setNumber(3, 456);
System.out.println("Text is " + ob.getText() + " and number 3 is " + ob.getNumber(7));
}
}
I'm pretty much stuck at this point because I am not sure how to add each teams score together.
This looks like homework... First of all you need to examine how you are parsing the strings in the file.
You're saying: the first 3 characters are the country, which looks correct, but then you set the info to the 4th through the 30th characters, which isn't correct. You need to dynamically figure out where that ends and the score begins. There is a space between the "info" and the "stats," knowing that you could use String's indexOf function. (http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(int))
Have a look at Maps.
A map is a collection that allows you to get data associated with a key in a very short time.
You can create a Map where the key is a country name, with value being the total points.
example:
Map<String,Integer> totalScore = new HashMap<>();
if (totalScore.containsKey("COUNTRYNAME"))
totalScore.put("COUNTRYNAME", totalScore.get("COUNTRYNAME") + playerScore)
else
totalScore.put("COUNTRYNAME",0)
This will add to the country score if the score exists, otherwise it will create a new totalScore for a country initialized to 0.
Not tested, but should give you some ideas:
public static void main(String... args)
throws Exception {
class Structure implements Comparable<Structure> {
private String team;
private String name;
private Double score;
public Structure(String team, String name, Double score) {
this.team = team;
this.name = name;
this.score = score;
}
public String getTeam() {
return team;
}
public String getName() {
return name;
}
public Double getScore() {
return score;
}
#Override
public int compareTo(Structure o) {
return this.score.compareTo(o.score);
}
}
File file = new File("path to your file");
List<String> lines = Files.readAllLines(Paths.get(file.toURI()), StandardCharsets.UTF_8);
Pattern p = Pattern.compile("(\\d+(?:\\.\\d+))");
List<Structure> structures = new ArrayList<Structure>();
for (String line : lines) {
Matcher m = p.matcher(line);
while (m.find()) {
String number = m.group(1);
String text = line.substring(0, line.indexOf(number) - 1);
double d = Double.parseDouble(number);
String team = text.substring(0, 3);
String name = text.substring(3, text.length());
structures.add(new Structure(team, name, d));
}
}
Collections.sort(structures);
List<Structure> topFive = structures.subList(0, 5);
for (Structure structure : topFive) {
System.out.println("Team: " + structure.getTeam());
System.out.println("Name: " + structure.getName());
System.out.println("Score: " + structure.getScore());
}
}
Just remove <br> from your file.
Loading file into memory
Your string splitting logic looks fine.
Create a class like PlayerData. Create one instance of that class for each row and set all the three fields into that using setters.
Keep adding the PlayerData objects into an array list.
Accumulating
Loop through the arraylist and accumulate the team scores into a hashmap. Create a Map to accumulate the team scores by mapping teamCode to totalScore.
Always store row data in a custom object for each row. String[] for each column is not a good way of holding data in general.
Take a look in File Utils. After that you can extract the content from last space character using String Utils e removing the <br> using it as a key for a TreeMap. Than you can have your itens ordered.
List<String> lines = FileUtils.readLines(yourFile);
Map<String, String> ordered = new TreeMap<>();
for (String s : lines) {
String[] split = s.split(" ");
String name = split[0].trim();
String rate = splt[1].trim().substring(0, key.length - 4);
ordered.put(rate, name);
}
Collection<String> rates = ordered.values(); //names ordered by rate
Of course that you need to adjust the snippet.

How to identify PP-tags/NP-tags/VP-tags in openNLP chunker?

I want to count the numbers of pp/np/vp in the text but I don't know how to identify PP-tags/NP-tags/VP-tags in openNLP chunker? I have tried this code but it's not working.
ChunkerModel cModel = new ChunkerModel(modelIn);
ChunkerME chunkerME = new ChunkerME(cModel);
String result[] = chunkerME.chunk(whitespaceTokenizerLine, tags);
HashMap<Integer,String> phraseLablesMap = new HashMap<Integer, String>();
Integer wordCount = 1;
Integer phLableCount = 0;
for (String phLable : result) {
if(phLable.equals("O")) phLable += "-Punctuation"; //The phLable of the last word is OP
if(phLable.split("-")[0].equals("B")) phLableCount++;
phLable = phLable.split("-")[1] + phLableCount;
System.out.println(wordCount + ":" + phLable);
phraseLablesMap.put(wordCount, phLable);
wordCount++;
}
Integer noPP=0;
Integer TotalPP=0;
for (String PPattach: result) {
if (PPattach.equals("PP")) {
for (int i=0;i<result.length;i++)
TotalPP = noPP +1;
}
}
System.out.println(TotalPP);
Output:
1:NP1
2:VP2
3:NP3
4:NP3
5:VP4
6:PP5
7:NP6
8:NP6
9:NP6
10:NP6
11:PP7
12:NP8
13:NP8
14:NP8
15:PP9
16:NP10
17:NP10
18:PP11
19:NP12
20:NP12
21:VP13
22:VP13
23:NP14
24:NP14
25:PP15
26:NP16
27:NP16
28:Punctuation16
0
best way is by using the span objects, they have a getType() method that returns the chunk type.
see this post
grouping all Named entities in a Document

Extracting data from a collection in Java

I have a csv dataset like this:
A, 10, USA
B,30, UK
C,4,IT
A,20,UK
B,10,USA
I want to read this csv lines and provide the following output:
A has ran 30 miles with average of 15.
B has ran 30 miles with average of 20.
C has ran 4 miles with average of 4.
I want to achieve this in Java. I have done this in C# by using Linq:
var readlines = File.ReadAllLines(filename);
var query = from lines in readlines
let data = lines.Split(',')
select new
{
Name = data[0],
Miles = data[1],
};
var values = query.GroupBy(x => new {x.Name}).Select(group => new { Person = group.Key, Events = group.Sum(g =>Convert.ToDouble(g.Miles)) ,Count = group.Count() });
I am looking to do this in Java, and I am not sure if I can do this without using any third party library or not? Any ideas?
So far, my code looks like this in Java:
CSVReader reader = new CSVReader(new FileReader(filename));
java.util.List<String[]> content = reader.readAll();
String[] row = null;
for(Object object:content)
{
row = (String[]) object;
String Name = row[0];
String Miles = row[1];
System.out.printf("%s has ran %s miles %n",Name,Miles);
}
reader.close();
}
I am looking for a nice way to get the total milage value for each name to calculate for the average.
As a C# developer, it is hard sometimes not to miss the features of linq. But as Farlan suggested you could do something like this:
CSVReader reader = new CSVReader(new FileReader(filename));
java.util.List<String[]> content = reader.readAll();
Map<String, Group> groups = new HashMap<>();
for(String[] row : content)
{
String Name = row[0];
String Miles = row[1];
System.out.printf("%s has ran %s miles %n", Name, Miles);
if (groups.containsKey(Name)){
groups.get(Name).Add(Double.valueOf(Miles));
} else {
Group g = new Group();
g.Add(Double.valueOf(Miles));
groups.put(Name, g);
}
}
reader.close();
for (String name : groups.keySet())
{
System.out.println(name + " ran " + groups.get(name).total() + " with avg of " + groups.get(name).average());
}
}
class Group {
private List<Double> miles;
public Group()
{
miles = new ArrayList<>();
}
public Double total(){
double sum = 0;
for (Double mile : miles)
{
sum += mile;
}
return sum;
}
public Double average(){
if (miles.size() == 0)
return 0d;
return total() / miles.size();
}
public void Add(Double m){
miles.add(m);
}
}
Use Java's BufferedReader class:
BufferedReader in = new BufferedReader(new FileReader("your.csv"));
String line;
while ( (line = in.readLine()) != null) {
String [] fields = line.split(",");
System.out.println(fields[0] + " has ran " + fields[1] + " miles with average " + fields[2]);
}
There are quite a few ways to do this, some long-winded approaches, some shorter. The issue is that Java can be very verbose for doing simple tasks, so the better approaches can be a bit uglier.
The example below shows you exactly how to achieve this, par the printing. Bear in mind however, it might not be the best approach but I feel its more of the easier ones to read and comprehend.
final File csvFile = new File("filename.csv");
final Scanner reader = new Scanner(csvFile);
final Map<String, Integer> info = new HashMap<>(); //Store the data
//Until there is are no more lines, continue
while (reader.hasNextLine()) {
final String[] data = reader.nextLine().split(","); // data[0] = A. [1] = 10. [2] = USA
final String alpha = data[0];
if (!info.containsKey(alpha)) {
info.put(alpha, Integer.parseInt(data[1]));
} else {
int miles = info.get(alpha);
info.put(alpha, miles + Integer.parseInt(data[1]));
}
}
reader.close();
The steps involved are simple:
Step 1 - Read the file.
By passing a File into the Scanner object, you set the target parsing to the File and not the console. Using the very neat hasNextLine() method, you can continually read each line until no more exist. Each line is then split by a comma, and stored in a String array for reference.
Step 2 - Associating the data.
As you want to cumulatively add the integers together, you need a way to associate already passed in letters with the numbers. A heavyweight but clean way of doing this is to use a HashMap. The Key which it takes is going to be a String, specifically A B or C. By taking advantage of the fact the Key is unique, we can use the O(1) containsKey(String) method to check if we've already read in the letter. If its new, add it to the HashMap and save the number with it. If however, the letter has been seen before, we find the old value, add it with the new one and overwrite the data inside the HashMap.
All you need to do now is print out the data. Feel free to take a different approach, but I hope this is a clear example of how you CAN do it in Java.
Maybe you could try this Java library: https://code.google.com/p/qood/
It handles data without any getter/setters, so it's more flexible than LINQ.
in your case, file "D:/input.csv" has 3 columns:
NAME,MILES,COUNTRY
A, 10, USA
B,30, UK
C,4,IT
A,20,UK
B,10,USA
the query code would be:
final QModel raw = QNew.modelCSV("D:/input.csv")
.debug(-1);//print out what read from CSV
raw.query()
.selectAs("OUTPUT",
"CONCAT(NAME,' has ran ',SUM(MILES),' miles with average of ',MEAN(MILES),'.')")
.groupBy("NAME")
.result().debug(-1)//print out the result
.to().fileCSV("D:/output.csv", "UTF-8");//write to another CSV file

Dynamic data store for ExtJS chart

I am trying to use JSP on server-side to perform a variable number of queries and output the result of all of them as a single block of JSON data for an ExtJS line chart.
The reason the number of queries is variable is because each one represent a different series (a different line) on the line chart, and the number of series is different depending on the line chart that the user selects.
I am using hibernate and my persistence class returns each query data as a: List<Map<String, Object>> (each Map represents one row).
There will always be at least one series (one line on the graph, one query to execute), so the way I was thinking of setting this up is as follows:
1) Have the initial query run and get the first series
2) Run another query to check for any other series that should be on the graph
3) For each "other" series found in the second query run a query that gets the data for that series (same number of rows) and then merge that data into the first List<Map<String, Object>> that was returned in #1 as another column. The query is set-up to order it properly it just needs to be merged at the same index level.
4) Output that List as JSON.
My problem is with #3, I am not sure how to go about the merging the data.
Here's what I have so far:
GenericSelectCommand graphData = new GenericSelectCommand(graphDataQuery);
GenericSelectCommand backSeriesData = new GenericSelectCommand(backSeriesQuery);
List<Map<String, Object>> graphDataList;
List<Map<String, Object>> backSeriesList;
try
{
Persistor myPersistor = new Persistor();
// 1) GET THE INITIAL LINE CHART SERIES
myPersistor.executeTransact(graphData);
graphDataList = graphData.getRows();
// 2) LOOK FOR ANY ADDITIONAL SERIES THAT SHOULD BE ON THE LINE CHART
myPersistor.executeTransact(backSeriesData);
backSeriesList = backSeriesData.getRows();
// 3) FOR EACH ADDITIONAL SERIES FOUND, RUN A QUERY AND APPEND THE DATA TO THE INITIAL LINE CHART SERIES (graphDataList)
for (int i = 0; i < backSeriesList.size(); i++)
{
Map<String, Object> backSeriesBean = backSeriesList.get(i);
// THIS QUERY RETURNS ONE COLUMN OF INT VALUES (THE LINE CHART DATA) WITH THE EXACT SAME NUMBER OF ROWS AS THE INITIAL LINE CHART SERIES (graphDataList)
String backDataQuery = "exec runQuery 'getBackData', '" + backSeriesBean.get("series_id") + "'";
GenericSelectCommand backData = new GenericSelectCommand(backDataQuery);
myPersistor.executeTransact(backData);
List<Map<String, Object>> backDataList = backData.getRows();
// FOR EACH RECORD IN THE BACK DATA (Map<String, Object>)
for (int i = 0; i < backDataList.size(); i++)
{
Map<String, Object> backDataBean = backDataList.get(i);
// HOW DO I ADD IT TO THE RECORD AT THE SAME INDEX LEVEL IN graphDataList (List<Map<String, Object>>)
}
}
}
catch (Throwable e)
{
System.err.println("Error: ");
System.err.println(e.getCause());
}
finally
{
myPersistor.closeSession();
}
// 4) RETURN THE DATA AS JSON NOW THAT IT IS MERGED
for (int i = 0; i < graphDataList.size(); i++)
{
Map<String, Object> graphDataBean = graphDataList.get(i);
out.println(/*JSON FORMAT + graphDataBean.get('data') + JSON FORMAT*/)
}
SOLUTION:
GenericSelectCommand graphData = new GenericSelectCommand(graphDataQuery);
GenericSelectCommand backSeries = new GenericSelectCommand(backSeriesQuery);
List<Map<String, Object>> graphDataList = Collections.emptyList();
List<Map<String, Object>> backSeriesList = Collections.emptyList();
List backDataListArray = new ArrayList();
try
{
// GET THE INITIAL LINE CHART SERIES
Persistor.instance().executeTransact(graphData);
graphDataList = graphData.getRows();
// LOOK FOR ANY ADDITIONAL SERIES THAT SHOULD BE ON THE LINE CHART
Persistor.instance().executeTransact(backSeries);
backSeriesList = backSeries.getRows();
// FOR EACH ADDITIONAL SERIES FOUND, RUN THE QUERY AND ADD IT TO backDataListArray
for (int i = 0; i < backSeriesList.size(); i++)
{
Map<String, Object> backSeriesBean = backSeriesList.get(i);
String backDataQuery = "exec runQuery 'getBackData', " + backSeriesBean.get("series_id");
GenericSelectCommand backData = new GenericSelectCommand(backDataQuery);
Persistor.instance().executeTransact(backData);
List<Map<String, Object>> backDataList = backData.getRows();
backDataListArray.add(backDataList);
}
}
catch (Throwable e)
{
System.err.println("Error: ");
System.err.println(e.getCause());
}
finally
{
Persistor.instance().closeSession();
}
// FOR EACH RECORD IN THE ORIGINAL QUERY, WRITE THE JSON STRING
for (int i = 0; i < graphDataList.size(); i++)
{
StringBuilder backDataString = new StringBuilder();
// BUILD THE BACK DATA STRING (IF THERE IS ANY)
for (int j = 0; j < backDataListArray.size(); j++)
{
List<Map<String, Object>> backDataList = (List<Map<String, Object>>) backDataListArray.get(j);
Map<String, Object> backDataBean = backDataList.get(i);
Map<String, Object> backSeriesBean = backSeriesList.get(j);
backDataString.append(backSeriesBean.get("the_series") + ": " + backDataBean.get("the_count") + ", ");
}
Map<String, Object> graphDataBean = graphDataList.get(i);
out.println("{the_quota: " + graphDataBean.get("the_quota") + ", " + "count_pt_year: " + graphDataBean.get("count_pt_year") + ", " + backDataString + "date_string: '" + graphDataBean.get("date_string") + "'}" + (i + 1 == graphDataList.size() ? "" : "," ));
}
I would not merge the lists. I would just create an outer list for each query and then go through the outer list and return each series list. You can just create the outer list as:
List outerList = new ArrayList();
I would not worry about specifying the types for the outer list as it just makes it more complicated for little benefit.

Categories

Resources