I am learning to build neural nets and I came across this code on github,
https://github.com/PavelJunek/back-propagation-java
There is a training set and a validation set required to be used but I don't know where to input the files. The Readme doesn't quite explain how to use the files. How do I test with different csv files I have on this code?
How so? It tells you exactly what to do. The program needs to get two CSV files: a CSV file containing all the training data and a second CSV file containing all of the validation data.
If you have a look at the Program.java file (in the main method), you'll see that you need to pass both files as arguments with the command line.
Related
I have been given two zip files : AP88.zip which contains a bunch of text files and file containing queries. I'm supposed to run a test using these two files using Apache Lucene. I think the basic idea is to use the queries to search into the various files contained in the AP88 zip file.
I understand the basics of Information retrieval and the theory behind it. However, I have no idea about where to start in order to run a test using the given data.
Could you please help to find a pre-existing code and how to use the given files in order to run a test with Apache Lucene ?
Many thanks.
I have >400 JPG files and a JSON file for each which contains the image tags, description and title. I've found this command
exiftool -json=picture.json picture.jpg
But I don't want to run this for each and every file.
How can I run this command for the folder containing the JPGs and JSONs or is there another way I can batch process these?
Each JSON file has the same name as it's JPG counterpart so it's easy to identify which files match up to each other.
Assuming your JPGs and JSONs have the same filename, but different extesion(e.g. picture001.jpg has an associated picture001.json,etc.), a batch for loop might work.
Assuming you've already cd-ed into the folder and the files aren't nested in folders, something like this should work
( for jpg in *.jpg; do exiftool -json=${jpg/\.jpg/.json} $jpg; done )
Note that this isn't tested. I recommend making a copy of your folder and testing there beforehand to make sure you don't irreversibly damage them.
I've also noticed you're using the java tag. I had to work with EXIF data in Java a while back (on Android then) and I used the JHeader library. If you want to roll your own little java command line tool, you should be able to use Java's IO classes to traverse your directory and files and the JHeader library to modify the EXIF data.
I have some pig output files and want to read them on another machine(without hadoop installation). I just want to read a tab-seperated plain text line and parse it into a java object. I am guessing we should be able to use pig.jar as dependency and be able to read it. I could not find relevant documentation. I think this class could be used? How can we provide the schema also.
I suggest you to store data in Avro serialization format. It is Pig-independent and it allows to handle complex data structures like you described (so you don't need to write your own parser). See this article for examples.
Your pig output files are just text files, right? Then you don't need any pig or hadoop jars.
Last time i worked with Pig was on amazon's EMR platform, and the output files were stashed in an s3 bucket. They were just text files and standard java can read the file in.
That class you referenced is for reading into pig from some text format.
Are you asking for a library to parse the pig data model into java objects? I.e. the text representation of tuples & bags, etc? If so then its probably easier to write it yourself. It's a VERY simple data model with only 3 -ish datatypes..
I've used Apache Flume to pipe a large amount of tweets into the HDFS of Hadoop. I was trying to do sentiment analysis on this data - just something simple to begin with, like positive v negative word comparison.
My problem is that all the guides I find showing me how to do it have a text file of positive and negative words and then a huge text file with every tweet.
As I used Flume, all my data is already in Hadoop. When I access it using localhost:50070 I can see the data, in separate files according to month/day/hour, with each file containing three or four tweets. I have maybe 50 of these files for every hour. Although it doesn't say anywhere, I'm assuming they are in JSON format.
Bearing this in mind, how can I perform my analysis on them? In all the examples I've seen where the Mapper and Reducer have been written, there has been a single file this has been performed on, not a large collection of small JSON files. What should my next step be?
This example should get you started
https://github.com/cloudera/cdh-twitter-example
Basically use hive external table to map your json data and query using hiveql
When you want to process all the files in a directory, you can just specify the path of the directory as your input file to your hadoop job so that it will consider all the files in that directory as its input.
For example if your small files are in the directory /user/flume/tweets/.... then in your hadoop job you can just specify /user/flume/tweets/ as your input file.
If you want to automate the analysis for every one hour you need to write one oozie workflow.
You can refer to the below link for sentiment analysis in hive
https://acadgild.com/blog/sentiment-analysis-on-tweets-with-apache-hive-using-afinn-dictionary/
I'm trying to understand process of parsing h.264 NAL units (to extract information about slices, macroblocks etc.), so I'm writing simple bit stream parser for h.264
Are there any example (training) files, which, for example, contains single NAL unit or single slice?
Does anybody knows where I can get such training data?
Thanks
If you want training data, you can download the H.264 reference software from http://iphome.hhi.de/suehring/tml/download/. Note that this reference software is written in c++ though. You don't need to be well versed in c++ though, you do need to be able to build the encoder and then you can use it as a tool to generate .264 data.
The bin directory contains .yuv files (raw uncompressed) and using the configuration files you can then generate .264 files. If you want a single NAL Unit as you specified, you can configure the encoder via the configuration file to only encode a single video frame using the FramesToBeEncoded parameter. If you open the generated .264 using a hex editor you can identify the NAL units by their start codes. By adapting the configuration files, you should be able to generate your desired test data.
Note that even if you only generate one frame, there might be more than one NAL unit inside the .264 file since the sequence and picture parameter sets are prepended to the IDR frame. You could easily isolate and separate e.g in c++ by searching for the start codes.