Using ExtJS and Java with Hadoop - java

My task is to write a Java based web application which will produce various charts like WaferMap, Histogram, Overlay Chart etc.
The front end is ExtJS and the chart generation part is taken care by JFreeChart.
The data for charts will be in multiple .CSV files which are stored in the file system.
My questions are:
The .CSV files size will be in GB's. Can I store these files in HDFS and query them during run-time and display data in frontend?
Is using Hadoop ecosystem is a feasible solution for my above requirement? Should I also consider Apache pig or hive for querying the CSV file?

Yes you can (Apache Hive)
It all depends but Hive seems like what you're looking for. It was designed with a SQL like feel and can include SQL clauses. It is widely used with major corps like Facebook, Netflix, FINRA, etc. In your case, supporting SQL syntax also means that you can integrate with Java's JDBC driver real easy and query data from your CSV files.
http://www.tutorialspoint.com/hive/
Setting up Hive can be a bit difficult at first if you're not too familiar with the Hadoop environment. The above link is a great reference link to understand Hive better and get you in the right direction.
Hope this was helpful!

Related

Java interacting directly with a DB file

Before I started learning and using Java I used VB. In VB, a program can open a Microsoft Access database file and view/edit the records. VB has a component that handles the file and parsing. What I am looking for is a Java program that can open a database file directly. What I am not looking for is Java communicating with a SQL server a database server or any other kind of server. I have tried to search the net for Java and databases but all I am able to find is Java protocols that communicate with a server or a Java server service. Those items fill the search records on all of the search engines I have tried. I have found that DB's can be XML, and that Java can open them but, it seems that it no longer preserves the relational aspects. I may be wrong on that fact, I am uncertain.
A. Can Java open a local DB file?
B. Can that preserve the relational aspects?C. Can multiple Java programs open the same DB file?
D. If a Java program on computer A modifies the DB file will the Java program on computer B know about that change and update itself?
Assuming they are not modifying the same record. The type of DB file is not that important to but it would be nice if it could be opened by MS Access. It would also be nice if it could be open by JAVADB. Neither of those two are required.
E. what kind of components or projects could meet this need?
Pure Java hasn't graphic design. Some degree of design of databases have IDE, for example Eclipse in perspectives: Database Debug, Database development. Can view tables, design columns etc.
It isn't full desktop tool for (semi) amateur computer worker.
In Java can write full comfortable user database program (in example Open Office "Base" has some dependencies to Java), with filling similar to entry level access.
For simpler database like sqlite there are several java projects that you can use, for example sqlite 4 java.
Sqlite is pretty good these days and for small databases a good choice.

using hadoop on the current application

we have an application which is written in Java, and uses solr,Elastic Search, Neo4j,MySQL and few more .
we require to increase our data size dramatically (from millions to billions)
So here the options I had in order to make this work:
clustering individual components notably solr, ES, Neo4j and MySQl
use what everyone talks about nowadays : Hadoop
Problem with first is hard to manage
the second option sounds too good to be true. So my questions are :
Can I actually assume that Hadoop can do that before digging in?
what other criteria do I need to consider?
Is there any alternative solution for such task?
Solr is for data searching. If you want to process the big data (meets criteria of volume, velocity and variety) such as ETL and reporting, you would need Hadoop.
Hadoop consist of several eco system components. You can refer to below link for documentation:
https://hadoop.apache.org

Creation of JSON service to read/write HDFS data

My requirement is as follows:
We are trying to implement Recommendations Engine for one of our customers. To achieve the same, we need to store data in HDFS from web application (for every click on the product) and compute the recommendations in the back end and display the result (as product) in the web application.
My approach is as follows highlighted with the steps:
We have downloaded and configured the Cloudera
We have downloaded/configured Apache Mlib (Recommendations Engine)
Using Eclipse Luna, we are able to run the Mlib (using Java plugin)
Now we need to create a JSON service which will read the data from web and
store in the HDFS. We are stuck in this step.
Now we need to create a JSON service which can read the data from HDFS and
compute the Recommendations and display the result in JSON format dynamically.
We are stuck at Step 4 & 5. Please suggest, how can we create a JSON service to read/write from HDFS?
You asked a very general question. I suggest you to get familiar with Apache Spark first. Read it's quick guide. Start to read\write data from hdfs into jsonRDD as described in tutorial. After you understand how to work with batch processing, read about spark streaming.
There is an old story that Ptolemy I asked if there was a shorter path to learning geometry than Euclid's Elements. Euclid replied there is no royal road to geometry. So there is no fast way to build mllib engine for your clients except for reading and understanding the basics of Apache Spark Usage. I wish you a good luck in that!

Bulk upload data into data store for GAE Java project

I would like to populate the data store. Yet all the examples and instructions for populating the data store are concerned with Python projects. Is there a way to upload bulk data using the AppEngine Java tools? (At the moment the data is in CSV format, but I can easily reformat the data as needed.)
It would be especially useful if it could be done within the Eclipse IDE.
Thanks.
I'm having the same problem as you with this one. According to the discussion at http://groups.google.com/group/google-appengine-java/browse_thread/thread/72f58c28433cac26
there's no equivalent tool available for Java yet. However it looks like there's nothing stopping you from using the Python tool to just populate the datastore and then accessing that data as normal through your Java code, although this assumes you're comfortable with Python, which could be the problem.

How to extract the data from a website using java?

I am familier with java programming language I like to extract the data from a website and store it to my database running on my machine.Is that possible in java.If so which API I should use. For example the are number of schools listed on a website How can I extract that data and store it to my database using java.
What you're referring to is commonly called 'screenscraping'. There are a variety of ways to do this in Java, however, I prefer HtmlUnit. While it was designed as a way to test web functionality, you can use it to hit a remote webpage, and parse it out.
I would recommend using a good error handling html parser like Tagsoup to extract from the HTML exactly what you're looking for.
You definitely need a good parser like NekoHTML.
Here's an example of using NekoHTML, albeit using Groovy (a Java-based scripting language) rather than Java itself:
http://www.keplarllp.com/blog/2010/01/better-competitive-intelligence-through-scraping-with-groovy
You can use VietSpider XML from
http://sourceforge.net/projects/binhgiang/files/
Download VietSpider3_16_XML_Windows.zip or VietSpider3_16_XML_Linux.zip
VietSpider Web Data Extractor: Software crawls the data from the websites ((Data Scraper)), format to XML standard (Text, CDATA) then store in the relational database. Product supports the various of RDBMs such as Oracle, MySQL, SQL Server, H2, HSQL, Apache Derby, Postgres …VietSpider Crawler supports session (login, query by form input), multi-downloading, JavaScript handling, proxy (and multi-proxy by auto scan the proxies from website)…
Depending on what you are really trying to do, you can use many different solutions.
If you juste wanna fetch the HTML code of a web page, then URL.getContent() may be your solution. Here is a little tutorial :
http://www.javacoffeebreak.com/books/extracts/javanotesv3/c10/s4.html
EDIT : didn't understand he was searching for a way to parse the HTML code. Some tools have been suggested above. Sorry for that.

Categories

Resources