For now I have a CSV with several columns in rows. Eventually, I will have a SQL relational database structure. I was wondering if there are any libraries to easily extract this data into a list of java objects.
Example:
title | location | date
EventA | los angeles, ca | 05-29-2014
EventB | New York, NY | 08-23-2013
This is the structure of the data in csv. I would have a java object called Event:
Event(String title, String location, String Date)
I am aware of openCSV. Is that would I need to use for csv? If that is the case, what is the different solution for a SQL relational database?
Also, does can reading a csv only be done in the main method?
For when you convert to the SQL database, you can use Apache's dbutils for a low-level solution, or Hibernate for a high-level solution.
dbutils
You can implement a ResultSetHandler to convert a result set into an object or if its a POJO the framework can convert it for you. There are examples on the apache site.
http://commons.apache.org/proper/commons-dbutils/
Hibernate
There are plenty of tutorials out there for working with Hibernate.
http://www.hibernate.org/
Try JSefa, which allows you to annotate Java classes that can be used in a serialization and de-serialization process.
From the tutorial:
The annotations for CSV are similar to the XML ones.
#CsvDataType()
public class Person {
#CsvField(pos = 1)
String name;
#CsvField(pos = 2, format = "dd.MM.yyyy")
Date birthDate;
}
Serialization
Serializer serializer = CsvIOFactory.createFactory(Person.class).createSerializer();
This time we used the super interface Serializer, so that we can abstract from the choosen format type (XML, CSV, FLR) in the following code.
The next should be no surprise:
serializer.open(writer);
// call serializer.write for every object to serialize
serializer.close(true);
The result
Erwin Schmidt;23.05.1964
Thomas Stumm;12.03.1979
Related
I'm not sure what the root issue is but this is the most basic I can make the problem. When I run something through kafka, and my streaming job picks it up, it runs through the entire process up until it's time to save it out to Cassandra, at which point it hangs. Any and all help is appreciated, been banging my head against this for too long
Snippets showing the basic problem below.
StreamingJob.java:
final DataStream<Pojo> stream = env.addSource(source)
.process(new MyProcess());
CassandraSink.addSink(stream).setClusterBuilder(new ClusterBuilder() {
#Override
protected Cluster buildCluster(Cluster.Builder builder) {
return builder.withCredentials("","")
.addContactPoint("127.0.0.1").withPort(9042).build();
}
})
.setMapperOptions(() -> new Mapper.Option[]{Mapper.Option.saveNullFields(false)})
.setDefaultKeyspace("my_keyspace").build();
env.execute(jobConfig.getName());
MyProcess.java
#Override
Pojo myPojo = doSomethingtoMyInput();
out.collect(myPojo);
//Debugging this proves it works to this point
MyPojo.java
#Table(keyspace = "my_keyspace", name="my_table")
public class MyPojo {
#PartitionKey(0)
#Column
String user_id;
#PartitionKey(1)
#Column
String other_id;
#ClusteringColumn
#Column
java.util.Date time_id;
//Getters and setters using standard notation
}
My cassandra schema
CREATE TABLE my_table (user_id text,
other_id text,
time_idtimestamp,
PRIMARY KEY ((user_id, other_id), time_id)
) WITH CLUSTERING ORDER BY (time_id DESC)
You'll need to verify the format of the time_id in the source as it might not be compatible with the CQL column.
In your POJO, you've mapped it to java.util.Date and if the field from the source does contain a date then it might be the reason it's not working.
CQL timestamp is a 64-bit signed int that represents the number of milliseconds since Unix epoch. The value of the field from the source can either be (a) an integer, or (b) a literal string that looks like yyyy-mm-dd HH:mm. The list of valid ISO 8601 formats is available here -- CQL timestamp. Cheers!
Found the answer after much fighting. Flink and Cassandra is a very strict and tenuous connection. Everything must be perfectly aligned, decimal in Cassandra requires Decimal in Java, and more confusingly, the timestamp in Cassandra would only work with a long value in Java.
Hopefully this helps others who come across this same issue.
I have a table with a column of Date type. Let's say table column is:
REPORTING_CONVERSION_DATE NOT NULL DATE
DAO class (which implements Serializable) has the corresponding field defined as:
#Temporal(TemporalType.TIMESTAMP)
#Column(name = "REPORTING_CONVERSION_DATE")
private Date reportingConversionDate;
I extract the record from database, use com.fasterxml.jackson.databind.ObjectMapper to get json string representation of the object, compress the json and store this in Cosmos DB in Azure. During the reverse leg, I get the record from Cosmos DB, decompress it, use ObjectMapper to read the java object back.
I found that on one machine(the code is running on springboot embedded Tomcat with jar as the packaging type) I see this:
reportingConversionDate : 1424483105000
while on other machine (the code is running on managed Tomcat with war as the packaging type) I see
reportingConversionDate : 2015-02-21T01:45:05.000+0000
Why is this different behaviour?
I'm new to mahout and this field of big data.
In general data doesn't come as a (long, long, Double) all the time.
So are there alternatives to FileDataModel?
DataModel model = new FileDataModel(new File("Ratings.csv"));
Users and items are identified solely by an ID value in the framework.
Further, this ID value must be numeric; it is a Java long type through
the APIs. A Preference object or PreferenceArray object encapsulates
the relation between user and preferred items (or items and users
preferring them).
I have recently faced the same issue. I had user id UUID type. But I had to add additional table with numeric user id and original UUID user id. Later checking the documentation i have found this explanation. According other implementation of DataModel :
A DataModel is the interface to information about user preferences. An
implementation might draw this data from any source, but a database is
the most likely source. Be sure to wrap this with a
ReloadFromJDBCDataModel to get good performance! Mahout provides
MySQLJDBCDataModel, for example, to access preference data from a
database via JDBC and MySQL. Another exists for PostgreSQL. Mahout
also provides a FileDataModel, which is fine for small applications.
You can build DataModel from Database.
Here is a example for PostgreSQL:
Intercafe looks like this:
PostgreSQLJDBCDataModel(DataSource dataSource, String preferenceTable, String userIDColumn, String itemIDColumn, String preferenceColumn, String timestampColumn)
Initalization:
source = new PGPoolingDataSource();
source.setDataSourceName(properties.getProperty("DATABASE_NAME"));
source.setServerName("127.0.0.1");
source.setDatabaseName(properties.getProperty("DATABASE_NAME"));
source.setUser(properties.getProperty("DATABASE_USER"));
source.setPassword(properties.getProperty("DATABASE_PASS"));
source.setMaxConnections(50);
DataModel model =new PostgreSQLJDBCDataModel(
source,
"mahout_teble",
"user_id",
"item_id",
"preference",
"timestamp"
)
)
I am new to UIMA ...
I want to connect to a database, extract data and process it using UIMA regex annotator and write back to database.
Example:
Table: emp
Name Department EmpId
AB-C Sale's 2134[3]
XYZ, Fina&nce 23423
PQ#R Marketing 234(47
To be transformed using UIMA regex annotator
Desired Output
Name Department EmpId
ABC Sales 21343
XYZ Finance 23423
PQR Marketing 23447
I have installed UIMA, ECLIPSE and relevant JDBC drivers to connect database.
Thanks in advance
There are a couple of ways to achieve this.
The simplest (not so extendable) way would be to write 3 classes (Use uimaFIT http://uima.apache.org/uimafit.html#Documentation to make coding easier) :
CollectionReader:
- read in all data in objects
- iterate over the objects and create JCASes from each object, you can store the primary key in an annotation.
Analysis Engine:
- use the UIMA regex annotator to manipulate the JCAS's documentText
Consumer:
- read the JCAS documentText and use the primary key to update the database
A better way would be to abstract the reading and writing by creating an external resource (http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.externalresources) that connects to the database (provide a hasNext() and next() method - this is very convenient for use in the CollectionReader and Consumer). This has the advantage that all initialisation logic can be isolated. When using UIMAFit, you can use configuration parameter injection (http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.configurationparameters), for example to make the connection string and the search query configurable.
Use the SimplePipeline class in uimaFIT to run your pipeline: http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.pipelines
I'm using the Yahoo Finance Streaming API to get stock quotes, I wanted to save these into a DB table for historical reference.
I'm looking for something which can easily parse various strings which have a format that varies like the examples below:
<script>try{parent.yfs_mktmcb({"unixtime":1310957222});}catch(e){}</script>
<script>try{parent.yfs_u1f({"ASX.AX":{c10:"-0.06"}});}catch(e){}</script>
<script>try{parent.yfs_u1f({"AWC.AX":{l10:"2.16",c10:"+0.01",p20:"+0.47"}});}catch(e){}</script>
<script>try{parent.yfs_u1f({"ALZ.AX":{l10:"2.6900",c10:"-0.1200",p20:"-4.27"}});}catch(e){}</script>
I want to parse these strings to a MySQL database and I was thinking the easiest way will be using Java to do this parsing. Basically these entries are line by line in a text file. I want to extract the time, the stock code, the price and the change values in a simple table.
The table looks like StockCode | Date | Time | Price | ChangeDol | ChangePer
Are there any tools or frameworks which would make this process easy?
Thanks!
I don't how you get your quote, but if you could use YQL, any XML parser would do:
YQL
<quote symbol="YHOO">
<Ask>14.76</Ask>
<AverageDailyVolume>28463800</AverageDailyVolume>
<Bid>14.51</Bid>
<AskRealtime>14.76</AskRealtime>
<BidRealtime>14.51</BidRealtime>
<BookValue>9.826</BookValue>
<Change_PercentChange>0.00 - 0.00%</Change_PercentChange>
....
</quote>
List of XML Parsers for Java
You could have a look there
http://www.wikijava.org/wiki/Downloading_stock_market_quotes_from_Yahoo!_finance
They get finencial data as csv from yahoo.