Best way to load data in MySQL DB using JDBC - java

Here's a sample dataset of a much larger data file, I need to load into mysql db. The problem is the dataset is too large to manually add/append the insert statements and put the commas at the right locations.
E 1
T 2006-11-02 22:01:34
U 6 andrevan
N 70 node_ue
V 1 62 2004-09-11 05:50:00 node
V 1 27 2004-09-11 06:13:00 slowking
V 1 11 2004-09-11 06:50:00 merovingian
V 1 34 2004-09-11 12:11:00 norm
V 1 10 2004-09-11 13:30:00 anárion
V 1 55 2004-09-11 15:20:00 thecustomoflife
V 1 28 2004-09-11 15:21:00 neutrality
V 1 8 2004-09-11 16:56:00 lst27
V 1 63 2004-09-11 18:00:00 zchangu
V 1 5 2004-09-11 19:51:00 orthogonal
V 1 26 2004-09-12 03:04:00 grunt
V -1 25 2004-09-12 03:46:00 blankfaze
V 1 56 2004-09-12 22:00:00 guanaco
V 1 64 2004-09-12 22:51:00 beau99
V 1 19 2004-09-13 00:51:00 ffirehorse
V 1 20 2004-09-13 01:27:00 michael
V 1 7 2004-09-14 19:49:00 texture
V 1 65 2004-09-16 05:01:00 friedmilk
V 1 66 2004-09-17 13:56:00 ezhiki
V 1 39 2004-09-18 07:34:00 squash
Can someone please suggest the best way to load this into mysql db using JDBC?
Thanks!

The best way is to use MySQL LOAD DATA INFILE (http://dev.mysql.com/doc/refman/5.6/en/load-data.html). Using this method, you can tell it to use tab delimited format and skip the first x rows.
If you need to run this from Java, you can wrap the call in Java code. For example on how this is done, you can see how Pentaho project is doing it: https://github.com/pentaho/pentaho-kettle/blob/master/engine/src/org/pentaho/di/trans/steps/mysqlbulkloader/MySQLBulkLoader.java/ Or this simple blog entry: http://jeffrick.com/2010/03/23/bulk-insert-into-a-mysql-database/.
Speaking from experience, this is definitely not a job for JDBC as it will cause too much performance bottleneck. However if for whatever reason you need to use JDBC, just make sure that you use PreparedStatement and batch the rows. Spring has a very good implementation that you can leverage. For example on this use: http://www.mkyong.com/spring/spring-jdbctemplate-batchupdate-example/. Of course you also need to make sure that you stream the file and read line by line accordingly to avoid out of memory.

Related

Predict function R returns 0.0 [duplicate]

I posted earlier today about an error I was getting with using the predict function. I was able to get that corrected, and thought I was on the right path.
I have a number of observations (actuals) and I have a few data points that I want to extrapolate or predict. I used lm to create a model, then I tried to use predict with the actual value that will serve as the predictor input.
This code is all repeated from my previous post, but here it is:
df <- read.table(text = '
Quarter Coupon Total
1 "Dec 06" 25027.072 132450574
2 "Dec 07" 76386.820 194154767
3 "Dec 08" 79622.147 221571135
4 "Dec 09" 74114.416 205880072
5 "Dec 10" 70993.058 188666980
6 "Jun 06" 12048.162 139137919
7 "Jun 07" 46889.369 165276325
8 "Jun 08" 84732.537 207074374
9 "Jun 09" 83240.084 221945162
10 "Jun 10" 81970.143 236954249
11 "Mar 06" 3451.248 116811392
12 "Mar 07" 34201.197 155190418
13 "Mar 08" 73232.900 212492488
14 "Mar 09" 70644.948 203663201
15 "Mar 10" 72314.945 203427892
16 "Mar 11" 88708.663 214061240
17 "Sep 06" 15027.252 121285335
18 "Sep 07" 60228.793 195428991
19 "Sep 08" 85507.062 257651399
20 "Sep 09" 77763.365 215048147
21 "Sep 10" 62259.691 168862119', header=TRUE)
str(df)
'data.frame': 21 obs. of 3 variables:
$ Quarter : Factor w/ 24 levels "Dec 06","Dec 07",..: 1 2 3 4 5 7 8 9 10 11 ...
$ Coupon: num 25027 76387 79622 74114 70993 ...
$ Total: num 132450574 194154767 221571135 205880072 188666980 ...
Code:
model <- lm(df$Total ~ df$Coupon, data=df)
> model
Call:
lm(formula = df$Total ~ df$Coupon)
Coefficients:
(Intercept) df$Coupon
107286259 1349
Predict code (based on previous help):
(These are the predictor values I want to use to get the predicted value)
Quarter = c("Jun 11", "Sep 11", "Dec 11")
Total = c(79037022, 83100656, 104299800)
Coupon = data.frame(Quarter, Total)
Coupon$estimate <- predict(model, newdate = Coupon$Total)
Now, when I run that, I get this error message:
Error in `$<-.data.frame`(`*tmp*`, "estimate", value = c(60980.3823396919, :
replacement has 21 rows, data has 3
My original data frame that I used to build the model had 21 observations in it. I am now trying to predict 3 values based on the model.
I either don't truly understand this function, or have an error in my code.
Help would be appreciated.
Thanks
First, you want to use
model <- lm(Total ~ Coupon, data=df)
not model <-lm(df$Total ~ df$Coupon, data=df).
Second, by saying lm(Total ~ Coupon), you are fitting a model that uses Total as the response variable, with Coupon as the predictor. That is, your model is of the form Total = a + b*Coupon, with a and b the coefficients to be estimated. Note that the response goes on the left side of the ~, and the predictor(s) on the right.
Because of this, when you ask R to give you predicted values for the model, you have to provide a set of new predictor values, ie new values of Coupon, not Total.
Third, judging by your specification of newdata, it looks like you're actually after a model to fit Coupon as a function of Total, not the other way around. To do this:
model <- lm(Coupon ~ Total, data=df)
new.df <- data.frame(Total=c(79037022, 83100656, 104299800))
predict(model, new.df)
Thanks Hong, that was exactly the problem I was running into. The error you get suggests that the number of rows is wrong, but the problem is actually that the model has been trained using a command that ends up with the wrong names for parameters.
This is really a critical detail that is entirely non-obvious for lm and so on. Some of the tutorial make reference to doing lines like lm(olive$Area#olive$Palmitic) - ending up with variable names of olive$Area NOT Area, so creating an entry using anewdata<-data.frame(Palmitic=2) can't then be used. If you use lm(Area#Palmitic,data=olive) then the variable names are right and prediction works.
The real problem is that the error message does not indicate the problem at all:
Warning message: 'anewdata' had 1 rows but variable(s) found to have X
rows
instead of newdata you are using newdate in your predict code, verify once. and just use Coupon$estimate <- predict(model, Coupon)
It will work.
To avoid error, an important point about the new dataset is the name of independent variable. It must be the same as reported in the model. Another way is to nest the two function without creating a new dataset
model <- lm(Coupon ~ Total, data=df)
predict(model, data.frame(Total=c(79037022, 83100656, 104299800)))
Pay attention on the model. The next two commands are similar, but for predict function, the first work the second don't work.
model <- lm(Coupon ~ Total, data=df) #Ok
model <- lm(df$Coupon ~ df$Total) #Ko

Java - get parent process

I think Java doesn't provide much in their API about getting processes, is there a way that you could get a parent's process PID/ID in Java?
If you're running on Linux you can check procfs using /proc/self/stat.
Following from #Imz's answer, on Linux, grab the output of /proc/self/stat (it's a one-line file, so just read it like a normal file)
43732 (java) S 43725 43725 11210 34822 43725 4202496 127791 387073
4055 0 3188 79 4597 253 20 0 53 0 16217706 39231705088 188764
18446744073709551615 4194304 4196452 140735605394256 140735605376816
274479481597 0 0 2 16800973 18446744073709551615 0 0 17 13 0 0 0 0 0
The 4th field (in bold above) is your parent process id

Java Convert an object in table format

I'm implementing an api that reads data from json response and writes the resulting objects to csv.
Is there a way to convert an object in java to a table format (row-column)?
E.g. assume I have these objects:
public class Test1 {
private int a;
private String b;
private Test2 c;
private List<String> d;
private List<Test2> e;
// getters-setters ...
}
public class Test2 {
private int x;
private String y;
private List<String> z;
// getters-setters ...
}
Lets say I have an instance with the following values
Test1 c1 = new Test1();
c1.setA(11);
c1.setB("12");
c1.setC(new Test2(21, "21", Arrays.asList(new String[] {"211", "212"}) ));
c1.setD(Arrays.asList(new String[] {"111", "112"}));
c1.setE(Arrays.asList(new Test2[] {
new Test2(31, "32"),
new Test2(41, "42")
}));
I would like to see something like this returned as a List<Map<String, Object>> or some other object:
a b c.x c.y c.z d e.x e.y
---- ---- ------ ------- ------ ---- ------ ------
11 12 21 21 211 111 31 32
11 12 21 21 211 111 41 42
11 12 21 21 211 112 31 32
11 12 21 21 211 112 41 42
11 12 21 21 212 111 31 32
11 12 21 21 212 111 41 42
11 12 21 21 212 112 31 32
11 12 21 21 212 112 41 42
I have already implemented something in order to achieve this result using reflections but my solution is too slow for larger objects.
I was thinking in using an in memory database so to convert the object into a database table and then select the result, something like MongoDB or ObjectDB, but I think its an overkill, and maybe slower than my approach. Also, these two do not support in memory database and I do not want to use another disk database, since I'm already using MySQL with hibernate. Usint ramdisk is not an option, since my server only has limited ram. Is there there an in memory oodbms that can do this?
I would prefeer as a solution an algorithm, or even better, if there is already a library that can convert any object to a row-column format? something like jackson or jaxb that convert data to/from other formats.
Thanks for the help
Finally after one week of banging my head against any possible thing available in my house I managed to find a solution.
I shared the code on GitHub so that if anyone ever encounters this problem again, he can avoids a couple of migranes :)
you can get the code from here:
https://github.com/Sebb77/Denormalizer
Note: I had to use the getType() function and the FieldType enum for my specific problem.
In the future I will try to speed up the code with some caching, or something else :)
Note2: this is just a sample code that should be used only for reference. Lots of improvements can be done.
Anyone is free to use the code, just send me a thank you email :)
Any suggestions, improvements or bugs reports are very welcome.

Native shared library loaded to slow in Android

I have a shared library placed in libs/armeabi folder. It is loaded using
System.loadLibrary("library_name.so");
The size of the library is around 3MB. The loading time is very long. It sometimes last almost 20 seconds. It blocks my GUI. I tried to put System.loadLibrary("library_name.so"); in a different thread but my GUI is still blocked. I know that others apps use even bigger .so files, but the loading time is not so big. What could be the problem?
EDIT
3MB was the size of the debug version. Release version is about 800KB, but the problem is the same. Some additional info:
.so contains my two c++ libraries which are circularly connected
running arm-linux-androideabi-nm -D -C -g library_name.so displays a lot of functions and variables
I don't use LOCAL_WHOLE_STATIC_LIBRARIES anymore
here are the section headers table obtained by using arm-linux-androideabi-readelf-tool:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .dynsym DYNSYM 00000114 000114 00b400 10 A 2 1 4
[ 2] .dynstr STRTAB 0000b514 00b514 015b0c 00 A 0 0 1
[ 3] .hash HASH 00021020 021020 004d1c 04 A 1 0 4
[ 4] .rel.dyn REL 00025d3c 025d3c 006e98 08 A 1 0 4
[ 5] .rel.plt REL 0002cbd4 02cbd4 000468 08 A 1 6 4
[ 6] .plt PROGBITS 0002d03c 02d03c 0006b0 00 AX 0 0 4
[ 7] .text PROGBITS 0002d6f0 02d6f0 08e6e0 00 AX 0 0 8
[ 8] .ARM.extab PROGBITS 000bbdd0 0bbdd0 00bad0 00 A 0 0 4
[ 9] .ARM.exidx ARM_EXIDX 000c78a0 0c78a0 005b80 08 AL 7 0 4
[10] .rodata PROGBITS 000cd420 0cd420 005cc0 00 A 0 0 4
[11] .data.rel.ro.loca PROGBITS 000d46d8 0d36d8 0006e4 00 WA 0 0 4
[12] .fini_array FINI_ARRAY 000d4dbc 0d3dbc 000008 00 WA 0 0 4
[13] .init_array INIT_ARRAY 000d4dc4 0d3dc4 00009c 00 WA 0 0 4
[14] .data.rel.ro PROGBITS 000d4e60 0d3e60 00384c 00 WA 0 0 8
[15] .dynamic DYNAMIC 000d86ac 0d76ac 000100 08 WA 2 0 4
[16] .got PROGBITS 000d87ac 0d77ac 000854 00 WA 0 0 4
[17] .data PROGBITS 000d9000 0d8000 000648 00 WA 0 0 8
[18] .bss NOBITS 000d9648 0d8648 047271 00 WA 0 0 8
[19] .comment PROGBITS 00000000 0d8648 000026 01 MS 0 0 1
[20] .note.gnu.gold-ve NOTE 00000000 0d8670 00001c 00 0 0 4
[21] .ARM.attributes ARM_ATTRIBUTES 00000000 0d868c 00002d 00 0 0 1
[22] .shstrtab STRTAB 00000000 0d86b9 0000d8 00 0 0 1
Try to reduce number of exported functions in your shared library. You can use
arm-linux-androideabi-nm -D -C -g library_name.so
and check if that list is unnecessarily long, and remove the ones that you don't use (declare them static). You can look up nm's manual by $man nm and read about how to use and interpret it.
If you need to use lots of functions, use RegisterNatives() to register your functions instead of relying on name mangling and lookup - that's what you do when you give your functions names like Java_your_path_YourClass_yourFunction.
You can also try to strip (arm-linux-androideabi-strip) your library, if it has symbols.
To avoid blocking UI, you can try to load your shared library early in a different thread and wait for it.
I wouldn't use LOCAL_WHOLE_STATIC_LIBRARIES, if exposing static libraries is not what I ultimately want.
LOCAL_WHOLE_STATIC_LIBRARIES
These are the static libraries that you want to include in your module without allowing the linker to remove dead code from them.
This is mostly useful if you want to add a static library to a shared library and have the static library's content exposed from the shared library.
Try to fix that problem instead of working around some build problem.
The problem is static initialization in some constructor which takes to much time to finishis

Converting a large ASCII to CSV file in python or java or something else on Linux or Win7

Need a hint so I can convert a huge (300-400 mb) ASCII file to a CSV file.
My ASCII file is a database with a lot of products (about 600,000 pcs = 55,200,000 lines in the file).
Below is ONE product. It is like a tablerow in a database, with 88 columns.
If you count the below lines, there is 92 lines.
For every time we have the '00I+CR\LF' it indicates, that we have a new row/product.
Each line is ended with a CR+LF.
A whole product/row is ended with the following three lines:
A00
A10
A21
-as shown below.
Between the starting line '00I CR+LF' and the three ending lines, we have lines, starting with 2 digits (column name), and what comes after those digits, is the data for the column.
If we take the first line below the starting line '00I CR+LF' we will see:
'0109321609'. 01 indicates that it is the column named 01, and the rest is the data stored in that column: '09321609'.
I want to strip out the two digits, indicating each column name/line-number, so the first line (after the starting indication '00I'): 0109321609 comes out as the following: ”09321609”.
Putting it together with the next line (02), it should give an output like:
”09321609”,”15274”, etc.
When coming to the end, we want a new row.
The first line '00I' and the three last lines 'A00', 'A10' and 'A21' we don't want to be included in the output file.
Here is how a row looks like (every line is ended by a CR+LF):
00I
0109321609
0215274
032
0419685
05
062
072
081
09
111
121
15
161
17
1814740
1920120401
2020120401
2120120401
22
230
240
251
26BLAHBLAH 1000MG
27
281
29
30
31BLAHBLAH 1000 mg Filmtablets Hursutacinzki
32
3336
341
350
361
371
401
410
420
43
445774
45FTA
46
47AN03AX14
48BLAHBLAH00000000000000000000010
491
501
512
522
5317
542
552
561
572
581
591
60
61
62
631
641
65
66
67
681
69
721
74884
761
771
780
790
801
811
831
851474
86
871
880
891
901
911
922
930
941
951
961
97
98
990
A00
A10
A21
Anyone got a hint on how it can be converted?
The file is too big for a webserver with php and mysql to run. My thought was to put the file in a directory on my local server, and read the file, strip out the line numbers, and insert the data directly in a mysql database on the fly, but the file is too big, and the server stalls.
I'm able to run under Linux (Ubuntu) and Windows 7.
Maybe some python or java is recommended? I'm able to run both, but my experience with those is low, but I'm a quick learner, so if someone can give a hint? :-)
Best Regards
Bjarke :-)
If you are absolutely certain that each entry is 92 lines long:
from itertools import izip
import csv
with open('data.txt') as inf, open('data.csv','wb') as outf:
lines = (line[2:].rstrip() for line in inf)
rows = (data[1:89] for data in izip(*([lines]*92)))
csv.writer(outf).writerows(rows)
It should be like this in python.
import csv
fo = csv.writer(open('out.csv','wb'))
with open('eg.txt', 'r') as f:
for line in f:
assert line[:3] == '00I'
buf = []
for i in range(88):
line = f.next()
buf.append(line.strip()[2:])
line = f.next()
assert line[:3] == 'A00'
line = f.next()
assert line[:3] == 'A10'
line = f.next()
assert line[:3] == 'A21'
fo.writerow(buf)

Categories

Resources