How to determine that PDImageXObject contain specific image - java

I have 5 images
sun.png
moon.png
sea.png
earth.png
sky.png
I put these images into pdf via pdf box and
I am able to identify the instance of PDImageXObject
PDXObject pdxObject = getResources().getXObject(objectName);
if (pdxObject instanceof PDImageXObject) {
//I want earth.png
PDImageXObject image = (PDImageXObject)xobject;
BufferedImage bImage = image.getImage();
String fileName='earth.png'; // How can I get same file name as it was at the time of placing on pdf
ImageIO.write(bImage,"PNG",new File(fileName));
System.out.println("earth saved.");
}
But how can I determine that this PDImageXObject is earth.png one? ; and
How can I get same file name as it was at the time of placing on pdf

There is no clear way to decide on number of images in a PDF without considering what you mean by images.
It is not as simple as counting visible content, nor use an extraction listing. Here are 5 images on the page, but the page is considered as only having one image, even though I can drag and scale the smaller copies. However, none have names that say Sun Moon Earth and Sea. They did as sun, moon, earth & sea.png but in a PDF they are given different names and ALL 5 are placed in the file as /Im0 but allocated different object numbers and different IDentity descriptions, It does not matter how many places they are linked to, nor total number of copies, they will simply be known by placements, so often the writer will keep it brief as /Im0 /Im1 /Im2 etc. Thus I was slightly surprised to see here there is only /Im0. But I suppose its how you count as a "page canvas media" when overlaying.
Each image has its own "fixed page" but we don't count images as pages here they are called a "form".
<</BBox[0 0 430 430]/Type/XObject/Length 39/Resources<</XObject<</Im0 16 0 R>>>>/Subtype/Form>>
stream
q
430 0 0 430 0 0 cm
/Im0 Do
Q
q
Q
q
Q
endstream
endobj
13 0 obj
<</BBox[0 0 430 430]/Type/XObject/Length 39/Resources<</XObject<</Im0 17 0 R>>>>/Subtype/Form>>
stream
q
430 0 0 430 0 0 cm
/Im0 Do
Q
q
Q
q
Q
endstream
endobj
14 0 obj
<</BBox[0 0 430 430]/Type/XObject/Length 39/Resources<</XObject<</Im0 18 0 R>>>>/Subtype/Form>>
stream
q
430 0 0 430 0 0 cm
/Im0 Do
So Mutool (base for pyMuPDF) agrees in "info" query
Images (1): 1 (4 0 R): [ ASCIIHex ] 1720x860 8bpc DevRGB (10 0 R)
Ok that expected lets extract
extracting image-0010.png
extracting image-0015.png
extracting image-0016.png
extracting image-0017.png
extracting image-0018.png
extracting image-0019.png
extracting image-0020.png
extracting image-0021.png
extracting image-0022.png
So that's expected too there is one image and 8 other different images each a sub layer(s) of the 4 visible.
earth is 16 and 20
moon is 17 and 21
sea is 18 and 22
sun is 15 and 19
However those 8 different pngs are nothing like the source pngs since they are simply shadows of the inputs.
The inter relationship is seen better in
pdfimages -list clipboard4.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1720 860 rgb 3 8 image no 10 0 96 96 8803K 203%
1 1 image 430 430 rgb 3 8 image no 15 0 249 249 1100K 203%
1 2 smask 430 430 gray 1 8 image no 15 0 249 249 367K 203%
1 3 image 430 430 rgb 3 8 image no 16 0 160 160 1100K 203%
1 4 smask 430 430 gray 1 8 image no 16 0 160 160 367K 203%
1 5 image 430 430 rgb 3 8 image no 17 0 288 288 1100K 203%
1 6 smask 430 430 gray 1 8 image no 17 0 288 288 367K 203%
1 7 image 430 430 rgb 3 8 image no 18 0 198 198 1100K 203%
1 8 smask 430 430 gray 1 8 image no 18 0 198 198 367K 203%

Related

EM Clustering with weka with log likelihood of 0 for some clusters? Confusing output

I have clustered 43574 time series using EM clusterer. The output is 24 clusters. I have some questions here. First, is it practically useful to deal with 24 clusters? Isn't it too much? If I am passing the results to neurosurgeon labelling these clusters for the purpose of management of the patient is that going to work? My most important question is that as shown below couple of clusters have 0% likelihood?! what does that mean? Then why they are in different clusters? Any help would be greatly appreciated, And this is what I got:
0 1892 ( 4%)
1 5153 ( 12%)
2 1594 ( 4%)
3 1221 ( 3%)
4 122 ( 0%)
5 2714 ( 6%)
6 7092 ( 16%)
7 141 ( 0%)
8 166 ( 0%)
9 464 ( 1%)
10 3331 ( 8%)
11 4316 ( 10%)
14 2411 ( 6%)
15 2573 ( 6%)
17 3063 ( 7%)
18 142 ( 0%)
19 4211 ( 10%)
20 925 ( 2%)
21 2038 ( 5%)
22 5 ( 0%)
These values are not likelihoods, but size.
data=array([1892, 5153, 1594, 1221, 122, 2714, 7092, 141, 166,
464, 3331, 4316, 2411, 2573, 3063, 142, 4211, 925, 2038, 5])
for f in data * 100. / sum(data): print "%.1f%%" % f,
yields the following relative cluster sizes with an additional digit of precision:
4.3% 11.8% 3.7% 2.8% 0.3% 6.2% 16.3% 0.3% 0.4% 1.1% 7.6% 9.9%
5.5% 5.9% 7.0% 0.3% 9.7% 2.1% 4.7% 0.0%
These are not likelihoods. It's cluster size / data set size.

Java - get parent process

I think Java doesn't provide much in their API about getting processes, is there a way that you could get a parent's process PID/ID in Java?
If you're running on Linux you can check procfs using /proc/self/stat.
Following from #Imz's answer, on Linux, grab the output of /proc/self/stat (it's a one-line file, so just read it like a normal file)
43732 (java) S 43725 43725 11210 34822 43725 4202496 127791 387073
4055 0 3188 79 4597 253 20 0 53 0 16217706 39231705088 188764
18446744073709551615 4194304 4196452 140735605394256 140735605376816
274479481597 0 0 2 16800973 18446744073709551615 0 0 17 13 0 0 0 0 0
The 4th field (in bold above) is your parent process id

Annealing on a multi-layered neural network: XOR experiments

Im begineer in this concept and what I have tried to learn for a feed-forward type neural network(topology of 2x2x1 ):
Bias and weight range of each neuron_____________Outputs for XOR test inputs
[-1,1] 1,1 ----> 0,9
1,0 ----> 0,8
0,1 ---->-0.1
0,0 ----> 0.1
[-10,10] 1,1 ----> 0,24
1,0 ----> 0,67
0,1 ---->-0.54
0,0 ----> 0.10
[-4,4] 1,1 ----> -0,02
1,0 ----> 0,80
0,1 ----> 0.87
0,0 ----> -0.09
So, the range of [-4,4] seems to be better than other.
Question: Is there a way to find the proper limits of weigths and biases compared to temperature limits and temperature decrease rate?
Note: Im trying two ways here. First is randomizing all weights and biases at once for each trial. Second is randomizing only single weight and a single bias at each trial. (50 iterations before decreasing temperature). Single weight change gives worse results.
(n+1) is next value, (n) is the value before
TempMax=2.0
TempMin=0.1 ----->approaching to zero, error of XOR output approaches to zero too
Temp(n+1)=Temp(n)/1.001
Weight update:
w(n+1)=w(n)+(float)(Math.random()*t*2.0f-t*1.0f)); // t is temperature
(same for bias update)
Iterations per temperature=50
Using java's Math.random() method(Spectral property is appropriate for annealing?)
Transition probability:
(1.0f/(1.0f+Math.exp(((candidate state error)-(old error))/temp)))
Neuron activation function: Math.tanh()
Tried many times and results are nearly same. Is reannealing the only solution to evade deeper local minimums?
I need a suitable weight/bias range/limit according to total neuron number and layer number and starting/enging temperature. 3x6x5x6x1 can count 3-bit input and gives outpu, can approximate step function, but I need to play with ranges always.
For this training data set, output error is too big(193 data points, 2 inputs, 1 output):
193 2 1
0.499995 0.653846
1
0.544418 0.481604
1
0.620200 0.320118
1
0.595191 0.404816
0
0.404809 0.595184
1
0.171310 0.636142
0
0.014323 0.403392
0
0.617884 0.476556
0
0.391548 0.478424
1
0.455912 0.721618
0
0.615385 0.500005
0
0.268835 0.268827
0
0.812761 0.187243
0
0.076923 0.499997
1
0.769231 0.500006
0
0.650862 0.864223
0
0.799812 0.299678
1
0.328106 0.614848
0
0.591985 0.722088
0
0.692308 0.500005
1
0.899757 0.334418
0
0.484058 0.419839
1
0.200188 0.700322
0
0.863769 0.256940
0
0.384615 0.499995
1
0.457562 0.508439
0
0.515942 0.580161
0
0.844219 0.431535
1
0.456027 0.529379
0
0.235571 0.104252
0
0.260149 0.400644
1
0.500003 0.423077
1
0.544088 0.278382
1
0.597716 0.540480
0
0.562549 0.651021
1
0.574101 0.127491
1
0.545953 0.731052
0
0.649585 0.350424
1
0.607934 0.427886
0
0.499995 0.807692
1
0.437451 0.348979
0
0.382116 0.523444
1
1 0.500000
1
0.731165 0.731173
1
0.500002 0.038462
0
0.683896 0.536585
1
0.910232 0.581604
0
0.499998 0.961538
1
0.903742 0.769772
1
0.543973 0.470621
1
0.593481 0.639914
1
0.240659 0.448408
1
0.425899 0.872509
0
0 0.500000
0
0.500006 0.269231
1
0.155781 0.568465
0
0.096258 0.230228
0
0.583945 0.556095
0
0.550746 0.575954
0
0.680302 0.935290
1
0.693329 0.461550
1
0.500005 0.192308
0
0.230769 0.499994
1
0.721691 0.831791
0
0.621423 0.793156
1
0.735853 0.342415
0
0.402284 0.459520
1
0.589105 0.052045
0
0.189081 0.371208
0
0.533114 0.579952
0
0.251594 0.871762
1
0.764429 0.895748
1
0.499994 0.730769
0
0.415362 0.704317
0
0.422537 0.615923
1
0.337064 0.743842
1
0.560960 0.806496
1
0.810919 0.628792
1
0.319698 0.064710
0
0.757622 0.393295
0
0.577463 0.384077
0
0.349138 0.135777
1
0.165214 0.433402
0
0.241631 0.758362
0
0.118012 0.341772
1
0.514072 0.429271
1
0.676772 0.676781
0
0.294328 0.807801
0
0.153846 0.499995
0
0.500005 0.346154
0
0.307692 0.499995
0
0.615487 0.452168
0
0.466886 0.420048
1
0.440905 0.797064
1
0.485928 0.570729
0
0.470919 0.646174
1
0.224179 0.315696
0
0.439040 0.193504
0
0.408015 0.277912
1
0.316104 0.463415
0
0.278309 0.168209
1
0.214440 0.214435
1
0.089768 0.418396
1
0.678953 0.767832
1
0.080336 0.583473
1
0.363783 0.296127
1
0.474240 0.562183
0
0.313445 0.577267
0
0.416055 0.443905
1
0.529081 0.353826
0
0.953056 0.687662
1
0.534725 0.448035
1
0.469053 0.344394
0
0.759341 0.551592
0
0.705672 0.192199
1
0.385925 0.775385
1
0.590978 0.957385
1
0.406519 0.360086
0
0.409022 0.042615
0
0.264147 0.657585
1
0.758369 0.241638
1
0.622380 0.622388
1
0.321047 0.232168
0
0.739851 0.599356
0
0.555199 0.366750
0
0.608452 0.521576
0
0.352098 0.401168
0
0.530947 0.655606
1
0.160045 0.160044
0
0.455582 0.518396
0
0.881988 0.658228
0
0.643511 0.153547
1
0.499997 0.576923
0
0.575968 0.881942
0
0.923077 0.500003
0
0.449254 0.424046
1
0.839782 0.727039
0
0.647902 0.598832
1
0.444801 0.633250
1
0.392066 0.572114
1
0.242378 0.606705
1
0.136231 0.743060
1
0.711862 0.641568
0
0.834786 0.566598
1
0.846154 0.500005
1
0.538462 0.500002
1
0.379800 0.679882
0
0.584638 0.295683
1
0.459204 0.540793
0
0.331216 0.430082
0
0.672945 0.082478
0
0.671894 0.385152
1
0.046944 0.312338
0
0.499995 0.884615
0
0.542438 0.491561
1
0.540796 0.459207
1
0.828690 0.363858
1
0.785560 0.785565
0
0.686555 0.422733
1
0.231226 0.553456
1
0.465275 0.551965
0
0.378577 0.206844
0
0.567988 0.567994
0
0.668784 0.569918
1
0.384513 0.547832
1
0.288138 0.358432
1
0.432012 0.432006
1
0.424032 0.118058
1
0.296023 0.703969
1
0.525760 0.437817
1
0.748406 0.128238
0
0.775821 0.684304
1
0.919664 0.416527
0
0.327055 0.917522
1
0.985677 0.596608
1
0.356489 0.846453
0
0.500005 0.115385
1
0.377620 0.377612
0
0.559095 0.202936
0
0.410895 0.947955
1
0.187239 0.812757
1
0.768774 0.446544
0
0.614075 0.224615
0
0.350415 0.649576
0
0.160218 0.272961
1
0.454047 0.268948
1
0.306671 0.538450
0
0.323228 0.323219
1
0.839955 0.839956
1
0.636217 0.703873
0
0.703977 0.296031
0
0.662936 0.256158
0
0.100243 0.665582
1
I highly doubt that any strict rules exist for your problem. First of all, limits/bounds of weights are strictly dependant on your input data representation, activation functions, neurons number and output function. what you can rely on here are rules of the thumb in the best possible scenario.
First, lets consider the initial weights values in classical algorithms. Some basic idea of the weights scale are to use them in the range of [-1,1] for small layers, and for large ones divide it by the square root of the number of units in the large layer. More sophisticated methods are described by Bishop (1995). With such rule of the thumb we could deduce, that a resonable range (which is simply row of magniture bigger then the initial guess) would be something in the form of [-10,10]/sqrt(neurons_count_in_the_lower_layer).
Unfortunately, to my best knowledge, temperature choice is much more complex, as it is rather a data dependant factor, not just topology based one. In some papers there have been suggestions for some values for some specific time series prediction, but nothing general. In simmulated annleaing "in general" (not just applied to NN training), there have been proposed many heuristic choices, ie.
If we know the maximum distance (cost function difference) between one
neighbour and another then we can use this information to calculate a
starting temperature. Another method, suggested in (13. Rayward-Smith, V.J., Osman, I.H., Reeves, C.R., Smith, G.D. 1996. Modern Heuristic Search Methods. John Wiley & Sons.), is to start with a very high temperature and cool it rapidly
until about 60% of worst solutions are being accepted. This forms the
real starting temperature and it can now be cooled more slowly. A
similar idea, suggested in (5. Dowsland, K.A. 1995. Simulated Annealing. In Modern Heuristic Techniques for Combinatorial Problems (ed. Reeves, C.R.), McGraw-Hill, 1995), is to rapidly heat the
system until a certain proportion of worse solutions are accepted and
then slow cooling can start. This can be seen to be similar to how
physical annealing works in that the material is heated until it is
liquid and then cooling begins (i.e. once the material is a liquid it
is pointless carrying on heating it). [from notes from University of Nottingham]
But the choice of the best for your application has to be based on numerous tests, as most of the things in the machine learning. If you are dealing with the problem, where you are really concerned about well trained neural network, it seems resonable to interest in Extreme Machine Learning, and Extreme Learning Machines (ELM), where the neural network training is conducted in the global optimization procedure, which guarantees the best possible solution (under used regularized cost function). Simulated annleaing, as a interative, greedy process (as well as back propagation) cannot guarantee anything, there are only heuristics and rules of thumb.

Native shared library loaded to slow in Android

I have a shared library placed in libs/armeabi folder. It is loaded using
System.loadLibrary("library_name.so");
The size of the library is around 3MB. The loading time is very long. It sometimes last almost 20 seconds. It blocks my GUI. I tried to put System.loadLibrary("library_name.so"); in a different thread but my GUI is still blocked. I know that others apps use even bigger .so files, but the loading time is not so big. What could be the problem?
EDIT
3MB was the size of the debug version. Release version is about 800KB, but the problem is the same. Some additional info:
.so contains my two c++ libraries which are circularly connected
running arm-linux-androideabi-nm -D -C -g library_name.so displays a lot of functions and variables
I don't use LOCAL_WHOLE_STATIC_LIBRARIES anymore
here are the section headers table obtained by using arm-linux-androideabi-readelf-tool:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .dynsym DYNSYM 00000114 000114 00b400 10 A 2 1 4
[ 2] .dynstr STRTAB 0000b514 00b514 015b0c 00 A 0 0 1
[ 3] .hash HASH 00021020 021020 004d1c 04 A 1 0 4
[ 4] .rel.dyn REL 00025d3c 025d3c 006e98 08 A 1 0 4
[ 5] .rel.plt REL 0002cbd4 02cbd4 000468 08 A 1 6 4
[ 6] .plt PROGBITS 0002d03c 02d03c 0006b0 00 AX 0 0 4
[ 7] .text PROGBITS 0002d6f0 02d6f0 08e6e0 00 AX 0 0 8
[ 8] .ARM.extab PROGBITS 000bbdd0 0bbdd0 00bad0 00 A 0 0 4
[ 9] .ARM.exidx ARM_EXIDX 000c78a0 0c78a0 005b80 08 AL 7 0 4
[10] .rodata PROGBITS 000cd420 0cd420 005cc0 00 A 0 0 4
[11] .data.rel.ro.loca PROGBITS 000d46d8 0d36d8 0006e4 00 WA 0 0 4
[12] .fini_array FINI_ARRAY 000d4dbc 0d3dbc 000008 00 WA 0 0 4
[13] .init_array INIT_ARRAY 000d4dc4 0d3dc4 00009c 00 WA 0 0 4
[14] .data.rel.ro PROGBITS 000d4e60 0d3e60 00384c 00 WA 0 0 8
[15] .dynamic DYNAMIC 000d86ac 0d76ac 000100 08 WA 2 0 4
[16] .got PROGBITS 000d87ac 0d77ac 000854 00 WA 0 0 4
[17] .data PROGBITS 000d9000 0d8000 000648 00 WA 0 0 8
[18] .bss NOBITS 000d9648 0d8648 047271 00 WA 0 0 8
[19] .comment PROGBITS 00000000 0d8648 000026 01 MS 0 0 1
[20] .note.gnu.gold-ve NOTE 00000000 0d8670 00001c 00 0 0 4
[21] .ARM.attributes ARM_ATTRIBUTES 00000000 0d868c 00002d 00 0 0 1
[22] .shstrtab STRTAB 00000000 0d86b9 0000d8 00 0 0 1
Try to reduce number of exported functions in your shared library. You can use
arm-linux-androideabi-nm -D -C -g library_name.so
and check if that list is unnecessarily long, and remove the ones that you don't use (declare them static). You can look up nm's manual by $man nm and read about how to use and interpret it.
If you need to use lots of functions, use RegisterNatives() to register your functions instead of relying on name mangling and lookup - that's what you do when you give your functions names like Java_your_path_YourClass_yourFunction.
You can also try to strip (arm-linux-androideabi-strip) your library, if it has symbols.
To avoid blocking UI, you can try to load your shared library early in a different thread and wait for it.
I wouldn't use LOCAL_WHOLE_STATIC_LIBRARIES, if exposing static libraries is not what I ultimately want.
LOCAL_WHOLE_STATIC_LIBRARIES
These are the static libraries that you want to include in your module without allowing the linker to remove dead code from them.
This is mostly useful if you want to add a static library to a shared library and have the static library's content exposed from the shared library.
Try to fix that problem instead of working around some build problem.
The problem is static initialization in some constructor which takes to much time to finishis

Converting a large ASCII to CSV file in python or java or something else on Linux or Win7

Need a hint so I can convert a huge (300-400 mb) ASCII file to a CSV file.
My ASCII file is a database with a lot of products (about 600,000 pcs = 55,200,000 lines in the file).
Below is ONE product. It is like a tablerow in a database, with 88 columns.
If you count the below lines, there is 92 lines.
For every time we have the '00I+CR\LF' it indicates, that we have a new row/product.
Each line is ended with a CR+LF.
A whole product/row is ended with the following three lines:
A00
A10
A21
-as shown below.
Between the starting line '00I CR+LF' and the three ending lines, we have lines, starting with 2 digits (column name), and what comes after those digits, is the data for the column.
If we take the first line below the starting line '00I CR+LF' we will see:
'0109321609'. 01 indicates that it is the column named 01, and the rest is the data stored in that column: '09321609'.
I want to strip out the two digits, indicating each column name/line-number, so the first line (after the starting indication '00I'): 0109321609 comes out as the following: ”09321609”.
Putting it together with the next line (02), it should give an output like:
”09321609”,”15274”, etc.
When coming to the end, we want a new row.
The first line '00I' and the three last lines 'A00', 'A10' and 'A21' we don't want to be included in the output file.
Here is how a row looks like (every line is ended by a CR+LF):
00I
0109321609
0215274
032
0419685
05
062
072
081
09
111
121
15
161
17
1814740
1920120401
2020120401
2120120401
22
230
240
251
26BLAHBLAH 1000MG
27
281
29
30
31BLAHBLAH 1000 mg Filmtablets Hursutacinzki
32
3336
341
350
361
371
401
410
420
43
445774
45FTA
46
47AN03AX14
48BLAHBLAH00000000000000000000010
491
501
512
522
5317
542
552
561
572
581
591
60
61
62
631
641
65
66
67
681
69
721
74884
761
771
780
790
801
811
831
851474
86
871
880
891
901
911
922
930
941
951
961
97
98
990
A00
A10
A21
Anyone got a hint on how it can be converted?
The file is too big for a webserver with php and mysql to run. My thought was to put the file in a directory on my local server, and read the file, strip out the line numbers, and insert the data directly in a mysql database on the fly, but the file is too big, and the server stalls.
I'm able to run under Linux (Ubuntu) and Windows 7.
Maybe some python or java is recommended? I'm able to run both, but my experience with those is low, but I'm a quick learner, so if someone can give a hint? :-)
Best Regards
Bjarke :-)
If you are absolutely certain that each entry is 92 lines long:
from itertools import izip
import csv
with open('data.txt') as inf, open('data.csv','wb') as outf:
lines = (line[2:].rstrip() for line in inf)
rows = (data[1:89] for data in izip(*([lines]*92)))
csv.writer(outf).writerows(rows)
It should be like this in python.
import csv
fo = csv.writer(open('out.csv','wb'))
with open('eg.txt', 'r') as f:
for line in f:
assert line[:3] == '00I'
buf = []
for i in range(88):
line = f.next()
buf.append(line.strip()[2:])
line = f.next()
assert line[:3] == 'A00'
line = f.next()
assert line[:3] == 'A10'
line = f.next()
assert line[:3] == 'A21'
fo.writerow(buf)

Categories

Resources