I'm working on a project for a school fundraiser and I'm supposed to be able to output results onto a PDF or Word Doc that I could easily automate to print out a sheet with the same page content but different results. I'm hoping I would be able to make the page look interesting as well, with bright colors and images.
I've been looking around and these two things caught my eye, which would you suggest I use? iText or Mail Merge with Office? (if you reccommend one over the other, can you also add resources for me to use?)
Thank you!
Mail Merge, no question about it. Of course ultimately iText gives you the power to rewrite the whole page based on where you are sending it (like making a report), but that is not what you are looking for. If by "different results" you mean things like the donor's name and amount of donation, then go for the Mail Merge.
If you are saying you have all kinds different bar charts and content differences per person, then I might think differently, but unless you are a super-amazing high school programmer, you aren't figuring out iText in time to get that done. It is a relatively big deal, programming wise, to put together a PDF from scratch using iText, compared to putting something together in Microsoft Word.
I agree with Jishai about keeping it simple if time is likely to be short (as it might be with funder-raiser schedule). Possibly the JODReports or Docmosis systems might be very handy since they have command line calls you can use to have documents generated base on a mail-merge requirement.
Hope that helps.
Related
I am trying to parse a TABLE in PDF file and display it as CSV. I have attached sample data from PDF below(only few columns) and sample output for the same. Each column width is fixed, let's say Company Name(18 char),Amount(8 char), Type(5 char) etc. I tried using Itext and PDFBox jars to get each page data and parsed line by line, but sounds like it is not a great solution as the line breaks and page breaks in PDF are not proper. Please me let me know if there is any other appropriate solution. We want to use any open source software for this.
This is a very complex problem. There are multiple master dissertations about this even.
An easy analogy: I have 5000 puzzle-pieces, all of them are perfectly square and could fit anywhere. Some of them have pieces of lines on them, some of them have snippets of text.
However, that does not mean it can't be done. It'll just take work.
General approach:
use iText (specifically IEventListener) to get information on all rendering events for every page
select those rendering events that make sense for your application. PathRenderInfo and TextRenderInfo.
Events in a pdf do not need to appear in order according to the spec. Solve this problem by implementing a comparator over IEventData. This comparator should sort according to reading order. This implies you might have to implement some basic language detection, since not every language reads left-to-right.
Once sorted, you can now start clustering items together according to any of the various heuristics you find in literature. For instance, two characters can be grouped into a snippet of text if they follow each other in the sorted list of events (meaning they appear next to each other in reading order), if the y-position does not differ too much (subscript and superscript might screw with this), and if the x-position does not differ too much (kerning).
Continue clustering characters until you have formed words
Assuming you have formed words, use similar algorithm to form words into lines. Use PathRenderInfo to withhold merging words if they intersect with a line.
Assuming you have managed to create lines, now look for tables. One possible approach is apply a horizontal and vertical projection. And look for those sub-areas in the page that (when projected) show a grid-like structure.
This high-level approach should make it painfully obvious why this is not a widely available thing. It's very hard to implement. It requires domain-knowledge of both PDF, fonts, and machine-learning.
If you are ok with commercial solutions, try out pdf2Data. It's an iText add-on that features this exact functionality.
http://itextpdf.com/itext7/pdf2Data
Here is a biological database, http://www.genecards.org/index.php?path=/GeneDecks
Usually, if I type in a gene name (string) (ex. TF53) and summit it, it will come back with a result on the webpage. Also, it can be chosen if users want to save it as tab-delimited/XML file. However, I have a list of gene name which contains more than thousands of gene name. How can I automate this a series of processes by Java program ?
I know this question can be quite broad and probably has various way to do. With only a little experience in Java programming, I really appreciate if someone can suggest a easier way to do it. Thanks.
One of the possibilities is to read gene names sequentially from your list and send for each other that request:
http://www.genecards.org/index.php?path=/GeneDecks/ParalogHunter/<your gene name>/100/{%22Sequence_Paralogs%22:%221%22,%22Domains%22:%221%22,%22Super_Pathways%22:%221%22,%22Expression_Patterns%22:%221%22,%22Phenotypes%22:%221%22,%22Compounds%22:%221%22,%22Disorders%22:%221%22,%22Gene_Ontologies%22:%221%22}
(so basically mimic what the site does).
For example:
http://www.genecards.org/index.php?path=/GeneDecks/ParalogHunter/TNFRSF10B/100/{%22Sequence_Paralogs%22:%221%22,%22Domains%22:%221%22,%22Super_Pathways%22:%221%22,%22Expression_Patterns%22:%221%22,%22Phenotypes%22:%221%22,%22Compounds%22:%221%22,%22Disorders%22:%221%22,%22Gene_Ontologies%22:%221%22}
However, they might not like people using their site in such way (submitting a lot of automatic requests). You might want to check their policy on that. Also, other thing to check is if they have an official API which can be used for batch retrieval of gene information.
I don't know what is the name of this visualization type, but I want to learn how to draw trees like the ones in this image:
I've seen this kind of visualization in many sites but I'm unable to know the technical term behind it.
That graph seems a lot like a force-directed layout. Painting those kind of images is not an easy task, depending on what are you trying to accomplish, you might want to use an existing framwork. If you want to use java you should see at gephi, if you can use an html approach you should definitely take a look at d3.js which is a javascript library for data visualization. They have neat examples: directed-force layout, and collapsible-force layout.
This particular image is done by Stephanie Posavec. You can learn about her design process from an interview she gave the folks at the Data Stories podcast. As far as I remember it, she partially crafts her visualizations by hand, so I'm not sure if you'll ever find an algorithm that does exactly this for you. For different tree layout algorithms, you can refer to treevis.net.
So I have two possible solutions that I want to implement. Firstly I will state my problem. The task I have been assigned to requires me to go to a website called finra.org and do broker checks to see if the brokers in my excel sheet(which gives the name and company among other things) still work A, and if they do work do they still work for the company in the excel sheet. If they do move on to the next one, and if they don't delete them from the sheet. The issue lies in that I have 37k names to check. I calculated this and to do it individually, which is annoying and takes the whole day allows me to do a maximum of 1400 a day. That is on a productive day when I dont have other things to do. So I figured a better use of my time ( I am an intern) would be to write a program which (here are my two suggested solutions:)
1.) Automatically through minimal key strokes copies the data and pastes it into the search box on the page. Ill still have to click and search but at least I would eliminate copying and pasting and switching between screens which takes the majority of the time.
2.) Completely automate the process. I was thinking of copying the names into a text file and then somehow writing a program that takes each name and submits a query to this website which would then show me the result. Perhaps sends the result text to a text file and then i could just GREP the text file for the data that i need.
Any idea if any of this is possible?
Thanks,
Kevin
Definitely possible. I'm doing something similar with a database and an Excel spreadsheet of values using AutoHotKey to automate queries, Chrome console commands and Javascript bookmarklets to scrape data into the clipboard, and Ruby/Nokogiri with more complex and/or structured parsing tasks.
Either of your methods will work - if you have little programming background, I would suggest starting with AutoHotKey since it mimics keyboard and mouse commands, so the programming is much more straightforward and easier to understand. If you have some object-oriented programming skills, learning Ruby/Nokogiri might be your solution, depending on how FINRA's page is structued.
I'm looking for several methods to compare two images to see how similar they are. Currently I plan to have percentages as the 'similarity index' end-result. My program outline is something like this:
User selects 2 images to compare.
With a button, the images are compared using several different methods.
At the end, each method will have a percentage next to it indicating how similar the images are based on that method.
I've done a lot of reading lately and some of the stuff I've read seems to be incredibly complex and advanced and not for someone like me with only about a year's worth of Java experience. So far I've read about:
The Fourier Transform - im finding this rather confusing to implement in Java, but apparently the Java Advanced Imaging API has a class for it. Though I'm not sure how to convert the output to an actual result
SIFT algorithm - seems incredibly complex
Histograms - probably the easiest out of all mentioned so far
Pixel grabbing - seems viable but if theres a considerable amount of variation between the two images it doesn't look like it's going to produce any sort of accurate result. I might be wrong?
I also have the idea of pre-processing an image using a Sobel filter first, then comparing it. Problem is the actual comparing part.
So yeah I'm looking to see if anyone has ideas for comparing images in Java. Hoping that there are people here that have done similar projects before. I just want to get some input on viable comparison techniques that arent too hard to implement in Java.
Thanks in advance
Fourier Transform - This can be used to efficiently can compute the cross-correlation, which will tell you how to align the two images and how similar they are, when they are optimally aligned.
Sift descriptors - These can be used to compare local features. They are often used for correspondence analysis and object recognition. (See also SURF)
Histograms - The normalized cross-correlation often yields good results for comparing images on a global level. But since you are just comparing color distributions you could end up declaring an outdoor scene with lots of snow as similar to an indoor scene with lots of white wallpaper...
Pixel grabbing - No idea what this is...
You can get a good overview from this paper. Another field you might to look into is content based image retrieval (CBIR).
Sorry for not being Java specific. HTH.
As a better alternative to simple pixel grabbing, try SSIM. It does require that your images are essentially of the same object from the same angle, however. It's useful if you're comparing images that have been compressed with different algorithms, for example (e.g. JPEG vs JPEG2000). Also, it's a fairly simple approach that you should be able to implement reasonably quickly to see some results.
I don't know of a Java implementation, but there's a C++ implementation using OpenCV. You could try to re-use that (through something like javacv) or just write it from scratch. The algorithm itself isn't that complicated anyway, so you should be able to implement it directly.