Compare actual data (xls) with expected data (xlsx) in Java - java

I have a scenario to run assertions on the Actual data that is provided in XLS file against the Expected Data provided in XLSX file basing on an identifier column in Java. Can anyone provide any advice or suggestion on this please?
Actual Data
Field(Name) Field(Identifier) Entity(Name) ParentEntity(Name)
Lead time Article.DeliveryTime Item None
Expected Data
Field(Name) Field(Identifier) Entity(Name) ParentEntity(Name)
Lead time Article.DeliveryTime Item ParentQualifier
The number of Rows and columns might change basing on the data provided, but the Field(Identifier) would be given in both the files.

I suggest converting the Excel files to some more structured format such as *.csv. You can do that with Excel by just saving it to *.csv. There are many CSV parser libraries.
If that is not possible for some reason (not owner of data, management,...) you could use Apache POI to parse the *.xls / *.xlsx files and then do the testing. How to shown here Link1 or here Link2. Then you can simply run the assertions with JUnit.
There are two potential problems though:
Changing columns: You need to parse the Excel without specifying exact column names. Then only compare columns that have the same name.
Data doesn't fit in memory: Search for id's and only load matching rows.

Related

How to extract data from a PDF file using Tika or any other library and store it in CSV/excel format

I want to extract the data present inside a PDF file and present it in the format of a CSV/Excel sheet.I got to know that this can be done using Tika library in java.But,i did find the solution as to how extract the data as simple text,but i want to know how to store it in an excel sheet.
If someone has done such type of work earlier,then please help me.
The first part (and the hard one) is to parse original data and interpret it as a table. Apache Tika will give you xhtml representation (or call your own handler with SAX events) but it usually won't construct a table for you. From pdf file, I mean, since pdf isn't a tabular format by itself.
So, you'll have to take Tika-produced paragraphs, split them and pass resulting cells to some csv/xls/xlsx writter.
It might work if you have some regular table in you pdf (one line per table row, clean cell logical separation etc). But it will look like parsing plain text, of course.
In case I wouldn't work, you'll have to take pdf parser (like Apache PDFBox) and try to interpret its output.
The second part (output) is simple. If csv/ssv/tsv is suitable for you -- use your preferred library to produce it (I can recommend Apache commons-csv).
But take into account that MS Excel requires BOM for UTF-8 and UTF-16 csv to understand that file isn't in one-byte encoding (like CP-1252 etc).
If you want Excel xls or xlsx format -- just use Apache POI to write it.

Using java to change excel date re

So I am creating a csv report within my java code and using excel to open the exported csv file. One of the column is a date which I am formatting within my code to be mm/dd/yyyy hh:mm:ss. This comes out as 02/10/2014 3:38:00 PM. Which is exactly how I want it. However the columns in the excel sheet display this as 02/10/2014 3:38. When I click on a cell in the excel sheet, it does display the full date at the top but I want it to display on the column itself so that it is easier to print. It doesn't seem like a column width issue since I have changed the column width but the full date still won't appear. I am however able to achieve it by changing the number format cells setting to custom. Is this something that can be done within java itself? Let me know if you need more information. Thanks!
Comma-separated values (CSV) is stores tabular data in plain-text format. To give Excel an instruction how to format a particular column you would need to user Excel format. In order to achieve it, you may use a Java library to export data in Excel format. One example of such a library is Apache POI - the Java API for Microsoft Documents (http://poi.apache.org/).
In addition, to work better with CSV files in Excel use import from text feature. This is a wizard you can specify the import settings, including column formats, width of the fields etc.
I hope it helps.

Java text extraction and data structure design

I have a huge set of data of tables in Open Office 3.0 document format.
Table 1:
(x range)|(x1,y1) |(x2,y2)|(x3,x3)|(x4,y4)
(-20,90) |(-20,0) |(-5,1) |(5,1) |(10,0)
...
Like wise i have n number of tables.All of these tables are fuzzy set membership functions.In simple terms they are computational model's according to which i have to process the input data.There are many number of such tables with differing row size and column size 3/4 .These data's are not going to change once loaded.
Example:
When i get a value of x in between the range -20 to 90.I will apply the first rule(given above).Suppose that it is -1(which is in between value of -20 and -5).Then I have to find a corresponding value between 0 and 1.
My First question is how to extract all the data from the tables in document format so that i can use in my java program.I know a bit of python and I know python can be useful in such cases.But then how to use it in my Java program.
Secondly what would be the best data structure i should use in such a senario.
Note: I'm not using any database.So i would prefer to keep the tables either in xml or some other format so that i can load it easily to the program.I also thinking of making a suitable data structure and then serializing them so that I can load them whenever required instead of parsing a file and recreating the data structure.Please post your comments.
In order to parse an OpenOffice Document in Java (to extract data), you can use a dedicated API such as ODFDOM.
I think this solution is very complicated for what you need. A easier solution would be to extract manually the OpenOffice table, to put it in a format more friendly to parse in Java:
CSV
DataBase (MySQL, etc.)

POI and Excel Lists or Tables

There is a requirement to read multiple Excel "Lists" from one sheet in an Excel file.
I need to know if there is an inbuilt method/api to obtain the dimensions of a list or deal with list specific data. At the bare minimum, is there a way of knowing the "headers" of the list and the number of rows contained in the list data?
I want the POI wrapper that I am writing to be aware of the data elements on the excel, rather than being told through properties/constants or worse; hardcoding the names of the headers/size of the list.
Regards, Vinu
PS: No idea whether the question made any sense...but anyone who gets my drift...please!
I've just written something to do this. AFAIK, Poi itself doesn't understand lists/tables, but you can easily write a wrapper that does. Alternatively, there is the jxls project - this allows complex mappings from POJOs to spreadsheets, and has table reader functionality.
How to: Excel Files has a number of links to projects that might be useful, including an Excel JDBC driver.

POI dynamic templates

Can anyone tell me where do I find some useful documentation on handling copying rows, cells, columns from one excel file to another, using POI?
I need to insert in one blank excel file, 2 or more templates from other files, dynamic.
I also need to keep all the styles made for the group of cells that I copy. How can I do that? Nothing found on apache poi tutorial on this point.
I am using POI 3.0.1.
Thank you!
I assume the problem is data types and merged cells? It's easy enough to get and set styles and set values.
Depending on your use case, you might be able to take entire sheets from the original document, assemble the new document from those and tweak it to your liking. Even if you have to combine multiple source sheets into one target sheet, you might still be able to retrieve source rows and assemble the target document from those rows.
...that was me some time ago...
I never could copy from one excel file to another with the exact style, but I found a solution : I used multiple worksheets instead of multiple excel files, cause style has no problem in being copied from one sheet to another as long as it is in the same workbook.
I also migrated from POI3.0.1 to POI 3.6. Far much better.

Categories

Resources