Parsing a bank statement PDF

Parsing a bank statement PDF - java

I have several bank statements from our users. I am trying to figure out a way to parse the rows of transactions. I have used PDFBox previously using TextArea, TextStripper, but i am not sure how to proceed with bank statements since they will have an undetermined number of rows and the rows may or maynot be of fixed size.

i wrote just such a parser to parse our chase pdf credit card statements, to speed up the tax-preparation time, with the help of an open source project called Apache Tika.
just need to include tika and pdf parser in your pom.xml dependencies:
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.17</version>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.17</version>
</dependency>
the PDF extractor is fairly straightforward also:
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.pdf.PDFParser;
import org.apache.tika.sax.BodyContentHandler;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.xml.sax.ContentHandler;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
public class PdfExtractor {
private static Logger logger = LoggerFactory.getLogger(PdfExtractor.class);
public static void main(String args[]) throws Exception {
StopWatch sw = new StopWatch();
List<String> files = new ArrayList<>();
files.add("C:/Users/m/Downloads/20170115.pdf");
files.add("C:/Users/m/Downloads/20170215.pdf");
files.add("C:/Users/m/Downloads/20170315.pdf");
files.add("C:/Users/m/Downloads/20170415.pdf");
files.add("C:/Users/m/Downloads/20170515.pdf");
files.add("C:/Users/m/Downloads/20170615.pdf");
files.add("C:/Users/m/Downloads/20170715.pdf");
files.add("C:/Users/m/Downloads/20170815.pdf");
files.add("C:/Users/m/Downloads/20170915.pdf");
files.add("C:/Users/m/Downloads/20171015.pdf");
files.add("C:/Users/m/Downloads/20171115.pdf");
files.add("C:/Users/m/Downloads/20171215.pdf");
files.add("C:/Users/m/Downloads/20180115.pdf");
InputStream is;
List<ChasePdfParser.ChaseRecord> full = new ArrayList<>();
for (String fileName : files) {
logger.info("Now processing " + fileName);
is = new FileInputStream(fileName);
ContentHandler contenthandler = new BodyContentHandler();
Metadata metadata = new Metadata();
PDFParser pdfparser = new PDFParser();
pdfparser.parse(is, contenthandler, metadata, new ParseContext());
String data = contenthandler.toString();
List<ChasePdfParser.ChaseRecord> chaseRecords = ChasePdfParser.parse(data);
full.addAll(chaseRecords);
is.close();
}
logger.info("Total processing time: " + PrettyPrinter.toMsSoundsGood(sw.getTime()));
full.forEach(cr -> System.err.println(cr.date + "|" + cr.desc + "|" + cr.amt));
}
}
The line parser also fairly straight-forward, since each line has all the necessary info, it's easy to parse it:
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class ChasePdfParser {
private static Logger logger = LoggerFactory.getLogger(ChasePdfParser.class);
private static int FOR_TAX_YEAR = 2017;
private static String YEAR_EXTENSION = "/" + FOR_TAX_YEAR;
private static DateTimeFormatter check = DateTimeFormatter.ofPattern("MM/dd/uuuu");
private static List<String> exclusions = new ArrayList<>(Arrays.asList("Payment Thank You", "AUTOMATIC PAYMENT"));
public static List<ChaseRecord> parse(String data) {
List<ChaseRecord> l = new ArrayList<>();
for (String line : data.split("\n")) {
if (line.isEmpty()) continue;
String[] split = line.split("\\s");
if (split == null || split.length == 0) continue;
String test = split[0];
if (!isMMDD(test)) continue;
if(skip(line)) continue;
if (split.length < 4) continue;
ChaseRecord cr = new ChaseRecord();
cr.date = extractDate(test);
try {
String last = split[split.length - 1];
last = last.replaceAll(",", "");
cr.amt = Double.parseDouble(last);
} catch (NumberFormatException e) {
e.printStackTrace();
}
cr.desc = String.join(" ", Arrays.copyOfRange(split, 1, split.length - 1));
cr.desc = cr.desc.replaceAll("\\s\\s+", " ");
l.add(cr);
}
return l;
}
private static boolean skip(String s) {
if (s == null || s.isEmpty()) {
return true;
}
for (String e : exclusions) {
if (s.contains(e)) {
return true;
}
}
return false;
}
protected static LocalDate extractDate(String s) {
if (!isMMDD(s)) {
return null;
}
LocalDate localDate = LocalDate.parse(s + YEAR_EXTENSION, check);
return localDate;
}
public static boolean isMMDD(String s) {
if (s == null || s.isEmpty() || s.length() != 5) {
return false;
}
try {
s += YEAR_EXTENSION;
LocalDate.parse(s, check);
return true;
} catch (Exception e) {
return false;
}
}
public static class ChaseRecord {
public LocalDate date;
public String desc;
public Double amt;
#Override
public String toString() {
return "ChaseRecord{" +
"date=" + date +
", desc='" + desc + '\'' +
", amt=" + amt +
'}';
}
}
}

Late to the party. You can also use pdftotext as a workaround. Everyone once in a great while it will miss out an amount of currency, particularly in the upper right of the table.
As you'd expect, you'll join the text on newlines and then start chopping the lines into lists, thence to write it to a tsv. The approach looks like this:
HTH.
import csv
import pdftotext
import re
from datetime import *
import os
import pandas as pd
# compile directory ref:
path='path to directory'
directory = os.fsencode(path)
# https://stackoverflow.com/questions/42202872/how-to-convert-list-to-row-dataframe-with-pandas
column_list = ['filesource','filedate','eventdate','description','bankcategory','amount']
filelist=[]
# an example of how to scrape chase statement pdf into list of lists:
def process_pdf_data(filename,filesource,filedate):
# trying with pdftotext
# print('starting pdf content scrape', file)
with open(filename, "rb") as f:
pdf = pdftotext.PDF(f)
pdf_join="\n".join(pdf)
pdf_array=pdf_join.split('\n')
# print(pdf_array)
startint=0
line=''
# at this point, the pdf_array is just a list of strings read serially from the pdf in succession down the page.
while line!='Account activity' and startint<=1000:
line=pdf_array[startint]
startint+=1
startint-=1 # bc it still gets incremented on exit above
# drop data before 'Account activity' as we won't need it.
del pdf_array[:startint]
# print(pdf_array)
# set pattern for date detection
# https://www.programiz.com/python-programming/regex
# https://docs.python.org/3/library/re.html
pattern=re.compile("^([A-Z]|[a-z]){3} [0-9]{1,2}, [0-9]{4}$") # test pattern for regex eval of date
startint=0 # use for test exit limits
# print('entering pdf content eval', file)
while startint<len(pdf_array):
# if string has certain date format:
# if it doesn't have this conversion then it's suspect and maybe write it to log
# print(startint,pdf_array[startint])
if pattern.match(pdf_array[startint])!=None:
# transform it to date
# https://docs.python.org/3/library/datetime.html
datestr=datetime.strptime(pdf_array[startint], '%b %d, %Y').date().isoformat()
# print('pattern match',datestr)
# look ahead and keep next few strings:
description=pdf_array[startint+2]
bankcategory=pdf_array[startint+4]
amount=''
if '$' in pdf_array[startint+6]:
amount=pdf_array[startint+6] # will mess with $/string type conversion downstream, when combining sources
# write to list of lists
templist=[]
templist.append(filesource)
templist.append(filedate)
templist.append(datestr)
templist.append(description)
templist.append(bankcategory)
templist.append(amount)
# print(templist)
filelist.append(templist)
startint+=1
process_pdf_data(,,)

Related

validate ArrayList contents against specific set of data

I want to check and verify that all of the contents in the ArrayList are similar to the value of a String variable. If any of the value is not similar, the index number to be printed with an error message like (value at index 2 didn't match the value of expectedName variable).
After I run the code below, it will print all the three indexes with the error message, it will not print only the index number 1.
Please note that here I'm getting the data from CSV file, putting it into arraylist and then validating it against the expected data in String variable.
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;
import java.io.IOException;
import java.io.Reader;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
public class ValidateVideoDuration {
private static final String CSV_FILE_PATH = "C:\\Users\\videologs.csv";
public static void main(String[] args) throws IOException {
String expectedVideo1Duration = "00:00:30";
String expectedVideo2Duration = "00:00:10";
String expectedVideo3Duration = "00:00:16";
String actualVideo1Duration = "";
String actualVideo2Duration = "";
String actualVideo3Duration = "";
ArrayList<String> actualVideo1DurationList = new ArrayList<String>();
ArrayList<String> actualVideo2DurationList = new ArrayList<String>();
ArrayList<String> actualVideo3DurationList = new ArrayList<String>();
try (Reader reader = Files.newBufferedReader(Paths.get(CSV_FILE_PATH));
CSVParser csvParser = new CSVParser(reader,
CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim());) {
for (CSVRecord csvRecord : csvParser) {
// Accessing values by Header names
actualVideo1Duration = csvRecord.get("Video 1 Duration");
actualVideo1DurationList.add(actualVideo1Duration);
actualVideo2Duration = csvRecord.get("Video 2 Duration");
actualVideo2DurationList.add(actualVideo2Duration);
actualVideo3Duration = csvRecord.get("Video 3 Duration");
actualVideo3DurationList.add(actualVideo3Duration);
}
}
for (int i = 0; i < actualVideo2DurationList.size(); i++) {
if (actualVideo2DurationList.get(i) != expectedVideo2Duration) {
System.out.println("Duration of Video 1 at index number " + Integer.toString(i)
+ " didn't match the expected duration");
}
}
The data inside my CSV file look like the following:
video 1 duration, video 2 duration, video 3 duration
00:00:30, 00:00:10, 00:00:16
00:00:30, 00:00:15, 00:00:15
00:00:25, 00:00:10, 00:00:16

Don't use == or != for string compare. == checks the referential equality of two Strings and not the equality of the values. Use the .equals() method instead.
Change your if condition to if (!actualVideo2DurationList.get(i).equals(expectedVideo2Duration))

com.fasterxml.jackson.core.JsonParseException: Unexpected character ('\' (code 92)) [Java]

I am deeply sorry for this messy title, but I am completly lost on why this can happen.
I am trying to parse a JSON String using Jackson. My code is simple:
import com.fasterxml.jackson.databind.ObjectMapper;
import formatter.Tweet;
import com.fasterxml.jackson.databind.DeserializationFeature;
public class FormatterTester {
static String tweet = "{\"created_at\":\"Fri May 03 11:43:17 +0000 2019\",\"id\":1124278249620566017,\"id_str\":\"1124278249620566017\",\"text\":\"RT #entkom: '\\u0e40\\u0e0b\\u0e49\\u0e19\\u0e15\\u0e4c-\\u0e28\\u0e38\\u0e20\\u0e1e\\u0e07\\u0e29\\u0e4c' \\u0e41\\u0e08\\u0e01\\u0e04\\u0e27\\u0e32\\u0e21\\u0e19\\u0e48\\u0e32\\u0e23\\u0e31\\u0e01 \\u0e21\\u0e2d\\u0e1a\\u0e04\\u0e27\\u0e32\\u0e21\\u0e2a\\u0e38\\u0e02\\u0e43\\u0e2b\\u0e49\\u0e41\\u0e1f\\u0e19\\u0e04\\u0e25\\u0e31\\u0e1a https:\\/\\/t.co\\/hBbi5hzEH8\",\"source\":\"\\u003ca href=\\\"http:\\/\\/twitter.com\\/download\\/android\\\" rel=\\\"nofollow\\\"\\u003eTwitter for Android\\u003c\\/a\\u003e\",\"truncated\":false,\"in_reply_to_status_id\":null,\"in_reply_to_status_id_str\":null,\"in_reply_to_user_id\":null,\"in_reply_to_user_id_str\":null,\"in_reply_to_screen_name\":null,\"user\":{\"id\":1062336001941504001,\"id_str\":\"1062336001941504001\",\"name\":\"\\ud83d\\udc0a\\u26bd\\ud83d\\udc2f\\ud83c\\udfb8\\ud83d\\udc99sugajin\\/\\/\\ud83d\\udc9a\\ud83d\\udc7b\\ud83d\\udc32\\ud83d\\udc0a\",\"screen_name\":\"sugajinBTS1\",\"location\":null,\"url\":null,\"description\":\"#BTS\\u597d\\u304d\\ud83d\\udc95\\u30b8\\u30f3\\u30cb\\u30e0\\u3088\\u308a\\u306e\\uff75\\uff99\\uff8d\\uff9f\\uff9d\\n#LGBTQ\\u304c\\u3082\\u3063\\u3068\\u7406\\u89e3\\u3055\\u308c\\u3066\\u6b32\\u3057\\u3044\\n#lovebychance\\u306e\\u6cbc\\u306b\\u30cf\\u30de\\u308a\\u4e2d\\n#season2\\u3068\\u3063\\u3066\\u3082\\u671f\\u5f85\\uff01\\uff01\\n#PinSon\\u2665SonPin\\n#2wish\\ud83d\\udc99\\ud83d\\udc9a\\n#Magus\\n#TeamReal\\n#LBCForever\\n\\u7121\\u8a00\\u30d5\\u30a9\\u30ed\\u30fc\\u5931\\u793c\\u3057\\u307e\\u3059\\ud83d\\ude47\",\"translator_type\":\"none\",\"protected\":false,\"verified\":false,\"followers_count\":61,\"friends_count\":224,\"listed_count\":0,\"favourites_count\":37785,\"statuses_count\":11611,\"created_at\":\"Tue Nov 13 13:26:54 +0000 2018\",\"utc_offset\":null,\"time_zone\":null,\"geo_enabled\":false,\"lang\":\"ja\",\"contributors_enabled\":false,\"is_translator\":false,\"profile_background_color\":\"F5F8FA\",\"profile_background_image_url\":\"\",\"profile_background_image_url_https\":\"\",\"profile_background_tile\":false,\"profile_link_color\":\"1DA1F2\",\"profile_sidebar_border_color\":\"C0DEED\",\"profile_sidebar_fill_color\":\"DDEEF6\",\"profile_text_color\":\"333333\",\"profile_use_background_image\":true,\"profile_image_url\":\"http:\\/\\/pbs.twimg.com\\/profile_images\\/1062337509701513216\\/5HFkKxoi_normal.jpg\",\"profile_image_url_https\":\"https:\\/\\/pbs.twimg.com\\/profile_images\\/1062337509701513216\\/5HFkKxoi_normal.jpg\",\"profile_banner_url\":\"https:\\/\\/pbs.twimg.com\\/profile_banners\\/1062336001941504001\\/1543643861\",\"default_profile\":true,\"default_profile_image\":false,\"following\":null,\"follow_request_sent\":null,\"notifications\":null},\"geo\":null,\"coordinates\":null,\"place\":null,\"contributors\":null,\"retweeted_status\":{\"created_at\":\"Fri May 03 01:29:52 +0000 2019\",\"id\":1124123879654301696,\"id_str\":\"1124123879654301696\",\"text\":\"'\\u0e40\\u0e0b\\u0e49\\u0e19\\u0e15\\u0e4c-\\u0e28\\u0e38\\u0e20\\u0e1e\\u0e07\\u0e29\\u0e4c' \\u0e41\\u0e08\\u0e01\\u0e04\\u0e27\\u0e32\\u0e21\\u0e19\\u0e48\\u0e32\\u0e23\\u0e31\\u0e01 \\u0e21\\u0e2d\\u0e1a\\u0e04\\u0e27\\u0e32\\u0e21\\u0e2a\\u0e38\\u0e02\\u0e43\\u0e2b\\u0e49\\u0e41\\u0e1f\\u0e19\\u0e04\\u0e25\\u0e31\\u0e1a https:\\/\\/t.co\\/hBbi5hzEH8\",\"source\":\"\\u003ca href=\\\"http:\\/\\/twitter.com\\\" rel=\\\"nofollow\\\"\\u003eTwitter Web Client\\u003c\\/a\\u003e\",\"truncated\":false,\"in_reply_to_status_id\":null,\"in_reply_to_status_id_str\":null,\"in_reply_to_user_id\":null,\"in_reply_to_user_id_str\":null,\"in_reply_to_screen_name\":null,\"user\":{\"id\":69565234,\"id_str\":\"69565234\",\"name\":\"ent_komchadluek\",\"screen_name\":\"entkom\",\"location\":null,\"url\":null,\"description\":null,\"translator_type\":\"none\",\"protected\":false,\"verified\":false,\"followers_count\":6684,\"friends_count\":1115,\"listed_count\":86,\"favourites_count\":14,\"statuses_count\":31813,\"created_at\":\"Fri Aug 28 11:28:17 +0000 2009\",\"utc_offset\":null,\"time_zone\":null,\"geo_enabled\":false,\"lang\":\"en\",\"contributors_enabled\":false,\"is_translator\":false,\"profile_background_color\":\"FF6699\",\"profile_background_image_url\":\"http:\\/\\/abs.twimg.com\\/images\\/themes\\/theme11\\/bg.gif\",\"profile_background_image_url_https\":\"https:\\/\\/abs.twimg.com\\/images\\/themes\\/theme11\\/bg.gif\",\"profile_background_tile\":true,\"profile_link_color\":\"B40B43\",\"profile_sidebar_border_color\":\"CC3366\",\"profile_sidebar_fill_color\":\"E5507E\",\"profile_text_color\":\"362720\",\"profile_use_background_image\":true,\"profile_image_url\":\"http:\\/\\/pbs.twimg.com\\/profile_images\\/471687167\\/ent1_normal.jpg\",\"profile_image_url_https\":\"https:\\/\\/pbs.twimg.com\\/profile_images\\/471687167\\/ent1_normal.jpg\",\"default_profile\":false,\"default_profile_image\":false,\"following\":null,\"follow_request_sent\":null,\"notifications\":null},\"geo\":null,\"coordinates\":null,\"place\":null,\"contributors\":null,\"is_quote_status\":false,\"quote_count\":9,\"reply_count\":33,\"retweet_count\":584,\"favorite_count\":505,\"entities\":{\"hashtags\":[],\"urls\":[{\"url\":\"https:\\/\\/t.co\\/hBbi5hzEH8\",\"expanded_url\":\"http:\\/\\/www.komchadluek.net\\/news\\/ent\\/370511#.XMuZj_HCjrY.twitter\",\"display_url\":\"komchadluek.net\\/news\\/ent\\/37051\\u2026\",\"indices\":[52,75]}],\"user_mentions\":[],\"symbols\":[]},\"favorited\":false,\"retweeted\":false,\"possibly_sensitive\":false,\"filter_level\":\"low\",\"lang\":\"th\"},\"is_quote_status\":false,\"quote_count\":0,\"reply_count\":0,\"retweet_count\":0,\"favorite_count\":0,\"entities\":{\"hashtags\":[],\"urls\":[{\"url\":\"https:\\/\\/t.co\\/hBbi5hzEH8\",\"expanded_url\":\"http:\\/\\/www.komchadluek.net\\/news\\/ent\\/370511#.XMuZj_HCjrY.twitter\",\"display_url\":\"komchadluek.net\\/news\\/ent\\/37051\\u2026\",\"indices\":[64,87]}],\"user_mentions\":[{\"screen_name\":\"entkom\",\"name\":\"ent_komchadluek\",\"id\":69565234,\"id_str\":\"69565234\",\"indices\":[3,10]}],\"symbols\":[]},\"favorited\":false,\"retweeted\":false,\"possibly_sensitive\":false,\"filter_level\":\"low\",\"lang\":\"th\",\"timestamp_ms\":\"1556883797446\"}";
public static void main(String[]args) {
String valor_retorno= null;
Tweet tw;
try {
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
tw = objectMapper.readValue(tweet, Tweet.class);
System.out.println("Check 3 - El formatter retorna:\n"+tw.toString());
valor_retorno = tw.toString();
} catch (Exception e) {
e.printStackTrace();
System.out.println("\nException " + e.getClass() + ": " + e.getMessage());
} finally {
System.out.println("\nReturn: Valor_retorno = "+valor_retorno);
}
}
}
If you run the code you'll see it works fine. Where is the problem then? I have to do this same operation on an Oracle NoSQL database. It's not important to know any of the parts related to the data retrieval since they work fine, I've tested them. The code is quite similar:
String data = new String(value.toByteArray(),StandardCharsets.UTF_8);
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
objectMapper.configure(Feature.ALLOW_UNQUOTED_CONTROL_CHARS, true);
tw = objectMapper.readValue(data, Tweet.class);
My objective is to obtain exactly the same result as in the first code. A String of values separated by '|' according of the attributes of my class Tweet.
However, this code is compressed in a Jar file, and run internally by the database over all the Tweets recorded. I can't see what happens nor debug it, but it produces the following exception:
com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens
I've tried scaping the string "data" with StringEscapeUtils.escapeJava(data);
what then produces the following exception:
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('\' (code 92)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
I've also tried scaping the string like this data.replace('\'', ' '); without success.
I can't understand after many tests, why it runs well on the demo I put here first and not on the actual project, having exactly the same dependencies.

For some reason, Jackson can't parse what I retrieve from the DataBase. This is most likely due to a problem of codification or decodification in the CentOS that my Docker container uses to hold the DB and where the script is invoqued and executed.
In the end, using Gson for the parsing was the best option, though it would still produce errors if you don't trim() the String. Apparently, for some reason the JSON came quoted twice. This is, ""JSON text"".
The code:
package formatter;
import java.io.*;
import java.lang.String;
import java.nio.charset.StandardCharsets;
import java.util.List;
import oracle.kv.*;
import com.google.gson.Gson;
import oracle.kv.KeyValueVersion;
import oracle.kv.exttab.Formatter;
public class TweetFormatter implements Formatter {
public TweetFormatter() {
super();
}
public String toOracleLoaderFormat(final KeyValueVersion kvv, final KVStore kvStore){
String valor_retorno= null;
Tweet tw; //antes sin null
BufferedWriter bf = FormatterUtils.getInstance().getWriter();
try {
final Key key = kvv.getKey();
final Value value = kvv.getValue();
Value.Format format = value.getFormat();
FormatterUtils.getInstance().writeLine(bf,"[Key: "+ key + ", Value:" +value.toByteArray()+ "]" + ". Format= "+ format.toString());
//Filtrar Clave
List<String> major = key.getMajorPath();
FormatterUtils.getInstance().writeLine(bf,"Check 1:\n Key is: "+key + "\n Key length is: "+major.size()
+ "\n Values are: "+major.toString() + "\n contains: "+major.contains("TweeterStream"));
Boolean contains = false;
for(String x : major) {
if(x.equals("TweeterStream")||x.equals("/TweeterStream")||x.equals("/TweeterStream/")) {
contains = true;
break;
}
}
//Parsear
if(contains){
String data = new String(value.toByteArray(),StandardCharsets.UTF_8);
data = data.trim();
tw = new Gson().fromJson(data,Tweet.class); //FUNCIONA
FormatterUtils.getInstance().writeLine(bf,"Check 3 - El formatter retorna:\n"+tw.toString());
valor_retorno = tw.toString();
}else{
FormatterUtils.getInstance().writeLine(bf,"\nEstoy en else");
}
FormatterUtils.getInstance().writeLine(bf,"\nestoy fuera del if-else");
} catch (Exception e) {
e.printStackTrace();
FormatterUtils.getInstance().writeLine(bf, "\nException " + e.getClass() + ": " + e.getMessage());
} finally {
FormatterUtils.getInstance().writeLine(bf,"\nReturn: Valor_retorno = "+valor_retorno);
FormatterUtils.getInstance().generateLog(bf);
}
return valor_retorno;
}

How to compare two XML files with java? [duplicate]

I'm trying to write an automated test of an application that basically translates a custom message format into an XML message and sends it out the other end. I've got a good set of input/output message pairs so all I need to do is send the input messages in and listen for the XML message to come out the other end.
When it comes time to compare the actual output to the expected output I'm running into some problems. My first thought was just to do string comparisons on the expected and actual messages. This doens't work very well because the example data we have isn't always formatted consistently and there are often times different aliases used for the XML namespace (and sometimes namespaces aren't used at all.)
I know I can parse both strings and then walk through each element and compare them myself and this wouldn't be too difficult to do, but I get the feeling there's a better way or a library I could leverage.
So, boiled down, the question is:
Given two Java Strings which both contain valid XML how would you go about determining if they are semantically equivalent? Bonus points if you have a way to determine what the differences are.

Sounds like a job for XMLUnit
http://www.xmlunit.org/
https://github.com/xmlunit
Example:
public class SomeTest extends XMLTestCase {
#Test
public void test() {
String xml1 = ...
String xml2 = ...
XMLUnit.setIgnoreWhitespace(true); // ignore whitespace differences
// can also compare xml Documents, InputSources, Readers, Diffs
assertXMLEqual(xml1, xml2); // assertXMLEquals comes from XMLTestCase
}
}

The following will check if the documents are equal using standard JDK libraries.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc1 = db.parse(new File("file1.xml"));
doc1.normalizeDocument();
Document doc2 = db.parse(new File("file2.xml"));
doc2.normalizeDocument();
Assert.assertTrue(doc1.isEqualNode(doc2));
normalize() is there to make sure there are no cycles (there technically wouldn't be any)
The above code will require the white spaces to be the same within the elements though, because it preserves and evaluates it. The standard XML parser that comes with Java does not allow you to set a feature to provide a canonical version or understand xml:space if that is going to be a problem then you may need a replacement XML parser such as xerces or use JDOM.

Xom has a Canonicalizer utility which turns your DOMs into a regular form, which you can then stringify and compare. So regardless of whitespace irregularities or attribute ordering, you can get regular, predictable comparisons of your documents.
This works especially well in IDEs that have dedicated visual String comparators, like Eclipse. You get a visual representation of the semantic differences between the documents.

The latest version of XMLUnit can help the job of asserting two XML are equal. Also XMLUnit.setIgnoreWhitespace() and XMLUnit.setIgnoreAttributeOrder() may be necessary to the case in question.
See working code of a simple example of XML Unit use below.
import org.custommonkey.xmlunit.DetailedDiff;
import org.custommonkey.xmlunit.XMLUnit;
import org.junit.Assert;
public class TestXml {
public static void main(String[] args) throws Exception {
String result = "<abc attr=\"value1\" title=\"something\"> </abc>";
// will be ok
assertXMLEquals("<abc attr=\"value1\" title=\"something\"></abc>", result);
}
public static void assertXMLEquals(String expectedXML, String actualXML) throws Exception {
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreAttributeOrder(true);
DetailedDiff diff = new DetailedDiff(XMLUnit.compareXML(expectedXML, actualXML));
List<?> allDifferences = diff.getAllDifferences();
Assert.assertEquals("Differences found: "+ diff.toString(), 0, allDifferences.size());
}
}
If using Maven, add this to your pom.xml:
<dependency>
<groupId>xmlunit</groupId>
<artifactId>xmlunit</artifactId>
<version>1.4</version>
</dependency>

Building on Tom's answer, here's an example using XMLUnit v2.
It uses these maven dependencies
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-core</artifactId>
<version>2.0.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-matchers</artifactId>
<version>2.0.0</version>
<scope>test</scope>
</dependency>
..and here's the test code
import static org.junit.Assert.assertThat;
import static org.xmlunit.matchers.CompareMatcher.isIdenticalTo;
import org.xmlunit.builder.Input;
import org.xmlunit.input.WhitespaceStrippedSource;
public class SomeTest extends XMLTestCase {
#Test
public void test() {
String result = "<root></root>";
String expected = "<root> </root>";
// ignore whitespace differences
// https://github.com/xmlunit/user-guide/wiki/Providing-Input-to-XMLUnit#whitespacestrippedsource
assertThat(result, isIdenticalTo(new WhitespaceStrippedSource(Input.from(expected).build())));
assertThat(result, isIdenticalTo(Input.from(expected).build())); // will fail due to whitespace differences
}
}
The documentation that outlines this is https://github.com/xmlunit/xmlunit#comparing-two-documents

Thanks, I extended this, try this ...
import java.io.ByteArrayInputStream;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
public class XmlDiff
{
private boolean nodeTypeDiff = true;
private boolean nodeValueDiff = true;
public boolean diff( String xml1, String xml2, List<String> diffs ) throws Exception
{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc1 = db.parse(new ByteArrayInputStream(xml1.getBytes()));
Document doc2 = db.parse(new ByteArrayInputStream(xml2.getBytes()));
doc1.normalizeDocument();
doc2.normalizeDocument();
return diff( doc1, doc2, diffs );
}
/**
* Diff 2 nodes and put the diffs in the list
*/
public boolean diff( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( diffNodeExists( node1, node2, diffs ) )
{
return true;
}
if( nodeTypeDiff )
{
diffNodeType(node1, node2, diffs );
}
if( nodeValueDiff )
{
diffNodeValue(node1, node2, diffs );
}
System.out.println(node1.getNodeName() + "/" + node2.getNodeName());
diffAttributes( node1, node2, diffs );
diffNodes( node1, node2, diffs );
return diffs.size() > 0;
}
/**
* Diff the nodes
*/
public boolean diffNodes( Node node1, Node node2, List<String> diffs ) throws Exception
{
//Sort by Name
Map<String,Node> children1 = new LinkedHashMap<String,Node>();
for( Node child1 = node1.getFirstChild(); child1 != null; child1 = child1.getNextSibling() )
{
children1.put( child1.getNodeName(), child1 );
}
//Sort by Name
Map<String,Node> children2 = new LinkedHashMap<String,Node>();
for( Node child2 = node2.getFirstChild(); child2!= null; child2 = child2.getNextSibling() )
{
children2.put( child2.getNodeName(), child2 );
}
//Diff all the children1
for( Node child1 : children1.values() )
{
Node child2 = children2.remove( child1.getNodeName() );
diff( child1, child2, diffs );
}
//Diff all the children2 left over
for( Node child2 : children2.values() )
{
Node child1 = children1.get( child2.getNodeName() );
diff( child1, child2, diffs );
}
return diffs.size() > 0;
}
/**
* Diff the nodes
*/
public boolean diffAttributes( Node node1, Node node2, List<String> diffs ) throws Exception
{
//Sort by Name
NamedNodeMap nodeMap1 = node1.getAttributes();
Map<String,Node> attributes1 = new LinkedHashMap<String,Node>();
for( int index = 0; nodeMap1 != null && index < nodeMap1.getLength(); index++ )
{
attributes1.put( nodeMap1.item(index).getNodeName(), nodeMap1.item(index) );
}
//Sort by Name
NamedNodeMap nodeMap2 = node2.getAttributes();
Map<String,Node> attributes2 = new LinkedHashMap<String,Node>();
for( int index = 0; nodeMap2 != null && index < nodeMap2.getLength(); index++ )
{
attributes2.put( nodeMap2.item(index).getNodeName(), nodeMap2.item(index) );
}
//Diff all the attributes1
for( Node attribute1 : attributes1.values() )
{
Node attribute2 = attributes2.remove( attribute1.getNodeName() );
diff( attribute1, attribute2, diffs );
}
//Diff all the attributes2 left over
for( Node attribute2 : attributes2.values() )
{
Node attribute1 = attributes1.get( attribute2.getNodeName() );
diff( attribute1, attribute2, diffs );
}
return diffs.size() > 0;
}
/**
* Check that the nodes exist
*/
public boolean diffNodeExists( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( node1 == null && node2 == null )
{
diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2 + "\n" );
return true;
}
if( node1 == null && node2 != null )
{
diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2.getNodeName() );
return true;
}
if( node1 != null && node2 == null )
{
diffs.add( getPath(node1) + ":node " + node1.getNodeName() + "!=" + node2 );
return true;
}
return false;
}
/**
* Diff the Node Type
*/
public boolean diffNodeType( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( node1.getNodeType() != node2.getNodeType() )
{
diffs.add( getPath(node1) + ":type " + node1.getNodeType() + "!=" + node2.getNodeType() );
return true;
}
return false;
}
/**
* Diff the Node Value
*/
public boolean diffNodeValue( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( node1.getNodeValue() == null && node2.getNodeValue() == null )
{
return false;
}
if( node1.getNodeValue() == null && node2.getNodeValue() != null )
{
diffs.add( getPath(node1) + ":type " + node1 + "!=" + node2.getNodeValue() );
return true;
}
if( node1.getNodeValue() != null && node2.getNodeValue() == null )
{
diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2 );
return true;
}
if( !node1.getNodeValue().equals( node2.getNodeValue() ) )
{
diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2.getNodeValue() );
return true;
}
return false;
}
/**
* Get the node path
*/
public String getPath( Node node )
{
StringBuilder path = new StringBuilder();
do
{
path.insert(0, node.getNodeName() );
path.insert( 0, "/" );
}
while( ( node = node.getParentNode() ) != null );
return path.toString();
}
}

AssertJ 1.4+ has specific assertions to compare XML content:
String expectedXml = "<foo />";
String actualXml = "<bar />";
assertThat(actualXml).isXmlEqualTo(expectedXml);
Here is the Documentation

Below code works for me
String xml1 = ...
String xml2 = ...
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreAttributeOrder(true);
XMLAssert.assertXMLEqual(actualxml, xmlInDb);

skaffman seems to be giving a good answer.
another way is probably to format the XML using a commmand line utility like xmlstarlet(http://xmlstar.sourceforge.net/) and then format both the strings and then use any diff utility(library) to diff the resulting output files. I don't know if this is a good solution when issues are with namespaces.

I'm using Altova DiffDog which has options to compare XML files structurally (ignoring string data).
This means that (if checking the 'ignore text' option):
<foo a="xxx" b="xxx">xxx</foo>
and
<foo b="yyy" a="yyy">yyy</foo>
are equal in the sense that they have structural equality. This is handy if you have example files that differ in data, but not structure!

I required the same functionality as requested in the main question. As I was not allowed to use any 3rd party libraries, I have created my own solution basing on #Archimedes Trajano solution.
Following is my solution.
import java.io.ByteArrayInputStream;
import java.nio.charset.Charset;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.junit.Assert;
import org.w3c.dom.Document;
/**
* Asserts for asserting XML strings.
*/
public final class AssertXml {
private AssertXml() {
}
private static Pattern NAMESPACE_PATTERN = Pattern.compile("xmlns:(ns\\d+)=\"(.*?)\"");
/**
* Asserts that two XML are of identical content (namespace aliases are ignored).
*
* #param expectedXml expected XML
* #param actualXml actual XML
* #throws Exception thrown if XML parsing fails
*/
public static void assertEqualXmls(String expectedXml, String actualXml) throws Exception {
// Find all namespace mappings
Map<String, String> fullnamespace2newAlias = new HashMap<String, String>();
generateNewAliasesForNamespacesFromXml(expectedXml, fullnamespace2newAlias);
generateNewAliasesForNamespacesFromXml(actualXml, fullnamespace2newAlias);
for (Entry<String, String> entry : fullnamespace2newAlias.entrySet()) {
String newAlias = entry.getValue();
String namespace = entry.getKey();
Pattern nsReplacePattern = Pattern.compile("xmlns:(ns\\d+)=\"" + namespace + "\"");
expectedXml = transletaNamespaceAliasesToNewAlias(expectedXml, newAlias, nsReplacePattern);
actualXml = transletaNamespaceAliasesToNewAlias(actualXml, newAlias, nsReplacePattern);
}
// nomralize namespaces accoring to given mapping
DocumentBuilder db = initDocumentParserFactory();
Document expectedDocuemnt = db.parse(new ByteArrayInputStream(expectedXml.getBytes(Charset.forName("UTF-8"))));
expectedDocuemnt.normalizeDocument();
Document actualDocument = db.parse(new ByteArrayInputStream(actualXml.getBytes(Charset.forName("UTF-8"))));
actualDocument.normalizeDocument();
if (!expectedDocuemnt.isEqualNode(actualDocument)) {
Assert.assertEquals(expectedXml, actualXml); //just to better visualize the diffeences i.e. in eclipse
}
}
private static DocumentBuilder initDocumentParserFactory() throws ParserConfigurationException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(false);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
return db;
}
private static String transletaNamespaceAliasesToNewAlias(String xml, String newAlias, Pattern namespacePattern) {
Matcher nsMatcherExp = namespacePattern.matcher(xml);
if (nsMatcherExp.find()) {
xml = xml.replaceAll(nsMatcherExp.group(1) + "[:]", newAlias + ":");
xml = xml.replaceAll(nsMatcherExp.group(1) + "=", newAlias + "=");
}
return xml;
}
private static void generateNewAliasesForNamespacesFromXml(String xml, Map<String, String> fullnamespace2newAlias) {
Matcher nsMatcher = NAMESPACE_PATTERN.matcher(xml);
while (nsMatcher.find()) {
if (!fullnamespace2newAlias.containsKey(nsMatcher.group(2))) {
fullnamespace2newAlias.put(nsMatcher.group(2), "nsTr" + (fullnamespace2newAlias.size() + 1));
}
}
}
}
It compares two XML strings and takes care of any mismatching namespace mappings by translating them to unique values in both input strings.
Can be fine tuned i.e. in case of translation of namespaces. But for my requirements just does the job.

This will compare full string XMLs (reformatting them on the way). It makes it easy to work with your IDE (IntelliJ, Eclipse), cos you just click and visually see the difference in the XML files.
import org.apache.xml.security.c14n.CanonicalizationException;
import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.c14n.InvalidCanonicalizerException;
import org.w3c.dom.Element;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;
import java.io.IOException;
import java.io.StringReader;
import static org.apache.xml.security.Init.init;
import static org.junit.Assert.assertEquals;
public class XmlUtils {
static {
init();
}
public static String toCanonicalXml(String xml) throws InvalidCanonicalizerException, ParserConfigurationException, SAXException, CanonicalizationException, IOException {
Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
byte canonXmlBytes[] = canon.canonicalize(xml.getBytes());
return new String(canonXmlBytes);
}
public static String prettyFormat(String input) throws TransformerException, ParserConfigurationException, IOException, SAXException, InstantiationException, IllegalAccessException, ClassNotFoundException {
InputSource src = new InputSource(new StringReader(input));
Element document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
Boolean keepDeclaration = input.startsWith("<?xml");
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
writer.getDomConfig().setParameter("xml-declaration", keepDeclaration);
return writer.writeToString(document);
}
public static void assertXMLEqual(String expected, String actual) throws ParserConfigurationException, IOException, SAXException, CanonicalizationException, InvalidCanonicalizerException, TransformerException, IllegalAccessException, ClassNotFoundException, InstantiationException {
String canonicalExpected = prettyFormat(toCanonicalXml(expected));
String canonicalActual = prettyFormat(toCanonicalXml(actual));
assertEquals(canonicalExpected, canonicalActual);
}
}
I prefer this to XmlUnit because the client code (test code) is cleaner.

Using XMLUnit 2.x
In the pom.xml
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-assertj3</artifactId>
<version>2.9.0</version>
</dependency>
Test implementation (using junit 5) :
import org.junit.jupiter.api.Test;
import org.xmlunit.assertj3.XmlAssert;
public class FooTest {
#Test
public void compareXml() {
//
String xmlContentA = "<foo></foo>";
String xmlContentB = "<foo></foo>";
//
XmlAssert.assertThat(xmlContentA).and(xmlContentB).areSimilar();
}
}
Other methods : areIdentical(), areNotIdentical(), areNotSimilar()
More details (configuration of assertThat(~).and(~) and examples) in this documentation page.
XMLUnit also has (among other features) a DifferenceEvaluator to do more precise comparisons.
XMLUnit website

Using JExamXML with java application
import com.a7soft.examxml.ExamXML;
import com.a7soft.examxml.Options;
.................
// Reads two XML files into two strings
String s1 = readFile("orders1.xml");
String s2 = readFile("orders.xml");
// Loads options saved in a property file
Options.loadOptions("options");
// Compares two Strings representing XML entities
System.out.println( ExamXML.compareXMLString( s1, s2 ) );

Since you say "semantically equivalent" I assume you mean that you want to do more than just literally verify that the xml outputs are (string) equals, and that you'd want something like
<foo> some stuff here</foo></code>
and
<foo>some stuff here</foo></code>
do read as equivalent. Ultimately it's going to matter how you're defining "semantically equivalent" on whatever object you're reconstituting the message from. Simply build that object from the messages and use a custom equals() to define what you're looking for.

Retrieving list of Tags from file properties

I would like to retrieve the list of Tags attached to a file in Windows 7 programatically. I am trying to create a mapping of file->tags that I can move across different platforms.
Is anyone aware of a library, or a way to get the 'Tags' values from command line? So far I have only been able to find ways to get basic file attributes such as Author, Date Created, etc.
I am unable to load PowerShell scripts on the computer unfortunately so am not able to make use of those features.
I tried using 'UserDefinedFileAttributeView' but that did not return any values, like so:
private LinkedList<String> windowsGetAllFileTags(File file) {
UserDefinedFileAttributeView fileAttributeView = Files.getFileAttributeView(file.toPath().toAbsolutePath(), UserDefinedFileAttributeView.class);
List<String> allAttributes = null;
try {
allAttributes = fileAttributeView.list();
} catch (IOException e) {
e.printStackTrace();
}
for(String attribute : allAttributes) {
System.out.println("Attribute = " + attribute);
}
return null;
}
An image of the Windows 7 Properties View

There is a Java library written and called as PE/COFF 4J on Github.
import java.io.IOException;
import org.boris.pecoff4j.PE;
import org.boris.pecoff4j.ResourceDirectory;
import org.boris.pecoff4j.ResourceEntry;
import org.boris.pecoff4j.constant.ResourceType;
import org.boris.pecoff4j.io.PEParser;
import org.boris.pecoff4j.io.ResourceParser;
import org.boris.pecoff4j.resources.StringFileInfo;
import org.boris.pecoff4j.resources.StringTable;
import org.boris.pecoff4j.resources.VersionInfo;
import org.boris.pecoff4j.util.ResourceHelper;
public class Main {
public static void main(String[] args) throws IOException {
PE pe = PEParser.parse("C:/windows/system32/notepad.exe");
ResourceDirectory rd = pe.getImageData().getResourceTable();
ResourceEntry[] entries = ResourceHelper.findResources(rd, ResourceType.VERSION_INFO);
for (int i = 0; i < entries.length; i++) {
byte[] data = entries[i].getData();
VersionInfo version = ResourceParser.readVersionInfo(data);
StringFileInfo strings = version.getStringFileInfo();
StringTable table = strings.getTable(0);
for (int j = 0; j < table.getCount(); j++) {
String key = table.getString(j).getKey();
String value = table.getString(j).getValue();
System.out.println(key + " = " + value);
}
}
}
}
Will print:
CompanyName = Microsoft Corporation
FileDescription = Notepad
FileVersion = 6.1.7600.16385 (win7_rtm.090713-1255)
InternalName = Notepad
LegalCopyright = © Microsoft Corporation. All rights reserved.
OriginalFilename = NOTEPAD.EXE
ProductName = Microsoft® Windows® Operating System
ProductVersion = 6.1.7600.16385
If you mention of obtaining tags of images or videos, #Drew Noakes has written Java library called as metadata-extractor for it.
Metadata metadata = ImageMetadataReader.readMetadata(imagePath);
To iterate all values in the file:
for (Directory directory : metadata.getDirectories()) {
for (Tag tag : directory.getTags()) {
System.out.println(tag);
}
}
You can also read specific values from specific directories:
// obtain the Exif SubIFD directory
ExifSubIFDDirectory directory
= metadata.getFirstDirectoryOfType(ExifSubIFDDirectory.class);
// query the datetime tag's value
Date date = directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL);
The library is available for Maven users too.

In Windows PowerShell, you could grab it with a bit of help from PresentationCore.dll:
function Get-ImageTags {
param(
[string]$Path
)
Add-Type -AssemblyName PresentationCore
try {
$FileStream = (Get-Item $Path).Open('Open','Read')
$BitmapFrame = [System.Windows.Media.Imaging.BitmapFrame]::Create($FileStream)
$Tags = #($BitmapFrame.Metadata.Keywords |%{ $_ })
}
catch {
throw
return
}
finally {
if($FileStream){
$FileStream.Dispose()
}
}
return $Tags
}
Then use like:
$Tags = Get-ImageTags -Path path\to\file.jpeg
The $Tags variable will now contain an array of tags

what about Files.getAttribute
I didn't tried that but probably this could work:
Files.getAttribute(Paths.get("/some/dir","file.txt"), "description:tags")

Need to pass test case in QC through Java

Could any one help me in below issue
I want to pass test cases in QC through Java, I used con4j and reached till test sets but I am unable to fetch the test cases under respective test set.
could any one please help me in how to pass test cases in QC through com4j
import com.qc.ClassFactory;
import com.qc.ITDConnection;
import com.qc.ITestLabFolder;
import com.qc.ITestSetFactory;
import com.qc.ITestSetTreeManager;
import com.qc.ITestSetFolder;
import com.qc.IList;
import com.qc.ITSTest;
import com.qc.ITestSet;
import com.qc.ITestFactory;
import com4j.*;
import com4j.stdole.*;
import com4j.tlbimp.*;
import com4j.tlbimp.def.*;
import com4j.tlbimp.driver.*;
import com4j.util.*;
import com4j.COM4J;
import java.util.*;
import com.qc.IRun;
import com.qc.IRunFactory;
public class Qc_Connect {
public static void main(String[] args) {
// TODO Auto-generated method stub
String url="http://abc/qcbin/";
String domain="abc";
String project="xyz";
String username="132222";
String password="Xyz";
String strTestLabPath = "Root\\Test\\";
String strTestSetName = "TestQC";
try{
ITDConnection itd=ClassFactory.createTDConnection();
itd.initConnectionEx(url);
System.out.println("COnnected To QC:"+ itd.connected());
itd.connectProjectEx(domain,project,username,password);
System.out.println("Logged into QC");
//System.out.println("Project_Connected:"+ itd.connected());
ITestSetFactory objTestSetFactory = (itd.testSetFactory()).queryInterface(ITestSetFactory.class);
ITestSetTreeManager objTestSetTreeManager = (itd.testSetTreeManager()).queryInterface(ITestSetTreeManager.class);
ITestSetFolder objTestSetFolder =(objTestSetTreeManager.nodeByPath(strTestLabPath)).queryInterface(ITestSetFolder.class);
IList its1 = objTestSetFolder.findTestSets(strTestSetName, true, null);
//IList ls= objTestSetFolder.findTestSets(strTestSetName, true, null);
System.out.println("No. of Test Set:" + its1.count());
ITestSet tst= (ITestSet) objTestSetFolder.findTestSets(strTestSetName, true, null).queryInterface(ITSTest.class);
System.out.println(tst.name());
//System.out.println( its1.queryInterface(ITestSet.class).name());
/* foreach (ITestSet testSet : its1.queryInterface(ITestSet.class)){
ITestSetFolder tsFolder = (ITestSetFolder)testSet.TestSetFolder;
ITSTestFactory tsTestFactory = (ITSTestFactory)testSet.TSTestFactory;
List tsTestList = tsTestFactory.NewList("");
}*/
/* Com4jObject comObj = (Com4jObject) its1.item(0);
ITestSet tst = comObj.queryInterface(ITestSet.class);
System.out.println("Test Set Name : " + tst.name());
System.out.println("Test Set ID : " + tst.id());
System.out.println("Test Set ID : " + tst.status());
System.out.println("Test Set ID : " );*/
System.out.println(its1.count());
System.out.println("TestSet Present");
Iterator itr = its1.iterator();
System.out.println(itr.hasNext());
while (itr.hasNext())
{
Com4jObject comObj = (Com4jObject) itr.next();
ITestSet sTestSet = comObj.queryInterface(ITestSet.class);
System.out.println(sTestSet.name());
Com4jObject comObj2 = sTestSet.tsTestFactory();
ITestSetFactory test = comObj2.queryInterface(ITestSetFactory.class);
}
// ITSTest tsTest=null;
// tsTest.
//its1.
/* comObj = (Com4jObject) its1.item(1);
ITSTest tst2=comObj.queryInterface(ITSTest.class);*/
// System.out.println( tst2.name());
/* foreach (ITSTest tsTest : tst2)
{
IRun lastRun = (IRun)tsTest.lastRun();
if (lastRun == null)
{
IRunFactory runFactory = (IRunFactory)tsTest.runFactory;
String date = "20160203";
IRun run = (IRun)runFactory.addItem( date);
run.status("Pass");
run.autoPost();
}
}*/
}
catch(Exception e){
e.printStackTrace();
}
}
}

I know the post is quite old. I have to struggle alot in OTA with Java and couldn't get a complete post for solving the issue.
Now i have running code after too much research.
so thought of sharing my code in case someone is looking for help.
Here is complete Solution.
`
ITestFactory sTestFactory = (connection.testFactory())
.queryInterface(ITestFactory.class);
ITest iTest1 = (sTestFactory.item(12081)).queryInterface(ITest.class);
System.out.println(iTest1.execDate());
System.out.println(iTest1.name());
ITestSetFactory sTestSetFactory = (connection.testSetFactory())
.queryInterface(ITestSetFactory.class);
ITestSet sTestSet = (sTestSetFactory.item(1402))
.queryInterface(ITestSet.class);
System.out.println(sTestSet.name() + "\n Test Set ID" + sTestSet.id());
IBaseFactory testFactory1 = sTestSet.tsTestFactory().queryInterface(
IBaseFactory.class);
testFactory1.addItem(iTest1);
System.out.println("Test case has been Added");
System.out.println(testFactory1.newList("").count());
IList tsTestlist = testFactory1.newList("");
ITSTest tsTest;
for (int tsTestIndex = 1; tsTestIndex <= tsTestlist.count(); tsTestIndex++) {
Com4jObject comObj = (Com4jObject) tsTestlist.item(tsTestIndex);
tsTest = comObj.queryInterface(ITSTest.class);
if (tsTest.name().equalsIgnoreCase("[3]TC_OTA_API_Test")) {
System.out.println("Hostname" + tsTest.hostName() + "\n"
+ tsTest.name() + "\n" + tsTest.status());
IRun lastRun = (IRun) tsTest.lastRun();
// IRun lastRun = comObjRun.queryInterface(IRun.class);
// don't update test if it may have been modified by someone
// else
if (lastRun == null) {
System.out.println("I am here last Run = Null");
runFactory = tsTest.runFactory().queryInterface(
IRunFactory.class);
System.out.println(runFactory.newList("").count());
String runName = "TestRun_Automated";
Com4jObject comObjRunForThisTS = runFactory
.addItem(runName);
IRun runObjectForThisTS = comObjRunForThisTS
.queryInterface(IRun.class);
runObjectForThisTS.status("Passed");
runObjectForThisTS.post();
runObjectForThisTS.refresh();
}
}
}
`

Why not build a client to access the REST API instead of passing through the OTA interface?
Once you build a basic client, you can post runs and update their status quite easily.

If you use c#/vb.net this has been easily completed. But you are working on java, I would suggest to provide interface above dlls to deal with operation. This will be much more easier than using com4j.
Similar query, probably following may help you. I would suggest to drop idea of using com4j and use solution provided in thread below which is proven,fail safe and auto-recoverable.
QC API JAR to connect using java
it was always been difficult to use com4j specially for HPQC/ALM. As dlls for QC are faulty and there are memory leaking/allocation problems which crashes dll executions frequently on certain platforms.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing a bank statement PDF - java

Related

validate ArrayList contents against specific set of data

com.fasterxml.jackson.core.JsonParseException: Unexpected character ('\' (code 92)) [Java]

How to compare two XML files with java? [duplicate]

Retrieving list of Tags from file properties

Need to pass test case in QC through Java

Categories

Resources