How to properly handle comma inside a quoted string using opencsv?

How to properly handle comma inside a quoted string using opencsv? - java

I'm trying to read csv file that contains strings both quoted and not.
If string is quoted, it should save it's quote chars.
Beside that, if string contains comma, it should not be split.
I've tried multiple ways but nothing works as of now.
Current test data:
"field1 (with use of , we lose the other part)",some description
field2,"Dear %s, some text"
Getting 1st field of mapped bean
Expected result:
"field1 (with use of , we lose the other part)"
field2
Current result:
"field1 (with use of
field2
Here is the code:
public class CsvToBeanReaderTest {
#Test
void shouldIncludeDoubleQuotes() {
String testData =
"\"field1 (with use of , we lose the other part)\",some description\n"
+
"field2,\"Dear %s, some text\"";
RFC4180ParserBuilder rfc4180ParserBuilder = new RFC4180ParserBuilder();
rfc4180ParserBuilder.withQuoteChar(ICSVWriter.NO_QUOTE_CHARACTER);
ICSVParser rfc4180Parser = rfc4180ParserBuilder.build();
CSVReaderBuilder builder = new CSVReaderBuilder(new StringReader(testData));
CSVReader reader = builder
.withCSVParser(rfc4180Parser)
.build();
List<TestClass> result = new CsvToBeanBuilder<TestClass>(reader)
.withType(TestClass.class)
.withEscapeChar('\"')
.build()
.parse();
result.forEach(testClass -> System.out.println(testClass.getField1()));
}
private List<TestClass> readTestData(String testData) {
return new CsvToBeanBuilder<TestClass>(new StringReader(testData))
.withType(TestClass.class)
.withSeparator(',')
.withSkipLines(0)
.withIgnoreEmptyLine(true)
.build()
.parse();
}
public static final class TestClass {
#CsvBindByPosition(position = 0)
private String field1;
#CsvBindByPosition(position = 1)
private String description;
public String toCsvFormat() {
return String.join(",",
field1,
description);
}
public String getField1() {
return field1;
}
}
}
I've found out that if I comment or remove rfc4180ParserBuilder.withQuoteChar(ICSVWriter.NO_QUOTE_CHARACTER); the string will be parsed correctly, but I will lose the quote char which should not be lost. Is there any suggestions what can be done? (I would prefer not to switch on other csv libraries)

Related

Camel Exchange body record size

Scenario:
CSV file is sent to my endpoint, Pojo transforms the data for java and message sent to one of my route lets say ("direct:consume") route, then a processor processes the file manipulating the message and creating a new output.
Issue:
file contains only one line the code breaks
file contains multiple lines the code works
Tried:
tried to find a way to determine the amount of record coming in the exchange.getIn().getBody()
read on stackoverflow
read camel documentation about exchange
check java codes for Object/Objects to List conversion without knowing record amount
Code:
public void process(Exchange exchange) throws Exception {
List<Cars> output = new ArrayList<Cars>();
**List<Wehicle> rows = (List<Wehicle>) exchange.getIn().getBody(); <-- Fails**
for (Wehicle row: rows) {
output.add(new Cars(row));
}
exchange.getIn().setBody(output);
exchange.getIn().setHeader("CamelOverruleFileName", "CarEntries.csv");
}
Wehicle
...
#CsvRecord(separator = ",", skipFirstLine = true, crlf = "UNIX")
public class Wehicle {
#DataField(pos = 1)
public String CouponCode;
#DataField(pos = 2)
public String Price;
}
...
Cars
#CsvRecord(separator = ",", crlf = "UNIX", generateHeaderColumns = true)
public class Cars {
#DataField(pos = 1, columnName = "CouponCode")
private String CouponCode;
#DataField(pos = 2, columnName = "Price")
private String Price;
public Cars(Wehicle origin) {
this.CouponCode = Utilities.addQuotesToString(origin.CouponCode);
this.Price = origin.Price;
}
}
Input:
"CouponCode","Price"
"ASD/785", 1900000
"BWM/758", 2000000
Question:
How to create dynamicall a List regardless if i get one object or multiple objects?
-- exchange.getIn().getBody() returns object
How to check the amount of records from camel exchange message ?
-- exchange.getIn().getBody() no size/length method
Any other way of doing this?
Haven't used java for a long time, plus quiet new to camel.

After re checking the official documentation it seems the following changes are solving the issue.
Code:
public void process(Exchange exchange) throws Exception {
List<Cars> output = new ArrayList<Cars>();
List records = exchange.getIn().getBody(List.class);
(List<Wehicle>) rows = (List<Wehicle>) records;
for (Wehicle row: rows) {
output.add(new Cars(row));
}
exchange.getIn().setBody(output);
exchange.getIn().setHeader("CamelOverruleFileName", "CarEntries.csv");
}

Write java bean to Csv Table format

Is there any way to write Java bean to Csv table format using Open Csv ?
What are the other libraries available to achieve this ?

uniVocity-parsers support for conversions to and from java beans is unmatched. Here's a simple example of a class:
public class TestBean {
// if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
#NullString(nulls = {"?", "-"})
// if a value resolves to null, it will be converted to the String "0".
#Parsed(defaultNullRead = "0")
private Integer quantity
#Trim
#LowerCase
#Parsed(index = 4)
private String comments;
// you can also explicitly give the name of a column in the file.
#Parsed(field = "amount")
private BigDecimal value;
#Trim
#LowerCase
// values "no", "n" and "null" will be converted to false; values "yes" and "y" will be converted to true
#BooleanString(falseStrings = {"no", "n", "null"}, trueStrings = {"yes", "y"})
#Parsed
private Boolean pending;
}
Now, to write instances to a file, do this:
Collection<TestBean> beansToWrite = someMethodThatProducesTheObjectYouWant();
File output = new File("/path/to/output.csv");
new CsvRoutines().writeAll(beansToWrite, TestBean.class, output, Charset.forName("UTF-8"));
The library offers many configuration options and ways to achieve what you want. If you find yourself using the same annotations over and over again, just define a meta-annotation. For example, apply a replacement conversion over fields that contain the ` character, instead of declaring this in every single field:
#Parsed
#Replace(expression = "`", replacement = "")
public String fieldA;
#Parsed(field = "BB")
#Replace(expression = "`", replacement = "")
public String fieldB;
#Parsed(index = 4)
#Replace(expression = "`", replacement = "")
public String fieldC;
You can create a meta-annotatin like this:
#Retention(RetentionPolicy.RUNTIME)
#Inherited
#Target({ElementType.FIELD, ElementType.METHOD, ElementType.ANNOTATION_TYPE})
#Replace(expression = "`", replacement = "")
#Parsed
public #interface MyReplacement {
#Copy(to = Parsed.class)
String field() default "";
#Copy(to = Parsed.class, property = "index")
int myIndex() default -1;
And use it in your class like this:
#MyReplacement
public String fieldA;
#MyReplacement(field = "BB")
public String fieldB;
#MyReplacement(myIndex = 4)
public String fieldC;
}
I hope it helps.
Disclaimer: I'm the author of this library, it's open-source and free (Apache V2.0 license)

Java: How to write a generalized function for this?

How to a write a single generalized for these? I mean the function should take parameters and return the desired string.
String fullName = driver.findElement(By.className("full-name")).getText();
String title = driver.findElement(By.className("title")).getText();
String locality = driver.findElement(By.className("locality")).getText();
String industry = driver.findElement(By.className("industry")).getText();
String connections = driver.findElement(By.xpath("//div[#class='member-connections']/strong")).getText();
String profileLink = driver.findElement(By.className("view-public-profile")).getText();
The function should be something like this:
String getInfo(String className, String byType) {
return driver.findElement(By.byType(className)).getText();
}
EDIT:
I have written this function, but I am not sure how to append byType with By.
static String getInfo(WebDriver driver, String byType, String byParam) {
return driver.findElement(By. + byType + (byParam)).getText();
}
Thanks!

This seems way easier than others are answering so I'm going to put my neck on the line. and say, what's wrong with this...
public String get(WebDriver driver, By by) {
return driver.findElement(by).getText();
}
..and using it like...
String a = get(urDriver, By.className(someName));
String b = get(urDriver, By.xpath(somePath));

You may try this:
public String byXpath(String xpath) {
return driver.findElement(By.xpath(xpath)).getText();
}
public String byClass(String $class) {
return driver.findElement(By.className($class)).getText();
}
Edited:
public String by(By by) {
return driver.findElement(by).getText();
}
String x = by(By.className(name));
String y = by(By.xpath(path));

CSV file structure for a java object

I want a CSV format for Order objects. My Order Object will have order details, order line details and item details. Please find the java object below:
Order {
OrderNo, OrderName, Price,
OrderLine {
OrderLineNo, OrderLinePrice,
Item{
ItemNo, ItemName, Item Description
}
}
}
Can anyone please guide me to create csv format for this.

Have a POJO class for your Object for which you want to create CSV file and use java.io.FileWriter to write/append values in csv file. This Java-code-geek Link will help you with this.

If you are feeling adventurous, I'm building support for nested elements in CSV in uniVocity-parsers.
The 2.0.0-SNAPSHOT version supports parsing nested beans with annotations. We are planning to release the final version in a couple of weeks. Writing support has not been implemented yet, so that part you'll have to do manually (should be fairly easy with the current API).
Parsing this sort of structure is more complex, but the parser seems to be working fine for most cases. Have a look at that test case:
Input CSV:
1,Foo
Account,23234,HSBC,123433-000,HSBCAUS
Account,11234,HSBC,222343-130,HSBCCAD
2,BAR
Account,1234,CITI,213343-130,CITICAD
Note that the first column of each row identifies which bean will be read. As "Client" in the CSV matches the class name, you don't need to annotate
Pojos
enum ClientType {
PERSONAL(2),
BUSINESS(1);
int typeCode;
ClientType(int typeCode) {
this.typeCode = typeCode;
}
}
public static class Client {
#EnumOptions(customElement = "typeCode", selectors = { EnumSelector.CUSTOM_FIELD })
#Parsed(index = 0)
private ClientType type;
#Parsed(index = 1)
private String name;
#Nested(identityValue = "Account", identityIndex = 0, instanceOf = ArrayList.class, componentType = ClientAccount.class)
private List<ClientAccount> accounts;
}
public static class ClientAccount {
#Parsed(index = 1)
private BigDecimal balance;
#Parsed(index = 2)
private String bank;
#Parsed(index = 3)
private String number;
#Parsed(index = 4)
private String swift;
}
Code to parse the input
public void parseCsvToBeanWithList() {
final BeanListProcessor<Client> clientProcessor = new BeanListProcessor<Client>(Client.class);
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setLineSeparator("\n");
settings.setRowProcessor(clientProcessor);
CsvParser parser = new CsvParser(settings);
parser.parse(new StringReader(CSV_INPUT));
List<Client> rows = clientProcessor.getBeans();
}
If you find any issue using the parser, please send update this issue

Lucene wrong match

I have a csvfile
id|name
1|PC
2|Activation
3|USB
public class TESTResult
{
private Long id;
private String name;
private Float score;
// with setters & getters
}
public class TEST
{
private Long id;
private String name;
// with setters & getters
}
public class JobTESTTagger {
private static Version VERSION;
private static CharArraySet STOPWORDS;
private static RewriteMethod REWRITEMETHOD;
private static Float MINSCORE = 0.0001F;
static {
BooleanQuery.setMaxClauseCount(100000);
VERSION = Version.LUCENE_44;
STOPWORDS = StopAnalyzer.ENGLISH_STOP_WORDS_SET;
REWRITEMETHOD = MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE;
}
public static ArrayList<TESTResult> searchText(String text, String keyId,
List<TEST> TESTs) {
ArrayList<TESTResult> results = new ArrayList<TESTResult>();
MemoryIndex index = new MemoryIndex();
EnglishAnalyzer englishAnalyzer = new EnglishAnalyzer(VERSION,STOPWORDS);
QueryParser parser = new QueryParser(VERSION, "text", englishAnalyzer);
parser.setMultiTermRewriteMethod(REWRITEMETHOD);
index.addField("text", text, englishAnalyzer);
for (int i = 0; i < TESTs.size(); i++) {
TEST TEST = TESTs.get(i);
String criteria = "\"" + TEST.getName().trim() + "\"";
if (criteria == null || criteria.isEmpty())
continue;
criteria = criteria.replaceAll("\r", " ");
criteria = criteria.replaceAll("\n", " ");
try {
Query query = parser.parse(criteria);
Float score = index.search(query);
if (score > MINSCORE) {
int result = new TESTResult(TEST.getId(), TEST.getName(),score);
results.add(result);
}
} catch (ParseException e) {
System.out.println("Could not parse article.");
}
}
return results;
}
public static void main(String[] args) {
ArrayList<TESTResult> testresults = searchText(text, keyId, iths);
CsvReader reader = new CsvReader("C:\a.csv");
reader.setDelimiter('|');
reader.readHeaders();
List<TEST> result = new ArrayList<TEST>();
while (reader.readRecord()) {
Long id = Long.valueOf(reader.get("id").trim());
String name = reader.get("name").trim();
TEST concept = new TEST(id, name);
result.add(concept);
}
String text = "These activities are good. I have a good PC in my house.";
}
I am matching 'activities' to Activation. How is it possible. Can anybody tell me how Lucene matches the words.
Thanks
R

EnglishAnalyzer, along with most language-specific analyzers, uses a stemmer. This means that it reduces terms to a stem (or root) of the term, in order to attempt to match more loosely. Mostly this works well, removing suffixes and matching up derived words to a common root. So when I search for "fish", I also find "fished", "fishing" and "fishes".
In this case though, both "activities" and "activation" both reduce to the root of "activ", resulting in the match you are seeing. Another example: "organ", "organic" and "organize" all have the common stem "organ".
You can stem or not, neither approach is perfect. If you don't stem you'll miss relevant results. If you do, you'll hit some odd irrelevant results.
To deal with specific problematic cases, you can define a stemmer exclusion set in EnglishAnalyzer to prevent stemming just on those specific problematic terms. In this case, I would think of "activation" as the probable term to prevent stemming on, though you could go either way. So I could do something like:
CharArraySet stemExclusionSet = new CharArraySet(VERSION, 1, true);
stemExclusionSet.add("activation");
EnglishAnalyzer englishAnalyzer = new EnglishAnalyzer(VERSION, STOPWORDS, stemExclusionSet);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to properly handle comma inside a quoted string using opencsv? - java

Related

Camel Exchange body record size

Write java bean to Csv Table format

Java: How to write a generalized function for this?

CSV file structure for a java object

Lucene wrong match

Categories

Resources