Scenario:
CSV file is sent to my endpoint, Pojo transforms the data for java and message sent to one of my route lets say ("direct:consume") route, then a processor processes the file manipulating the message and creating a new output.
Issue:
file contains only one line the code breaks
file contains multiple lines the code works
Tried:
tried to find a way to determine the amount of record coming in the exchange.getIn().getBody()
read on stackoverflow
read camel documentation about exchange
check java codes for Object/Objects to List conversion without knowing record amount
Code:
public void process(Exchange exchange) throws Exception {
List<Cars> output = new ArrayList<Cars>();
**List<Wehicle> rows = (List<Wehicle>) exchange.getIn().getBody(); <-- Fails**
for (Wehicle row: rows) {
output.add(new Cars(row));
}
exchange.getIn().setBody(output);
exchange.getIn().setHeader("CamelOverruleFileName", "CarEntries.csv");
}
Wehicle
...
#CsvRecord(separator = ",", skipFirstLine = true, crlf = "UNIX")
public class Wehicle {
#DataField(pos = 1)
public String CouponCode;
#DataField(pos = 2)
public String Price;
}
...
Cars
#CsvRecord(separator = ",", crlf = "UNIX", generateHeaderColumns = true)
public class Cars {
#DataField(pos = 1, columnName = "CouponCode")
private String CouponCode;
#DataField(pos = 2, columnName = "Price")
private String Price;
public Cars(Wehicle origin) {
this.CouponCode = Utilities.addQuotesToString(origin.CouponCode);
this.Price = origin.Price;
}
}
Input:
"CouponCode","Price"
"ASD/785", 1900000
"BWM/758", 2000000
Question:
How to create dynamicall a List regardless if i get one object or multiple objects?
-- exchange.getIn().getBody() returns object
How to check the amount of records from camel exchange message ?
-- exchange.getIn().getBody() no size/length method
Any other way of doing this?
Haven't used java for a long time, plus quiet new to camel.
After re checking the official documentation it seems the following changes are solving the issue.
Code:
public void process(Exchange exchange) throws Exception {
List<Cars> output = new ArrayList<Cars>();
List records = exchange.getIn().getBody(List.class);
(List<Wehicle>) rows = (List<Wehicle>) records;
for (Wehicle row: rows) {
output.add(new Cars(row));
}
exchange.getIn().setBody(output);
exchange.getIn().setHeader("CamelOverruleFileName", "CarEntries.csv");
}
Related
I am trying to read records from a CSV file and filter the records based on the date. I have implemented this in the following way. But is this a correct way?
The steps are:
Creating pipeline
Read the data from a file
Perform necessary filtering
Create a MapElement Object and convert the OrderRequest to String
Mapping the OrderRequest Entity to String
Write the output to a file
Code:
// Creating pipeline
Pipeline pipeline = Pipeline.create();
// For transformations Reading from a file
PCollection<String> orderRequest = pipeline
.apply(TextIO.read().from("src/main/resources/ST/STCheck/OrderRequest.csv"));
PCollection<OrderRequest> pCollectionTransformation = orderRequest
.apply(ParDo.of(new DoFn<String, OrderRequest>() {
private static final long serialVersionUID = 1L;
#ProcessElement
public void processElement(ProcessContext c) {
String rowString = c.element();
if (!rowString.contains("order_id")) {
String[] strArr = rowString.split(",");
OrderRequest orderRequest = new OrderRequest();
orderRequest.setOrder_id(strArr[0]);
// Condition to check if the
String source1 = strArr[1];
DateTimeFormatter fmt1 = DateTimeFormat.forPattern("mm/dd/yyyy");
DateTime d1 = fmt1.parseDateTime(source1);
System.out.println(d1);
String source2 = "4/24/2017";
DateTimeFormatter fmt2 = DateTimeFormat.forPattern("mm/dd/yyyy");
DateTime d2 = fmt2.parseDateTime(source2);
System.out.println(d2);
orderRequest.setOrder_date(strArr[1]);
System.out.println(strArr[1]);
orderRequest.setAmount(Double.valueOf(strArr[2]));
orderRequest.setCounter_id(strArr[3]);
if (DateTimeComparator.getInstance().compare(d1, d2) > -1) {
c.output(orderRequest);
}
}
}
}));
// Create a MapElement Object and convert the OrderRequest to String
MapElements<OrderRequest, String> mapElements = MapElements.into(TypeDescriptors.strings())
.via((OrderRequest orderRequestType) -> orderRequestType.getOrder_id() + " "
+ orderRequestType.getOrder_date() + " " + orderRequestType.getAmount() + " "
+ orderRequestType.getCounter_id());
// Mapping the OrderRequest Entity to String
PCollection<String> pStringList = pCollectionTransformation.apply(mapElements);
// Now Writing the elements to a file
pStringList.apply(TextIO.write().to("src/main/resources/ST/STCheck/OrderRequestOut.csv").withNumShards(1)
.withSuffix(".csv"));
// To run pipeline
pipeline.run();
System.out.println("We are done!!");
Pojo Class:
public class OrderRequest implements Serializable{
String order_id;
String order_date;
double amount;
String counter_id;
}
Though I am getting the correct result, is this a correct way? My two main problem is
1) How to i access individual columns? So that, I can specify conditions based on that column value.
2) Can we specify headers when reading the data?
Yes, you can process CSV files like this using TextIO.read() provided they do not contain fields embedding newlines and you can recognize/skip the header lines. Your pipeline looks good, though as a minor style issue I would probably have the first ParDo do only the parsing, followed by a Filter that looked at the date to filter things out.
If you want to automatically infer the header lines, you could open read the first line in your main program (using standard java libraries, or Beams FileSystems class) and extract this out manually, passing it into your parsing DoFn.
I agree a more columnar approach would be more natural. We have this in Python as our Dataframes API which is now available for general use. You would write something like
with beam.Pipeline() as p:
df = p | beam.dataframe.io.read_csv("src/main/resources/ST/STCheck/OrderRequest.csv")
filtered = df[df.order_date > limit]
filtered.write_csv("src/main/resources/ST/STCheck/OrderRequestOut.csv")
I'm trying to read csv file that contains strings both quoted and not.
If string is quoted, it should save it's quote chars.
Beside that, if string contains comma, it should not be split.
I've tried multiple ways but nothing works as of now.
Current test data:
"field1 (with use of , we lose the other part)",some description
field2,"Dear %s, some text"
Getting 1st field of mapped bean
Expected result:
"field1 (with use of , we lose the other part)"
field2
Current result:
"field1 (with use of
field2
Here is the code:
public class CsvToBeanReaderTest {
#Test
void shouldIncludeDoubleQuotes() {
String testData =
"\"field1 (with use of , we lose the other part)\",some description\n"
+
"field2,\"Dear %s, some text\"";
RFC4180ParserBuilder rfc4180ParserBuilder = new RFC4180ParserBuilder();
rfc4180ParserBuilder.withQuoteChar(ICSVWriter.NO_QUOTE_CHARACTER);
ICSVParser rfc4180Parser = rfc4180ParserBuilder.build();
CSVReaderBuilder builder = new CSVReaderBuilder(new StringReader(testData));
CSVReader reader = builder
.withCSVParser(rfc4180Parser)
.build();
List<TestClass> result = new CsvToBeanBuilder<TestClass>(reader)
.withType(TestClass.class)
.withEscapeChar('\"')
.build()
.parse();
result.forEach(testClass -> System.out.println(testClass.getField1()));
}
private List<TestClass> readTestData(String testData) {
return new CsvToBeanBuilder<TestClass>(new StringReader(testData))
.withType(TestClass.class)
.withSeparator(',')
.withSkipLines(0)
.withIgnoreEmptyLine(true)
.build()
.parse();
}
public static final class TestClass {
#CsvBindByPosition(position = 0)
private String field1;
#CsvBindByPosition(position = 1)
private String description;
public String toCsvFormat() {
return String.join(",",
field1,
description);
}
public String getField1() {
return field1;
}
}
}
I've found out that if I comment or remove rfc4180ParserBuilder.withQuoteChar(ICSVWriter.NO_QUOTE_CHARACTER); the string will be parsed correctly, but I will lose the quote char which should not be lost. Is there any suggestions what can be done? (I would prefer not to switch on other csv libraries)
I'm doing an excercice where I'm required to filter the amount of crimes per year based on a file that has more than 13 millions of lines (in case that's an important info). For that, I did this and it's working fine:
JavaRDD<String> anoRDD = arquivo.map(s ->
{String[] campos = s.split(";") ;
return campos[2];
});
System.out.println(anoRDD.countByValue());
But, the next question to be answered is "How many "NARCOTIC" crimes happen per YEAR?", I managed to filter the total amount, but not per year, I did the following:
JavaRDD<String> NarcoticsRDD = arquivo.map(s ->
{String[] campos = s.split(";") ;
return campos[4];
});
JavaRDD<String> JustNarcotics = NarcoticsRDD.filter(s -> s.equals("NARCOTICS"));
System.out.println(JustsNarcotics.countByValue());
How can I do this type of filter in Spark using java?
Tks!
So the first thing you would want to do is to map your data to a bean class.
Step 1: Let's create a bean class to represent your data. This should implement serializable and must have public getters and setters.
public class CrimeInfo implements Serializable {
private Integer year;
private String crimeType;
public CrimeInfo(Integer year, String crimeType) {
this.year = year;
this.crimeType = crimeType;
}
public Integer getYear() {
return year;
}
public void setYear(Integer year) {
this.year = year;
}
public String getCrimeType() {
return crimeType;
}
public void setCrimeType(String crimeType) {
this.crimeType = crimeType;
}
}
Step 2: Lets create your data. I just created a dummy dataset here, but you can read from your data source.
List<String> crimes = new ArrayList<>();
crimes.add("1998; Robbery");
crimes.add("1998; Robbery");
crimes.add("1998; Narcotics");
JavaRDD<String> crimesRdd = javaSparkContext().parallelize(crimes);
Step 3: Lets now map it to the bean class
JavaRDD<CrimeInfo> crimeInfoRdd = crimesRdd.map(entry -> {
String[] crimeInfo = entry.split(";");
return new CrimeInfo(Integer.parseInt(crimeInfo[0]), crimeInfo[1]);
});
Step 4: Let's use dataframes to simplify the construct.
Dataset<Row> crimeInfoDataset =
sparkSession.createDataFrame(crimeInfoRdd, CrimeInfo.class);`
Step 5: Lets group by the entities to see the result.
crimeInfoDataset.groupBy("year", "crimeType").count().show(false);
+----+----------+-----+
|year|crimeType |count|
+----+----------+-----+
|1998| Robbery |2 |
|1998| Narcotics|1 |
+----+----------+-----+
If you want to just see activity for few crimeTypes then just use filter on the above dataset.
I'm on RavenDB 3.5.35183. I have a type:
import com.mysema.query.annotations.QueryEntity;
#QueryEntity
public class CountryLayerCount
{
public String countryName;
public int layerCount;
}
and the following query:
private int getCountryLayerCount(String countryName, IDocumentSession currentSession)
{
QCountryLayerCount countryLayerCountSurrogate = QCountryLayerCount.countryLayerCount;
IRavenQueryable<CountryLayerCount> levelDepthQuery = currentSession.query(CountryLayerCount.class, "CountryLayerCount/ByName").where(countryLayerCountSurrogate.countryName.eq(countryName));
CountryLayerCount countryLayerCount = new CountryLayerCount();
try (CloseableIterator<StreamResult<CountryLayerCount>> results = currentSession.advanced().stream(levelDepthQuery))
{
while(results.hasNext())
{
StreamResult<CountryLayerCount> srclc = results.next();
System.out.println(srclc.getKey());
CountryLayerCount clc = srclc.getDocument();
countryLayerCount = clc;
break;
}
}
catch(Exception e)
{
}
return countryLayerCount.layerCount;
}
The query executes successfully, and shows the correct ID for the document I'm retrieving (e.g. "CountryLayerCount/123"), but its data members are both null. The where clause also works fine, the country name is used to retrieve individual countries. This is so simple, but I can't see where I've gone wrong. The StreamResult contains the correct key, but getDocument() doesn't work - or, rather, it doesn't contain an object. The collection has string IDs.
In the db logger, I can see the request coming in:
Receive Request # 29: GET - geodata - http://localhost:8888/databases/geodata/streams/query/CountryLayerCount/ByName?&query=CountryName:Germany
Request # 29: GET - 22 ms - geodata - 200 - http://localhost:8888/databases/geodata/streams/query/CountryLayerCount/ByName?&query=CountryName:Germany
which, when plugged into the browser, correctly gives me:
{"Results":[{"countryName":"Germany","layerCount":5,"#metadata":{"Raven-Entity-Name":"CountryLayerCounts","Raven-Clr-Type":"DbUtilityFunctions.CountryLayerCount, DbUtilityFunctions","#id":"CountryLayerCounts/212","Temp-Index-Score":0.0,"Last-Modified":"2018-02-03T09:41:36.3165473Z","Raven-Last-Modified":"2018-02-03T09:41:36.3165473","#etag":"01000000-0000-008B-0000-0000000000D7","SerializedSizeOnDisk":164}}
]}
The index definition:
from country in docs.CountryLayerCounts
select new {
CountryName = country.countryName
}
AFAIK, one doesn't have to index all the fields of the object to retrieve it in its entirety, right ? In other words, I just need to index the field(s) to find the object, not all the fields I want to retrieve; at least that was my understanding...
Thanks !
The problem is related to incorrect casing.
For example:
try (IDocumentSession sesion = store.openSession()) {
CountryLayerCount c1 = new CountryLayerCount();
c1.layerCount = 5;
c1.countryName = "Germany";
sesion.store(c1);
sesion.saveChanges();
}
Is saved as:
{
"LayerCount": 5,
"CountryName": "Germany"
}
Please notice we use upper case letters in json for property names (this only applies to 3.X versions).
So in order to make it work, please update json properties names + edit your index:
from country in docs.CountryLayerCounts
select new {
CountryName = country.CountryName
}
Btw. If you have per country aggregation, then you can simply query using:
QCountryLayerCount countryLayerCountSurrogate =
QCountryLayerCount.countryLayerCount;
CountryLayerCount levelDepthQuery = currentSession
.query(CountryLayerCount.class, "CountryLayerCount/ByName")
.where(countryLayerCountSurrogate.countryName.eq(countryName))
.single();
I want a CSV format for Order objects. My Order Object will have order details, order line details and item details. Please find the java object below:
Order {
OrderNo, OrderName, Price,
OrderLine {
OrderLineNo, OrderLinePrice,
Item{
ItemNo, ItemName, Item Description
}
}
}
Can anyone please guide me to create csv format for this.
Have a POJO class for your Object for which you want to create CSV file and use java.io.FileWriter to write/append values in csv file. This Java-code-geek Link will help you with this.
If you are feeling adventurous, I'm building support for nested elements in CSV in uniVocity-parsers.
The 2.0.0-SNAPSHOT version supports parsing nested beans with annotations. We are planning to release the final version in a couple of weeks. Writing support has not been implemented yet, so that part you'll have to do manually (should be fairly easy with the current API).
Parsing this sort of structure is more complex, but the parser seems to be working fine for most cases. Have a look at that test case:
Input CSV:
1,Foo
Account,23234,HSBC,123433-000,HSBCAUS
Account,11234,HSBC,222343-130,HSBCCAD
2,BAR
Account,1234,CITI,213343-130,CITICAD
Note that the first column of each row identifies which bean will be read. As "Client" in the CSV matches the class name, you don't need to annotate
Pojos
enum ClientType {
PERSONAL(2),
BUSINESS(1);
int typeCode;
ClientType(int typeCode) {
this.typeCode = typeCode;
}
}
public static class Client {
#EnumOptions(customElement = "typeCode", selectors = { EnumSelector.CUSTOM_FIELD })
#Parsed(index = 0)
private ClientType type;
#Parsed(index = 1)
private String name;
#Nested(identityValue = "Account", identityIndex = 0, instanceOf = ArrayList.class, componentType = ClientAccount.class)
private List<ClientAccount> accounts;
}
public static class ClientAccount {
#Parsed(index = 1)
private BigDecimal balance;
#Parsed(index = 2)
private String bank;
#Parsed(index = 3)
private String number;
#Parsed(index = 4)
private String swift;
}
Code to parse the input
public void parseCsvToBeanWithList() {
final BeanListProcessor<Client> clientProcessor = new BeanListProcessor<Client>(Client.class);
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setLineSeparator("\n");
settings.setRowProcessor(clientProcessor);
CsvParser parser = new CsvParser(settings);
parser.parse(new StringReader(CSV_INPUT));
List<Client> rows = clientProcessor.getBeans();
}
If you find any issue using the parser, please send update this issue