Finding unique elements between two 2 TSV Files

Finding unique elements between two 2 TSV Files - java

Hi guys I am stuck with one of my problem assignments. I have tried different approaches but I'm still not able to do it.

I think this will do the trick. Process the old file first, then overwrite existing ones from the new file.
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
...
public class ContactsProcessor {
public static void main(String[] args) {
List<String> contactsNew = Files.readAllLines(Paths.get("contactsNew.tsv"));
List<String> contactsOld = Files.readAllLines(Paths.get("contactsOld.tsv"));
List<String> contactsGmail = new ArrayList<String>();
Map<String, String> gmailMap = new HashMap<String, String>();
// process old contacts first -- add to a Map
for (String info : contactsOld) {
String[] parts = info.split("\\t");
if (info.endsWith("#gmail.com")) {
gmailMap.put(parts[0], info);
}
}
// process new contacts second -- add to a Map, overwriting old contacts with same name
for (String info : contactsNew) {
String[] parts = info.split("\\t");
if (info.endsWith("#gmail.com")) {
gmailMap.put(parts[0], info);
}
}
contactsGmail.addAll(gmailMap.values());
Collections.sort(contactsGmail);
Files.write(Paths.get("contactsGmail.tsv"), contactsGmail);
}
}

Related

Converting Shape File to RDF document, in Java

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.URI;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import javax.management.AttributeChangeNotification;
import org.apache.jena.datatypes.xsd.XSDDatatype;
import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.rdf.model.Property;
import org.apache.jena.rdf.model.RDFNode;
import org.apache.jena.rdf.model.RDFReaderI;
import org.apache.jena.rdf.model.Resource;
import org.apache.jena.rdf.model.Statement;
import org.apache.jena.rdf.model.StmtIterator;
import org.apache.jena.riot.Lang;
import org.apache.jena.riot.RDFDataMgr;
import org.apache.jena.riot.system.StreamRDFWriter;
import org.apache.jena.vocabulary.VCARD;
import org.geotools.data.DataStore;
import org.geotools.data.DataStoreFinder;
import org.geotools.data.DataUtilities;
import org.geotools.data.FeatureSource;
import org.geotools.data.FileDataStore;
import org.geotools.data.FileDataStoreFinder;
import org.geotools.data.Query;
import org.geotools.data.ServiceInfo;
import org.geotools.data.shapefile.ShapefileDataStore;
import org.geotools.data.simple.SimpleFeatureCollection;
import org.geotools.data.simple.SimpleFeatureIterator;
import org.geotools.data.simple.SimpleFeatureSource;
import org.geotools.feature.FeatureCollection;
import org.geotools.feature.FeatureIterator;
import org.geotools.swing.data.JFileDataStoreChooser;
import org.opengis.feature.ComplexAttribute;
import org.opengis.feature.simple.SimpleFeature;
import org.opengis.feature.simple.SimpleFeatureType;
import org.opengis.feature.type.FeatureType;
import org.opengis.filter.Filter;
public class ShpToRdf {
public static void main(String[] args) throws IOException {
ArrayList<String> names = new ArrayList<String>();
ArrayList<String> values = new ArrayList<String>();
File file = JFileDataStoreChooser.showOpenFile("shp", null);
if (file == null) {
return;
}
FileDataStore myData = FileDataStoreFinder.getDataStore(file);
SimpleFeatureSource source = myData.getFeatureSource();
SimpleFeatureType schema = source.getSchema();
Query query = new Query(schema.getTypeName());
query.setMaxFeatures(100);
Model model = ModelFactory.createDefaultModel();
String shpURI = "http://www.shp.fake/";
Resource shapeFile = model.createResource(shpURI);
FeatureCollection<SimpleFeatureType, SimpleFeature> collection = source.getFeatures(query);
try (FeatureIterator<SimpleFeature> features = collection.features()) {
while (features.hasNext()) {
SimpleFeature feature = features.next();
model.setNsPrefix("shp", shpURI);
for (org.opengis.feature.Property attribute : feature.getProperties()) {
names.add(attribute.getName().toString());
values.add(attribute.getValue().toString());
}
}
}
ArrayList<Integer> ids = new ArrayList<Integer>();
for(int i=0; i<names.size();i++) {
if (names.get(i).equals("Id")) {
ids.add(i);
}
}
Property features = model.createProperty(shpURI,"features");
for(int i = 0; i<ids.size();i++) {
Property id = model.createProperty(shpURI,names.get(ids.get(i)));
shapeFile = model.createResource(shpURI)
.addProperty(features, model.createResource()
.addProperty(id,model.createResource()
.addProperty(id, values.get(ids.get(i)))
.addProperty(features, "feature1")
.addProperty(features, "feature2")
.addProperty(features, "feature3")));
}
RDFDataMgr.write(System.out, model, Lang.RDFXML);
}
}
I am trying to create an application that converts Shape File(shp) to RDF.
The problem is that I can get two ArrayLists from the shp. The one has the names of the values (id,name,geometry etc.), and the other has the values.
To create the RDF, I have to match each Id with the matching values(ex. Id =1 has name = road 1, geometry = line etc.)
Could you help me with this?
Thank you!

I think you should be able to do this by tweaking the following bit of logic
for (org.opengis.feature.Property attribute : feature.getProperties()) {
names.add(attribute.getName().toString());
values.add(attribute.getValue().toString());
}
Instead of putting them in two lists, you can put them in a list of pairs. This way when you iterate over the list, you know the mapping between the subject and object.
It should look something similar to
List<Pair<String, Integer>> contentList = new ArrayList<Pair<String, String>>();
for (org.opengis.feature.Property attribute : feature.getProperties()) {
Pair<String, Integer> subjectObjectPairs = new Pair<String, String>(attribute.getName().toString(), attribute.getValue().toString());
contentList.add(subjectObjectPairs);
}
I'm not sure what the ids ArrayList is for, but you could move that logic into the for loop above to make sure you're only getting identifiers.

How to run particular Test step of soapUi in java

I want to run particular testStep of my testcase of soap ui using java code. My problem is when I try to run at test step level it need argument of TestCase runner which is anonymous inner type and TestCaseRunContext which is interface. Do I have to implement both to run the same? if yes can please any sample how to do that??
here's my code
package com.testauto.soaprunner.soap.impl;
import java.sql.Timestamp;
import java.util.ArrayList;
import java.util.Date;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.eviware.soapui.SoapUI;
import com.eviware.soapui.StandaloneSoapUICore;
import com.eviware.soapui.impl.wsdl.WsdlProject;
import com.eviware.soapui.impl.wsdl.WsdlTestSuite;
import com.eviware.soapui.impl.wsdl.testcase.WsdlTestCase;
import com.eviware.soapui.impl.wsdl.testcase.WsdlTestCaseRunner;
import com.eviware.soapui.impl.wsdl.teststeps.WsdlTestStep;
import com.eviware.soapui.model.TestPropertyHolder;
import com.eviware.soapui.model.iface.MessageExchange;
import com.eviware.soapui.model.propertyexpansion.PropertyExpansionUtils;
import com.eviware.soapui.model.testsuite.TestCase;
import com.eviware.soapui.model.testsuite.TestCaseRunContext;
import com.eviware.soapui.model.testsuite.TestProperty;
import com.eviware.soapui.model.testsuite.TestStepResult;
import com.eviware.soapui.model.testsuite.TestSuite;
import com.eviware.soapui.support.types.StringToObjectMap;
import com.eviware.soapui.support.types.StringToStringsMap;
import com.testauto.soaprunner.data.InputData;
import com.testauto.soaprunner.data.ReportData;
public class RunTestImpl{
static Logger logger = LoggerFactory.getLogger(RunTestImpl.class);
List<ReportData> reportDatList=new ArrayList<ReportData>();
public List<ReportData> process(Map<String, String> readDataMap, InputData input, Map<List<String>, String> configurationMap, List<String> configuration, WsdlTestSuite testSuite)
{
List<ReportData> report = new ArrayList<ReportData>();
logger.info("Into the Class for running test cases");
try{
report= getTestSuite(readDataMap,input,configurationMap,configuration,testSuite);
}
catch(Exception e)
{
logger.info(e.getMessage());
}
return report;
}
private List<ReportData> getTestSuite(Map<String, String> readDataMap, InputData input, Map<List<String>, String> configurationMap, List<String> configuration, WsdlTestSuite testSuite) throws Exception {
ReportData report=new ReportData();
logger.info("Into the Class for running test cases");
String suiteName = "";
String reportStr = "";
List<String> testCaseNameList= setPropertyValues(readDataMap,input);
WsdlTestCaseRunner runner = null;
List<TestSuite> suiteList = new ArrayList<TestSuite>();
List<TestCase> caseList = new ArrayList<TestCase>();
SoapUI.setSoapUICore(new StandaloneSoapUICore(true));
System.out.println("testcase name "+ configurationMap.get(configuration));
// WsdlTestCase testCase= testSuite.getTestCaseByName(input.getApiName()+"_"+testCaseName+"_TestCase");
WsdlTestCase testCase= testSuite.getTestCaseByName("my_TESTCASE");
WsdlTestStep tesStep=testCase.getTestStepByName(configurationMap.get(testCaseNameList));
System.out.println("test case name:"+testCase.getName());
report.setTestCase(testCase.getName());
suiteList.add(testSuite);
runner= tesStep.run(?,?);
return reportDatList;
}
private List<String> setPropertyValues(Map<String, String> readDataMap, InputData input) {
String testCaseName="";
TestPropertyHolder holder = PropertyExpansionUtils.getGlobalProperties();
List<String> dataConfigurationList=new ArrayList<String>();
Iterator entries = readDataMap.entrySet().iterator();
while (entries.hasNext()) {
Entry thisEntry = (Entry) entries.next();
String key = (String) thisEntry.getKey();
String value = (String) thisEntry.getValue();
testCaseName+=key;
holder.setPropertyValue(key, holder.getPropertyValue(key));
dataConfigurationList.add(key);
}
System.out.println("testCaseName"+testCaseName);
return dataConfigurationList;
}
}
}

After trying different things I got something like this.
TestCaseRunContext context = new MockTestRunContext(new MockTestRunner(testStep.getTestCase()), testStep);
MockTestRunner runner = new MockTestRunner(testStep.getTestCase());
TestStepResult testStepResult= testStep.run(runner, context);
I don't know how it works this trick worked for me. if someone know the reason behind this please share

Spark - Extract line after String match and save it in ArrayList

I am new to spark and trying to extract a line which contains "Subject:" and save it in an arraylist. I am not facing any error but the array list is empty. Can you please guide me where am i going wrong? or the best way to do this?
import java.util.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.VoidFunction;
public final class extractSubject {
public static void main(String[] args) {
SparkConf sparkConf = new SparkConf().setMaster("local[1]").setAppName("JavaBookExample");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
JavaRDD<String> sample = sc.textFile("/Users/Desktop/sample.txt");
final ArrayList<String> list = new ArrayList<>();
sample.foreach(new VoidFunction<String>(){
public void call(String line) {
if (line.contains("Subject:")) {
System.out.println(line);
list.add(line);
}
}}
);
System.out.println(list);
sc.stop();
}
}

Please keep in mind that Spark applications run distributed and in parallel. Therefore you cannot modify variables outside of functions that are executed by Spark.
Instead you need to return a result from these functions. In your case you need flatMap (instead of foreach that has no result), which concatenates collections that are returned as result of your function.
If a line matches a list that contains the matching line is returned, otherwise you return an empty list.
To print the data in the main function, you first have to gather the possibly distributed data in your master node, by calling collect().
Here an example:
import java.util.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
public final class extractSubject {
public static void main(String[] args) {
SparkConf sparkConf = new SparkConf().setMaster("local[1]").setAppName("JavaBookExample");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
//JavaRDD<String> sample = sc.textFile("/Users/Desktop/sample.txt");
JavaRDD<String> sample = sc.parallelize(Arrays.asList("Subject: first",
"nothing here",
"Subject: second",
"dummy"));
JavaRDD<String> subjectLinesRdd = sample.flatMap(new FlatMapFunction<String, String>() {
public Iterable<String> call(String line) {
if (line.contains("Subject:")) {
return Collections.singletonList(line); // line matches → return list with the line as its only element
} else {
return Collections.emptyList(); // ignore line → return empty list
}
}
});
List<String> subjectLines = subjectLinesRdd.collect(); // collect values from Spark workers
System.out.println(subjectLines); // → "[Subject: first, Subject: second]"
sc.stop();
}
}

How to display current accumulator value updated in DStream?

I am processing a java jar. The accumulator adds up the stream values. The problem is, I want to display the value in my UI every time it increments or in a specific periodic interval.
But, Since the accumulators value can only be got from the Driver program, I am not able to access this value until the process finishes its execution. any idea on how i can access this value periodically?
My code is as given below
package com.spark;
import java.util.HashMap;
import java.util.Map;
import org.apache.spark.Accumulator;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.kafka.KafkaUtils;
import scala.Tuple2;
public class KafkaSpark {
/**
* #param args
*/
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("Simple Application");
conf.setMaster("local");
JavaStreamingContext jssc = new JavaStreamingContext(conf,
new Duration(5000));
final Accumulator<Integer> accum = jssc.sparkContext().accumulator(0);
Map<String, Integer> topicMap = new HashMap<String, Integer>();
topicMap.put("test", 1);
JavaPairDStream<String, String> lines = KafkaUtils.createStream(jssc,
"localhost:2181", "group1", topicMap);
JavaDStream<Integer> map = lines
.map(new Function<Tuple2<String, String>, Integer>() {
public Integer call(Tuple2<String, String> v1)
throws Exception {
if (v1._2.contains("the")) {
accum.add(1);
return 1;
}
return 0;
}
});
map.print();
jssc.start();
jssc.awaitTermination();
System.out.println("*************" + accum.value());
System.out.println("done");
}
}
I am streaming data using Kafka.

In spark only when jssc.star() is called the actual code starts to execute. Now the control is with spark it starts to run the loop, all you system.out.println will be called only once. and will not be executed with the loop everytime.
For out put operations check the documentation
you can either use
print()
forEachRDD()
save as object text or hadoop file
Hope this helps

jssc.start();
while(true) {
System.out.println("current:" + accum.value());
Thread.sleep(1000);
}

Error in write ArrayList String to File

I am trying to write an arraylist string list to a file. The arraylist string is actually a string converted from twitter JSON and I am trying to write the tweet text into the file.
However, I keep getting this error:
Exception in thread "main" java.lang.NullPointerException
at java.io.Writer.write(Unknown Source)
at kr.ac.uos.datamining.test.main(test.java:32)
The code for the whole class are below:
package kr.ac.uos.datamining;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.*;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import kr.ac.uos.datamining.JSONParser;
import kr.ac.uos.datamining.Tweet;
import kr.ac.uos.datamining.User;
public class test {
public static List <String> list = new ArrayList<String>();
public static void main(String[] args) throws IOException, FileNotFoundException, InterruptedException, SQLException {
JSONParser j = new JSONParser(new File("D:/curl-7.32.0/samsunggalaxy-01-23-2014.txt"));
ArrayList<Tweet> tweets = j.getTweets();
for(Tweet tweet : tweets){
list.add(tweet.getText());
}
FileWriter writer = new FileWriter("D:/samsunggalaxy.txt");
for (String tweet: list) {
Line 32 writer.write(tweet);
}
writer.close();
}
}
Since it is said as Unknown Source, is it the problem with the String tweet: list line?
I tried to change it to String str: list but its not working as well

The only way that you got a NullPoineterException is that your text is null, so validate what you want to write before writing it.
for (String tweet: list) {
if(tweet != null || !tweet.equals("")) {
writer.write(tweet);
}
}

Seems one of your tweet objects is null. This is the problem here.

It seems the String tweet is null, so I'd check your Tweet.getText() method to ensure it never returns null.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding unique elements between two 2 TSV Files - java

Hi guys I am stuck with one of my problem assignments. I have tried different approaches but I'm still not able to do it.

Related

Converting Shape File to RDF document, in Java

How to run particular Test step of soapUi in java

Spark - Extract line after String match and save it in ArrayList

How to display current accumulator value updated in DStream?

Error in write ArrayList String to File

Categories

Resources