Before anything, the title doesn't convey what I really want to ask.
What I want to know is, how can I make a map, where for several users, it collects their Data and then groups it all together. I'm currently using two lists, one for the users' names and another for their works. I tried using a map.put but it kept overwriting the previous entry. So what I'd like to obtain is as follows;
Desired output:
{user1 = work1, work2, work3 , user2 = work1, work2 , userN = workN}
Current output:
{[user1, user2, user3, user4]=[work1, work2, work3, work4, work5 (user1) , work1 (user2), work1, work2, work3 ( user3 )]}
This is the code that I'm currently using to achieve the above.
private static Map<List<String>, List<String>> repositoriesUserData = new HashMap<>();
private static Set<String> collaboratorNames = new HashSet<>();
public static void main(String[] args) throws Exception {
login();
getCollabs(GITHUB_REPO_NAME);
repositoriesUnderUser();
}
public GitManager(String AUTH, String USERNAME, String REPO_NAME) throws IOException {
this.GITHUB_LOGIN = USERNAME;
this.GITHUB_OAUTH = AUTH;
this.GITHUB_REPO_NAME = REPO_NAME;
this.githubLogin = new GitHubBuilder().withOAuthToken(this.GITHUB_OAUTH, this.GITHUB_LOGIN).build();
this.userOfLogin = this.githubLogin.getUser(GITHUB_LOGIN);
}
public static void login() throws IOException {
new GitManager(GIT_TOKEN, GIT_LOGIN, GITHUB_REPO_NAME);
connect();
}
public static void connect() throws IOException {
if (githubLogin.isCredentialValid()) {
valid = true;
githubLogin.connect(GITHUB_LOGIN, GITHUB_OAUTH);
userOfLogin = githubLogin.getUser(GITHUB_LOGIN);
}
}
public static String getCollabs(String repositoryName) throws IOException {
GHRepository collaboratorsRepository = userOfLogin.getRepository(repositoryName);
collaboratorNames = collaboratorsRepository.getCollaboratorNames();
String collaborators = collaboratorNames.toString();
System.out.println("Collaborators for the following Repository: " + repositoryName + "\nAre: " + collaborators);
String out = "Collaborators for the following Repository: " + repositoryName + "\nAre: " + collaborators;
return out;
}
public static List<String> fillList() {
List<String> collaborators = new ArrayList<>();
collaboratorNames.forEach(s -> {
collaborators.add(s);
});
return collaborators;
}
public static String repositoriesUnderUser() throws IOException {
GHUser user;
List<String> names = new ArrayList<>();
List<String> repoNames = new ArrayList<>();
for (int i = 0; i < fillList().size(); i++) {
user = githubLogin.getUser(fillList().get(i));
Map<String, GHRepository> temp = user.getRepositories();
names.add(user.getLogin());
temp.forEach((c, b) -> {
repoNames.add(b.getName());
});
}
repositoriesUserData.put(names,repoNames);
System.out.println(repositoriesUserData);
return "temporaryReturn";
}
All help is appreciated!
I'll give it a try (code in question still not working for me):
If I understood correctly, you want a Map, that contains the repositories for each user.
So therefore i think the repositoriesUserData should be a Map<String, List<String>.
With that in mind, lets fill the map in each loop-cycle with the user from the lists as key and the list of repository-names as value.
The method would look like this (removed the temporary return and replaced it with void)
public static String repositoriesUnderUser() throws IOException {
for (int i = 0; i < fillList().size(); i++) {
GHUser user = githubLogin.getUser(fillList().get(i));
Map<String, GHRepository> temp = user.getRepositories();
repositoriesUserData.put(user.getLogin(), temp.values().stream().map(GHRepository::getName).collect(Collectors.toList()));
}
return "temporaryReturn";
}
Edit: (Short explanation what is happening in your code)
You are collecting all usernames to the local List names and also adding all repository-names to the local List 'repoNames'.
At the end of the method you put a new entry to your map repositoriesUserData.
That means at the end of the method you just added one single entry to the map where
key = all of the users
value = all of the repositories from the users (because its a list, if two users have the same repository, they are added twice to this list)
Related
I have 2 methods. I am using checkstyle on my code. I have extracted the buildResultWithLabel() from buildResult().
Problematic Code:
final int issueNo = commitMessage.getIssueNumber();
if (processedIssueNumbers.contains(issueNo)) {
return;
}
processedIssueNumbers.add(issueNo);
Before extracting the method, the code was in buildResult(). Now that I have extracted it, it is no longer running in a loop. Before it was a continue; instead of a return;.
ERROR: (coding) ReturnCount: Return count is 1 (max allowed for void
methods/constructors/lambdas is 0).
My code looks like this:
public static Result buildResult(String localRepoPath, String authToken, String remoteRepoPath,
String startRef, String endRef) throws IOException,
GitAPIException {
final Result result = new Result();
final GHRepository remoteRepo = createRemoteRepo(authToken, remoteRepoPath);
final Set<RevCommit> commitsForRelease =
getCommitsBetweenReferences(localRepoPath, startRef, endRef);
commitsForRelease.removeAll(getIgnoredCommits(commitsForRelease));
final Set<Integer> processedIssueNumbers = new HashSet<>();
for (RevCommit commit : commitsForRelease) {
CommitMessage commitMessage = new CommitMessage(commit.getFullMessage());
if (commitMessage.isRevert()) {
System.out.println(commitMessage.getMessage());
commitMessage = new CommitMessage(commitMessage.getRevertedCommitMessage());
}
buildResultWithLabel(remoteRepoPath, result, remoteRepo, commitsForRelease,
processedIssueNumbers,
commit,
commitMessage);
}
return result;
}
private static void buildResultWithLabel(String remoteRepoPath, Result result,
GHRepository remoteRepo,
Set<RevCommit> commitsForRelease,
Set<Integer> processedIssueNumbers, RevCommit commit,
CommitMessage commitMessage) throws IOException {
if (commitMessage.isIssueOrPull()) {
final int issueNo = commitMessage.getIssueNumber();
if (processedIssueNumbers.contains(issueNo)) {
return;
}
processedIssueNumbers.add(issueNo);
final GHIssue issue = remoteRepo.getIssue(issueNo);
if (issue.getState() != GHIssueState.CLOSED) {
result.addWarning(String.format(MESSAGE_NOT_CLOSED, issueNo, issue.getTitle(),
remoteRepoPath, issueNo));
}
final String issueLabel = getIssueLabelFrom(issue);
if (issueLabel.isEmpty()) {
final String error = String.format(MESSAGE_NO_LABEL,
issueNo,
Arrays.stream(Constants.ISSUE_LABELS)
.collect(Collectors.joining(SEPARATOR)),
remoteRepoPath, issueNo);
result.addError(error);
}
final List<GHLabel> releaseLabels = getAllIssueLabels(issue);
if (releaseLabels.size() > 1) {
final String error = String.format(MESSAGE_MORE_THAN_ONE_RELEASE_LABEL,
issueNo,
Arrays.stream(Constants.ISSUE_LABELS)
.collect(Collectors.joining(SEPARATOR)),
remoteRepoPath, issueNo);
result.addError(error);
}
final Set<RevCommit> issueCommits = getCommitsForIssue(commitsForRelease, issueNo);
final String authors = getAuthorsOf(issueCommits);
final ReleaseNotesMessage releaseNotesMessage =
new ReleaseNotesMessage(issue, authors);
result.putReleaseNotesMessage(issueLabel, releaseNotesMessage);
}
else {
// Commits that have messages which do not contain issue or pull number
final String commitShortMessage = commit.getShortMessage();
final String author = commit.getAuthorIdent().getName();
final ReleaseNotesMessage releaseNotesMessage =
new ReleaseNotesMessage(commitShortMessage, author);
result.putReleaseNotesMessage(Constants.MISCELLANEOUS_LABEL, releaseNotesMessage);
}
}
"Single return path permitted" is an opinion about code readability, and you've discovered that strictly applying it can make your code much less readable by preventing early-exit when you calculate that you don't need to do a bunch of work.
The best option is to change your Checkstyle policy and turn off this rule. If you absolutely can't (a non-programmer manager has decided on "best policies"), then you can extract the remainder of your method into another private method and if (!contains) { longProcess(); }.
I'm using ZK and I have this code that works me statically
<zscript>
<![CDATA[
List tipo_servicios = new ArrayList();
List tipo_servicios_enc = new ArrayList();
DTO.Tiposervicio tipo_servicios_select;
DTO.Tiposervicio tiposervicio = new DTO.Tiposervicio();
tiposervicio.setId(1);
tiposervicio.setName("Mustang");
tiposervicio.setDescripcion("New Mustang 2018");
tiposervicio.setEstatus('A');
tipo_servicios.add(tiposervicio);
void buscarTipoServicios()
{
if (keywordBox.getValue() != null && !keywordBox.getValue().trim().equals(""))
{
tipo_servicios_enc.clear();
for (DTO.Tiposervicio tipo_serv : tipo_servicios)
{
if (tipo_serv.getName().toLowerCase().contains(keywordBox.getValue().trim().toLowerCase()) || tipo_serv.getName().toLowerCase().contains(keywordBox.getValue().trim().toLowerCase()))
{
tipo_servicios_enc.add(tipo_serv);
}
}
binder.loadAll();
}
}
]]>
</zscript>
It's a search engine
void buscarTipoServicios()
And I have in my service package my next code that is used to load my array from the database
public class ConsultarTipoServicio extends SelectorComposer
{
private List<Tiposervicio> listaTipoServicio;
private TiposervicioJpaController tipoServicioJpaController;
public ConsultarTipoServicio() throws Exception
{
EntityManagerFactory emf =Persistence.createEntityManagerFactory("ProyectoLabIIPU");
tipoServicioJpaController=new TiposervicioJpaController(emf);
listaTipoServicio= tipoServicioJpaController.findTiposervicioEntities();
}
public List<Tiposervicio> getlistaTipoServicio()
{
return listaTipoServicio;
}
}
I want somehow to assign to my
List tipo_servicios = new ArrayList();
The array already loaded from
getlistaTypeServicio ()
I'm trying something like this but it gives me error
List tipo_servicios = Servicios.ConsultarTipoServicios.getlistaTipoServicio();
I solved it this way
consultar = new Servicios.ConsultarTipoServicio();
List tipo_servicios = consultar.getlistaTipoServicio();
List tipo_servicios_enc = new ArrayList();
DTO.Tiposervicio tipo_servicios_select;
Given a directory, my application traverses and loads .mdb MS Access dbs using the Jackcess API. Inside of each database, there is a table named GCMT_CMT_PROPERTIES with a column named cmt_data containing some text. I also have a Mapper object (which essentially resembles a Map<String,String> but allows duplicate keys) which I use as a dictionary when replacing a certain word from a string.
So for example if mapper contains fox -> dog then the sentence: "The fox jumps" becomes "The dog jumps".
The design I'm going with for this program is as follows:
1. Given a directory, traverse all subdirectories and load all .mdb files into a File[].
2. For each db file in File[], create a Task<Void> called "TaskMdbUpdater" and pass it the db file.
3. Dispatch and run each task as it is created (see 2. above).
TaskMdbUpdater is responsible for locating the appropriate table and column in the db file it was given and iteratively running a "find & replace" routine on each row of the table to detect words from the dictionary and replace them (as shown in example above) and finally updating that row before closing the db. Each instance of TaskMdbUpdater is a background thread with a Jackcess API DatabaseBuilder assigned to it, so it is able to manipulate the db.
In the current state, the code is running without throwing any exceptions whatsoever, however when I "manually" open the db through Access and inspect a given row, it appears to not have changed. I've tried to pin the source of the issue without any luck and would appreciate any support. If you need to see more code, let me know and I'll update my question accordingly.
public class TaskDatabaseTaskDispatcher extends Task<Void> {
private String parentDir;
private String dbFileFormat;
private Mapper mapper;
public TaskDatabaseTaskDispatcher(String parent, String dbFileFormat, Mapper mapper) {
this.parentDir = parent;
this.dbFileFormat = dbFileFormat;
this.mapper = mapper;
}
#Override
protected Void call() throws Exception {
File[] childDirs = getOnlyDirectories(getDirectoryChildFiles(new File(this.parentDir)));
DatabaseBuilder[] dbs = loadDatabasesInParent(childDirs);
Controller.dprint("TaskDatabaseTaskDispatcher", dbs.length + " databases were found in parent directory");
TaskMdbUpdater[] tasks = new TaskMdbUpdater[dbs.length];
Thread[] workers = new Thread[dbs.length];
for(int i=0; i<dbs.length; i++) {
// for each db, dispatch Task so a worker can update that db.
tasks[i] = new TaskMdbUpdater(dbs[i], mapper);
workers[i] = new Thread(tasks[i]);
workers[i].setDaemon(true);
workers[i].start();
}
return null;
}
private DatabaseBuilder[] loadDatabasesInParent(File[] childDirs) throws IOException {
DatabaseBuilder[] dbs = new DatabaseBuilder[childDirs.length];
// Traverse children and load dbs[]
for(int i=0; i<childDirs.length; i++) {
File dbFile = FileUtils.getFileInDirectory(
childDirs[i].getCanonicalFile(),
childDirs[i].getName() + this.dbFileFormat);
dbs[i] = new DatabaseBuilder(dbFile);
}
return dbs;
}
}
// StringUtils class, utility methods
public class StringUtils {
public static String findAndReplace(String str, Mapper mapper) {
String updatedStr = str;
for(int i=0; i<mapper.getMappings().size(); i++) {
updatedStr = updatedStr.replaceAll(mapper.getMappings().get(i).getKey(), mapper.getMappings().get(i).getValue());
}
return updatedStr;
}
}
// FileUtils class, utility methods:
public class FileUtils {
/**
* Returns only directories in given File[].
* #param list
* #return
*/
public static File[] getOnlyDirectories(File[] list) throws IOException, NullPointerException {
List<File> filteredList = new ArrayList<>();
for(int i=0; i<list.length; i++) {
if(list[i].isDirectory()) {
filteredList.add(list[i]);
}
}
File[] correctSizeFilteredList = new File[filteredList.size()];
for(int i=0; i<filteredList.size(); i++) {
correctSizeFilteredList[i] = filteredList.get(i);
}
return correctSizeFilteredList;
}
/**
* Returns a File[] containing all children under specified parent file.
* #param parent
* #return
*/
public static File[] getDirectoryChildFiles(File parent) {
return parent.listFiles();
}
}
public class Mapper {
private List<aMap> mappings;
public Mapper(List<aMap> mappings) {
this.mappings = mappings;
}
/**
* Returns mapping dictionary, typically used for extracting individual mappings.
* #return List of type aMap
*/
public List<aMap> getMappings() {
return mappings;
}
public void setMappings(List<aMap> mappings) {
this.mappings = mappings;
}
}
/**
* Represents a single String based K -> V mapping.
*/
public class aMap {
private String[] mapping; // [0] - key, [1] - value
public aMap(String[] mapping) {
this.mapping = mapping;
}
public String getKey() {
return mapping[0];
}
public String getValue() {
return mapping[1];
}
public String[] getMapping() {
return mapping;
}
public void setMapping(String[] mapping) {
this.mapping = mapping;
}
}
Update 1:
To verify my custom StringUtils.findAndReplace logic, I've performed the following unit test (in JUnit) which is passing:
#Test
public void simpleReplacementTest() {
// Construct a test mapper/dictionary
List<aMap> aMaps = new ArrayList<aMap>();
aMaps.add(new aMap(new String[] {"fox", "dog"})); // {K, V} = K -> V
Mapper mapper = new Mapper(aMaps);
// Perform replacement
String corpus = "The fox jumps";
String updatedCorpus = StringUtils.findAndReplace(corpus, mapper);
assertEquals("The dog jumps", updatedCorpus);
}
I'm including my TaskMdbUpdater class here separately with some logging code included, as I suspect point of failure lies somewhere in call:
/**
* Updates a given .mdb database according to specifications defined internally.
* #since 2.2
*/
public class TaskMdbUpdater extends Task<Void> {
private final String TABLE_NAME = "GCMT_CMT_PROPERTIES";
private final String COLUMN_NAME = "cmt_data";
private DatabaseBuilder dbPackage;
private Mapper mapper;
public TaskMdbUpdater(DatabaseBuilder dbPack, Mapper mapper) {
super();
this.dbPackage = dbPack;
this.mapper = mapper;
}
#Override
protected Void call() {
try {
// Controller.dprint("TaskMdbUpdater", "Worker: " + Thread.currentThread().getName() + " running");
// Open db and extract Table
Database db = this.dbPackage
.open();
Logger.debug("Opened database: {}", db.getFile().getName());
Table table = db.getTable(TABLE_NAME);
Logger.debug("Opening table: {}", table.getName());
Iterator<Row> tableRows = table.iterator();
// Controller.dprint("TaskMdbUpdater", "Updating database: " + db.getFile().getName());
int i=0;
try {
while( tableRows.hasNext() ) {
// Row is basically a<code> Map<Column_Name, Value> </code>
Row cRow = tableRows.next();
Logger.trace("Current row: {}", cRow);
// Controller.dprint(Thread.currentThread().getName(), "Database name: " + db.getFile().getName());
// Controller.dprint("TaskMdbUpdater", "existing row: " + cRow.toString());
String str = cRow.getString(COLUMN_NAME);
Logger.trace("Row {} column field contents (before find/replace): {}", i, str);
String newStr = performFindAndReplaceOnString(str);
Logger.trace("Row {} column field contents (after find/replace): {}", i, newStr);
cRow.put(COLUMN_NAME, newStr);
Logger.debug("Updating field in row {}", i);
Row newRow = table.updateRow(cRow); // <code>updateRow</code> returns the new, updated row. Ignoring this.
Logger.debug("Calling updateRow on table with modified row");
// Controller.dprint("TaskMdbUpdater", "new row: " + newRow.toString());
i++;
Logger.trace("i = {}", i);
}
} catch(NoSuchElementException e) {
// e.printStackTrace();
Logger.error("Thread has iterated past number of rows in table", e);
}
Logger.info("Iterated through {} rows in table {}", i, table.getName());
db.close();
Logger.debug("Closing database: {}", db.getFile().getName());
} catch (Exception e) {
// e.printStackTrace();
Logger.error("An error occurred while attempting to update row value", e);
}
return null;
}
/**
* #see javafx.concurrent.Task#failed()
*/
#Override
protected void failed() {
super.failed();
Logger.error("Task failed");
}
#Override
protected void succeeded() {
Logger.debug("Task succeeded");
}
private String performFindAndReplaceOnString(String str) {
// Logger.trace("OLD: [" + str + "]");
String updatedStr = null;
for(int i=0; i<mapper.getMappings().size(); i++) {
// loop through all parameter names in mapper to search for in str.
updatedStr = findAndReplace(str, this.mapper);
}
// Logger.trace("NEW: [" + updatedStr + "]");
return updatedStr;
}
}
Here's a small exerept from my log. As you can see, it doesn't seem to do anything after opening the table which has left me a bit perplexed:
INFO (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Located the following directories under specified MOIS parent which contains an .mdb file:
[01_Parent_All_Safe_Test[ RV_DMS_0041RV_DMS_0001RV_DMS_0003RV_DMS_0005RV_DMS_0007RV_DMS_0012RV_DMS_0013RV_DMS_0014RV_DMS_0016RV_DMS_0017RV_DMS_0018RV_DMS_0020RV_DMS_0023RV_DMS_0025RV_DMS_0028RV_DMS_0029RV_DMS_0031RV_DMS_0033RV_DMS_0034RV_DMS_0035RV_DMS_0036RV_DMS_0038RV_DMS_0039RV_DMS_0040 ]]
...
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Created new task: NAMEMAP.logic.TaskMdbUpdater#4cfe46fe
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Created new worker: Thread[Thread-22,5,main]
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Set worker Thread[Thread-22,5,main] as daemon
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Dispatching worker: Thread[Thread-22,5,main]
...
DEBUG (16-02-2017 17:28:00) [Thread-22] NAMEMAP.logic.TaskMdbUpdater.call(): Opened database: RV_DMS_0023.mdb
DEBUG (16-02-2017 17:28:00) [Thread-22] NAMEMAP.logic.TaskMdbUpdater.call(): Opening table: GCMT_CMT_PROPERTIES
After this point, there isn't any more entries entries in the log and the processor spikes at 100% load, remaining that way until I force kill the application. This could mean the program gets stuck in an infinite while loop - however if that were to be the case then shouldn't there be log entries in the file?
Update 2
Okay I've further narrowed the problem by printing log TRACE into stdio. It seems that my performFindAndReplaceOnString is super inefficient and it never gets past the first row of these dbs because it's just grinding away at the long string. Any suggestions on how I can efficiently perform a string replacement for this use case?
I am a simple web crawler that is built using the building blocks of crawler4j. I am trying to build a dictionary as my crawler crawls and then pass it to my main (controller) as it builds and parses text. How can I do this since my MyCrawler object isn't created in my main class (uses MyCrawler.class as first parameter)? Also, I am unable to change the controller.start method. I want to be able to use the dictionary created in the crawler after the crawler has finished.
The best way I can think to do it is have controller.start take a predefined and created MyCrawler object, but there is no way to do this, that I can see.
Below is my code. Thank you very much for you help!
Crawler:
public class MyCrawler extends WebCrawler
{
private final static Pattern FILTERS = Pattern.compile(".*(\\.(css|js|gif|jpg|png|mp3|mp3|zip|gz))$");
public ArrayList<String> dictionary = new ArrayList<String>();
#Override public boolean shouldVisit(Page referringPage, WebURL url)
{
String href = url.getURL().toLowerCase();
return !FILTERS.matcher(href).matches()
&& href.startsWith("http://lyle.smu.edu/~fmoore"));
}
#Override public void visit(Page page)
{
String url = page.getWebURL().getURL();
System.out.println("URL: " + url);
if(page.getParseData() instanceof HtmlParseData)
{
HtmlParseData h = (HtmlParseData)page.getParseData();
String text = h.getText();
String[] words = text.split(" ");
for(int i = 0;i < words.length;i++)
{
if(!words[i].equals("") || !words[i].equals(null) || !words[i].equals("\n"))
dictionary.add(words[i]);
}
String html = h.getHtml();
Set<WebURL> links = h.getOutgoingUrls();
System.out.println("Text length: " + text.length());
System.out.println("Html length: " + html.length());
System.out.println("Number of outgoing links: " + links.size());
System.out.println(text);
}
}
}
Controller:
public class Controller
{
public ArrayList<String> dictionary = new ArrayList<String>();
public static void main(String[] args) throws Exception
{
int numberOfCrawlers = 1;
String crawlStorageFolder = "/data/crawl/root";
CrawlConfig c = new CrawlConfig();
c.setCrawlStorageFolder(crawlStorageFolder);
c.setMaxDepthOfCrawling(-1); //Unlimited Depth
c.setMaxPagesToFetch(-1); //Unlimited Pages
c.setPolitenessDelay(200); //Politeness Delay
PageFetcher pf = new PageFetcher(c);
RobotstxtConfig robots = new RobotstxtConfig();
RobotstxtServer rs = new RobotstxtServer(robots, pf);
CrawlController controller = new CrawlController(c, pf, rs);
controller.addSeed("http://lyle.smu.edu/~fmoore");
controller.start(MyCrawler.class, numberOfCrawlers);
controller.shutdown();
controller.waitUntilFinish();
}
}
Let a WebCrawlerFactory create your MyCrawler objects. This should do the trick (at least since version 4.2). However your dictionary should support concurrent access (a simple ArrayList does not!)
// use a factory, instead of supplying the crawler type to pass the dictionary
controller.start(new WebCrawlerFactory<MyCrawler>() {
#Override
public MyCrawler newInstance() throws Exception {
return new MyCrawler(dictionary);
}
}, numberOfCrawlers);
I'm creating a RDD in 1st part of the application, then converting it to a list using rdd.collect().
But for some reason the list size is coming as 0 in the second part of the application , while the RDD from which I'm creating the list is not empty.Even rdd.toArray() is giving empty list.
Below is my program.
public class Query5kPids implements Serializable{
List<String> ListFromS3 = new ArrayList<String>();
public static void main(String[] args) throws JSONException, IOException, InterruptedException, URISyntaxException {
SparkConf conf = new SparkConf();
conf.setAppName("Spark-Cassandra Integration");
conf.set("spark.cassandra.connection.host", "12.16.193.19");
conf.setMaster("yarn-cluster");
SparkConf conf1 = new SparkConf().setAppName("SparkAutomation").setMaster("yarn-cluster");
Query5kPids app1 = new Query5kPids(conf1);
app1.run1(file);
Query5kPids app = new Query5kPids(conf);
System.out.println("Both RDD has been generated");
app.run();
}
private void run() throws JSONException, IOException, InterruptedException {
JavaSparkContext sc = new JavaSparkContext(conf);
query(sc);
sc.stop();
}
private void run1(File file) throws JSONException, IOException, InterruptedException {
JavaSparkContext sc = new JavaSparkContext(conf);
getData(sc,file);
sc.stop();
}
private void getData(JavaSparkContext sc, File file) {
JavaRDD<String> Data = sc.textFile(file.toString());
System.out.println("RDD Count is " + Data.count());
// here it prints some count value
ListFromS3 = Data.collect();
// ListFromS3 = Data.toArray();
}
private void query(JavaSparkContext sc) {
System.out.println("RDD Count is " + ListFromS3.size());
// Prints 0
// So cant convert the list to RDD
JavaRDD<String> rddFromGz = sc.parallelize(ListFromS3);
}
}
NOTE -> In the actual program , the RDD and List is of type.
List<UserSetGet> ListFromS3 = new ArrayList<UserSetGet>();
JavaRDD<UserSetGet> Data = new ....
where UserSetGet is a Pojo , With Setter and getter methods, and its Serializable.
app1.run1 puts the RDD contents into app1.ListFromS3. Then you look at app.ListFromS3, which is empty. app1.ListFromS3 and app.ListFromS3 are fields on two different objects. Setting one does not set the other.
I think you meant ListFromS3 to be static, meaning it belongs to the Query5kPids class, not to a particular instance. Like this:
static List<String> ListFromS3 = new ArrayList<String>();