I parse some xml files using XStream libraries. The result is a map for every file. When I debug, the result map is what I was looking for, but when I go to next line the value of map changes out of the blue! However it doesn't go for the next round in the "for" loop, it contains the information of next file. What can cause it?
public class Debug {
public String path = "E:\\Projects\\svn\\FANRPProduction\\WMS\\src\\main\\resources\\wms\\bpmn\\bp";
public XStream xStream;
public Map<String, List<CallActivity>> bpTpMap;
public void findBPandTP() throws IOException {
File root = new File(path);
File[] xmlFiles = FindAllXMLFiles.recursive(root);
bpTpMap=new HashMap<String, List<CallActivity>>();
for (File xml : xmlFiles) {
if (xml != null) {
xStream = new XStream(new StaxDriver());
xStream.alias("definitions", CallActivity.class);
xStream.registerConverter(new CallActivityConverter());
bpTpMap = (Map) xStream.fromXML(xml);//Here I get correct information. For example "WMS_RBP_OutgoingWeighing"
List<String> bpList = new ArrayList<String>(bpTpMap.keySet()); //Here I see the name of the next file in the path in bpTpMap, which is "WMS_BP_WeighingConfiguration"
}
}
}
}
I would suggest you to debug with the following:
bpTpMap = (Map) xStream.fromXML(xml);//Here I get correct information. For example "WMS_RBP_OutgoingWeighing"
System.out.println(bpTpMap.size());
Set<String> setOfKeys = bpTpMap.keySet();
System.out.println("Initial value of keys:"+bpTpMap);//Here you would see any extra values if the map has it
Related
i'm trying to make a function
reading from specific directory and make a json file with that file's title in directory.
it reads file's title well but when i print out, it overlaps again and again
i need same key name and different value.
is there any way to put a number on key name or make same key?
bullet01.png
{"file":"bullet01.png"}
bullet011.png
{"file":"bullet011.png"}
bullet012.png
{"file":"bullet012.png"}
bullet013.png
{"file":"bullet013.png"}
bullet02.png
{"file":"bullet02.png"}
this is a full code
public void downloadFile(ViewMeta view) throws IOException {
DataSet input = view.getInputDataSet();
HttpServletRequest request = view.getHttpServletRequest();
String filePath = request.getServletContext().getRealPath("/curriculum1.4/filedir");
DataSet output = new DataSet();
File dir = new File(filePath);
String files[] = dir.list();
JSONObject data= new JSONObject();
for(String fn : files) {
System.out.println(fn);
data.put("file", fn);
System.out.println(data);
}
view.setAttribute("file", data);
view.printJSON();
}
this is a setAttribute structure
public void setAttribute(String key, Object val) {
if (this.keyList == null) {
this.keyList = new ArrayList();
}
this.keyList.add(key);
this.request.setAttribute(key, val);
this.request.setAttribute("coreframe.object.keyList", this.keyList);
}
if you want same key and have different value, you can make it as JSONArray format.
[{"file" : "bullet01.png"}, {"file" : "bullet02.png"}, {"file" : "bullet03.png"}]
your code may need to change like this :
.....
JSONArray array = new JSONArray();
for(String fn : files) {
//create json object for each file
JSONObject data= new JSONObject();
System.out.println(fn);
data.put("file", fn);
System.out.println(data);
//put json object into json array
array.put(data);
}
view.setAttribute("file", array);
view.printJSON();
No, the keys in the JSON should be unique. You can try appending numbers at the end of the key "file"
I'm currently having the following issue:
I wish to loop through a set of JSON files. I want to filter out certain JSON files that match a filter. This filter is another JSON object.
MongoDB is able to do this; you give a JSON object as parameter and it will list documents containing the given JSON elements.
I need a flatfile version of this, but I am not able to succeed. I'm using GSON as my JSON library.
Consumes an array of file paths that each contain a JSON string, and a JsonObject that represents the filter rules. Returns a list of file paths that match the filter rules.
public List<String> filter(String[] filePaths, JsonObject rules) throws FileNotFoundException {
final List<String> filtered = new ArrayList<String>();
final Set<Map.Entry<String, JsonElement>> rulesEntries = rules.entrySet();
for (String path : filePaths) {
final Reader reader = new BufferedReader(new InputStreamReader(new FileInputStream(new File(path))));
final JsonObject file = jsonParser.parse(reader).getAsJsonObject();
final Set<Map.Entry<String, JsonElement>> fileEntries = file.entrySet();
if (fileEntries.containsAll(rulesEntries)) filtered.add(path);
}
return filtered;
}
I am new to Hindsight & Hadoop map reduce concept. I am trying to merge multiple XML files to a single XML file using map reduce program. My intention is to merge each XML file into a destination XML file by prepending and appending file name as start and end tag.
For eg. the below XML's should be merged into a single XML shown below
Input XML Files
<xml><a></a></xml>
<xml><b></b></xml>
<xml><c></c></xml>
Output XML File
<xml>
<File1Name><xml><a></a></xml><File2Name>
<File2Name><xml><b></b></xml><File3Name>
<File3Name><xml><c></c></xml><File3Name>
<xml>
Question 1: Is it possible to map a XML file to each mapper and create a key value pair, key as a file name and value as an each XML file prepending and appending file name as start and end tags and reducer to merge all XML's to a single context and output to XML shown above.
Question 2: How can i get file name as key in mapper code?
Answer 1:
I don't suggest sending just a single XML to a mapper unless the files are over 1gb a piece. You can send a list of xml locations to your mapper and then in your mapper code open each location and extract the data into your output.
Answer 2:
If using azure blob storage, you could list all the blobs in a container and assign them to the input split.
How to create your list of InputSplits:
ArrayList<InputSplit> ret = new ArrayList<InputSplit>();
/*Do this for each path we receive. Creates a directory of splits in this order s = input path (S1,1),(s2,1)…(sN,1),(s1,2),(sN,2),(sN,3) etc..
*/
for (int i = numMinNameHashSplits; i <= Math.min(numMaxNameHashSplits,numNameHashSplits–1); i++) {
for (Path inputPath : inputPaths) {
ret.add(new ParseDirectoryInputSplit(inputPath.toString(), i));
System.out.println(i + ” “+inputPath.toString());
}
}
return ret;
}
}
Once the List<InputSplits> is assembled, each InputSplit is handed to a Record Reader class where each Key, Value, pair is read then passed to the map task. The initialization of the recordreader class uses the InputSplit, a string representing the location of a “folder” of invoices in blob storage, to return a list of all blobs within the folder, the blobs variable below. The below Java code demonstrates the creation of the record reader for each hashslot and the resulting list of blobs in that location.
Public class ParseDirectoryFileNameRecordReader
extends RecordReader<IntWritable, Text> {
private int nameHashSlot;
private int numNameHashSlots;
private Path myDir;
private Path currentPath;
private Iterator<ListBlobItem> blobs;
private int currentLocation;
public void initialize(InputSplit split, TaskAttemptContext context)
throws IOException, InterruptedException {
myDir = ((ParseDirectoryInputSplit)split).getDirectoryPath();
//getNameHashSlot tells us which slot this record reader is responsible for
nameHashSlot = ((ParseDirectoryInputSplit)split).getNameHashSlot();
//gets the total number of hashslots
numNameHashSlots = getNumNameHashSplits(context.getConfiguration());
//gets the input credientals to the storage account assigned to this record reader.
String inputCreds = getInputCreds(context.getConfiguration());
//break the directory path to get account name
String[] authComponents = myDir.toUri().getAuthority().split(“#”);
String accountName = authComponents[1].split(“\\.”)[0];
String containerName = authComponents[0];
String accountKey = Utils.returnInputkey(inputCreds, accountName);
System.out.println(“This mapper is assigned the following account:”+accountName);
StorageCredentials creds = new StorageCredentialsAccountAndKey(accountName,accountKey);
CloudStorageAccount account = new CloudStorageAccount(creds);
CloudBlobClient client = account.createCloudBlobClient();
CloudBlobContainer container = client.getContainerReference(containerName);
blobs = container.listBlobs(myDir.toUri().getPath().substring(1) + “/”, true,EnumSet.noneOf(BlobListingDetails.class), null,null).iterator();
currentLocation = –1;
return;
}
Once initialized, the record reader is used to pass the next key to the map task. This is controlled by the nextKeyValue method, and it is called every time map task starts. The blow Java code demonstrates this.
//This checks if the next key value is assigned to this task or is assigned to another mapper. If it assigned to this task the location is passed to the mapper, otherwise return false
#Override
public boolean nextKeyValue() throws IOException, InterruptedException {
while (blobs.hasNext()) {
ListBlobItem currentBlob = blobs.next();
//Returns a number between 1 and number of hashslots. If it matches the number assigned to this Mapper and its length is greater than 0, return the path to the map function
if (doesBlobMatchNameHash(currentBlob) && getBlobLength(currentBlob) > 0) {
String[] pathComponents = currentBlob.getUri().getPath().split(“/”);
String pathWithoutContainer =
currentBlob.getUri().getPath().substring(pathComponents[1].length() + 1);
currentPath = new Path(myDir.toUri().getScheme(), myDir.toUri().getAuthority(),pathWithoutContainer);
currentLocation++;
return true;
}
}
return false;
}
The logic in the map function is than simply as follows, with inputStream containing the entire XML string
Path inputFile = new Path(value.toString());
FileSystem fs = inputFile.getFileSystem(context.getConfiguration());
//Input stream contains all data from the blob in the location provided by Text
FSDataInputStream inputStream = fs.open(inputFile);
Resources:
http://www.andrewsmoll.com/3-hacks-for-hadoop-and-hdinsight-clusters/ "Hack 3"
http://blogs.msdn.com/b/mostlytrue/archive/2014/04/10/merging-small-files-on-hdinsight.aspx
I've got a YAML file that looks like this:
---
name:
storage:
documentfiles:
username: rafa
password: hello
And I'm trying to get the last two username and password values. My current code is the one below. I'm using a Map to store the YAML values, but since there is more than one child when I map.get() anything past name it gives me a null value. if I do map.get(name) I get {storage={documentfiles={username=rafa, password=hello}}} Does anyone know how I can correctly get the username and password?
public Map grabYaml(){
Yaml reader = new Yaml();
InputStream inputStream = getClass().getClassLoader().getResourceAsStream(yamlFileName);
Map map = (Map) reader.load(inputStream);
return map;
}
Something like this
public class Test {
public Map grabYaml() throws IOException {
Yaml reader = new Yaml();
InputStream inputStream = new FileInputStream(new File(yamlFileName));
Map map = (Map) reader.load(inputStream);
return map;
}
public static void main(String[] args) throws IOException {
Map storage = (Map) new Test().grabYaml().get("name");
Map documentfiles = (Map)storage.get("storage");
Map userData = (Map) documentfiles.get("documentfiles");
System.out.println(userData.get("username"));
System.out.println(userData.get("password"));
}
}
I have the follwing XML file -
<?xml version="1.0" encoding="UTF-8"?>
<BatchOrders xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<BatchHeader>
<ServiceProvider>123456789</ServiceProvider>
</BatchHeader>
<OrderDetails>
<MessageType>HelloWorld</MessageType>
<IssueDateTime>22/01/2012 00:00:00</IssueDateTime>
<receivedDateTime>22/01/2012 00:00:00</receivedDateTime>
<Status>TestStatus</Status>
</OrderDetails>
</BatchOrders>
I want to read in the contents and set them to fields I have created. So I have the following code below (not some is omitted - I have just included what I think I need to show. The below is in a test class which I have created - I also have a writer as part of this class that writes an XML File fine to disk as I expect. The problem I am facing is reading the file above and displaying the contents read to the Console just for now.
File myFileRead = null;
FileReader myFileReader = null;
try {
myFileRead = new File("C:/Path/myfile.xml");
myRecord = new myRecord();
myFileReader = new FileReader(myFileRead);
myXPathReader reader = new myXPathReader(myFileReader);
while (reader.hasNext())
{
record = reader.next();
//prints out then to cosole
}
So from above I have the myRecord class where I have the getters/setters for e.g ServiceProvider, etc. I also then have a class for myXpathReader which does the following:
private Document document;
private List batchorders;
private Iterator iterator;
public myXPathReader (Reader myFileReader)
throws Exception
{
SAXBuilder builder = new SAXBuilder();
document = builder.build(myFileReader);
batchorders = new JDOMXPath("//BatchOrders").selectNodes(document);
iterator = batchorders.iterator();
}
public int getSize() { return batchorders.size(); }
public boolean hasNext() { return iterator.hasNext(); }
public myRecord next()
throws Exception {
Element element = (Element) iterator.next();
myRecord record = new myRecord();
record.setServiceProvider((new JDOMXPath("./ServiceProvider").stringValueOf(element)));
//Some more sets ans close class etc...
Now if I debug the code and after the element on iterator.next I can see the file contents have being read in correctly. But on my console the ServiceProvider value and in fact all the values are getting set to empty string "". Am I doing something incorrect on the JDOMXPath in order to pull the value from the XML?
In your example XML ServiceProvider is not a child of BatchOrders, there's another level (BatchHeader) in between. So your second XPath expression should probably be
BatchHeader/ServiceProvider
instead of ./ServiceProvider