PIG Custom loader's getNext() is being called again and again

PIG Custom loader's getNext() is being called again and again - java

I have started working with Apache Pig for one of our projects. I have to create a custom input format to load our data files. For this, I followed this example Hadoop:Custom Input format. I also created my custom RecordReader implementation to read the data (we get our data in binary format from some other application) and parse that to proper JSON format.
The problem occurs when I use my custom loader in Pig script. As soon as my loader's getNext() method is invoked, it calls my custom RecordReader's nextKeyValue() method, which works fine. It reads the data properly, passes it back to my loader which parses the data and returns a Tuple. So far so good.
The problem arises when my loader's getNext() method is called again and again. It gets called, works fine, and returns the proper output (I debugged it till return statement). But then, instead of letting the execution go further, my loader gets called again. I tried to see the number of times my loader is called, and I could see the number go till 20K!
Can somebody please help me understand the problem in my code?
Loader
public class SimpleTextLoaderCustomFormat extends LoadFunc {
protected RecordReader in = null;
private byte fieldDel = '\t';
private ArrayList<Object> mProtoTuple = null;
private TupleFactory mTupleFactory = TupleFactory.getInstance();
#Override
public Tuple getNext() throws IOException {
Tuple t = null;
try {
boolean notDone = in.nextKeyValue();
if (!notDone) {
return null;
}
String value = (String) in.getCurrentValue();
byte[] buf = value.getBytes();
int len = value.length();
int start = 0;
for (int i = 0; i < len; i++) {
if (buf[i] == fieldDel) {
readField(buf, start, i);
start = i + 1;
}
}
// pick up the last field
readField(buf, start, len);
t = mTupleFactory.newTupleNoCopy(mProtoTuple);
mProtoTuple = null;
} catch (InterruptedException e) {
int errCode = 6018;
String errMsg = "Error while reading input";
e.printStackTrace();
throw new ExecException(errMsg, errCode,
PigException.REMOTE_ENVIRONMENT, e);
}
return t;
}
private void readField(byte[] buf, int start, int end) {
if (mProtoTuple == null) {
mProtoTuple = new ArrayList<Object>();
}
if (start == end) {
// NULL value
mProtoTuple.add(null);
} else {
mProtoTuple.add(new DataByteArray(buf, start, end));
}
}
#Override
public InputFormat getInputFormat() throws IOException {
//return new TextInputFormat();
return new CustomStringInputFormat();
}
#Override
public void setLocation(String location, Job job) throws IOException {
FileInputFormat.setInputPaths(job, location);
}
#Override
public void prepareToRead(RecordReader reader, PigSplit split)
throws IOException {
in = reader;
}
Custom InputFormat
public class CustomStringInputFormat extends FileInputFormat<String, String> {
#Override
public RecordReader<String, String> createRecordReader(InputSplit arg0,
TaskAttemptContext arg1) throws IOException, InterruptedException {
return new CustomStringInputRecordReader();
}
}
Custom RecordReader
public class CustomStringInputRecordReader extends RecordReader<String, String> {
private String fileName = null;
private String data = null;
private Path file = null;
private Configuration jc = null;
private static int count = 0;
#Override
public void close() throws IOException {
// jc = null;
// file = null;
}
#Override
public String getCurrentKey() throws IOException, InterruptedException {
return fileName;
}
#Override
public String getCurrentValue() throws IOException, InterruptedException {
return data;
}
#Override
public float getProgress() throws IOException, InterruptedException {
return 0;
}
#Override
public void initialize(InputSplit genericSplit, TaskAttemptContext context)
throws IOException, InterruptedException {
FileSplit split = (FileSplit) genericSplit;
file = split.getPath();
jc = context.getConfiguration();
}
#Override
public boolean nextKeyValue() throws IOException, InterruptedException {
InputStream is = FileSystem.get(jc).open(file);
StringWriter writer = new StringWriter();
IOUtils.copy(is, writer, "UTF-8");
data = writer.toString();
fileName = file.getName();
writer.close();
is.close();
System.out.println("Count : " + ++count);
return true;
}
}

Try this in Loader
//....
boolean notDone = ((CustomStringInputFormat)in).nextKeyValue();
//...
Text value = new Text(((CustomStringInputFormat))in.getCurrentValue().toString())

Related

How we can read the request body in a filter without affecting the original request in java?

(Java ver. 8)
I need to process the request body in a filter. Using the below code, I read the body.
private static String convertInputStreamToString(InputStream is) throws IOException {
ByteArrayOutputStream result = new ByteArrayOutputStream();
byte[] buffer = new byte[1024 * 50];
int length;
while ((length = is.read(buffer)) != -1) {
result.write(buffer, 0, length);
}
return result.toString("UTF-8");
}
The issue is if there are parameters posted by request body with the content type "application/x-www-form-urlencoded", then the parameters won't be available after reading the body. They are available to get using request.getParameter(), if I don't read the body.
Moreover, I tried using the below code to wrap the request and provide the body, so it would be available to the rest of the solution (e.g. servlets), but the issue with losing the parameters happens yet. code is copied/adopted from this post
public class RequestWrapper extends HttpServletRequestWrapper {
private final String body;
public RequestWrapper(HttpServletRequest request) throws IOException {
super(request);
body = convertInputStreamToString(request.getInputStream());
}
private static String convertInputStreamToString(InputStream is) throws IOException {
ByteArrayOutputStream result = new ByteArrayOutputStream();
byte[] buffer = new byte[1024 * 50];
int length;
while ((length = is.read(buffer)) != -1) {
result.write(buffer, 0, length);
}
return result.toString("UTF-8");
}
#Override
public ServletInputStream getInputStream() throws IOException {
final byte[] myBytes = body.getBytes("UTF-8");
ServletInputStream servletInputStream = new ServletInputStream() {
private int lastIndexRetrieved = -1;
private ReadListener readListener = null;
#Override
public boolean isFinished() {
return (lastIndexRetrieved == myBytes.length - 1);
}
#Override
public boolean isReady() {
return isFinished();
}
#Override
public void setReadListener(ReadListener readListener) {
this.readListener = readListener;
if (!isFinished()) {
try {
readListener.onDataAvailable();
} catch (IOException e) {
readListener.onError(e);
}
} else {
try {
readListener.onAllDataRead();
} catch (IOException e) {
readListener.onError(e);
}
}
}
#Override
public int read() throws IOException {
int i;
if (!isFinished()) {
i = myBytes[lastIndexRetrieved + 1];
lastIndexRetrieved++;
if (isFinished() && (readListener != null)) {
try {
readListener.onAllDataRead();
} catch (IOException ex) {
readListener.onError(ex);
throw ex;
}
}
return i;
} else {
return -1;
}
}
};
return servletInputStream;
}
#Override
public BufferedReader getReader() throws IOException {
return new BufferedReader(new InputStreamReader(this.getInputStream()));
}
}

I tried to run the code you mentioned you're using and I think the accepted answer may not solve your issue as it's quite old. Seems you also need to overwrite the getParameter, getParameterMap and getParameterValues methods. I tried to do that based on this answer from the same post and seems it works. Here is the code:
public class MultiReadHttpServletRequest extends HttpServletRequestWrapper {
private ByteArrayOutputStream cachedBytes;
private String body;
private Map<String, String[]> parameterMap;
public MultiReadHttpServletRequest(HttpServletRequest request) throws IOException {
super(request);
parameterMap = super.getParameterMap();
cacheBodyAsString();
System.out.println("The Body read into a String is: " + body);
}
#Override
public ServletInputStream getInputStream() throws IOException {
if (cachedBytes == null)
cacheInputStream();
return new CachedServletInputStream(cachedBytes.toByteArray());
}
#Override
public BufferedReader getReader() throws IOException {
return new BufferedReader(new InputStreamReader(getInputStream()));
}
#Override
public String getParameter(String key) {
Map<String, String[]> parameterMap = getParameterMap();
String[] values = parameterMap.get(key);
return values != null && values.length > 0 ? values[0] : null;
}
#Override
public String[] getParameterValues(String key) {
Map<String, String[]> parameterMap = getParameterMap();
return parameterMap.get(key);
}
#Override
public Map<String, String[]> getParameterMap() {
return parameterMap;
}
private void cacheInputStream() throws IOException {
// Cache the inputstream in order to read it multiple times
cachedBytes = new ByteArrayOutputStream();
byte[] buffer = new byte[1024 * 50];
int length;
InputStream is = super.getInputStream();
while ((length = is.read(buffer)) != -1) {
cachedBytes.write(buffer, 0, length);
}
}
private void cacheBodyAsString() throws IOException {
ByteArrayOutputStream result = new ByteArrayOutputStream();
byte[] buffer = new byte[1024 * 50];
int length;
InputStream is = getInputStream();
while ((length = is.read(buffer)) != -1) {
result.write(buffer, 0, length);
}
body = result.toString("UTF-8");
}
}
public class CachedServletInputStream extends ServletInputStream {
private final ByteArrayInputStream buffer;
public CachedServletInputStream(byte[] contents) {
this.buffer = new ByteArrayInputStream(contents);
}
#Override
public int read() {
return buffer.read();
}
#Override
public boolean isFinished() {
return buffer.available() == 0;
}
#Override
public boolean isReady() {
return true;
}
#Override
public void setReadListener(ReadListener listener) {
throw new RuntimeException("Not implemented");
}
}
This is just a sample implementation. I highly recommend to follow the steps specified in the answer mentioned above as it seems to be newer and it also ensures that the parameters are being read from both body and query string. My code is just a sample sketch to see if it works as expected.

Thank you #zaerymoghaddam for helping with this.
I was concerning if I am affecting the request object implicitly, so the rest of the solution is lacking something in it.
Moreover, I found that parameterMap = super.getParameterMap(); is not icluding the parameters from body (in case of post with content type of "application/x-www-form-urlencoded")
With a little bit of change of your code I came up with below solution:
public class MyRequestWrapper extends HttpServletRequestWrapper {
private ByteArrayOutputStream cachedBytes;
private String body;
private Map<String, String[]> parameterMap;
private static int bufferLength = 1024 * 50;
public MyRequestWrapper(final HttpServletRequest request) throws IOException {
super(request);
cacheBodyAsString();
parameterMap = new HashMap<>(super.getParameterMap());
addParametersFromBody();
}
#Override
public ServletInputStream getInputStream() throws IOException {
return new CachedServletInputStream(cachedBytes.toByteArray());
}
#Override
public BufferedReader getReader() throws IOException {
return new BufferedReader(new InputStreamReader(this.getInputStream()));
}
public String GetRequestBodyAsString() {
return this.body;
}
#Override
public String getParameter(String key) {
Map<String, String[]> parameterMap = getParameterMap();
String[] values = parameterMap.get(key);
return values != null && values.length > 0 ? values[0] : null;
}
#Override
public String[] getParameterValues(String key) {
Map<String, String[]> parameterMap = getParameterMap();
return parameterMap.get(key);
}
#Override
public Map<String, String[]> getParameterMap() {
return parameterMap;
}
private void cacheInputStream() throws IOException {
cachedBytes = new ByteArrayOutputStream();
byte[] buffer = new byte[bufferLength];
int length;
InputStream is = super.getInputStream();
while ((length = is.read(buffer)) != -1) {
cachedBytes.write(buffer, 0, length);
}
}
private void cacheBodyAsString() throws IOException {
if (cachedBytes == null)
cacheInputStream();
this.body = cachedBytes.toString("UTF-8");
}
private void addParametersFromBody() {
if(this.body == null || this.body.isEmpty())
return;
String[] params = this.body.split("&");
String[] value = new String[1];
for (String param : params) {
String key = param.split("=")[0];
value[0] = param.split("=")[1];
parameterMap.putIfAbsent(key, value);
}
}
class CachedServletInputStream extends ServletInputStream {
private final ByteArrayInputStream buffer;
public CachedServletInputStream(byte[] contents) {
this.buffer = new ByteArrayInputStream(contents);
}
#Override
public int read() {
return buffer.read();
}
#Override
public boolean isFinished() {
return buffer.available() == 0;
}
#Override
public boolean isReady() {
return true;
}
#Override
public void setReadListener(ReadListener listener) {
throw new RuntimeException("Not implemented");
}
}
}

Strangely HttpServletRequest content may only be read once. It comes as a stream so once you read the stream it is gone. So you need some wrapper that allows you multiple reads. Spring actually provides such wrapper. The name of the class is ContentCachingRequestWrapper. Here its Javadoc. Here is the answer that explains how to use it if you work with Spring boot: How to get request body params in spring filter?

invoke a get/set static operation from two different classes

I have 3 Classes: Regulate, Luminosity, Test
From the class Regulate, I which to setting an attribute in the class Luminosity by invoking the method setAttribute
Then in class Test, I calling the method getAttribute.
The problem is, When I calling the method getAttribute, I find a different value that I set it.
This is the Class Luminosity
public class Luminosity{
public static int attribute;
public static int getAttribute(){
return attribute;
}
public static void setAttribute(int v) {
attribute=v;
try {
File fichier = new File("../../WorkspaceSCA/Lamp/value.txt");
PrintWriter pw = new PrintWriter(new FileWriter(fichier)) ;
String ch=Integer.toString(attribute);
pw.append(ch);
pw.println();
pw.close();
}catch (Exception e) {
e.printStackTrace();
}
}
}
the Regulate Code:
public class Regulate {
public static void main(String[] args) throws InterruptedException {
Luminosity.setSensedValue(50));
System.out.println("Value of Luminosity= "+ Luminosity.getSensedValue());
}
}
this shows me: Value of Luminosity= 50
Now, I want to recover this value from a different class(Test), like this:
public class Test {
public static void main(String[] args) throws InterruptedException {
System.out.println("Value = "+ Luminosity.getSensedValue());
this shows me: Value= 0
I want to recover the same value.
Thank's in advance

You are start two different classes in two different threads.
Of course Luminosity doesn't have previous value, it was setting in different JVM.
If you want to setup an attribute and transfer it between two threads you can place it in a text file.
public class Luminosity {
private static final String FILE_NAME = "attribute.txt";
private int attribute;
public void writeAttribute(int val) throws IOException {
try (FileWriter fileWriter = new FileWriter(FILE_NAME)) {
fileWriter.append("" + val);
fileWriter.flush();
}
attribute = val;
}
public int readAttribute() throws IOException {
StringBuilder sb = new StringBuilder();
try (FileReader fileReader = new FileReader(FILE_NAME)) {
int r;
while (true) {
char[] buffer = new char[100];
r = fileReader.read(buffer);
if (r == -1) break;
sb.append(new String(Arrays.copyOf(buffer, r)));
}
} catch (FileNotFoundException e) {
return 0;
}
if (sb.length() == 0) return 0;
return Integer.parseInt(sb.toString());
}
public static void main(String[] args) throws IOException {
Luminosity luminosity = new Luminosity();
System.out.println("attribute after start: " + luminosity.readAttribute());
luminosity.writeAttribute(50);
System.out.println("new attribute: " + luminosity.readAttribute());
}
}

Get progress information during JAXB de-/serialization

Is there a way to register some progress monitor on JAXB Marshaller and Unmarshaller?
I would like to show some progress information in my GUI while data is de-/serialized.
I see that you can set a Unmarshaller.Listener and Marshaller.Listener, which have a "before" and "after" method. Nevertheless, I do not see any straight forward way to get the total number of elements to serialize.
I would need that obviously to calculate some "percentage done" info.

Is it ok to parse before unmarshalling?
If so, assuming you have a list of objects, you could do something like...
final String tagName = *** name of tag you are counting ***;
InputStream in = *** stream of your xml ***;
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser saxParser = spf.newSAXParser();
final AtomicInteger counter = new AtomicInteger();
saxParser.parse(in, new DefaultHandler() {
#Override
public void startElement (String uri, String localName, String qName, Attributes attributes) {
if (localName.equals(tagName))
counter.incrementAndGet();
}
});

Would doing a more low-level approach by leveraging on the InputStream be an acceptable solution?
E.g.
import java.io.IOException;
import java.io.InputStream;
import java.util.function.DoubleConsumer;
public class InputStreamWithProgressDecorator extends InputStream {
/** Input stream to be decorated */ private final InputStream inputStream;
/** Amount of byte read */ private long position = 0L;
/** File size */ private final long length;
/** Mark */ private int mark = 0;
/** Consumer of the progress */ private final DoubleConsumer callBack;
public InputStreamWithProgressDecorator(final InputStream is, final long l, final DoubleConsumer cb) {
inputStream = is;
length = l;
callBack = cb;
}
private void setPosition(final long fp) {
position = fp;
callBack.accept(getProgress());
}
public double getProgress() {
return length == 0L ? 100d : ((double) position) * 100d / ((double) length);
}
public long getPosition() {
return position;
}
#Override
public int read(byte[] b) throws IOException {
final int rc = inputStream.read(b);
setPosition(position + rc);
return rc;
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
final int rc = inputStream.read(b, off, len);
setPosition(position + rc);
return rc;
}
#Override
public byte[] readAllBytes() throws IOException {
final byte[] result = inputStream.readAllBytes();
setPosition(position + result.length);
return result;
}
#Override
public byte[] readNBytes(int len) throws IOException {
final byte[] result = inputStream.readNBytes(len);
setPosition(position + result.length);
return result;
}
#Override
public int readNBytes(byte[] b, int off, int len) throws IOException {
final int rc = inputStream.readNBytes(b, off, len);
setPosition(position + rc);
return rc;
}
#Override
public long skip(long n) throws IOException {
final long rc = inputStream.skip(n);
setPosition(position + rc);
return rc;
}
#Override
public int available() throws IOException {
return inputStream.available();
}
#Override
public void close() throws IOException {
inputStream.close();
}
#Override
public synchronized void mark(int readlimit) {
inputStream.mark(readlimit);
mark = readlimit;
}
#Override
public synchronized void reset() throws IOException {
inputStream.reset();
setPosition(mark);
}
#Override
public boolean markSupported() {
return inputStream.markSupported();
}
#Override
public int read() throws IOException {
final int c = inputStream.read();
setPosition(position + 1);
return c;
}
}
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.function.DoubleConsumer;
public class Demo1 {
public static void main(String[] args) throws IOException {
final File file = new File(args[0]);
final DoubleConsumer callBack = p -> System.out.printf("%.0f%%\n", p);
try (final FileInputStream fis = new FileInputStream(file); final InputStreamWithProgressDecorator is = new InputStreamWithProgressDecorator(fis, file.length(), callBack)) {
// Simulating JAXB unmarshaller reads
byte[] buffer = is.readNBytes(1024);
while (buffer.length != 0) buffer = is.readNBytes(1024);
}
}
}
Or if you have a FileInputStream with a separate Thread approach :
public class FileInputStreamReadProgressThread extends Thread implements UncaughtExceptionHandler {
/** Input stream */ private final FileInputStream fileInputStream;
/** File size */ private final long length;
/** Read progress in percents */ private double progress = 0d;
/** Exception from thread */ private Throwable exception = null;
/** Consumer of the progress */ private final DoubleConsumer callBack;
public FileInputStreamReadProgressThread(final FileInputStream fis, final long l, final DoubleConsumer cb) {
fileInputStream = fis;
length = l;
callBack = cb;
setUncaughtExceptionHandler(this);
setName(getClass().getSimpleName());
}
public double getProgress() { return progress; }
public Throwable getException() { return exception; }
#Override public void uncaughtException(final Thread t, final Throwable e) { exception = e; }
#Override
public void run() {
try {
long position = -1L;
final FileChannel channel = fileInputStream.getChannel();
while (!isInterrupted() && channel.isOpen() && position < length) {
position = channel.position();
progress = length == 0L ? 100d : ((double)position) * 100d / ((double)length);
callBack.accept(progress);
sleep(100L);
}
} catch (final IOException e) {
exception = e;
} catch (final InterruptedException e) {
// Do nothing
}
}
}
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.channels.Channels;
import java.util.function.DoubleConsumer;
public class Demo2 {
public static void main(String[] args) throws IOException {
final File file = new File(args[0]);
final DoubleConsumer callBack = p -> System.out.printf("%.0f%%\n", p);
try (final FileInputStream fis = new FileInputStream(file); final InputStream is = Channels.newInputStream(fis.getChannel())) {
final FileInputStreamReadProgressThread readProgressThread = new FileInputStreamReadProgressThread(fis, file.length(), callBack);
readProgressThread.start();
// Simulating JAXB unmarshaller reads
is.readAllBytes();
}
}
}

Hadoop mapper is never called, custom input format might be the issue

So I am doing a little test program just to get the hang of hadoops inputformat classes. I had a word search already built which took in lines as values and searched for the word line by line. I wanted to see if I could get hadoop to take in values word by word, hadoop doesn't seem to like that and keeps giving me results using the default mapper. My mappers initialize function is never even called.
I do know my record reader is called and that it is doing more or less what it is supposed to and I'm pretty sure the output of the record reader is what my mapper is searching for so why does hadoop decide not to call it?
Here is the relevant code
Input Format Class
public class WordReader extends FileInputFormat<Text, Text> {
#Override
public RecordReader<Text, Text> createRecordReader(InputSplit split,
TaskAttemptContext context) {
return new MyWholeFileReader();
}
}
Record Reader
public class MyWholeFileReader extends RecordReader<Text, Text> {
private long start;
private LineReader in;
private Text key = null;
private Text value = null;
private ArrayList<String> outputvalues;
public void initialize(InputSplit genericSplit,
TaskAttemptContext context) throws IOException {
outputvalues = new ArrayList<String>();
FileSplit split = (FileSplit) genericSplit;
Configuration job = context.getConfiguration();
start = split.getStart();
final Path file = split.getPath();
// open the file and seek to the start of the split
FileSystem fs = file.getFileSystem(job);
FSDataInputStream fileIn = fs.open(split.getPath());
in = new LineReader(fileIn, job);
if (key == null) {
key = new Text();
}
key.set(split.getPath().getName());
if (value == null) {
value = new Text();
}
}
public boolean nextKeyValue() throws IOException {
if (outputvalues.size() == 0) {
Text buffer = new Text();
int i = in.readLine(buffer);
String str = buffer.toString();
for (String vals : str.split(" ")) {
outputvalues.add(vals);
}
if (i == 0 || outputvalues.size() == 0) {
key = null;
value = null;
return false;
}
}
value.set(outputvalues.remove(0));
System.out.println(value.toString());
return true;
}
#Override
public Text getCurrentKey() {
return key;
}
#Override
public Text getCurrentValue() {
return value;
}
/**
*
* Get the progress within the split
*/
public float getProgress() {
return 0.0f;
}
public synchronized void close() throws IOException {
if (in != null) {
in.close();
}
}
}
Mapper
public class WordSearchMapper extends Mapper<Text, Text, OutputCollector<Text,IntWritable>, Reporter> {
static String keyword;
BloomFilter<String> b;
public void configure(JobContext jobConf) {
keyword = jobConf.getConfiguration().get("keyword");
System.out.println("keyword>> " + keyword);
b = new BloomFilter<String>(.01,10000);
b.add(keyword);
System.out.println(b.getExpectedBitsPerElement());
}
public void map(Text key, Text value, OutputCollector<Text,IntWritable> output,
Reporter reporter) throws IOException {
int wordPos;
System.out.println("value.toString()>> " + value.toString());
System.out.println(((FileSplit) reporter.getInputSplit()).getPath()
.getName());
String[] tokens = value.toString().split("[\\p{P} \\t\\n\\r]");
for (String st :tokens) {
if (b.contains(st)) {
if (value.toString().contains(keyword)) {
System.out.println("Found one");
wordPos = ((Text) value).find(keyword);
output.collect(value, new IntWritable(wordPos));
}
}
}
}
}
Driver:
public class WordSearch {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf,"WordSearch");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(WordSearchMapper.class);
job.setInputFormatClass( WordReader.class);
job.setOutputFormatClass(TextOutputFormat.class);
conf.set("keyword", "the");
FileInputFormat.setInputPaths(job, new Path("search.txt"));
FileOutputFormat.setOutputPath(job, new Path("outputs"+System.currentTimeMillis()));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}

And I figured it out... this is why hadoop needs to stop supporting multiple versions of itself or why I should stop jamming multiple tutorials together. Turns out my mapper needs to be set up like this for the way my mapper and record reader are set up to interact.
'public class WordSearchMapper extends Mapper { static String keyword;`
I only realized this after looking at my imports and seeing that reporter was from package org.apache.hadoop.mapred as opposed to org.apache.hadoop.mapreduce –

Dcm4Che - getting images from pacs

I've got following problem. I have to write small application that connects to pacs and gets images. I decided to use dcm4che toolkit. I've written following code:
public class Dcm4 {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
DcmQR dcmqr = new MyDcmQR("server");
dcmqr.setCalledAET("server", true);
dcmqr.setRemoteHost("213.165.94.158");
dcmqr.setRemotePort(104);
dcmqr.getKeys();
dcmqr.setDateTimeMatching(true);
dcmqr.setCFind(true);
dcmqr.setCGet(true);
dcmqr.setQueryLevel(MyDcmQR.QueryRetrieveLevel.IMAGE);
dcmqr.addMatchingKey(Tag.toTagPath("PatientID"),"2011");
dcmqr.addMatchingKey(Tag.toTagPath("StudyInstanceUID"),"1.2.276.0.7230010.3.1.2.669896852.2528.1325171276.917");
dcmqr.addMatchingKey(Tag.toTagPath("SeriesInstanceUID"),"1.2.276.0.7230010.3.1.3.669896852.2528.1325171276.916");
dcmqr.configureTransferCapability(true);
List<DicomObject> result=null;
byte[] imgTab=null;
BufferedImage bImage=null;
try {
dcmqr.start();
System.out.println("started");
dcmqr.open();
System.out.println("opened");
result = dcmqr.query();
System.out.println("queried");
dcmqr.get(result);
System.out.println("List Size = " + result.size());
for(DicomObject dco:result){
System.out.println(dco);
dcmTools.toByteArray(dco);
System.out.println("end parsing");
}
} catch (Exception e) {
System.out.println("error "+e);
}
try{
dcmqr.stop();
dcmqr.close();
}catch (Exception e) {
}
System.out.println("done");
}
}
Everything seems to be fine until I call dcmTools.toByteArray(dco).
Output till calliing toByteArray() looks like this:
List Size = 1
(0008,0052) CS #6 [IMAGE] Query/Retrieve Level
(0008,0054) AE #6 [server] Retrieve AE Title
(0020,000E) UI #54 [1.2.276.0.7230010.3.1.3.669896852.2528.1325171276.916] Series Instance UID
Source of ToByteArray:
public static byte[] toByteArray(DicomObject obj) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BufferedOutputStream bos = new BufferedOutputStream(baos);
DicomOutputStream dos = new DicomOutputStream(bos);
dos.writeDicomFile(obj);
dos.close();
byte[] data = baos.toByteArray();
return data;
}
After calling toByteArray I got output:
error java.lang.IllegalArgumentException: Missing (0002,0010) Transfer Syntax UID
I,ve found some informations in other forums and it seems like DcmQR.get() method doesn't send imgage data. Is it possible to force DcmQR to do it. I've written that problem is in or with DcmQR.createStorageService() method but I haven't found the solution. Please help me!!!
Hello cneller!
I've made some changes you suggested: I've add setMoveDest and setStoreDestination and DicomObject are stored in destination I've added - it looks great. Then I've tried to write response handler based on FutureDimseRSP which is used in Association.cget method:
public class MyDimseRSP extends DimseRSPHandler implements DimseRSP{
private MyEntry entry = new MyEntry(null, null);
private boolean finished;
private int autoCancel;
private IOException ex;
#Override
public synchronized void onDimseRSP(Association as, DicomObject cmd,
DicomObject data) {
super.onDimseRSP(as, cmd, data);
MyEntry last = entry;
while (last.next != null)
last = last.next;
last.next = new MyEntry(cmd, data);
if (CommandUtils.isPending(cmd)) {
if (autoCancel > 0 && --autoCancel == 0)
try {
super.cancel(as);
} catch (IOException e) {
ex = e;
}
} else {
finished = true;
}
notifyAll();
}
#Override
public synchronized void onClosed(Association as) {
if (!finished) {
// ex = as.getException();
ex = null;
if (ex == null) {
ex = new IOException("Association to " + as.getRemoteAET()
+ " closed before receive of outstanding DIMSE RSP");
}
notifyAll();
}
}
public final void setAutoCancel(int autoCancel) {
this.autoCancel = autoCancel;
}
#Override
public void cancel(Association a) throws IOException {
if (ex != null)
throw ex;
if (!finished)
super.cancel(a);
}
public DicomObject getDataset() {
return entry.command;
}
public DicomObject getCommand() {
return entry.dataset;
}
public MyEntry getEntry() {
return entry;
}
public synchronized boolean next() throws IOException, InterruptedException {
if (entry.next == null) {
if (finished)
return false;
while (entry.next == null && ex == null)
wait();
if (ex != null)
throw ex;
}
entry = entry.next;
return true;
}
}
Here is MyEntry code:
public class MyEntry {
final DicomObject command;
final DicomObject dataset;
MyEntry next;
public MyEntry(DicomObject command, DicomObject dataset) {
this.command = command;
this.dataset = dataset;
}
public DicomObject getCommand() {
return command;
}
public DicomObject getDataset() {
return dataset;
}
public MyEntry getNext() {
return next;
}
public void setNext(MyEntry next) {
this.next = next;
}
}
Then I've retyped get method from Dmcqr as follows:
public void getObject(DicomObject obj, DimseRSPHandler rspHandler)throws IOException, InterruptedException{
TransferCapability tc = selectTransferCapability(qrlevel.getGetClassUids());
MyDimseRSP myRsp=new MyDimseRSP();
if (tc == null)
throw new NoPresentationContextException(UIDDictionary
.getDictionary().prompt(qrlevel.getGetClassUids()[0])
+ " not supported by " + remoteAE.getAETitle());
String cuid = tc.getSopClass();
String tsuid = selectTransferSyntax(tc);
DicomObject key = obj.subSet(MOVE_KEYS);
assoc.cget(cuid, priority, key, tsuid, rspHandler);
assoc.waitForDimseRSP();
}
In second argument in this method I've used an instance of my response handler (MyDimseRSP). And I run my code I got null value of command and dataset of my response handler. In "next" variable only "command" is not null, and od course it's not DicomObject which I need. What I'm doing wrong!!!!

You're going to have to step through the code a bit (including the DCM4CHE toolkit code). I suspect you are using the default response handler, which just counts the number of completed operations, and doesn't actually store the image data from the get command.
Clearly, your for loop, below, is looping over the results of the find operation, not the get (which needs to be handled in the response handler).
for(DicomObject dco:result)
I expect you will have to override the response handler to write your DICOM files appropriately. See also the DcmRcv class for writing DICOM files from the DicomObject you'll receive.
:
From your edits above, I assume you are just trying to get the raw DICOM instance data (not the command that stored it). What about a response handler roughly like:
List<DicomObject> dataList = new ArrayList<DicomObject>();
#Override
public void onDimseRSP(Association as, DicomObject cmd, DicomObject data) {
if( shouldAdd(as, cmd) ) {
dataList.add( data )
}
}
Watch out for large lists, but it should get you the data in memory.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

PIG Custom loader's getNext() is being called again and again - java

Try this in Loader //.... boolean notDone = ((CustomStringInputFormat)in).nextKeyValue(); //... Text value = new Text(((CustomStringInputFormat))in.getCurrentValue().toString())

Related

How we can read the request body in a filter without affecting the original request in java?

invoke a get/set static operation from two different classes

Get progress information during JAXB de-/serialization

Hadoop mapper is never called, custom input format might be the issue

Dcm4Che - getting images from pacs

Categories

Resources