Groovys XmlParser ignores CDATA CR/CL

Groovys XmlParser ignores CDATA CR/CL - java

I want to parse a log4j generated xml log. Within the xml is a node with a throwable (if any). This (multiline, tabbed) text is encapsulated in a CDATA tag.
This is an excerpt of the whole file:
<log4j:event logger="org.codehaus.groovy.grails.web.errors.GrailsExceptionResolver" timestamp="1330083921521" level="ERROR" thread="http-8080-1">
<log4j:message><![CDATA[Exception occurred when processing request: [GET] /test/log/show
Stacktrace follows:]]></log4j:message>
<log4j:throwable><![CDATA[org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at test.LogController$_closure2.doCall(LogController.groovy:21)
at test.LogController$_closure2.doCall(LogController.groovy)
at java.lang.Thread.run(Thread.java:662)
]]></log4j:throwable>
</log4j:event>
I parse it with groovys XmlParser:
def parser = new XmlParser(false, false).parse(new File("stack.log"))
return parser.'log4j:event'.collect { l ->
LogEntry entry = new LogEntry()
entry.with {
level = l.'#level'
message = l.'log4j:message'.text()
thread = l.'#thread'
logger = l.'#logger'
timestamp = new Date(l.'#timestamp' as long)
throwable = l.'log4j:throwable'?.text() ?: ''
}
entry
}
The 'throwable' field contains all the text but without CR/LF.
Does anybody know how to cope with that?
Thanks in advcance...

Hate to just throw code at you, but it seems to work as expected and returns the CRLFs
def xml = '''<log>
| <log4j:event logger="org.codehaus.groovy.grails.web.errors.GrailsExceptionResolver" timestamp="1330083921521" level="ERROR" thread="http-8080-1">
| <log4j:message><![CDATA[Exception occurred when processing request: [GET] /test/log/show
|Stacktrace follows:]]></log4j:message>
| <log4j:throwable><![CDATA[org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
| at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
| at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
| at test.LogController$_closure2.doCall(LogController.groovy:21)
| at test.LogController$_closure2.doCall(LogController.groovy)
| at java.lang.Thread.run(Thread.java:662)
|]]></log4j:throwable>
| </log4j:event>
|</log>'''.stripMargin()
class LogEntry {
def level
def message
def thread
def logger
def timestamp
def throwable
String toString() {
"""EVENT:
| level : $level
| message : $message
| thread : $thread
| logger : $logger
| ts : $timestamp
| thrown : $throwable""".stripMargin()
}
}
def parser = new XmlParser(false, false).parseText( xml )
def entries = parser.'log4j:event'.collect { event ->
new LogEntry().with {
level = event.#level
message = event.'log4j:message'.text()
thread = event.#thread
logger = event.#logger
timestamp = new Date( event.#timestamp as long )
throwable = event.'log4j:throwable'?.text() ?: ''
it
}
}
entries.each {
println it
}
That prints:
EVENT:
level : ERROR
message : Exception occurred when processing request: [GET] /test/log/show
Stacktrace follows:
thread : http-8080-1
logger : org.codehaus.groovy.grails.web.errors.GrailsExceptionResolver
ts : Fri Feb 24 11:45:21 GMT 2012
thrown : org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at test.LogController$_closure2.doCall(LogController.groovy:21)
at test.LogController$_closure2.doCall(LogController.groovy)
at java.lang.Thread.run(Thread.java:662)
Which has CRLF chars in it where they are supposed to be...
This is with Groovy 1.8.6 btw... What version are you using? Can you upgrade and try again?

The xml standard calls for white space to be normalized during the parse.
I'm not sure, but the parser may have a setting to override this behavior. Otherwise, you could pre-process the file, replacing line endings inside c data sections with their xml entity equivalents, and then parse it.

Related

My grails project facing error"TransactionRequiredException:no transaction" using domain.save(flush:true).save() it can is saving but updating

package com.fhjony.ocbt
import grails.web.servlet.mvc.GrailsParameterMap
class MemberService {
def save(GrailsParameterMap params) {
Member member = new Member(params)
def response = AppUtil.saveResponse(false, member)
if (member.validate()) {
member.save(true)
if (!member.hasErrors()){
response.isSuccess = true
}
}
return response
}
def update(Member member, GrailsParameterMap params) {
member.properties = params
def response = AppUtil.saveResponse(false, member)
if (member.validate()) {
member.save(flush: true)
if (!member.hasErrors()){
response.isSuccess = true
}
}
return response
}
def getById(Serializable id) {
return Member.get(id)
}
def list(GrailsParameterMap params) {
params.max = params.max ?: GlobalConfig.itemPerPage()
List<Member> memberList = Member.createCriteria().list(params) {
if (params?.colName && params?.colValue) {
like(params.colName, "%" + params.colValue + "%")
}
if (!params.sort) {
order("id", "desc")
}
}
return [list: memberList, count: memberList.totalCount]
}
def delete(Member member) {
try {
member.delete(flush: true,failOnError:true)
} catch (Exception e) {
println(e.getMessage())
return false
}
return true
}
}
Error message:
URI /member/update Class
javax.persistence.TransactionRequiredException Message null Caused by
no transaction is in progress
Line | Method
->> 211 | invoke in org.grails.core.DefaultGrailsControllerClass$ReflectionInvoker
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | 188 | invoke in org.grails.core.DefaultGrailsControllerClass | 90 | handle . . . .
. in org.grails.web.mapping.mvc.UrlMappingsInfoHandlerAdapter | 1039
| doDispatch in
org.springframework.web.servlet.DispatcherServlet | 942 | doService
. . . in '' | 1005 | processRequest in
org.springframework.web.servlet.FrameworkServlet | 908 | doPost . .
. . . in '' | 882 | service in '' | 77 |
doFilterInternal in

You want your database interactions to be happening in a transactional context. One simple piece of that is you can mark your service class with #grails.gorm.transactions.Transactional.
Separate from that, and this isn't really related to your question, but passing GrailsParameterMap map around as a method argument is an unusual thing to do. What the right thing to do is depends on some factors in your app and you may want to pass values into your service rather than the whole map but if you really want the whole map in the service, one way to get there is by way of WebAttributes.
import grails.gorm.transactions.Transactional
import grails.web.api.WebAttributes
#Transactional
class MemberService implements WebAttributes {
def serviceMethod() {
// you can access params here because
// WebAttributes provides access to it
Member member = new Member(params)
// ...
}
}

Java Spark : Spark Bug Workaround for Datasets Joining with unknow Join Column Names

I am using Spark 2.3.1 with Java.
I have encountered what (I think), is this known bug of Spark.
Here is my code :
public Dataset<Row> compute(Dataset<Row> df1, Dataset<Row> df2, List<String> columns){
Seq<String> columns_seq = JavaConverters.asScalaIteratorConverter(columns.iterator()).asScala().toSeq();
final Dataset<Row> join = df1.join(df2, columns_seq);
join.show()
join.withColumn("newColumn", abs(col("value1").minus(col("value2")))).show();
return join;
}
I call my code like this :
Dataset<Row> myNewDF = compute(MyDataset1, MyDataset2, Arrays.asList("field1","field2","field3","field4"));
Note : MyDataset1 and MyDataset2 are two datasets that come from the same Dataset MyDataset0 with multiple different transformations.
On the join.show() line, I get the following error :
2018-08-03 18:48:43 - ERROR main Logging$class - - - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 235, Column 21: Expression "project_isNull_2" is not an rvalue
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 235, Column 21: Expression "project_isNull_2" is not an rvalue
at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:11821)
at org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7170)
at org.codehaus.janino.UnitCompiler.getConstantValue2(UnitCompiler.java:5332)
at org.codehaus.janino.UnitCompiler.access$9400(UnitCompiler.java:212)
at org.codehaus.janino.UnitCompiler$13$1.visitAmbiguousName(UnitCompiler.java:5287)
at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:4053)
...
2018-08-03 18:48:47 - WARN main Logging$class - - - Whole-stage codegen disabled for plan (id=7):
But it does not stop the execution and still displays the content of the dataset.
Then, on the line join.withColumn("newColumn", abs(col("value1").minus(col("value2")))).show();
I get the error :
Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved attribute(s) 'value2,'value1 missing from field6#16,field7#3,field8#108,field5#0,field9#4,field10#28,field11#323,value1#298,field12#131,day#52,field3#119,value2#22,field2#35,field1#43,field4#144 in operator 'Project [field1#43, field2#35, field3#119, field4#144, field5#0, field6#16, value2#22, field7#3, field9#4, field10#28, day#52, field8#108, field12#131, value1#298, field11#323, abs(('value1 - 'value2)) AS newColumn#2579]. Attribute(s) with the same name appear in the operation: value2,value1. Please check if the right attribute(s) are used.;;
'Project [field1#43, field2#35, field3#119, field4#144, field5#0, field6#16, value2#22, field7#3, field9#4, field10#28, day#52, field8#108, field12#131, value1#298, field11#323, abs(('value1 - 'value2)) AS newColumn#2579]
+- AnalysisBarrier
...
This error end the program.
The workaround proposed Mijung Kim on the Jira Issue is to create a Dataset clone thanks to toDF(Columns). But in my case, where the column names used for the join are not known in advance (I only have a List), I can't use this workaround.
Is there another way to get around this very annoying bug ?

Try to call this method:
private static Dataset<Row> cloneDataset(Dataset<Row> ds) {
List<Column> filterColumns = new ArrayList<>();
List<String> filterColumnsNames = new ArrayList<>();
scala.collection.Iterator<StructField> it = ds.exprEnc().schema().toIterator();
while (it.hasNext()) {
String columnName = it.next().name();
filterColumns.add(ds.col(columnName));
filterColumnsNames.add(columnName);
}
ds = ds.select(JavaConversions.asScalaBuffer(filterColumns).seq()).toDF(scala.collection.JavaConverters.asScalaIteratorConverter(filterColumnsNames.iterator()).asScala().toSeq());
return ds;
}
on both datasets just before the join like this :
df1 = cloneDataset(df1);
df2 = cloneDataset(df2);
final Dataset<Row> join = df1.join(df2, columns_seq);
// or ( based on Nakeuh comment )
final Dataset<Row> join = cloneDataset(df1.join(df2, columns_seq));

How do I create a csv file with dynamically rows

I just can't figure out how to create a csv file with dynamically rows.
I used commons-csv to write 3 rows to a file but I need some more.
I'm reading values out of a database line by line which look like that:
[ID:1,TSP:'2018-01-01 00:10:00', VALUE: 856830, VAL1:'36,704'],
[ID:4,TSP:'2018-01-01 00:12:00', VALUE: 736830, VAL1:'1,14'],
[ID:5,TSP:'2018-01-01 00:10:00', VALUE: 656830, VAL1:'12,504'],
[ID:5,TSP:'2018-01-01 00:50:00', VALUE: 936830, VAL1:'5,18'],
[ID:3,TSP:'2018-01-01 00:10:00', VALUE: 736860, VAL1:'3,4'],
[ID:4,TSP:'2018-01-01 00:50:00', VALUE: 726830, VAL1:'9,14']
I wanna create a .csv file with a formatting like that:
TSP(2018-01-01 00:10:00) | VALUE_ID1 | VAL_ID1 | VALUE_ID3 | VAL_ID3 | VALUE_ID5 | VAL_ID5
TSP(2018-01-01 00:12:00) | VALUE_ID4 | VAL_ID4
TSP(2018-01-01 00:50:00) | VALUE_ID4 | VAL_ID4 | VALUE_ID5 | VAL_ID5
I hope anyone can help me out with this one, cuz it kills me...
Thanks in advance!
EDIT:
I used groupBy on the TSP so I have the following pattern now:
TSP(2018-01-01 00:10:00):[[ID:1,TSP(2018-01-01 00:10:00),VALUE_ID1,VAL_ID1], [ID:3,TSP(2018-01-01 00:10:00),VALUE_ID3,VAL_ID3], [ID:5,TSP(2018-01-01 00:10:00),VALUE_ID5,VAL_ID5]]
TSP(2018-01-01 00:12:00):[[ID:4,TSP(2018-01-01 00:10:00),VALUE_ID4,VAL_ID4]]
etc. that's where I got stuck now –

You can use below Java API for CVS generation:
OpenCSV
Apache Commons CSV
Both API has read/write API for working with CSV
Examples of APIs:
OpenCSV
Apache Commons CSV
I would prefer to use OpenCSV

I believe univocity-parsers can help you greatly here. Check this section of its tutorial. You can write your rows like this:
File outputFile = new File("/path/to/your.csv");
CsvWriter writer = new CsvWriter(outputFile, new CsvWriterSettings());
String currentTime = null;
for(String[] row : getDataFromYourDatabase()){
if(row[1].equals(currentTime)){ //check the time signature
writer.addValue(row[0]); // write the ID on a column
writer.addValue(row[2]); // write the value associated with the ID on the next column
} else {
if(currentTime != null){
writer.writeValuesToRow(); // generates a row.
}
currentTime = row[1];
writer.addValue(row[1]); //writes the time to the first column of the next row
}
}
Assuming method getDataFromYourDatabase() returns all rows ordered by the time (i.e. the values that look like this: TSP(2018-01-01 00:10:00)) this should work great.
Hope it helps.
Disclosure: I'm the author of this library. It's open-source and free (Apache 2.0 license)

This is the best I can come up with using Fuzzy-Csv (https://github.com/kayr/fuzzy-csv/)
package fuzzycsv
import static fuzzycsv.FuzzyStaticApi.reduce
def data = [[ID: 1, TSP: '2018-01-01 00:10:00', VALUE: 856830, VAL1: '36,704'],
[ID: 4, TSP: '2018-01-01 00:12:00', VALUE: 736830, VAL1: '1,14'],
[ID: 5, TSP: '2018-01-01 00:10:00', VALUE: 656830, VAL1: '12,504'],
[ID: 5, TSP: '2018-01-01 00:50:00', VALUE: 936830, VAL1: '5,18'],
[ID: 3, TSP: '2018-01-01 00:10:00', VALUE: 736860, VAL1: '3,4'],
[ID: 4, TSP: '2018-01-01 00:50:00', VALUE: 726830, VAL1: '9,14']]
def list =
FuzzyCSVTable
.fromMapList(data)
.transform('ID') { "VAL_ID${it.ID}" }
.transform('TSP') { "TSP_(${it.TSP})" }
.summarize('TSP', reduce { it['ID'] }.az('IDS'))
.with { tbl(csv.collect { it.flatten() }[1..-1]) }
println(list.toCsvString())
output
"TSP_(2018-01-01 00:10:00)","VAL_ID1","VAL_ID5","VAL_ID3"
"TSP_(2018-01-01 00:12:00)","VAL_ID4"
"TSP_(2018-01-01 00:50:00)","VAL_ID5","VAL_ID4"

Sphinx Lexical Error parsing grammar file

I have been making a speech recognition program in Java just to see how it works. It was all working completely fine up until a day ago. I have no idea what happened as I didn't change anything in my code.
As it is coming up with a lexical error I think I only need to paste my grammar file.
controls.gram:
#JSGF V1.0;
grammar controls;
public <intro> = "Computer";
public <statement> = <open> | <ignore>;
public <open> = "open" ("chrome" | "word");
public <navigate> = "navigate" "to" (("face" book") | ("stack" "overflow"));
public <ignore> = ("don't" "worry") | ("never" "mind");
public <thanks> = "thanks";
Error log:
Exception in thread "main" edu.cmu.sphinx.jsgf.parser.TokenMgrError: Lexical error at line 11, column 77. Encountered: "\r" (13), after : "\"));"
at edu.cmu.sphinx.jsgf.parser.JSGFParserTokenManager.getNextToken(JSGFParserTokenManager.java:1197)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.jj_ntk(JSGFParser.java:1012)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.item(JSGFParser.java:643)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.sequence(JSGFParser.java:568)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.alternatives(JSGFParser.java:484)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.item(JSGFParser.java:676)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.sequence(JSGFParser.java:568)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.alternatives(JSGFParser.java:484)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.item(JSGFParser.java:676)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.sequence(JSGFParser.java:568)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.alternatives(JSGFParser.java:484)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.RuleDeclaration(JSGFParser.java:450)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.GrammarUnit(JSGFParser.java:313)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.newGrammarFromJSGF(JSGFParser.java:128)
at edu.cmu.sphinx.jsgf.parser.JSGFParser.newGrammarFromJSGF(JSGFParser.java:241)
at edu.cmu.sphinx.jsgf.JSGFGrammar.loadNamedGrammar(JSGFGrammar.java:697)
at edu.cmu.sphinx.jsgf.JSGFGrammar.commitChanges(JSGFGrammar.java:613)
at edu.cmu.sphinx.jsgf.JSGFGrammar.createGrammar(JSGFGrammar.java:300)
at edu.cmu.sphinx.linguist.language.grammar.Grammar.allocate(Grammar.java:116)
at edu.cmu.sphinx.linguist.flat.FlatLinguist.allocate(FlatLinguist.java:300)
at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.allocate(SimpleBreadthFirstSearchManager.java:646)
at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:87)
at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:168)
at [MY PACKAGE].Utils.<init>(Utils.java:23)
at [MY PACKAGE].Launcher.<clinit>(Launcher.java:8)
I have tried removing all Whitespace characters, encoding it in ANSI, UTF (with and without BOM). All of which have been to no avail...
Can anyone help me here?

You have a missing quote:
public = "navigate" "to" (("face" "book") | ("stack" "overflow"));
That should fix it :-)

creating nested nodes using Groovy xml Building

I have the below xml, Updated added "symptoms"
<EBF>
<EBFINFO>
<EBFNUM>EBF262323</EBFNUM>
<RELEASEDATETIME>May 06, 2011</RELEASEDATETIME>
<SYMPTOMS>
<br> INFA252994 - 910 : While running concurrent session Workflow manager hangs and workflow monitor does not respond</br>
<br> INFA262323 - 910 : pmcmd, pmdtm and all LM clients on Windows fail to connect to IS when IPv6 is installed but all IPv6 interfaces are disabled</br>
</SYMPTOMS>
<FILES>
<FILE>
<PATH>H:\EBF262323\EBF262323_Client_Installer_win32_x86\EBFs\clients\PmClient\client\bin\ACE.dll_bak</PATH>
<CHECKSUM>303966974</CHECKSUM>
<AFFECTEDFILES>
<CHECKSUM>3461283269</CHECKSUM>
<PATH>C:\clients\PmClient\CommandLineUtilities\PC\server\bin\ACE.dll</PATH>
<PATH>C:\clients\PmClient\client\bin\ACE.dll</PATH>
</AFFECTEDFILES>
</FILE>
</FILES>
<NOTES>
</NOTES>
</EBFINFO>
</EBF>
Note: In the above xml ebf\enfinfo\files\file\affectedfiles\path and ebf\enfinfo\files\file can be one or more
which I am parsing and generating another xml out of it
def records = new XmlParser().parseText(rs)
csm.ebfHistory(){
records.EBFINFO.each{
ebfHistory_info(num:it.EBFNUM.text(),
release_date_time:it.RELEASEDATETIME.text()
){
it.FILES.FILE.each{ //says Exception in thread "main" java.lang.NullPointerException: Cannot get property 'FILES' on null object
ebfHistory_fileinfo(file_path:it.PATH.text(),
file_checksum:it.CHECKSUM.text()
){
ebfHistory_fileinfo_affectedfiles(
afile_checksum:it.CHECKSUM.text(),
afile_path:it.PATH.text()
)
}
}
}
}
}
something like below
<ebfHistory>
<ebfHistory_info num="EBF262323",release_date_time="May 06, 2011">
<ebfHistory_fileinfo file_checksum="303966974">
<ebfHistory_fileinfo_affectedfiles afile_checksum="3461283269">
<path>C:\clients\PmClient\CommandLineUtilities\PC\server\bin\ACE.dll</path>
<path>C:\clients\PmClient\client\bin\ACE.dll</path>
</ebfHistory_fileinfo_affectedfiles>
</ebfHistory_fileinfo>
</ebfHistory_info>
</ebfHistory>
but instead I get Exception in thread "main" java.lang.NullPointerException: Cannot get property 'FILES' on null object where am i going wrong? Please help somebody. Thanks
Updated code (working)
def records = new XmlParser().parseText(rs)
csm.ebfHistory(){
records.EBFINFO.each{ ebfinfo ->
ebfHistory_info(num:ebfinfo.EBFNUM.text(),
release_date_time:ebfinfo.RELEASEDATETIME.text())
{
ebfinfo.SYMPTOMS.br.each{
ebfHistory_symptom(name:it.text())
}
}
}
ebfHistory_dump(rs){
"${rs}"
}
}

The it no longer refers to each EBFINFO, because you are in another closure--the ebfHistory_info closure.
Instead, explicitly name the EBFINFO object:
records.EBFINFO.each { ebfinfo -> // <-- Give it a name
ebfHistory_info(num:it.EBFNUM.text(),
release_date_time:it.RELEASEDATETIME.text()) {
ebfinfo.FILES.FILE.each { // <-- Use the name here
Same thing in the ebfHistory_fileinfo_affectedfiles parameters.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Groovys XmlParser ignores CDATA CR/CL - java

The xml standard calls for white space to be normalized during the parse. I'm not sure, but the parser may have a setting to override this behavior. Otherwise, you could pre-process the file, replacing line endings inside c data sections with their xml entity equivalents, and then parse it.

Related

My grails project facing error"TransactionRequiredException:no transaction" using domain.save(flush:true).save() it can is saving but updating

Java Spark : Spark Bug Workaround for Datasets Joining with unknow Join Column Names

How do I create a csv file with dynamically rows

Sphinx Lexical Error parsing grammar file

creating nested nodes using Groovy xml Building

Categories

Resources