How to create separate change list when using the API? - java

I am trying to create a Groovy script that takes a list of change lists from our trunk and merges them one at a time into a release branch. I would like to have all the change lists locally because I want to run a test build before submitting upstream. However, whenever I run the script I find that when I look in P4V all the merges have been placed in the default change list. How can I keep them separate?
My code (in Groovy but using the Java API) is as follows:
final changeListNumbers = [ 579807, 579916, 579936 ]
final targetBranch = "1.0.7"
changeListNumbers.each { changeListNumber ->
final existingCl = server.getChangelist( changeListNumber )
final cl = new Changelist(
IChangelist.UNKNOWN,
client.getName(),
server.userName,
ChangelistStatus.NEW,
new Date(),
"${existingCl.id} - ${existingCl.description}",
false,
server
);
cl.fileSpecs = mergeChangeListToBranch( client, cl, changeListNumber, targetBranch )
}
def List<IFileSpec> mergeChangeListToBranch( final IClient client, final IChangelist changeList, final srcChangeListNumber, final String branchVersion ){
final projBase = '//Clients/Initech'
final trunkBasePath = "$projBase/trunk"
final branchBasePath = "$projBase/release"
final revisionedTrunkPath = "$trunkBasePath/...#$srcChangeListNumber,$srcChangeListNumber"
final branchPath = "$branchBasePath/$branchVersion/..."
println "trunk path: $revisionedTrunkPath\nbranch path is: $branchPath"
mergeFromTo( client, changeList, revisionedTrunkPath, branchPath )
}
def List<IFileSpec> mergeFromTo( final IClient client, final IChangelist changeList,final String sourceFile, final String destFile ){
mergeFromTo(
client,
changeList,
new FileSpec( new FilePath( FilePath.PathType.DEPOT, sourceFile ) ),
new FileSpec( new FilePath( FilePath.PathType.DEPOT, destFile ) )
)
}
def List<IFileSpec> mergeFromTo( final IClient client, final IChangelist changeList, final FileSpec sourceFile, final FileSpec destFile ){
final resolveOptions = new ResolveFilesAutoOptions()
resolveOptions.safeMerge = true
client.resolveFilesAuto(
client.integrateFiles( sourceFile, destFile, null, null ),
// client.integrateFiles( changeList.id, false, null, null, sourceFile, destFile ),
resolveOptions
)
}
If I try to IChangeList.update() I get the following error:
Caught: com.perforce.p4java.exception.RequestException: Error in change specification.
Error detected at line 7.
Invalid status 'new'.
If instead of using IChangelist.UNKNOWN to existingCl.id + 10000 (which is larger than any existing change list number currently in use) then I get
Caught: com.perforce.p4java.exception.RequestException: Tried to update new or default changelist

To create the changelist in the server, call IClient.createChangelist():
final existingCl = server.getChangelist( changeListNumber )
cl = new Changelist(
IChangelist.UNKNOWN,
... snip ...
);
cl = client.createChangelist(cl);
cl.fileSpecs = mergeChangeListToBranch( client, cl, ...
Then to integrate into this particular change:
IntegrateFilesOptions intOpts = new IntegrateFilesOptions()
intOpts.setChangelistId( cl.getId())
client.integrateFiles( sourceFile, destFile, null, intOpts)
That integrateFiles() returns the integrated file(s), so check that the returned IFileSpec.getOpStatus() is FileSpecOpStatus.VALID.

Related

Can not extract text via Apache Tika using Lucee

I would like to extract text from pdf, docx etc via Lucee 5+ (5.2.9), but unfortunately i get empty result set. I have used several Apache Tika versions (runnable jar with Java 1.8.0) that might fit to my specific Lucee and Java requirements, but the result set always remains empty.
exract.cfc
component {
public any function init() {
_setTikaJarPath( GetDirectoryFromPath( GetCurrentTemplatePath( ) ) & "tika-app-1.19.1.jar" );
return this;
}
private struct function doParse( required any fileContent, boolean includeMeta=true, boolean includeText=true ) {
var result = {};
var is = "";
var jarPath = _getTikaJarPath();
if ( IsBinary( arguments.fileContent ) ) {
is = CreateObject( "java", "java.io.ByteArrayInputStream" ).init( arguments.fileContent );
} else {
// TODO, support plain string input (i.e. html)
return {};
}
try {
var parser = CreateObject( "java", "org.apache.tika.parser.AutoDetectParser", jarPath );
var ch = CreateObject( "java", "org.apache.tika.sax.BodyContentHandler" , jarPath ).init(-1);
var md = CreateObject( "java", "org.apache.tika.metadata.Metadata" , jarPath ).init();
parser.parse( is, ch, md );
if ( arguments.includeMeta ) {
result.metadata = {};
for( var key in md.names() ) {
var mdval = md.get( key );
if ( !isNull( mdval ) ) {
result.metadata[ key ] = _removeNonUnicodeChars( mdval );
}
}
}
if ( arguments.includeText ) {
result.text = _removeNonUnicodeChars( ch.toString() );
}
} catch( any e ) {
result = { error = e };
}
return result;
}
public function read(required string filename) {
var result = {};
if(!fileExists(filename)) {
result.error = "#filename# does not exist.";
return result;
};
var f = createObject("java", "java.io.File").init(filename);
var fis = createObject("java","java.io.FileInputStream").init(f);
try {
result = doParse(fis);
} catch(any e) {
result.error = e;
}
fis.close();
return result;
}
private string function _removeNonUnicodeChars( required string potentiallyDirtyString ) {
return ReReplace( arguments.potentiallyDirtyString, "[^\x20-\x7E]", "", "all" );
}
// GETTERS AND SETTERS
private string function _getTikaJarPath() {
return _tikaJarPath;
}
private void function _setTikaJarPath( required string tikaJarPath ) {
_tikaJarPath = arguments.tikaJarPath;
}
}
and the code that i use to run it
<cfset takis = new exract()>
<cfset files = directoryList(expandPath("./sources"))>
<cfloop index="f" array="#files#">
<cfif not findNoCase(".DS_Store",f)>
<cfdump var="#takis.read(f)#" label="#f#">
</cfif>
</cfloop>
I think the problem is a class clash: The Lucee core engine already loads a version of Tika meaning the one you point to is ignored. But the loaded version doesn't behave as expected, returning empty strings as you've seen.
I've solved this by using OSGi to load the desired Tika version. This involves editing the Manifest of the tika-app jar to include basic OSGi metadata and then loading it via my osgiLoader
There is a pre-built Tika bundle available but I haven't been able to get it to work with Lucee.
Here's how to convert the latest tika-app jar to OSGi:
open the "tika-app-1.28.2.jar" with 7-zip
open META-INF then select MANIFEST.MF and press F4 to open it in a text editor
add the following to the end of the file:
Bundle-Name: Apache Tika App Bundle
Bundle-SymbolicName: apache-tika-app-bundle
Bundle-Description: Apache Tika App jar converted to an OSGi bundle
Bundle-ManifestVersion: 2
Bundle-Version: 1.28.2
Bundle-ClassPath: .,tika-app-1.28.2.jar
Save choosing to update when prompted.
You can then call the jar using osgiLoader as follows:
extractor.cfc
component{
property name="loader" type="object";
property name="tikaBundle" type="struct";
public extractor function init( required object loader, required struct tikaBundle ){
variables.loader = arguments.loader
variables.tikaBundle = arguments.tikaBundle
return this
}
public string function parseToString( required string filePath ){
try{
var fileStream = CreateObject( "java", "java.io.FileInputStream" ).init( JavaCast( "string", arguments.filePath ) )
var tikaObject = loader.loadClass( "org.apache.tika.Tika", tikaBundle.path, tikaBundle.name, tikaBundle.version )
var result = tikaObject.parseToString( fileStream )
}
finally{
fileStream.close()
}
return result
}
}
(The following script assumes extractor.cfc, the modified Tika jar, the osgiLoader.cfc and the document to be processed are in the same directory.)
index.cfm
<cfscript>
docPath = ExpandPath( "test.pdf" )
loader = New osgiLoader()
tikaBundle = {
version: "1.28.2"
,name: "apache-tika-app-bundle"
,path: ExpandPath( "tika-app-1.28.2.jar" )
}
extractor = New extractor( loader, tikaBundle )
result = extractor.parseToString( docPath )
dump( result )
</cfscript>
Another way to get the right version loaded is to use JavaLoader. For some reason I couldn't get it to work with the latest tika-app jar (1.28.2), but 1.19.1 does seem to work.
Hacking the existing extension
I would advise you to raise an issue with Preside to change their extension to avoid the clash, but as a temporary hack you could try amending it yourself as follows:
First, add your modified Tika bundle and the osgiLoader.cfc to the /preside-ext-tika/services/ directory.
Next, change line 14 of DocumentMetadataService.cfc so the name of the Tika jar path matches your modified bundle.
_setTikaJarPath( GetDirectoryFromPath( GetCurrentTemplatePath( ) ) & "tika-app-1.28.2.jar" );
Then, modify lines 33-35 of the same cfc to replace:
var parser = CreateObject( "java", "org.apache.tika.parser.AutoDetectParser", jarPath );
var ch = CreateObject( "java", "org.apache.tika.sax.BodyContentHandler" , jarPath ).init(-1);
var md = CreateObject( "java", "org.apache.tika.metadata.Metadata" , jarPath ).init();
with the following:
var loader = New osgiLoader();
var tikaBundle = { version: "1.28.2", name: "apache-tika-app-bundle" };
var parser = loader.loadClass( "org.apache.tika.parser.AutoDetectParser", jarPath, tikaBundle.name, tikaBundle.version )
var ch = loader.loadClass( "org.apache.tika.sax.BodyContentHandler" , jarPath, tikaBundle.name, tikaBundle.version ).init(-1)
var md = loader.loadClass( "org.apache.tika.metadata.Metadata" , jarPath, tikaBundle.name, tikaBundle.version ).init()
NB: I don't have Preside so can't test it in context.

UserDefinedFileAttributeView view returns null , unable to set attributes

I am trying to tag files on an android device to display later on a list based on user defined tags (e.g. Category , Description , Owner).
I have used the widely recommended java NIO code for UserDefinedFileAttributes
The file path is found and is reported as valid , however
final UserDefinedFileAttributeView view = getFileAttributeView (path,
UserDefinedFileAttributeView.class);
always returns 'view' = null
What am I missing?
Perhaps UserDefinedFileAttributeView is not supported on all platforms / devices?
I have tried this code on Android Studio emulator for Nexus 6 (API 28)
String MEDIA_PATH = Environment.getExternalStorageDirectory().getPath()
+ File.separator + "myEMR" + File.separator +
"Imported_Documents" ;
String uNewFileName = "MyFile";
public void setLocalFileAttributes() throws Exception {
Path path = Paths.get(MEDIA_PATH, uNewFileName);
final UserDefinedFileAttributeView view = getFileAttributeView(path,
UserDefinedFileAttributeView.class);
// The file attribute
final String category = "Category";
final String value = "UnCategorised";
final byte[] bytes = value.getBytes("UTF-8");
final ByteBuffer writeBuffer =
ByteBuffer.allocate(bytes.length);
writeBuffer.put(bytes);
writeBuffer.flip();
view.write(category, writeBuffer);
}

Reuse normalizer in ND4J/DL4J

I wonder what's the proper way to reuse a normalizer in ND4J/DL4J. Currently, I save it follows:
final DataNormalization normalizer = new NormalizerStandardize();
normalizer.fit( trainingData );
normalizer.transform( trainingData );
normalizer.transform( testData );
try {
final NormalizerSerializer normalizerSerializer = new NormalizerSerializer();
normalizerSerializer.addStrategy( new StandardizeSerializerStrategy() );
normalizerSerializer.write( normalizer, path );
} catch ( final IOException e ) {
// ...
}
And load it via:
try {
final NormalizerSerializer normalizerSerializer = new NormalizerSerializer();
normalizerSerializer.addStrategy( new StandardizeSerializerStrategy() );
final DataNormalization normalizer = normalizerSerializer.restore( path );
} catch ( final Exception e ) { // Throws Exception instead of IOException.
// ...
}
Is that OK? Unfortunately, I wasn't able to find more information in the docs.
This is what I do...
DataNormalization normalizer = new NormaizerStandardize();
normalizer.fit(trainingData);
normalizer.transform(trainingData);
save it
NormalizerSerializer saver = NormalizerSerializer.getDefaults();
File normalsFile = new File("fileName");
saver.write(normalizer,normalsFile);
restore it
NormalizerSerializer loader = NormalizerSerializer.getDefaults();
DataNormalization restoredNormalizer = loader.restore(normalsFile);
restoredNormalizer.transform(testData);
The ND4J Java Docs say that .getDefaults() gets a serializer, configured with strategies for the built-in normalizer implementations. As you are using NormalizerStandardize the getDefaults() offers a short-hand way of achieving the same end without explicitly adding the strategy.

How to read Nutch content from Java/Scala?

I'm using Nutch to crawl some websites (as a process that runs separate of everything else), while I want to use a Java (Scala) program to analyse the HTML data of websites using Jsoup.
I got Nutch to work by following the tutorial (without the script, only executing the individual instructions worked), and I think it's saving the websites' HTML in the crawl/segments/<time>/content/part-00000 directory.
The problem is that I cannot figure out how to actually read the website data (URLs and HTML) in a Java/Scala program. I read this document, but find it a bit overwhelming since I've never used Hadoop.
I tried to adapt the example code to my environment, and this is what I arrived at (mostly by guesswprk):
val reader = new MapFile.Reader(FileSystem.getLocal(new Configuration()), ".../apache-nutch-1.8/crawl/segments/20140711115438/content/part-00000", new Configuration())
var key = null
var value = null
reader.next(key, value) // test for a single value
println(key)
println(value)
However, I am getting this exception when I run it:
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1873)
at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:517)
I am not sure how to work with a MapFile.Reader, specifically, what constructor parameters I am supposed to pass to it. What Configuration objects am I supposed to pass in? Is that the correct FileSystem? And is that the data file I'm interested in?
Scala:
val conf = NutchConfiguration.create()
val fs = FileSystem.get(conf)
val file = new Path(".../part-00000/data")
val reader = new SequenceFile.Reader(fs, file, conf)
val webdata = Stream.continually {
val key = new Text()
val content = new Content()
reader.next(key, content)
(key, content)
}
println(webdata.head)
Java:
public class ContentReader {
public static void main(String[] args) throws IOException {
Configuration conf = NutchConfiguration.create();
Options opts = new Options();
GenericOptionsParser parser = new GenericOptionsParser(conf, opts, args);
String[] remainingArgs = parser.getRemainingArgs();
FileSystem fs = FileSystem.get(conf);
String segment = remainingArgs[0];
Path file = new Path(segment, Content.DIR_NAME + "/part-00000/data");
SequenceFile.Reader reader = new SequenceFile.Reader(fs, file, conf);
Text key = new Text();
Content content = new Content();
// Loop through sequence files
while (reader.next(key, content)) {
try {
System.out.write(content.getContent(), 0,
content.getContent().length);
} catch (Exception e) {
}
}
}
}
Alternatively, you can use org.apache.nutch.segment.SegmentReader (example).

How to convert a string into a piece of code (Factory Method Pattern?)

Let's say we have a String like this:
String string2code = "variable = 'hello';";
How could we convert that String to a piece of code like this?:
variable = "hello";
GroovyShell is the answer:
String string2code = "variable = 'hello'; return variable.toUpperCase()";
def result = new GroovyShell().evaluate string2code
assert result == "HELLO"
If you're into more complex stuff later, you can compile whole classes using GroovyClassLoader.
private static Class loadGroovyClass( File file ) throws MigrationException {
try {
GroovyClassLoader gcl = new GroovyClassLoader( ExternalMigratorsLoader.class.getClassLoader() );
GroovyCodeSource src = new GroovyCodeSource( file );
Class clazz = gcl.parseClass( src );
return clazz;
}
catch( CompilationFailedException | IOException ex ){
...
}
}
Maybe you can take a look a Janino
Janino is a small java compiler than not only can compile source files, it can compile expressions like the one you have.

Categories

Resources