Pig UDF Maxmind GeoIP Database Data File Loading Issue - java

The following code works when I execute the Pig script locally while specifying a local GeoIPASNum.dat file. However, it does not work when run in MapReduce distributed mode. What am I missing?
Pig job
DEFINE AsnResolver AsnResolver('/hdfs/location/of/GeoIPASNum.dat');
loaded = LOAD 'log_file' Using PigStorage() AS (ip:chararray);
columned = FOREACH loaded GENERATE AsnResolver(ip);
STORE columned INTO 'output/' USING PigStorage();
AsnResolver.java
public class AsnResolver extends EvalFunc<String> {
String ipAsnFile = null;
#Override
public String exec(Tuple input) throws IOException {
try {
LookupService lus = new LookupService(ipAsnFile,
LookupService.GEOIP_MEMORY_CACHE);
return lus.getOrg((String) input.get(0));
} catch (IOException e) {
}
return null;
}
public AsnResolver(String file) {
ipAsnFile = file;
}
...
}

The problem is that you are using a string reference to an HDFS path and the LookupService constructor can't resolve the file. It probably works when you run it locally since the LookupService has no problem with a file in your local FS.
Override the getCacheFiles method:
#Override
public List<String> getCacheFiles() {
List<String> list = new ArrayList<String>(1);
list.add(ipAsnFile + "#GeoIPASNum.dat");
return list;
}
Then change your LookupService constructor to use the Distributed Cache reference to "GeoIPASNum.dat" :
LookupService lus = new LookupService("GeoIPASNum.dat", LookupService.GEOIP_MEMORY_CACHE);

Search for "Distributed Cache" in this page of the Pig docs: http://pig.apache.org/docs/r0.11.0/udf.html
The example it shows using the getCacheFiles() method should ensure that the file is accessible to all the nodes in the cluster.

Related

Update hashset from .txt while app is running

The goal is to block access to the page from the list of IP addresses. This list is in the file list.txt.
I made the service that checks IP from request and with HashSet of "unwanted" addresses, but subgoal is "catch on the fly" this list.txt. What I mean: if I add some IP to this file, it should be blocked without restarting application. And I have not ideas how to solve this, cause my app refreshes this list only after restart. My code is below
#Service
public class BlackListService {
public Set<String> loadBlackList() {
java.util.Set<java.lang.String> blackList = new HashSet<>();
InputStream resource = null;
try {
resource = new ClassPathResource(
"blacklist.txt").getInputStream();
} catch (IOException e) {
e.printStackTrace();
}
try (BufferedReader reader = new BufferedReader(new InputStreamReader(resource))) {
blackList = reader.lines().collect(Collectors.toSet());
for (java.lang.String address:
blackList) {
System.out.println(address);
}
} catch (IOException e) {
e.printStackTrace();
}
return blackList;
}
public boolean isNowAllowedIP(Set<String> blackList, String requestIP) {
return blackList.contains(requestIP);
}
}
And controller:
#Controller
public class MainController {
private final BlackListService blackListService;
public MainController(BlackListService blackListService) {
this.blackListService = blackListService;
}
#GetMapping("/")
public String mainPage(HttpServletRequest request, Model model) {
Set<String> blackList = blackListService.loadBlackList();
if (blackListService.isNowAllowedIP(blackList, request.getRemoteAddr())) {
Logger logger = Logger.getLogger("Access logs");
logger.warning("Access disallowed");
model.addAttribute("message", request.getRemoteAddr() + ": Access disallowed");
return "index";
}
model.addAttribute("message", "Access allowed");
return "index";
}
}
Can someone help with this "subgoal"?
In loadBlackList() you are reading a resource from the classpath. Could this be picking up a file built into your jar file or build dir which is not the file you are editing? I would try changing loadBlackList() to use FileReader and a path on the file system rather than a path within the classpath instead of InputStreamReader.
What you need is a recurring background job that will reload your blacklist after you change it. This blog will discusses a "modern" approach for doing it with Spring.
Save the last modified time for the file when your program starts and you first load it. See this for checking the file modified time.
Schedule the background job to run every minute (or 5 or whatever is frequent enough for your needs).
When the job runs check the current last updated time on the file and if its different than the saved one, then its time to reload your list.

Adding logs to wso2 to track logs implemented in custom java code

Below I have a code snippet for a custom API manager mediator, I'm suppose to modify this code for our use. I'm having trouble though getting the logs out of the code when I'm running it in our wso2 environment. What would be the process to be able to the outputs of these logs. This is going to be a jar file I add to the repository/components/lib/ directory of the APIM. The jar file name is com.domain.wso2.apim.extensions. I need to be able to see whats being passed and what parts of the code are being hit for testing
public class IdentifiersLookup extends AbstractMediator implements ManagedLifecycle {
private static Log log = LogFactory.getLog(IdentifiersLookup.class);
private String propertyPrefix = "";
private String netIdPropertyToUse = "";
private DataSource ds = null;
private String DsName = null;
public void init(SynapseEnvironment synapseEnvironment) {
if (log.isInfoEnabled()) {
log.info("Initializing IdentifiersLookup Mediator");
}
if (log.isDebugEnabled())
log.debug("IdentifiersLookup: looking up datasource" + DsName);
try {
this.ds = (DataSource) new InitialContext().lookup(DsName);
} catch (NamingException e) {
e.printStackTrace();
}
if (log.isDebugEnabled())
log.debug("IdentifiersLookup: acquired datasource");
}
Add the below line to log4j.properties file resides wso2am-2.0.0/repository/conf/ folder and restart the server.
log4j.logger.com.domain.wso2.apim.extensions=INFO

PigServer script execution

I have a java code that generates pig script. I am wondering if there is option to execute that script directly within the java code. I found there is a option to embed pig script execution inside java code using PigServer class.
The problem is that I'm using AvroStorage to store the results and the class contains method Store() that apparently uses file storage.
Is there any way how to execute my pig script using AvroStorage inside JAVA using PigServer class?
Its generic code from their DOC, they use pigServer.store("D", "myoutput"); but instead of the file i need to call AvroStorage.
public class WordCount {
public static void main(String[] args) {
PigServer pigServer = new PigServer();
try {
pigServer.registerJar("/mylocation/tokenize.jar");
runMyQuery(pigServer, "myinput.txt";
}
catch (IOException e) {
e.printStackTrace();
}
}
public static void runMyQuery(PigServer pigServer, String inputFile) throws IOException {
pigServer.registerQuery("A = load '" + inputFile + "' using TextLoader();");
pigServer.registerQuery("B = foreach A generate flatten(tokenize($0));");
pigServer.registerQuery("C = group B by $1;");
pigServer.registerQuery("D = foreach C generate flatten(group), COUNT(B.$0);");
pigServer.store("D", "myoutput");
}
}

Should YamlConfiguration objects be closed?

I've been working on a plugin that requires a fair amount of data being stored.
I have it being stored in a custom config file I found online that works basically the same as the default config.
The problem I'm having is that I am not sure how to actually close the file or if I even need to, as I know little about yaml configurations.
The code for the template I used is below.
I'm also curious as to advice on how I should store larger amounts of data in the future.
public class CustomConfig {
//store name of file to load/edit
private final String fileName;
//store plugin, to get file directory
private final JavaPlugin plugin;
//store actual hard disk file location
private File configFile;
//store ram file copy location
private FileConfiguration fileConfiguration;
//constructor taking a plugin and filename
public CustomConfig(JavaPlugin plugin, String fileName) {
//ensure plugin exists to get folder path
if (plugin == null)
throw new IllegalArgumentException("plugin cannot be null");
//set this classes plugin variable to the one passed to this method
this.plugin = plugin;
//get name of file to load/edit
this.fileName = fileName;
//get directory/folder of file to load/edit
File dataFolder = plugin.getDataFolder();
if (dataFolder == null)
throw new IllegalStateException();
//load config file from hard disk
this.configFile = new File(plugin.getDataFolder(), fileName);
reloadConfig();
}
public void reloadConfig() {
//load memory file from the hard copy
fileConfiguration = YamlConfiguration.loadConfiguration(configFile);
// Look for defaults in the jar
File configFile = new File(plugin.getDataFolder(), fileName);
if (configFile != null) {
YamlConfiguration defConfig = YamlConfiguration.loadConfiguration(configFile);
fileConfiguration.setDefaults(defConfig);
}
}
public FileConfiguration getConfig() {
if (fileConfiguration == null) {
this.reloadConfig();
}
return fileConfiguration;
}
public void saveConfig() {
if (fileConfiguration == null || configFile == null) {
return;
} else {
try {
getConfig().save(configFile);
} catch (IOException ex) {
plugin.getLogger().log(Level.SEVERE, "Could not save config to " + configFile, ex);
}
}
}
public void saveDefaultConfig() {
if (!configFile.exists()) {
this.plugin.saveResource(fileName, false);
}
}
}
No. You do not have to close YamlConfiguration objects.
While the default config (JavaPlugin.getConfig()) is bound to the lifecycle of the plugin, custom ones are disposed when any other Java object is, i.e. when the garbage collector determines that there are no more references pointing to them in the code.
You don't need to close the config. It's not a BufferedWriter. The config keeps all of the data in the memory until the server shuts down. This means that if you change something in the config during the time your plugin is enabled, you will need to use your reloadConfig() method. The only clean up you need to do after using the FileConfiguration#set(String, Object) method is to use FileConfiguration#saveConfig() to tell Bukkit to take the current state of your config and copy it into your config file.

Updating Dropwizard config at runtime

Is it possible to have my app update the config settings at runtime? I can easily expose the settings I want in my UI but is there a way to allow the user to update settings and make them permanent ie save them to the config.yaml file? The only way I can see it to update the file by hand then restart the server which seems a bit limiting.
Yes. It is possible to reload the service classes at runtime.
Dropwizard by itself does not have the way to reload the app, but jersey has.
Jersey uses a container object internally to maintain the running application. Dropwizard uses the ServletContainer class of Jersey to run the application.
How to reload the app without restarting it -
Get a handle to the container used internally by jersey
You can do this by registering a AbstractContainerLifeCycleListener in Dropwizard Environment before starting the app. and implement its onStartup method as below -
In your main method where you start the app -
//getting the container instance
environment.jersey().register(new AbstractContainerLifecycleListener() {
#Override
public void onStartup(Container container) {
//initializing container - which will be used to reload the app
_container = container;
}
});
Add a method to your app to reload the app. It will take in the list of string which are the names of the service classes you want to reload. This method will call the reload method of the container with the new custom DropWizardConfiguration instance.
In your Application class
public static synchronized void reloadApp(List<String> reloadClasses) {
DropwizardResourceConfig dropwizardResourceConfig = new DropwizardResourceConfig();
for (String className : reloadClasses) {
try {
Class<?> serviceClass = Class.forName(className);
dropwizardResourceConfig.registerClasses(serviceClass);
System.out.printf(" + loaded class %s.\n", className);
} catch (ClassNotFoundException ex) {
System.out.printf(" ! class %s not found.\n", className);
}
}
_container.reload(dropwizardResourceConfig);
}
For more details see the example documentation of jersey - jersey example for reload
Consider going through the code and documentation of following files in Dropwizard/Jersey for a better understanding -
Container.java
ContainerLifeCycleListener.java
ServletContainer.java
AbstractContainerLifeCycleListener.java
DropWizardResourceConfig.java
ResourceConfig.java
No.
Yaml file is parsed at startup and given to the application as Configuration object once and for all. I believe you can change the file after that but it wouldn't affect your application until you restart it.
Possible follow up question: Can one restart the service programmatically?
AFAIK, no. I've researched and read the code somewhat for that but couldn't find a way to do that yet. If there is, I'd love to hear that :).
I made a task that reloads the main yaml file (it would be useful if something in the file changes). However, it is not reloading the environment. After researching this, Dropwizard uses a lot of final variables and it's quite hard to reload these on the go, without restarting the app.
class ReloadYAMLTask extends Task {
private String yamlFileName;
ReloadYAMLTask(String yamlFileName) {
super("reloadYaml");
this.yamlFileName = yamlFileName;
}
#Override
public void execute(ImmutableMultimap<String, String> parameters, PrintWriter output) throws Exception {
if (yamlFileName != null) {
ConfigurationFactoryFactory configurationFactoryFactory = new DefaultConfigurationFactoryFactory<ReportingServiceConfiguration>();
ValidatorFactory validatorFactory = Validation.buildDefaultValidatorFactory();
Validator validator = validatorFactory.getValidator();
ObjectMapper objectMapper = Jackson.newObjectMapper();
final ConfigurationFactory<ServiceConfiguration> configurationFactory = configurationFactoryFactory.create(ServiceConfiguration.class, validator, objectMapper, "dw");
File confFile = new File(yamlFileName);
configurationFactory.build(new File(confFile.toURI()));
}
}
}
You can change the configuration in the YAML and read it while your application is running. This will not however restart the server or change any server configurations. You will be able to read any changed custom configurations and use them. For example, you can change the logging level at runtime or reload other custom settings.
My solution -
Define a custom server command. You should use this command to start your application instead of the "server" command.
ArgsServerCommand.java
public class ArgsServerCommand<WC extends WebConfiguration> extends EnvironmentCommand<WC> {
private static final Logger LOGGER = LoggerFactory.getLogger(ArgsServerCommand.class);
private final Class<WC> configurationClass;
private Namespace _namespace;
public static String COMMAND_NAME = "args-server";
public ArgsServerCommand(Application<WC> application) {
super(application, "args-server", "Runs the Dropwizard application as an HTTP server specific to my settings");
this.configurationClass = application.getConfigurationClass();
}
/*
* Since we don't subclass ServerCommand, we need a concrete reference to the configuration
* class.
*/
#Override
protected Class<WC> getConfigurationClass() {
return configurationClass;
}
public Namespace getNamespace() {
return _namespace;
}
#Override
protected void run(Environment environment, Namespace namespace, WC configuration) throws Exception {
_namespace = namespace;
final Server server = configuration.getServerFactory().build(environment);
try {
server.addLifeCycleListener(new LifeCycleListener());
cleanupAsynchronously();
server.start();
} catch (Exception e) {
LOGGER.error("Unable to start server, shutting down", e);
server.stop();
cleanup();
throw e;
}
}
private class LifeCycleListener extends AbstractLifeCycle.AbstractLifeCycleListener {
#Override
public void lifeCycleStopped(LifeCycle event) {
cleanup();
}
}
}
Method to reload in your Application -
_ymlFilePath = null; //class variable
public static boolean reloadConfiguration() throws IOException, ConfigurationException {
boolean reloaded = false;
if (_ymlFilePath == null) {
List<Command> commands = _configurationBootstrap.getCommands();
for (Command command : commands) {
String commandName = command.getName();
if (commandName.equals(ArgsServerCommand.COMMAND_NAME)) {
Namespace namespace = ((ArgsServerCommand) command).getNamespace();
if (namespace != null) {
_ymlFilePath = namespace.getString("file");
}
}
}
}
ConfigurationFactoryFactory configurationFactoryFactory = _configurationBootstrap.getConfigurationFactoryFactory();
ValidatorFactory validatorFactory = _configurationBootstrap.getValidatorFactory();
Validator validator = validatorFactory.getValidator();
ObjectMapper objectMapper = _configurationBootstrap.getObjectMapper();
ConfigurationSourceProvider provider = _configurationBootstrap.getConfigurationSourceProvider();
final ConfigurationFactory<CustomWebConfiguration> configurationFactory = configurationFactoryFactory.create(CustomWebConfiguration.class, validator, objectMapper, "dw");
if (_ymlFilePath != null) {
// Refresh logging level.
CustomWebConfiguration webConfiguration = configurationFactory.build(provider, _ymlFilePath);
LoggingFactory loggingFactory = webConfiguration.getLoggingFactory();
loggingFactory.configure(_configurationBootstrap.getMetricRegistry(), _configurationBootstrap.getApplication().getName());
// Get my defined custom settings
CustomSettings customSettings = webConfiguration.getCustomSettings();
reloaded = true;
}
return reloaded;
}
Although this feature isn't supported out of the box by dropwizard, you're able to accomplish this fairly easy with the tools they give you.
Before I get started, note that this isn't a complete solution for the question asked as it doesn't persist the updated config values to the config.yml. However, this would be easy enough to implement yourself simply by writing to the config file from the application. If anyone would like to write this implementation feel free to open a PR on the example project I've linked below.
Code
Start off with a minimal config:
config.yml
myConfigValue: "hello"
And it's corresponding configuration file:
ExampleConfiguration.java
public class ExampleConfiguration extends Configuration {
private String myConfigValue;
public String getMyConfigValue() {
return myConfigValue;
}
public void setMyConfigValue(String value) {
myConfigValue = value;
}
}
Then create a task which updates the config:
UpdateConfigTask.java
public class UpdateConfigTask extends Task {
ExampleConfiguration config;
public UpdateConfigTask(ExampleConfiguration config) {
super("updateconfig");
this.config = config;
}
#Override
public void execute(Map<String, List<String>> parameters, PrintWriter output) {
config.setMyConfigValue("goodbye");
}
}
Also for demonstration purposes, create a resource which allows you to get the config value:
ConfigResource.java
#Path("/config")
public class ConfigResource {
private final ExampleConfiguration config;
public ConfigResource(ExampleConfiguration config) {
this.config = config;
}
#GET
public Response handleGet() {
return Response.ok().entity(config.getMyConfigValue()).build();
}
}
Finally wire everything up in your application:
ExampleApplication.java (exerpt)
environment.jersey().register(new ConfigResource(configuration));
environment.admin().addTask(new UpdateConfigTask(configuration));
Usage
Start up the application then run:
$ curl 'http://localhost:8080/config'
hello
$ curl -X POST 'http://localhost:8081/tasks/updateconfig'
$ curl 'http://localhost:8080/config'
goodbye
How it works
This works simply by passing the same reference to the constructor of ConfigResource.java and UpdateConfigTask.java. If you aren't familiar with the concept see here:
Is Java "pass-by-reference" or "pass-by-value"?
The linked classes above are to a project I've created which demonstrates this as a complete solution. Here's a link to the project:
scottg489/dropwizard-runtime-config-example
Footnote: I haven't verified this works with the built in configuration. However, the dropwizard Configuration class which you need to extend for your own configuration does have various "setters" for internal configuration, but it may not be safe to update those outside of run().
Disclaimer: The project I've linked here was created by me.

Categories

Resources