I want to transfer file from FTP server to HDFS. I tried this method: FTP to HDFS, the demo code as follows:
Configuration conf = new Configuration();
FTPFileSystem ftpfs = new FTPFileSystem();
ftpfs.setConf(conf);
ftpfs.initialize(new URI(ftpConnectStr), conf);
Path homeDirectory = ftpfs.getHomeDirectory();
System.out.println(homeDirectory.toString());
FileStatus[] fileStatuses = ftpfs.listStatus(new Path("/"));
for(FileStatus fileStatus : fileStatuses){
System.out.println(fileStatuses.length);
System.out.println(fileStatus.toString());
}
boolean test = ftpfs.mkdirs(new Path("test"));
System.out.println(test);
The ftpfs.listStatus(new Path("/")) doesn't work, it shows nothing, but the FTP server has two directories and ftpfs.mkdirs(new Path("test")) work fine, the program running result as follows:
and the FTP server dirctory as follows:
I searched in google,but find a little information. I don't know why. If you can help me, I will be very grateful,thanks
As you have found out, the problem is because Hadoop (or rather the underlying Apache Common Net FtpClient) defaults to the FTP active mode, which hardly works nowadays due to ubiquitous NATs and firewalls.
Since Hadoop 2.9, you can set up FTP passive mode by setting fs.ftp.data.connection.mode configuration option to enable the FTP passive mode:
fs.ftp.data.connection.mode=PASSIVE_LOCAL_DATA_CONNECTION_MODE
See https://issues.apache.org/jira/browse/HADOOP-13953
Finally,I found the problem where it is; In FTP Server,the data tansfer mode is set to passive.
Then I debuged the source code of FTPFileSystem,and i found that it didn't set FTP passive mode;
so,I modify the related code of FTPFileSystem to this:
Rerun the program:
and it works fine:
Related
Currently I'm facing a problem where I don't know what to do to resolve it. I'm developing a simple application that transfer files through different FTP and SFTP servers to be processed. Well, at the beginnig those servers weren't ready, so I used Apache Mina and RebexTinySftpServer to set a FTP server and a SFTP server on my computer to test my development.
With those applications I completed and tested my application locally, so it was time to test it using the servers that will be used in production, but something is wrong with the FTP server and Spring Integration is not detecting the files that are put in the folder to be transferred to the SFTP server.
I have two folders on each server: one for Input files and the another one for Output files. So when Spring Integration detects that there's a new file in the Input folder on the SFTP Server, it transfers the file to the Input folder on FTP Server to be processed for another service. That part works fine.
After that service processes the file, it generates an output file and stores it in the Output folder on the FTP Server to be transferred to the Output folder on the SFTP server, but for some reason Spring Integration is not detecting those files and doesn't transfer none of them. This part is what I don't know how to solve because none Exception is being thrown, so I don't know what part of my code I have to modify.
Here is my code where I define the DefaultFtpSessionFactory:
public DefaultFtpSessionFactory ftpSessionFactory() {
DefaultFtpSessionFactory session = new DefaultFtpSessionFactory();
session.setUsername(username);
session.setPassword(password);
session.setPort(port);
session.setHost(host);
session.setFileType(FTP.ASCII_FILE_TYPE);
session.setClientMode(FTPClient.PASSIVE_LOCAL_DATA_CONNECTION_MODE);
return session;
}
And here is the code where I define the FTP Inbound Adapter:
#Bean
FtpInboundChannelAdapterSpec salidaAS400InboundAdapter() {
return Ftp.inboundAdapter(as400Session.ftpSessionFactory())
.preserveTimestamp(true)
.remoteDirectory(as400Session.getPathSalida())
.deleteRemoteFiles(true)
.localDirectory(new File("tmp/as400/salida"))
.temporaryFileSuffix(".tmp");
}
I should mention that the FTP Server is running on an AS/400 system, so maybe that has something to do with the situation I'm facing.
I found the solution of my problem. I'm posting this in case something similar happens to someone else.
My project is using spring-integration-ftp 5.5.16 that has commons-net 3.8.0 as a dependency. For some reason, that version of commons-net wasn't retrieving the files inside the directory of the AS400, so I excluded that dependency from spring-integration-ftp and added commons-net 3.9.0 in my project. Now everything works fine.
So, I haven't worked with docker for very long. But this is the first time where I've had a requirement to ssh OUT of a docker container. It should be straight forward because I know I can connect to databases pull files from repositories. But for some reason I cannot seem to connect to a remote sftp server. Interestingly on my local it runs fine (no docker), but when building on Jenkins the tests cannot connect. Even to a MOCK server that I set up and put a test file on before the tests run. Running on Jenkins also makes it difficult to debug what the issue is.
Im using Jcraft to get the connection below:
public static Session getSession (String host, String user) throws JSchException{
JSch jsch = null;
int port = 22;
if (JunitHelperContext.getInstance.isJunit()){
port = JunitHelperContext.getInstance.getSftpPort();
Session session = jsch.getSession(user,host,port);
java.util.Properties config = new java.util.Properties();
config.put(“StrictHostKeyChecking”, “no”);
if (!JunitHelperContext.getInstance.isJunit()){
config.put(“PreferredAuthentications”, “publickey”);
jsch.setKnownHosts(”~/.ssh/known_hosts”);
jsch.addIdentity(“~/.ssh/id_rsa”);
}
session.setConfig(config);
session.connect();
return session;
}
}
My requirement is to go out and read a file and process it. I can build the kit fine using a non-docker template. The file is found and processed. Running it inside a docker container though, I get this error when I try to connect:
Invalid Key Exception: the security strength of SHA-1 digest algorithm is not sufficient for this key size.
com.jcraft.jsch.JSchException: Session.connect: java.io.IOException: End of IO Stream Read
So this seems like a security issue. In production, the certificates are on the server and they can be read in that /.ssh directory. But this is a mock Jcraft server, and I shouldnt need to authenticate.
Is there a piece I am missing here? Configuration in the docker file ?Thanks in advance.
You probably need to enable Java's JCE unlimited strength stuff in the docker container:
http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html
There are export restrictions on this cryptography stuff and I'll bet your docker container has the weak strength exportable jars.
Copy local_policy.jar and US_export_policy.jar into the docker container with the Dockerfile and overwrite what's there.
Follow the instructions at that link.
I currently work with Netbeans on Windows machine to develop topologies. When I deploy in local mode:
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
everything works just fine, but when I try to :
StormSubmitter.submitTopology("word", conf, builder.createTopology());
it obviously tries to deploy the topology in a cluster mode and fails since I dont have storm nimbus running on my local computer. I do have storm deployed on one Digital Ocean droplet, but my current (and not convenient) solution is to copy the JAR file and use the storm jar... command to deploy.
My question is: is there a way to tell Netbeans what is my nimbus IP address, so it can deploy it remotely? (and save me the time)Thank you in advance!
Check this link
Now I can develope topologies in Netbeans, test them locally, and eventually deploy them to my Nimbus on the cluster. This solution works great for me!!!
Add to conf file:
conf.put(Config.NIMBUS_HOST, "123.456.789.101); //YOUR NIMBUS'S IP
conf.put(Config.NIMBUS_THRIFT_PORT,6627); //int is expected here
Also, add the following :
System.setProperty("storm.jar", <path-to-jar>); //link to exact file location (w/ dependencies)
to avoid the following error:[main] INFO backtype.storm.StormSubmitter - Jar not uploaded to master yet. Submitting jar...
Exception in thread "main" java.lang.RuntimeException: Must submit topologies using the 'storm' client script so that StormSubmitter knows which jar to upload.
Cheers!
Yeah, definitely you can tell your topology about your nimbus IP. Following is the example code to submit topology on remote cluster.
Map storm_conf = Utils.readStormConfig();
storm_conf.put("nimbus.host", "<Nimbus Machine IP>");
Client client = NimbusClient.getConfiguredClient(storm_conf)
.getClient();
String inputJar = "C:\\workspace\\TestStormRunner\\target\\TestStormRunner-0.0.1-SNAPSHOT-jar-with-dependencies.jar";
NimbusClient nimbus = new NimbusClient(storm_conf, "<Nimbus Machine IP>",
<Nimbus Machine Port>);
// upload topology jar to Cluster using StormSubmitter
String uploadedJarLocation = StormSubmitter.submitJar(storm_conf,
inputJar);
String jsonConf = JSONValue.toJSONString(storm_conf);
nimbus.getClient().submitTopology("testtopology",
<uploadedJarLocation>, jsonConf, builder.createTopology());
Here is the working example : Submitting a topology to Remote Storm Cluster
You can pass those information using the conf map parameters .. you can pass a key, value pair as per your requirements
for a list of accepted parameters check this page ..
I've got a simple task for now: connect to a remote server and get list of files and their info (in particular date of creation).
Tried JSch, but it's like writing unix app 20 years ago. Would like to switch to sshj so if it's possible, please provide some code on how to achieve at least file listing and their info (ideally, I would like to get an array of File objects).
So how can I achieve the goal?
Thanks in advance.
NOTE: AFAIU it's only possible by having ls on server side and parsing it, isn't it?
They have examples bundled with their source distribution. Did you look a them? I found this in 2 minutes: sshj: how to execute remote command example
Edit:
Ok, you could execute for instance (basing on the example I linked):
final Command cmd = session.exec("ls -l /some/interesting/dir");
String lsOutput = cmd.getOutputAsString();
// parse lsOutput and extract required information
...
There is no simplier way if you want to do it over ssh, because it has no notion of files etc. It is just a remote shell. Maybe sftp could provide some better interface here, but I am no expert with sftp.
Here is the code for sftp (JSCH)
ChannelSftp sftp = (ChannelSftp)session.openChannel("sftp");
sftp.connect();
sftp.cd(DIRECTORY);
Vector v = null;
v = sftp.ls("*.txt"); //txt files only
Use with keyfile:
JSch jsch = new JSch();
jsch.setKnownHosts(myKonfig.getKnownHostsFile());
String privKeyFile = myKonfig.getPrivateKeyFile();
jsch.addIdentity(privKeyFile);
Oops, just saw that it doesn't return the creation time, just the modification time.
If you're just looking to get file information from the remote system, I would recommend using the SFTPClient class that's provided within sshj.
use the:
SFTPClient.ls(directory)
command to find all the remote files, then use the:
SFTPClient.stat(file)
to get all the information from the remote files including the date of modification.
We are using Commons FTPClient to retrieve files from an ftp server. Our code is similar to:
FTPClient ftpClient= new FTPClient();
ftpClient.connect(server);
ftpClient.login(username, password);
FileOutputStream out = new FileOutputStream(localFile);
ftpClient.retrieveFile(remoteFile, out)
When we run this code the file is moved from the FTP Server instead of copied. Just wondering is this expected behavior?
If this is expected behavior what is the best approach to retrieve a copy of the file from the server but leave a copy of the file on the server? (We do not have access to write to the FTP Server so we cannot write the file back to the server)
Any help appreciated,
Thanks
It is very strange behavior. I have just examined the code of FTPClient and did not see something that may remove the remote file. I believe that this is a configuration of your FTP server.
To check it I'd recommend you to try other FTP client. For example unix command line utility ftp or fget or regular web browser.
I wish you good luck.