Sqoop 安装完整教程(基于 WSL2 + Ubuntu 24.04)

发布时间:2026/5/26 6:50:19

Sqoop 安装完整教程(基于 WSL2 + Ubuntu 24.04) 本教程详细介绍了在WSL2Ubuntu24.04环境下安装配置Sqoop1.4.7的完整流程环境准备Java8、Hadoop3.3.6、MySQL8.0.45已安装验证命令java -version/hadoop version/mysql --version安装步骤下载Sqoop1.4.7并解压到/usr/local配置环境变量SQOOP_HOME和PATH安装MySQL JDBC驱动到Sqoop/lib目录解决依赖问题commons-lang等jar包功能验证测试连接MySQLsqoop list-databases数据导入测试MySQL→HDFS数据导出测试HDFS→MySQL常见问题NoClassDefFoundError需添加commons-lang等依赖MySQL连接问题检查服务状态和权限警告信息HBase等组件缺失警告可忽略后续建议学习增量导入、条件过滤等高级功能探索与Hive的集成掌握并行度控制和数据格式处理安装完成后用户已具备在Hadoop和关系型数据库间迁移数据的能力。教程包含完整的一键安装脚本和详细的排错指南。AI 总结版Windows 11 WSL Ubuntu 环境安装 Hadoop 完整指南AI 总结版在 Windows 11 WSL Ubuntu 上安装和使用 Hive 完整教程Sqoop 安装完整教程基于 WSL2 Ubuntu 24.04本教程基于之前的安装过程整理适用于Windows 11 WSL2 Ubuntu 24.04环境。一、前置环境要求在安装 Sqoop 之前请确保以下环境已就绪环境要求检查命令版本要求Javajava -versionJava 8 或更高版本Hadoophadoop versionHadoop 2.x 或 3.xMySQLmysql --versionMySQL 5.x 或 8.xbash# 检查各组件版本 java -version hadoop version mysql --versionmumuMuJinqiu:~$ hadoop version Hadoop 3.3.6 Source code repository https://github.com/apache/hadoop.git -r 1be78238728da9266a4f88195058f08fd012bf9c Compiled by ubuntu on 2023-06-18T08:22Z Compiled on platform linux-x86_64 Compiled with protoc 3.7.1 From source with checksum 5652179ad55f76cb287d9c633bb53bbd This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.6.jar完美Hadoop 3.3.6 已经安装好了二、下载 Sqoopbash# 进入用户目录 cd ~ # 下载 Sqoop 1.4.7稳定版本针对 Hadoop 2.6.0 的二进制包 wget https://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz如果 wget 下载较慢可以用浏览器下载后通过 MobaXterm 拖拽到 WSL 目录。三、安装 Sqoopbash# 1. 解压到 /usr/local 目录 sudo tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/ # 2. 进入 /usr/local 目录 cd /usr/local # 3. 重命名文件夹方便后续使用 sudo mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop # 4. 修改所有者将 mumu 换成你的用户名 sudo chown -R $(whoami):$(whoami) sqoop四、配置环境变量bash# 编辑 .bashrc 文件 nano ~/.bashrc在文件末尾添加以下内容bash# Sqoop 环境变量 export SQOOP_HOME/usr/local/sqoop export PATH$PATH:$SQOOP_HOME/bin保存退出CtrlX→Y→ 回车然后执行bashsource ~/.bashrc五、验证安装bashsqoop version预期输出textSqoop 1.4.7 ...如果出现关于 HBase、HCatalog 等的警告属于正常现象不影响核心功能。六、配置 Sqoopbash# 1. 进入配置目录 cd $SQOOP_HOME/conf # 2. 复制配置文件模板 cp sqoop-env-template.sh sqoop-env.sh # 3. 编辑配置文件 nano sqoop-env.sh找到并修改以下配置取消注释并填写路径bash# 设置 Hadoop 安装路径 export HADOOP_COMMON_HOME/usr/local/hadoop export HADOOP_MAPRED_HOME/usr/local/hadoop # 如果以后需要连接 Hive取消下面这行的注释 # export HIVE_HOME/usr/local/hive七、安装 MySQL JDBC 驱动Sqoop 需要 JDBC 驱动才能连接 MySQL。bash# 1. 下载 JDBC 驱动 cd ~ wget https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.33/mysql-connector-j-8.0.33.jar # 2. 复制到 Sqoop 的 lib 目录 sudo cp mysql-connector-j-8.0.33.jar /usr/local/sqoop/lib/通过apt安装的 MySQL 客户端mysql命令和 Sqoop 需要的MySQL JDBC 驱动是两回事互不影响。mumuMuJinqiu:~$ mysql --version mysql Ver 8.0.45-0ubuntu0.24.04.1 for Linux on x86_64 ((Ubuntu))简单解释项目是什么你用的是什么状态MySQL 客户端(mysql命令)用来连接和管理数据库的命令行工具已安装 ✅不需要重装MySQL JDBC 驱动一个.jar文件让 Java 程序如 Sqoop能连接 MySQL未安装 ⏳需要下载安装结论你不需要重新安装 MySQL 客户端但需要下载 JDBC 驱动放到 Sqoop 的lib目录下。八、解决依赖缺失问题May 18 14:23:37 MuJinqiu systemd[1]: Starting mysql.service - MySQL Community Server... May 18 14:23:37 MuJinqiu systemd[1]: Started mysql.service - MySQL Community Server. mumuMuJinqiu:~$ ^C mumuMuJinqiu:~$ sqoop list-databases \ --connect jdbc:mysql://localhost:3306/ \ --username root \ --password 123456 Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 2026-05-25 17:16:05,910 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 2026-05-25 17:16:05,937 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 2026-05-25 17:16:06,020 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. Exception in thread main java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils at org.apache.sqoop.manager.MySQLManager.initOptionDefaults(MySQLManager.java:73) at org.apache.sqoop.manager.SqlManager.init(SqlManager.java:89) at com.cloudera.sqoop.manager.SqlManager.init(SqlManager.java:33) at org.apache.sqoop.manager.GenericJdbcManager.init(GenericJdbcManager.java:51) at com.cloudera.sqoop.manager.GenericJdbcManager.init(GenericJdbcManager.java:30) at org.apache.sqoop.manager.CatalogQueryManager.init(CatalogQueryManager.java:46) at com.cloudera.sqoop.manager.CatalogQueryManager.init(CatalogQueryManager.java:31) at org.apache.sqoop.manager.InformationSchemaManager.init(InformationSchemaManager.java:38) at com.cloudera.sqoop.manager.InformationSchemaManager.init(InformationSchemaManager.java:31) at org.apache.sqoop.manager.MySQLManager.init(MySQLManager.java:65) at org.apache.sqoop.manager.DefaultManagerFactory.accept(DefaultManagerFactory.java:67) at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184) at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:272) at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:44) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Caused by: java.lang.ClassNotFoundException: org.apache.commons.lang.StringUtils at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 20 more这个错误是因为 Sqoop 缺少一个依赖库commons-lang。这是 Sqoop 运行必需的但安装包没有自动包含。如果还有其他缺失的依赖Sqoop 可能还需要其他 commons 相关的包。如果上面的命令执行后还报类似的错误比如NoClassDefFoundError: org/apache/commons/xxx依次下载以下包# commons-lang 已下载 wget https://repo1.maven.org/maven2/commons-lang/commons-lang/2.6/commons-lang-2.6.jar # commons-logging日志相关 wget https://repo1.maven.org/maven2/commons-logging/commons-logging/1.2/commons-logging-1.2.jar # commons-configuration配置相关 wget https://repo1.maven.org/maven2/commons-configuration/commons-configuration/1.10/commons-configuration-1.10.jar # 复制到 lib 目录 sudo cp commons-*.jar /usr/local/sqoop/lib/一次性解决推荐为了避免逐个下载可以执行以下命令批量下载常用的 Sqoop 依赖cd ~ # 下载多个依赖 wget https://repo1.maven.org/maven2/commons-lang/commons-lang/2.6/commons-lang-2.6.jar wget https://repo1.maven.org/maven2/commons-logging/commons-logging/1.2/commons-logging-1.2.jar wget https://repo1.maven.org/maven2/commons-configuration/commons-configuration/1.10/commons-configuration-1.10.jar wget https://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2.jar # 全部复制到 Sqoop lib sudo cp commons-*.jar /usr/local/sqoop/lib/ # 再次测试 sqoop list-databases --connect jdbc:mysql://localhost:3306/ --username root --password 123456Sqoop 安装后可能缺少commons-lang等依赖需要手动添加bashcd ~ # 下载常用依赖 wget https://repo1.maven.org/maven2/commons-lang/commons-lang/2.6/commons-lang-2.6.jar wget https://repo1.maven.org/maven2/commons-logging/commons-logging/1.2/commons-logging-1.2.jar wget https://repo1.maven.org/maven2/commons-configuration/commons-configuration/1.10/commons-configuration-1.10.jar wget https://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2.jar # 全部复制到 Sqoop lib 目录 sudo cp commons-*.jar /usr/local/sqoop/lib/mumuMuJinqiu:~$ sqoop version Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 2026-05-25 17:14:10,892 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Sqoop 1.4.7 git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8 Compiled by maugli on Thu Dec 21 15:59:58 STD 2017✅Sqoop 已经成功安装了版本是 1.4.7。那些Warning警告信息是正常的因为你的环境里没有安装 HBase、HCatalog、Accumulo、Zookeeper 这些组件。这些警告不影响 Sqoop 的核心功能MySQL ↔ HDFS/Hive 的数据导入导出可以暂时忽略。mumuMuJinqiu:~$ sudo service mysql status ● mysql.service - MySQL Community Server Loaded: loaded (/usr/lib/systemd/system/mysql.service; enabled; preset: enabled) Active: active (running) since Mon 2026-05-18 14:23:37 CST; 1 week 0 days ago Main PID: 666735 (mysqld) Status: Server is operational Tasks: 58 (limit: 19181) Memory: 426.4M (peak: 447.5M) CPU: 16min 15.889s CGroup: /system.slice/mysql.service └─666735 /usr/sbin/mysqld May 18 14:23:37 MuJinqiu systemd[1]: Starting mysql.service - MySQL Community Server... May 18 14:23:37 MuJinqiu systemd[1]: Started mysql.service - MySQL Community Server.✅MySQL 服务已经在运行了状态是active (running)很好。九、测试 Sqoop 连接 MySQLbashsqoop list-databases \ --connect jdbc:mysql://localhost:3306/ \ --username root \ --password 你的MySQL密码预期输出列出 MySQL 中的所有数据库如mysql,information_schema,test等。2026-05-25 17:17:09 (377 KB/s) - ‘commons-lang-2.6.jar’ saved [284220/284220] mumuMuJinqiu:~$ sudo cp commons-lang-2.6.jar /usr/local/sqoop/lib/ mumuMuJinqiu:~$ sqoop list-databases \ --connect jdbc:mysql://localhost:3306/ \ --username root \ --password 123456 Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 2026-05-25 17:17:28,793 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 2026-05-25 17:17:28,824 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 2026-05-25 17:17:28,910 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. Loading class com.mysql.jdbc.Driver. This is deprecated. The new driver class is com.mysql.cj.jdbc.Driver. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. mysql information_schema performance_schema sys lee A2608 metastore成功了Sqoop 已经可以正常连接 MySQL 了从输出可以看到Sqoop 成功列出了 MySQL 中的所有数据库关于那些警告你现在看到的警告分为两类都不影响核心功能警告类型含义需要处理吗Warning: /usr/local/sqoop/../hbase does not exist没有安装 HBase❌ 不需要除非你要用 HBaseLoading class com.mysql.jdbc.Driver...旧的驱动类名写法❌ 不需要只是提示驱动已自动注册十、完整安装脚本一键复制如果你已经配置好 Hadoop 和 MySQL可以一次性执行以下命令完成安装bash#!/bin/bash # Sqoop 一键安装脚本 cd ~ # 下载 Sqoop wget https://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz # 解压安装 sudo tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/ cd /usr/local sudo mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop sudo chown -R $(whoami):$(whoami) sqoop # 配置环境变量 echo export SQOOP_HOME/usr/local/sqoop ~/.bashrc echo export PATH$PATH:$SQOOP_HOME/bin ~/.bashrc source ~/.bashrc # 配置 Sqoop cd $SQOOP_HOME/conf cp sqoop-env-template.sh sqoop-env.sh # 下载并复制依赖 cd ~ wget https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.33/mysql-connector-j-8.0.33.jar wget https://repo1.maven.org/maven2/commons-lang/commons-lang/2.6/commons-lang-2.6.jar wget https://repo1.maven.org/maven2/commons-logging/commons-logging/1.2/commons-logging-1.2.jar wget https://repo1.maven.org/maven2/commons-configuration/commons-configuration/1.10/commons-configuration-1.10.jar wget https://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2.jar sudo cp mysql-connector-j-*.jar /usr/local/sqoop/lib/ sudo cp commons-*.jar /usr/local/sqoop/lib/ # 验证安装 sqoop version echo Sqoop 安装完成十一、测试导入/导出功能导入 MySQL 数据到 HDFSbash# 1. 在 MySQL 中准备测试数据 mysql -u root -p你的密码 -e CREATE DATABASE IF NOT EXISTS test; USE test; CREATE TABLE IF NOT EXISTS users (id INT, name VARCHAR(20)); INSERT INTO users VALUES (1, Alice), (2, Bob); # 2. 用 Sqoop 导入到 HDFS sqoop import \ --connect jdbc:mysql://localhost:3306/test \ --username root \ --password 你的密码 \ --table users \ --target-dir /user/mumu/users \ --m 1 # 3. 查看导入结果 hdfs dfs -ls /user/mumu/users hdfs dfs -cat /user/mumu/users/part-m-00000预期输出text1,Alice 2,Bob快速验证 Sqoop 导入功能你可以试试从 MySQL 导入一张表到 HDFS1. 先在 MySQL 中准备测试数据bashmysql -u root -p123456 -e CREATE DATABASE IF NOT EXISTS test; USE test; CREATE TABLE IF NOT EXISTS users (id INT, name VARCHAR(20)); INSERT INTO users VALUES (1, Alice), (2, Bob);2. 用 Sqoop 导入到 HDFSbashsqoop import \ --connect jdbc:mysql://localhost:3306/test \ --username root \ --password 123456 \ --table users \ --target-dir /user/mumu/users \ --m 13. 查看导入结果bashhdfs dfs -ls /user/mumu/users hdfs dfs -cat /user/mumu/users/part-m-00000mumuMuJinqiu:~/bigdata/2026$ mysql ERROR 1045 (28000): Access denied for user mumulocalhost (using password: NO) mumuMuJinqiu:~/bigdata/2026$ mysql -u root -p123456 -e CREATE DATABASE IF NOT EXISTS test; USE test; CREATE TABLE IF NOT EXISTS users (id INT, name VARCHAR(20)); INSERT INTO users VALUES (1, Alice), (2, Bob); mysql: [Warning] Using a password on the command line interface can be insecure.这个命令执行成功了虽然有两个提示但都是正常的第一个错误Access denied for user mumu因为你直接输入mysql没有指定用户名系统默认用你当前的 WSL 用户名mumu去登录但这个用户没有 MySQL 权限。这不是问题你平时都用mysql -u root -p登录就行。第二个命令的警告Using a password on the command line...只是提醒你在命令行里直接写密码不够安全不影响执行。验证数据是否创建成功执行以下命令查看刚才创建的表mysql -u root -p123456 -e USE test; SELECT * FROM users;应该能看到------------- | id | name | ------------- | 1 | Alice | | 2 | Bob | -------------然后执行 Sqoop 导入到 HDFSbashsqoop import \ --connect jdbc:mysql://localhost:3306/test \ --username root \ --password 123456 \ --table users \ --target-dir /user/mumu/users \ --m 1如果成功你会看到 MapReduce 作业运行的日志最后显示Imported ... records。查看导入结果bashhdfs dfs -cat /user/mumu/users/part-m-00000应该输出text1,Alice 2,BobmumuMuJinqiu:~/bigdata/2026$ hdfs dfs -cat /user/mumu/users/part-m-00000 1,Alice 2,Bob mumuMuJinqiu:~/bigdata/2026$完美Sqoop 数据导入成功了你已经在 Hadoop 和 MySQL 之间完成了第一次数据迁移。这是一个里程碑式的进展标志着你的大数据开发环境已经完全配置好并可以正常使用。常用 Sqoop 命令示例操作命令列出数据库sqoop list-databases --connect jdbc:mysql://localhost:3306/ --username root --password 123456列出表sqoop list-tables --connect jdbc:mysql://localhost:3306/数据库名 --username root --password 123456导入表到 HDFSsqoop import --connect jdbc:mysql://localhost:3306/数据库名 --username root --password 123456 --table 表名 --target-dir /路径 --m 1导出到 MySQLsqoop export --connect jdbc:mysql://localhost:3306/数据库名 --username root --password 123456 --table 表名 --export-dir /hdfs上的数据路径执行 SQL 查询sqoop eval --connect jdbc:mysql://localhost:3306/数据库名 --username root --password 123456 --query SELECT * FROM 表名尝试 Hive 集成如果后面需要用到 Hive可以安装 Hive 并配置 Sqoop 直接导入到 Hive 表十二、常见问题及解决方法问题解决方法command not found: sqoop检查环境变量执行source ~/.bashrcNoClassDefFoundError: org/apache/commons/lang/StringUtils下载commons-lang-2.6.jar并复制到 Sqoop lib 目录Access denied for user确认 MySQL 用户名和密码正确Connection refused确认 MySQL 服务已启动sudo service mysql statusHBase/HCatalog 警告不影响核心功能可忽略或安装对应组件消除警告十三、安装完成标志当你能成功执行sqoop list-databases并看到数据库列表时说明 Sqoop 安装配置完成。我的大数据环境总览组件版本状态操作系统Ubuntu 24.04 (WSL2)✅JavaOpenJDK 17✅Hadoop3.3.6✅MySQL8.0.45✅Sqoop1.4.7✅SSH 服务OpenSSH✅现在能做什么1. 将 MySQL 数据导入 HDFS已完成bashsqoop import \ --connect jdbc:mysql://localhost:3306/test \ --username root \ --password 123456 \ --table users \ --target-dir /user/mumu/users \ --m 12. 将 HDFS 数据导出回 MySQLbash# 先在 MySQL 中创建空表 mysql -u root -p123456 -e USE test; CREATE TABLE IF NOT EXISTS users_backup (id INT, name VARCHAR(20)); # 导出 sqoop export \ --connect jdbc:mysql://localhost:3306/test \ --username root \ --password 123456 \ --table users_backup \ --export-dir /user/mumu/users \ --input-fields-terminated-by ,3. 增量导入只导入新增数据bash# 假设新增了 id3 的数据 mysql -u root -p123456 -e USE test; INSERT INTO users VALUES (3, Charlie); # 增量导入 sqoop import \ --connect jdbc:mysql://localhost:3306/test \ --username root \ --password 123456 \ --table users \ --target-dir /user/mumu/users_incremental \ --m 1 \ --check-column id \ --last-value 24. 导入到 Hive如果以后安装了 Hivebashsqoop import \ --connect jdbc:mysql://localhost:3306/test \ --username root \ --password 123456 \ --table users \ --hive-import \ --hive-table default.users我的环境配置总结组件状态验证方式Java✅java -versionHadoop✅hadoop versionhdfs dfs -ls /MySQL✅mysql -u root -pSqoop✅sqoop version 刚才的导入MySQL → HDFS 导入✅已测试成功后续学习建议熟悉 Sqoop 参数--where条件过滤、--columns指定列、--query自定义SQL了解控制并行度-mmapper 数量大表时可以设置大于 1注意格式问题--fields-terminated-by、--null-string处理空值学习作业管理--hadoop-home、--mapreduce-job-name等

相关新闻