0%

CentOS下安装Spark

CentOS下安装Spark

安装前系统和用户配置

  1. Centos 7系统更新

    1
    2
    yum install epel-release
    yum update
  2. 创建用户并切换用户

    1
    2
    adduser spark
    passwd spark

    vi /etc/sudoers

    1
    2
    3
    ## Allow root to run any commands anywhere
    root ALL=(ALL) ALL
    spark ALL=(ALL) ALL
    1
    2
    su - spark
    mkdir /home/spark/Downloads

    安装spark

  3. 安装Python3.6

    1
    sudo yum install python36 python36-devel
  4. 安装Spark

    下载并解压

    1
    2
    3
    4
    5
    wget https://archive.apache.org/dist/spark/spark-2.2.1/spark-2.2.2-bin-hadoop2.7.tgz
    tar -zxvf spark-2.2.2-bin-hadoop2.7.tgz
    cd spark-2.2.1-bin-hadoop2.7
    cd ./conf
    cp spark-env.sh.template spark-env.sh
  5. 修改配置文件

    vi ./conf/spark-env.sh

    1
    2
    export JAVA_HOME=/usr/lib/jvm/java
    SPARK_MASTER_HOST=192.168.231.131 #  实现远程连接
  6. 启动spark

    1
    ./sbin/start-all.sh  # 启动Spark
  7. 使用

  • 创建

    1
    2
    3
    import org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.SparkSession

    val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.some.config.option", "some-value").getOrCreate()

    可能存在问题

  • export: `/usr/lib/jvm/java’: not a valid identifier

    export JAVA_HOME=/usr/lib/jvm/java: = 号两边不能有空格!

  • java.net.ConnectException: Connection refused

  • metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException

    1
    2
    sudo service hive-metastore status
    sudo service hive-server2 status
  • org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://localhost:9000/user/spark/examples/src/main/resources/people.json;

原创技术分享,您的支持将鼓励我继续创作