CentOS下安装Spark

CentOS下安装Spark

安装前系统和用户配置

  1. Centos 7系统更新

    1
    2
    yum install epel-release
    yum update
  2. 创建用户并切换用户

    1
    2
    adduser spark
    passwd spark

    vi /etc/sudoers

    1
    2
    3
    ## Allow root to run any commands anywhere
    root ALL=(ALL) ALL
    spark ALL=(ALL) ALL
    1
    2
    su - spark
    mkdir /home/spark/Downloads

安装spark

  1. 安装Python3.6

    1
    sudo yum install python36 python36-devel
  2. 安装Spark

    下载并解压

    1
    2
    3
    4
    5
    wget https://archive.apache.org/dist/spark/spark-2.2.1/spark-2.2.2-bin-hadoop2.7.tgz
    tar -zxvf spark-2.2.2-bin-hadoop2.7.tgz
    cd spark-2.2.1-bin-hadoop2.7
    cd ./conf
    cp spark-env.sh.template spark-env.sh
  3. 修改配置文件

    vi ./conf/spark-env.sh

    1
    2
    export JAVA_HOME=/usr/lib/jvm/java
    SPARK_MASTER_HOST=192.168.231.131 #  实现远程连接
  4. 启动spark

    1
    ./sbin/start-all.sh  # 启动Spark
  5. 使用

    • 创建
    1
    2
    3
    import org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.SparkSession

    val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.some.config.option", "some-value").getOrCreate()

可能存在问题

  • export: `/usr/lib/jvm/java’: not a valid identifier

    export JAVA_HOME=/usr/lib/jvm/java: = 号两边不能有空格!

  • java.net.ConnectException: Connection refused

  • metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException

    1
    2
    sudo service hive-metastore status
    sudo service hive-server2 status
  • org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://localhost:9000/user/spark/examples/src/main/resources/people.json;