本文共 2515 字,大约阅读时间需要 8 分钟。
大体流程可以参考:
http://qindongliang.iteye.com/blog/2224797
(次参考:http://blog.csdn.net/wind520/article/details/43458925)
补充细节如下:
1)
vi /etc/profile,我的配置为
JAVA_HOME='/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85.x86_64'HADOOP_HOME='/root/hadoop2.6'SCALA_HOME='/root/scala2.10.4'SPARK_HOME='/root/spark1.4.0'MASTER='local-cluster[3,2,1024]' # 3-nodes cluster
2)
配置spark-env.sh,JAVA_HOME分配实际地址而不是相对地址,SPARK_MASTER_IP分配IP
export JAVA_HOME='/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85.x86_64'(原文为export JAVA_HOME=$JAVA_HOME)
export SPARK_MASTER_IP=192.168.22.250(原文为SPARK_MASTER_IP=master)
最后我的配置为
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85.x86_64export SCALA_HOME=$SCALA_HOMEexport HADOOP_HOME=/root/hadoop2.6export HADOOP_CONF_DIR=/root/hadoop2.6/etc/hadoopexport SPARK_MASTER_IP=192.168.22.250export SPARK_DRIVER_MEMORY=1G
3)
slaves不用改
4)
启动:sbin/start-master.sh(原文为sbin/start-all.sh)
查看log,/root/spark1.4.0/logs
15/08/25 14:01:16 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@192.168.22.250:7077]15/08/25 14:01:16 INFO util.Utils: Successfully started service 'sparkMaster' on port 7077.15/08/25 14:01:17 INFO server.Server: jetty-8.y.z-SNAPSHOT15/08/25 14:01:17 INFO server.AbstractConnector: Started SelectChannelConnector@mk-vm:606615/08/25 14:01:17 INFO util.Utils: Successfully started service on port 6066.15/08/25 14:01:17 INFO rest.StandaloneRestServer: Started REST server for submitting applications on port 606615/08/25 14:01:17 INFO master.Master: Starting Spark master at spark://192.168.22.250:707715/08/25 14:01:17 INFO master.Master: Running Spark version 1.4.015/08/25 14:01:17 INFO server.Server: jetty-8.y.z-SNAPSHOT15/08/25 14:01:17 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:808015/08/25 14:01:17 INFO util.Utils: Successfully started service 'MasterUI' on port 8080.15/08/25 14:01:17 INFO ui.MasterWebUI: Started MasterWebUI at http://192.168.22.250:808015/08/25 14:01:17 INFO master.Master: I have been elected leader! New state: ALIVE
5)开启8080端口
/sbin/iptables -I INPUT -p tcp --dport 8080 -j ACCEPT /etc/init.d/iptables save service iptables restart
6)
http://192.168.22.250:8080
7)
启动worker,如:sbin/start-slaves.sh park://192.168.22.250:7077
再看界面,worker增加了
8)停止、重启
./sbin/stop-master.sh
重启服务器后,重启Spark流程:
cd spark1.4.0/
sbin/start-all.sh
验证:http://192.168.22.250:8080/
9)减少INFO日志消息打印
cp $SPARK_HOME/conf/log4j.properties.template $SPARK_HOME/conf/log4j.properties,去掉“.template”扩展名。
编辑新文件,用WARN替换代码中出现的INFO。 pyspark 输出消息将会更简略!