博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
log4j2+flume+hadoop
阅读量:6255 次
发布时间:2019-06-22

本文共 7916 字,大约阅读时间需要 26 分钟。

大数据数据采集架构

我们日常的应用会打印很多日志,很可能我们需要从这些日志中提取某些有用信息,要实现这个功能可以通过如下架构实现。我的选型是log4j2+flume+hadoop。整个架构如图所示:

DBC_1_

问题一:为什么是log4j2?
1.传统的log4j对性能的消耗很大。Apache宣称,对于并发发操作log4j2的性能是log4j的18倍2.log4j2为flume专门提供了一个flume appender 利于flume做数据采集3.log4j提供jsonLaout,可以生成json形式的日志,这种类型的数据对于第二阶段的数据解析提供了便利。
问题二:为什么是flume
1.flume是JAVA语言开发的,我个人是专门做JAVA,如果要做自定义会很方便,而flume提供了灵活的自定义功能。2.flume在采集数据的时候便可做一些数据清洗的东西,将不想要的东西过滤掉。3.flume本身比较轻巧,日数据在100W以内都能稳定使用。如果超过100W可以考虑跟kafka集成。
问题三:为什么是hadoop?
公司的要求是将用户的数据收集,存储,然后进行分析,根据分析结果改善用户体验,等等。hadoop的优势是对硬件的要求不高,并且有很强的容错性,能对数据进行离线分析。这些特点恰好满足公司需求。
IP分配
IP flume hadoop
m1 192.168.1.111 agent1 NameNode
s2 192.168.1.112 collector1 DataNode1
s3 192.168.1.113 collector2 DataNode2

一.log4j2

1. 新建一个marven项目,目录结构如图所示

image

2. 配置pom文件
2.8.2
2.8.2
2.8.2
1.7.0
2.7.0
org.apache.logging.log4j
log4j-core
${log4j.version}
org.apache.logging.log4j
log4j-slf4j-impl
${slf4j.version}
org.apache.logging.log4j
log4j-flume-ng
${flume-ng.versiopn}
org.apache.flume.flume-ng-clients
flume-ng-log4jappender
${log4j-flume-ng.version}
com.fasterxml.jackson.core
jackson-core
2.7.0
com.fasterxml.jackson.core
jackson-databind
${jackson.version}
3.配置log4j2.xml

3.LaoutTest.java

import org.apache.logging.log4j.Level;import org.apache.logging.log4j.LogManager;import org.apache.logging.log4j.Logger;import java.util.Date;/** * Created by hadoop on 2017/7/28. */public class LaoutTest {    static Logger logger = LogManager.getLogger(LaoutTest.class);    public static void main(String[] args) throws InterruptedException {        while (true) {            // 每隔两秒log输出一下当前系统时间戳            Thread.sleep(100);            logger.info(String.valueOf(new Date().getTime()));            logger.log(Level.getLevel("FLUME"), "another diagnostic message");            try {                throw new Exception("exception msg");            }            catch (Exception e) {                logger.error("error:" + e.getMessage());            }        }    }}

二.flume

flume我使用的是1.7版本,下载地址
安装在/usr下 文件名更名为flume。三台机器都这样操作。如下配置文件都是在/user/flume/conf下创建生成
1. agent1: 配置文件名avro-mem-hdfs-collector.properties
#nents on this agentagent1.sources = r1agent1.sinks = k1 k2 k3agent1.channels = c1 c2 c3#设定来源 通道 存储之间的关系agent1.sources.r1.channels = c1 c2 c3agent1.sinks.k1.channel = c1agent1.sinks.k2.channel = c2agent1.sinks.k3.channel = c3agent1.sources.r1.selector = replicating#sourceagent1.sources.r1.type = avroagent1.sources.r1.bind = 0.0.0.0agent1.sources.r1.port = 41414agent1.sources.r1.fileHeader = falseagent1.sources.r1.interceptors =i1agent1.sources.r1.interceptors.i1.type = timestamp#channel c1agent1.channels.c1.type = memoryagent1.channels.c1.keep-alive = 30 agent1.channels.c1.capacity = 10000agent1.channels.c1.transactionCapacity = 1000#sink k1agent1.sinks.k1.type = hdfsagent1.sinks.k1.channel = c1agent1.sinks.k1.hdfs.path = hdfs://192.168.1.111:9000/all/%Y-%m-%d/%Hagent1.sinks.k1.hdfs.filePrefix = logsagent1.sinks.k1.hdfs.inUsePrefix = .agent1.sinks.k1.hdfs.fileType = DataStreamagent1.sinks.k1.hdfs.rollInterval = 0agent1.sinks.k1.hdfs.rollSize = 16777216agent1.sinks.k1.hdfs.rollCount = 0agent1.sinks.k1.hdfs.batchSize = 1000agent1.sinks.k1.hdfs.writeFormat = textagent1.sinks.k1.hdfs.fileType = DataStreamagent1.sinks.k1.callTimeout =10000#channel c2agent1.channels.c2.type=memoryagent1.channels.c2.keep-alive = 30agent1.channels.c2.capacity = 10000agent1.channels.c2.transactionCapacity = 1000#sink for k2agent1.sinks.k2.type = avroagent1.sinks.k2.channel = c2agent1.sinks.k2.hostname = 192.168.1.112agent1.sinks.k2.port = 41414#channel c3agent1.channels.c3.type=memoryagent1.channels.c3.keep-alive = 30agent1.channels.c3.capacity = 10000agent1.channels.c3.transactionCapacity = 1000#sink for k3agent1.sinks.k3.type = avroagent1.sinks.k3.channel = c2agent1.sinks.k3.hostname = 192.168.1.113agent1.sinks.k3.port = 41414
2. collector2 配置文件名avro-mem-hdfs.properties
#nents on this agentcollector2.sources = r1collector2.sinks = k1collector2.channels = c1#sourcecollector2.sources.r1.channels = c1collector2.sources.r1.type = avrocollector2.sources.r1.bind = 0.0.0.0collector2.sources.r1.port = 41414collector2.sources.r1.fileHeader = falsecollector2.sources.r1.interceptors =i1collector2.sources.r1.interceptors.i1.type = timestamp# channel collector2.channels.c1.type = memorycollector2.channels.c1.keep-alive = 30 collector2.channels.c1.capacity = 30000collector2.channels.c1.transactionCapacity = 3000# sinkcollector2.sinks.k1.channel = c1collector2.sinks.k1.type = hdfscollector2.sinks.k1.hdfs.path = hdfs://192.168.1.111:9000/business1/%Y-%m-%d/%Hcollector2.sinks.k1.hdfs.filePrefix = logscollector2.sinks.k1.hdfs.inUsePrefix = .collector2.sinks.k1.hdfs.fileType = DataStreamcollector2.sinks.k1.hdfs.rollInterval = 0collector2.sinks.k1.hdfs.rollSize = 16777216collector2.sinks.k1.hdfs.rollCount = 0collector2.sinks.k1.hdfs.batchSize = 1000collector2.sinks.k1.hdfs.writeFormat = textcollector2.sinks.k1.hdfs.fileType = DataStreamcollector2.sinks.k1.callTimeout =10000
3. collector3 配置文件avro-mem-hdfs.properties
#nents on this agent#nents on this agentcollector3.sources = r1collector3.sinks = k1collector3.channels = c1#sourcecollector3.sources.r1.channels = c1collector3.sources.r1.type = avrocollector3.sources.r1.bind = 0.0.0.0collector3.sources.r1.port = 41414collector3.sources.r1.fileHeader = falsecollector3.sources.r1.interceptors =i1collector3.sources.r1.interceptors.i1.type = timestamp# channel collector3.channels.c1.type = memorycollector3.channels.c1.keep-alive = 30 collector3.channels.c1.capacity = 30000collector3.channels.c1.transactionCapacity = 3000# sinkcollector3.sinks.k1.channel = c1collector3.sinks.k1.type = hdfscollector3.sinks.k1.hdfs.path = hdfs://192.168.1.111:9000/business2/%Y-%m-%d/%Hcollector3.sinks.k1.hdfs.filePrefix = logscollector3.sinks.k1.hdfs.inUsePrefix = .collector3.sinks.k1.hdfs.fileType = DataStreamcollector3.sinks.k1.hdfs.rollInterval = 0collector3.sinks.k1.hdfs.rollSize = 16777216collector3.sinks.k1.hdfs.rollCount = 0collector3.sinks.k1.hdfs.batchSize = 1000collector3.sinks.k1.hdfs.writeFormat = textcollector3.sinks.k1.hdfs.fileType = DataStreamcollector3.sinks.k1.callTimeout =10000
4. 进入/user/flume目录启动agent与collectoer
1.启动agent1: bin/flume-ng agent -c ./conf/ -f conf/avro-mem-hdfs-collector.properties -Dflume.root.logger=INFO,console -n agent12.启动collector1: bin/flume-ng agent -c ./conf/ -f conf/avro-mem-hdfs.properties -Dflume.root.logger=INFO,console -n collector13.启动collector2:bin/flume-ng agent -c ./conf/ -f conf/avro-mem-hdfs.properties -Dflume.root.logger=INFO,console -n collector3

转载地址:http://qpisa.baihongyu.com/

你可能感兴趣的文章
安装内容[Python]第三方库-Scrapy入门使用
查看>>
关闭web.config的继承
查看>>
一键让应用程序适配 iphone5
查看>>
http 长连接和轮询
查看>>
Windows CE 6.0的安装,简单定制和导出SDK--转载
查看>>
在Windows Server 2008 R2上安装Exchange 2013过程中遇到的一些问题
查看>>
Maven POM入门
查看>>
codeforces 6A. Triangle
查看>>
仿CSDN Blog返回页面顶部功能
查看>>
【HTML5游戏开发小技巧】RPG情形对话中,令文本逐琢夸出
查看>>
ORA-04031:
查看>>
早晚有一天,我们都会成为自己当初讨厌的人
查看>>
基于SMTP协议的CMD命令邮件发送
查看>>
九度笔记之 1209最小邮票数
查看>>
Java中swap解惑
查看>>
HDU 2068 RPG的错排
查看>>
操作数有自增操作时复合表达式的陷阱
查看>>
从WW中剥离一个三维场景框架
查看>>
ASP.NET网页动态添加、更新或删除数据行
查看>>
vbs获取当前主机IP
查看>>